Neuro-Informatics and Neural Modelling
HANDBOOK
OF
BIOLOGICAL
PHYSICS
Series Editor: A.J. Hoff
Volume 1" A: B:
Structure and Dynamics of Membranes From Cells to Visicles Generic and Specific Interactions
Volume 2:
Transport Processes in Eukaryotic and Prokaryotic Organisms
Volume 3:
Molecular Mechanisms in Visual Transduction
Volume 4:
Neural-Informatics and Neural Modelling
Neuro-Informatics and Neural Mo delling Editors: E Moss
Center for Neurodynamics, University of Missouri at St. Louis St. Louis, MO 63121, USA S. Gielen
Department of Medical Physics and Biophysics, University of Nijmegen 6525 EZ Nijmegen, The Netherlands
2001 ELSEVIER Amsterdam
- London
- New York-
Oxford
- Paris - Shannon
- Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands
9 2001 Elsevier Science B.V. All rights reserved. This w o r k is protected u n d e r c o p y r i g h t by E l s e v i e r Science, and the f o l l o w i n g terms and c o n d i t i o n s a p p l y to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science Global Rights Department, PO Box 800, Oxford OX5 1DX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also contact Global Rights directly through Elsevier's home page (http://www.elsevier.nl), by selecting "Obtaining Permissions'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (978) 7508400, fax: (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London WIP 0LP, UK: phone: (+44) 207 631 5555: fax: (+44) 207 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Global Rights Department, at the mail, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 2001 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.
ISSN: 1383-8121 ISBN: 0 444 50284 X Q The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
General Preface
Biological Physics encompasses the study of the processes of life with physical concepts and methods based on the laws of nature, which are assumed to be equally valid for living and dead matter. A multidisciplinary approach brings together elements from b i o l o g y - knowledge of the problem that is attacked - and from the physical sciences- the techniques and the methodology for solving the problem. In principle, Biological Physics covers the physics of all of biology, including medicine, and therefore its range is extremely broad. There clearly is a need to bring order to the growing complexity of research in Biological Physics. The Handbook of Biological Physics answers this need with a series of interconnected monographs, each devoted to a certain subfield that is covered in depth and with great attention to the clarity of presentation. The Handbook is structured such that interrelations between fields and subfields are made transparent. Evaluations are presented of the extent to which the application of physical concepts and methodologies (often with considerable effort in terms of personal and material input) have advanced our understanding of the biological process under examination, and areas in which a concentrated effort might solve a long-standing problem, are identified. Individual volumes of the Handbook are devoted to an entire "system" unless the field is very active or extended (as e.g. for membranes or vision research), in which case the system is broken down into two or more subsystems. The guiding principle in planning the individual volumes is that of going from simple, welldefined concepts and model systems on a molecular and (supra)cellular level, to the highly complex structures and functional mechanisms of living beings. Each volume contains an introduction defining the (sub)field and the contribution of each of the following chapters. Chapters generally end with an overview of areas that need further attention, and provide an outlook into future developments. The first volume of the Handbook, Structure and Dynamics of Membranes, deals with the morphology of biomembranes and with different aspects of lipid and lipid-protein model membranes (Part A), and with membrane adhesion, membrane fusion and the interaction of biomembranes with polymer networks such as the cytoskeleton (Part B). The second volume, Transport Processes in Eukaryotic and Prokaryotic Organisms, continues the discussion of biomembranes as barriers be-
vi
General Preface
tween the inside of the cell and the outside world, or between distinct compartments of the cellular inner space, across which a multitude of transport processes occur. The third volume, Molecular Mechanisms of Visual Transduction, extends the scope of the previous volumes to perhaps the most intensely studied signal transduction process, visual transduction. The molecular mechanisms of phototransduction in vertebrates and invertebrates are treated in depth. The structure and comparative molecular biology of numerous visual pigments are discussed. The primary photoreactions in rhodopsin and in the microvillar and ciliary photoreceptors of invertebrates are examined and compared. Finally, the visual processes in insect photoreceptors are modelled in detail. The present volume, Neuro-Informatics and Neural Modelling, is perhaps the most ambitious of the Handbook volumes so far. It addresses the next step on the road from sensory and transduction phenomena to perception and consciousness, viz. neural information transmission, and the storage and decoding of this information by neural networks. The volume is divided into two sections, one on the dynamics associated with non-linear processes in a complex neural system, and one on structure and properties of single neurons and neural networks. Information coding and decoding in a stochastic environment is treated in detail. The dynamics of learning, data storage and retrieval are analyzed, and learning rules designed. It is shown that non-trivial brain anatomy emerges in a natural way from the combination of stimulus properties and learning rules. In the other section, it is shown that noise and fluctuations are natural to all biological systems, and often enhance rather than corrupt neural signal transmission and processing. Chaos is introduced in neural systems and thoroughly analyzed. Synchronization on various levels, such as single ion channels in cell membranes or populations of spiking neurons, is discussed and the tantalizing idea put forward that synchronized coherent structures in the brain may play a role in information transmission. AII chapters have been written following a tutorial approach, providing a sound conceptual and theoretical framework. Each chapter focuses on a particular research topic; together they provide senior and junior researchers a challenging overview of the field, which, we hope, will inspire exciting new research in the years to come. Planned volumes
The "bottom-up" approach adopted for individual volumes of the Handbook, is also the guideline for the entire series. Having started with two volumes treating the molecular and supramolecular structure of the cell, Volume 3 is the first of several volumes on cellular and supracellular systems. The present volume, No. 4, on neuroinformatics and neural modelling provides the bridge between sensory cellular input, perception and cognitive processes. The next two planned volumes are on Molecular Motors as chemo-mechanical transduction devices, and on Biological Electron Transport processes. Further planned volumes are: * V i s i o n - perception, pattern recognition, imaging 9 The vestibular system
General Preface
vii
Hearing 9 The cardio-vascular system, fluid dynamics and chaos 9 Electro-reception and magnetic field effects Further volumes will be added as the need arises. We hope that the present volume of the Handbook will find an equally warm welcome in the Biological Physics community as the first three volumes, and that those who read these volumes will communicate their criticisms and suggestions for the future development of the project. 9
Leiden, Spring 2001 Arnold J. Hoff Editor
This Page Intentionally Left Blank
Preface to Volume 4 Neuro-Informatics and Neural Modelling
How do sensory neurons transmit information about environmental stimuli to the central nervous system? How do networks of neurons in the CNS decode that information, thus leading to perception and consciousness? These questions are among the oldest in neuroscience. Quite recently, new approaches to exploration of these questions have arisen, often from interdisciplinary approaches combining traditional computational neuroscience with dynamical systems theory, including nonlinear dynamics and stochastic processes. In this volume, we present in two sections a selection of contributions about these topics from a collection of wellknown authors. One section highlights some insights that have recently developed out of the nonlinear systems approach. The second section focuses on computational aspects from single neurons to networks with a major emphasis on the latter. All authors were instructed to prepare papers following a tutorial approach providing a conceptual and theoretical framework, which can serve as a standard for researchers in this field for at least the next ten years. Following the general tutorial introduction, each chapter presents a developing focus on particular research topics toward the end, confronting conceptual and theoretical predictions with recent experimental data. The result is a collection of outstanding tutorial papers with highlights from contemporary research. In the first section (edited by Frank Moss), we look at some unique contemporary phenomena arising from dynamical systems approaches, which have sparked recent attention and interest in neuroscience. Here we encounter Stochastic Resonance and Synchronization and learn how a random process, or "noise" as it is often called, can enhance rather than corrupt the detection and/or transmission and processing of information in neural systems. And it is shown that noise and fluctuations are ubiquitous in biological settings at all levels from the subcellular to the organismal level. Next, chaos in neural systems is introduced and questions concerning the meaning and information content of the unstable periodic orbits characteristic of chaotic systems are raised. Tutorials and research on how these orbits are detected, counted and classified topologically in biological systems are presented. Following this, we look at how these orbits are controlled in cardiac p r e p a r a t i o n s - applications that one day might be developed into useful therapies
x
Preface to Volume 4
for the treatment of heart diseases. We then look more closely at synchronization, a general process that is widely found in biological systems, from the subcellular level to the level of populations of neurons in the brain. The role of noise and fluctuations on the quality of synchronization among populations of oscillators represented by single ion channels in cell membranes or populations of spiking neurons giving rise to the data of magnetoencephalographic recordings from the brain are discussed. Finally we come to self-organized criticality- arising from complex systems that have no single time or length scale - and ask if the noisy but coherent structures resulting from such systems can carry or transmit useful information. For example, can the synchronized propagation of noisy coherent structures in the brain transmit useful information? And could the quantitative characterization of the statistics of such structures be developed into a useful diagnostic of brain disease? In the second section (edited by Stan Gielen), we are led to look closely at how single neurons function. These, in turn, show us how well-known properties of brain architecture and functions, like perception, cognition and motor control, can be understood as emergent properties of single neuron behavior and learning in a population of neurons. The first section starts with an extensive description of biological (spiking) neurons and a formal description of dynamical behavior of neurons. This part provides a general framework for the information processing by single neurons in various parts of the nervous system. The aim is to provide a formal representation, which captures the complexity of the biological neuron, yet allows an analytical approach to understand the relation between structure and function of a neuron. The stochastic behavior of neurons necessitates a probabilistic approach to characterize the input-output relation. This brings us to the second part of this section, where we focus on the statistical mechanics as a tool to characterize the behavior of populations of stochastic neurons. After a general broad introduction, we address extensively the storage capacity, convergence, and the stability of attractor states, which are the result of learning processes in neuronal networks with various types of connectivity (full connectivity, sparse connectivity, symmetric and asymmetric connections, connectivity decreasing with distance between neurons). Next, this analysis is extended by studying the dynamics of learning, data storage and retrieval, and the evolution of the states of a neural network. This requires a simultaneous analysis at two time scales: a fast time scale of milliseconds (evolution of the neuronal states) and a slow time scale (seconds to minutes) corresponding to the rate of synaptic plasticity. This is followed by a thorough analysis of learning and how learning rules should be designed in order to guarantee optimal convergence to the desired network performance. The next question to be addressed deals with the efficiency of information storage and transmission by an ensemble of neurons. How is information coded in recruitment, firing rate and synchrony of firing of neurons? How can we interpret the neuronal activity in a population of neurons in terms of sensory stimuli or motor behavior? An answer to this question is absolutely necessary if one wants to gain some understanding of brain function from measured neuronal activity (either single-unit or multi-unit). The overview presented here provides a framework for all future work on this topic. The last part of this section is devoted to the self-organizing properties of neuronal networks with regard
Preface to Volume 4
xi
to connectivity. It explains how the well-structured (for example topographical) organization that is frequently observed in brain structures, emerges in a natural way from the combined effect of stimulus properties and learning rules, leading to non-trivial brain anatomy. Furthermore, it explains how the different architectures in various parts of the brain form the basis for different sensory-motor functions. We hope that students and researchers alike will be challenged and stimulated by the selections collected in this volume. In tutorial fashion the authors do answer many questions. But many more remain unanswered or only partially addressed. We hope the articles in this volume will inspire the readers to consider some of the topics treated and the questions raised for their own research. We promise that such enterprises will be challenging and fruitful. Stan Gielen and Frank Moss July, 2000
This Page Intentionally Left Blank
Contents of Volume 4
General Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Preface to Volume 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
Contents o f Volume 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
Contributors to Volume 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvii
SECTION 1: STATISTICAL AND NONLINEAR DYNAMICS IN NEUROSCIENCE
Stochastic Resonance, Noise and Information in Biophysical Systems 1. 2.
K.A. Richardson and J.J. Collins Electrical Stimulation o f the Somatosensory System . . . . . . . . . . . . L. Schimansky-Geier, V.S. Anishchenko and A. Neiman Phase Synchronization." From Periodic to Chaotic and Noisy . . . . . . .
3.
1
23
P. Arhem and H. Liljenstr6m Fluctuations in Neural Systems: From Subcellular to Network L e v e l s . .
83
Chaos and the Detection of Unstable Periodic Orbits in Biological Systems .
,
K. Dolan, M.L. Spano and F. Moss Detecting Unstable Periodic Orbits in Biological Systems . . . . . . . . .
131
R. Gilmore and X. Pei The Topology and Organization o f Unstable Periodic Orbits in H o d g k i n - H u x l e y Models o f Receptors with Subthreshold Oscillations .
155
Chaos Control in Cardiac and Other Applications .
D.J. Christini, K. Hall, J.J. Collins and L. Glass Controlling Cardiac Arrhythmias: The Relevance o f Nonlinear Dynamics . . . . . . . . . . . . . . . . . . . .
Xlll
205
Contents of Volume 4
xiv
o
D.J. Gauthier, S. Bahar and G.M. Hall Controlling the Dynamics o f Cardiac Muscle Using Small Electrical Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229
Synchronization .
,
J.A. White and J.S. Haas Intrinsic Noise from Voltage-Gated Ion Channels: Effects on Dynamics and Reliability in Intrinsically Oscillatory Neurons . . . . . . . . . . . . . . . . . . . . . . . .
257
M. Rosenblum, A. Pikovsky, C. Sch~fer, P.A. Tass and J. Kurths Phase Synchronization: From Theory to Data Analysis . . . . . . . . . .
279
Self Organized Criticality in Biophysical Applications 10.
P. Jung, A.H. Cornell-Bell, M. Dreher, A. deGrauw, R. Strawsburg and V. Trinkaus-Randall Statistical Analysis and Modeling o f Calcium Waves in Healthy and Pathological Astrocyte Syncytia . . . . . . . . . . . . . . . .
323
SECTION 2: BIOLOGICAL PHYSICS OF N E U R O N S AND NEURAL NETWORKS
Biophysical Models for Biological Neurons 11.
12.
C. Meunier and I. Segev Neurones as Physical Objects." Structure, Dynamics and Function . . . .
353
W. Gerstner A Framework for Spiking Neuron Models: The Spike Response Model . . . . . . . . . . . . . . . . . . . . . . . . . . .
469
Introduction to Neural Networks 13. 14. 15. 16.
H.J. Kappen An Introduction to Stochastic Neural Networks . . . . . . . . . . . . . . .
517
A.C.C. Coolen Statistical Mechanics o f Recurrent Neural Networks I - Statics . . . . .
553
A.C.C. Coolen Statistical Mechanics o f Recurrent Neural Networks H - D y n a m i c s . . .
619
J.A. Flanagan Topologically Ordered Neural Networks . . . . . . . . . . . . . . . . . . .
685
Contents of Volume 4
xv
Learning in Neural Networks 17. 18.
K. Fukumizu Geometry of Neural Networks: Natural Gradient for Learning . . . . . .
731
J.L. van Hemmen Theory of Synaptic Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . .
771
Information Coding in Neural Networks 19. 20.
21.
A. Treves Information Coding in Higher Sensory and Memory Areas . . . . . . . .
825
C.C.A.M. Gielen Population Coding: Efficiency and Interpretation of Neuronal Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
853
D. Golomb, D. Hansel and G. Mato Mechanisms of Synchrony of Neural Activity in Large Networks . . . .
887
Self-Organisation in Cortex 22.
23.
U. Ernst, M. Tsodyks and K. Pawelzik Emergence of Feature Selectivity from Lateral Interactions in the Visual Cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Lappe Information Transfer Between Sensory and Motor Networks . . . . . . .
969 1001
Epilogue to Volume 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1043
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1045
This Page Intentionally Left Blank
Contributors to Volume 4
V.S. Anishchenko, Nonlinear Dynamics Laboratory, Department of Physics, Saratov State University, Saratov 410026, Russian Federation P. Jlrhem, Agora for Biosystems and Department of Neuroscience, Karolinska Institutet, SE- 171 77 Stockholm, Sweden S. Bahar, Center for Neurodynamics, University of Missouri- St. Louis, 8001 Natural Bridge Rd., St. Louis, MO 63121, USA D.J. Christini, Division of Cardiology, Department of Medicine, Cornell University Medical College, NY 10021, USA J.J. Collins, Department of Biomedical Engineering and Center for BioDynamics, Boston Univesity, Boston, MA 10021, USA A.C.C. Coolen, Department of Mathematics, King's College London, Strand, London WC2R 2LS, UK A.H. Cornell-Bell, Viatech Imaging/Cognetix, 58 Main Street Ivoryton, CT 06442, USA Ao deGrauw, Division of Neurology Children's Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, USA M. Dreher, Viatech Imaging/Cognetix, 58 Main Street Ivoryton, CT 06442, USA K. Dolan, Center for Neurodynamics, University of Missouri at St. Louis, 8001 Natural Bridge Rd., St. Louis, MO 63121, USA U. Ernst, Institute for Theoretical Physics, University of Bremen, Kufsteiner Str., D-28334 Bremen, Germany J.A. Flanagan, Neural Network Research Center, Helsinki University of Technology, P.O. Box 5400, Fin-02015 HUT, Finland K. Fukumizu, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan D.J. Gauthier, Department of Physics, Department of Biomedical Engineering, and Center for Nonlinear and Complex Systems, Duke University, Box 90305, Durham, NC 27708, USA
xvii
xviii
Contributors to Volume 4
W. Gerstner, Center for Neuro-mimetic Systems, Computer Science Department, EPFL-DI, Swiss Federal Institute of Technology, CH-1015 Lausanne EPFL, Switzerland C.C.A.M. Gielen, Department of Medical Physics and Biophysics, University of Nijmegen, Geert Grooteplein Noord 21, NL 6525 EZ Nijmegen, The Netherlands R~ Gilmore, Department of Physics, Drexel University, Philadelphia, PA 19104,
USA L. Glass, Departments of Physics and Physiology, McGill University, Montreal, Que., Canada D. Golomb, Zlotowski Center for Neuroscience and Department of Physiology Faculty of Health Sciences, Ben Gurion University of the Negev, Be'er-Sheva 84105, Israel G.M. Hall, The Corporate Executive Board, 2000 Pennsylvania Avenue, N.W., Suite 6000, Washington, DC 20006, USA K. Hall, Departments of Physics and Physiology, McGill University, Montreal, Que., Canada D. Hansel, Laboratoire de Neurophysique et de Physiologie du Syst6me Moteur EP 1848 CNRS, Universit6 Ren6 Descartes, 45 rue des Saints P6res, 75270 Paris Cedex 06, France J.S. Hass, Department of Biomedical Engineering, Center for Biodynamics, 44 Cummington Street, Boston, MA 02215, USA P. Jung, Department of Physics and Astronomy and Program for Neuroscience, Ohio University, Athens, OH 45701, USA H.J. Kappen, SNN University of Nijmegen, Geert Grooteplein Noord 21, 6525 EZ Nijmegen, The Netherlands J. Kurths, Department of Physics, University of Potsdam, Am Neuen Palais 10, D- 14415 Potsdam, Germany M. Lappe, Computational and Cognitive Neuroscience Laboratory, Department of Zoology and Neurobiology, Ruhr-University, 44780 Bochum, Germany H. Liljenstr6m, Agora for Biosystems and Department of Biometry and Informatics, SLU, SE-750 07 Uppsala, Sweden G. Mato, Comisi6n Nacional; de Energia At6mica and CONICET, Centro At6mico Bariloche and Instituto Balseiro (CNEA) and UNC), 8400 San Carlos de Bariloche, R.N., Argentina C. Meunier, Laboratoire de Neurophysique et physiologie du Syst6me moteur (EP 1848 CNRS), Universit6 Ren6 Descartes, 75270 Paris cedex 06, France F. Moss, Center for Neurodynamics, University of Missouri at St. Louis, 8001 Natural Bridge Rd., St. Louis, MO 63121, USA
Contributors to Volume 4
xix
A. Neiman, Center for Neurodynamics, University of Missouri at St. Louis, St. Louis, MO 63121, USA K. Pawelzik, Institute for Theoretical Physics, University of Bremen, Kufsteiner Str., D-28334 Bremen, Germany X. Pei, Center for Neurodynamics, University of Missouri, St. Louis, MO 63121, USA A. Pikovsky, Department of Physics, University of Potsdam, Am Neuen Palais 10, D- 14415 Potsdam, Germany K.A. Richardson, Center for Biodynamics and Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA M. Rosenblum, Department of Physics, University of Potsdam, Am Neuen Palais 10, D- 14415 Potsdam, Germany L Segev, Department of Neurobiology, Institute of Life Sciences, Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel C. Schi~fer, Centre for Nonlinear Dynamics, Department of Physiology, McGill University, 3655 Drummond Street, Montreal, Que. Canada H3G 1Y6 L. Schimansky-Geier, Institut ftir Physik, Humboldt-Universitat zu Berlin, Invalidenstr 110, D- 10115 Berlin, Germany M.L. Spano, NSWC, Carderock Laboratory, 9500 MacArthur Blvd., Code 681, W. Bethesda, MD 20817, USA R. Strawsburg, Division of Neurology, Children's Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, USA P.A. Tass, Institute of Medicine (MEG), Research Centre Julich, D-52425 Julich, Germany A. Treves, SISSA, Cognitive Neuroscience, Trieste, Italy V. Trinkaus-Randall, Department of Opthalmology, Boston University, School of Medicine, 80 E Concord Str., Boston, MA 02118, USA J.L. Van Hemmen, Physik Department der TU Mtinchen, D-85747, Garching bei Mtinchen, Germany J.A. White, Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston University, Boston, MA 02215, USA
This Page Intentionally Left Blank
CHAPTER 1
Electrical Stimulation of the Somatosensory System K.A. R I C H A R D S O N
and J.J. C O L L I N S
Center for Biodynamics and Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
Contents
1.
Introduction
2.
Somatosensory anatomy ..........................................
3.
4.
.................................................
2.1.
Skin s t r u c t u r e
2.2.
R e c e p t o r types
3 3
.............................................
3
............................................
4
Electrical s t i m u l a t i o n o f the s o m a t o s e n s o r y system . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3.1.
Electrical e x c i t a t i o n of p e r i p h e r a l nerve fibers . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3.2.
C o n v e r g e n c e o f electrical a n d m e c h a n i c a l s t i m u l a t i o n
C o m p u t a t i o n a l studies
.....................
...........................................
7 8
4.1.
M o d e l i n g basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.
Our computational model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.
Point current source excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.4.
C i r c u l a r electrode e x c i t a t i o n
15
4.5.
Electrical noise e x c i t a t i o n
Appendix A References
....................................
......................................
.................................................
.....................................................
8 10
18 19 21
1. Introduction According to the theory of stochastic resonance (SR), the ability of certain nonlinear systems to detect weak signals can be enhanced with the addition of noise. SR has been demonstrated in a wide variety of biological systems [1-9], including the human somatosensory system [10-12]. Previous studies dealing with SR in biological systems have largely employed noise that is of the same modality as the stimulus. For example, we have used random vibrations to enhance the ability of human subjects to detect weak mechanical cutaneous stimuli [10,11]. However, if a noise-based technique is to be used to enhance tactile sensation in humans, it is possible that the noise will be introduced electrically via stimulation systems, such as glove electrodes and sock electrodes. Accordingly, in a follow-on psychophysical study with healthy young subjects, we examined the effects of electrical input noise on the detectability of weak mechanical cutaneous stimuli [12]. We found that the ability of an individual to detect a subthreshold mechanical cutaneous stimulus can be significantly enhanced by introducing a particular level of electrical noise. This work has laid the foundation for the development of noise-based sensory prosthetics for individuals with elevated sensory thresholds, such as older adults, stroke patients, and diabetic patients with peripheral neuropathy. To optimize the development of such devices, it is important to have an understanding of the effects of electrical stimulation on the somatosensory system. This chapter will explore this topic in detail. In Section 2, we briefly describe the anatomy of the somatosensory system, as it relates to electrical stimulation. In Section 3, we review some of the key neurophysiological and psychophysical studies that have involved somatosensory stimulation. In Section 4, we discuss our computational studies that explore the effects of electrical stimulation on cutaneous mechanoreception.
2. Somatosensory anatomy 2.1. Skin structure
In human glabrous skin, there are two major tissue layers: the dermis and the epidermis, as seen in Fig. 1 [13]. The outer layer, the epidermis, is constructed mostly of stratified epithelium cells. The dermis, which lies below the epidermis, is made of dense connective tissue that contains nervous and vascular networks [14]. The dermis structure follows the undulations of the epidermis and forms papillae, which are finger-like projections up into the epidermis. Nerve endings are found in both the dermis and epidermis layers; the location of each receptor termination depends on the receptor type. Free nerve endings travel through the connective
4
K.A. Richardson and J.J. Collins
,~ 9
Hairy skin-~-.--~-Glabrous skin~
,
|
|
J
|
|
i
i
|
Epidermal- ~,~..~,...~"~;,"?~.~.:~ri.:.? dermal b~der Free nerve ~,"~. ~~, ~ ..,, Mndi~ngner,s~ ' " corpuscle ~ Hair f o l l i c l e ~ ;~,~.~2
:.:~
..: . ~4ii;:,.~)~i.~!" "
~i,
9
~""~
i.:I
[ii~J ,, Dermis i.".I~ii~ .
Pacinian
' .,,~.~~~.~!'
""
.
.
.
.
.
.
~"
ending Fig. 1. Skin anatomy: a diagram of the skin found at the human fingertip. Shown are the two major layers of the skin, along with the different receptors found in each layer. Adapted from Ref. [13]. tissue of the upper portion of the dermal papillae and enter the epidermis. Other fibers terminate in various depths of the dermis. Fibers that terminate in the upper part of the dermis do so in the upper part of the dermal papillae [15].
2.2. Receptor types There are four basic types of mechanoreceptors found in human glabrous skin. These receptors are grouped according to their adaptation properties. The Pacinian corpuscles (PCs) and the Meissner corpuscles adapt rapidly to sustained stimuli and encode information about transient stimuli, such as vibration or flutter. The Merkel cells and Ruffini's endings adapt slowly to sustained stimuli and encode information about spatial discrimination and the presence of a signal [16]. Each of these receptor types is innervated by large diameter myelinated fibers. The transduction of mechanical stimuli is thought to occur through stretch sensitive gates at the nerve terminus [17]. Information about this process has been extracted largely from work on PCs. Mechanical stimuli induce a stretch on the membrane of the receptor terminal. This causes the stretch-gated ion channels in the membrane to open, which results in a local depolarization of the cell. This local potential is graded so that a more intense stimulation causes a larger depolarization over a larger area of the receptor membrane. Local potentials set up on the membrane can then summate both temporally and spatially. Once the potential reaches a certain value, the receptor generates an action potential and the mechanical signal is transduced into an electrical signal. This signal is then centrally processed to obtain information about the original mechanical stimulus. It is interesting to note that typically one afferent firing can cause a psychophysical response [16,18].
Electrical stimulation of the somatosensory system
5
The receptors, while similar in transduction elements (i.e., stretch-gated ion channels), are anatomically distinguishable by the terminal structure that contains their transduction elements and by the location of their termination (see Fig. 1). The PCs are large structures found deep in the dermal layer. The axon of each PC terminates inside an encapsulating bulb. The myelinated fiber enters the bulb and ends in an unmyelinated nerve terminal called a neurite. The bulb itself is made of many layers called lamellae, which are responsible for rapid adaptation to stimuli. When a mechanical stimulus is introduced to the receptor, the fluid between the layers redistributes and the neurite no longer experiences a load. The stretch-gated channels are located on the tip of this neurite; the site of action potential generation is further down the neuron, just before myelin begins to ensheath the axon but not before it leaves the capsule. Thus, the site of transduction is within the capsule. The PCs are the most widely studied mechanoreceptor type and thus much is known about how their structure relates to action potential initiation [17]. Although the remaining three receptors- the Meissner corpuscles, Ruffini's endings, and Merkel cells- are not as widely studied as the PCs, general information about their location and terminal structures are available. The Meissner corpuscles are found in the dermal papillae. In young subjects, each papillae contains at least one corpuscle; in older subjects, the distribution of receptors is sparser [15]. The nerve endings of these receptors are encapsulated in a connective tissue sheath, but several neurons may innervate a given capsule. Once the myelinated fiber enters the capsule, it loses its myelin sheath, branches and then tangles around other fibers, axon terminals and connective tissue. The Ruffini's endings are also encapsulated receptors found in the dermal layer. The connective tissue compartment of each Ruffini's ending encloses the axon and branching terminal nerve fibers. Finally, the Merkel cells are found in the epidermis. The lower part of the cell is in direct contact with a nerve terminus that contains many vesicles. It is thought that the Merkel cells function as a transducer, communicating via a chemical transmitter to the nerve terminal directly below it [ 14,15,19]. There are also other types of fibers, in addition to these mechanoreceptors, innervating the glabrous skin of the hand and foot. These include smaller fibers, both myelinated and unmyelinated, that are involved in nociception and thermoreception. The nociceptors are specialized to different types of injurious stimuli. For instance, there are mechanical, thermal and chemical nociceptors. These are highthreshold units that are found throughout the dermis [20]. In general, the foot and hand (the major areas of interest for noise-based prosthetics) contain a wide variety of receptor types with different stimulus specificities. These receptors are found in various tissue layers in the skin and have different terminal structures. Such features may affect the ability of electrical stimulation to excite these endings. For example, the capsules that encase some of the endings may act as insulation. Alternatively, a superficial receptor may be excited by proximity alone. These issues need to be taken into account when considering the ability of electrical stimuli to excite various receptors and fiber types.
6
K.A. Richardson and J.J. Collins
3. Electrical stimulation of the somatosensory system
3.1. Electrical excitation of peripheral nerve fibers Sensory effects from electrical stimulation have been reported since the 1700s, and systematic experiments on this topic have been conducted since at least the early 1900s [21 ]. Of note, there have been many investigations that have studied the effects of electrical stimuli on somesthetic sensation. Several of these studies have shown that the ability to excite certain receptors is related to the anatomical structure and location of a given receptor and the configuration of the stimulating electrode. Garnsworthy et al. [22], for instance, conducted a series of animal experiments and found that most unmyelinated fibers, at threshold, can be activated with low-current, high-voltage stimulation from needle electrodes placed on the surface of the skin. The stimulation selectively excites unmyelinated C fibers (which are of small diameter) rather than the large diameter myelinated fibers. This is probably due to the fact that C fibers terminate more superficially and are recruited first, based on proximity to the localized stimulating electrode. Low-threshold C fibers are not present in humans, so it is presumed that the poly-modal C fibers and possibly the A8 (smallest diameter of the largest fibers) nociceptive fibers would be activated in humans during electrical stimulation. These two types of human receptors have been shown to have similar properties. Pfeiffer [21], in a 1968 review article, noted that according to the studies performed by Von Frey, a small electrode causing a localized current density is best for stimulating individual receptors. This type of electrode configuration typically results in stinging pain, a temperature sensation, or a vibrating pressure sensation, depending on the type of receptor that is stimulated. When a large area is stimulated, the sensation is that of hammering or buzzing, and it is felt to be deep in the skin and non-painful, until higher intensities. The notion of gating is commonly invoked to explain the flow of different types of information in the somatosensory system. It has been hypothesized that there is a pain gate that modulates the input of nociceptive information at the level of the spinal cord, and that tactile input inhibits information transmission from nociceptors. Apkarian et al. [23] examined the reciprocal form of control; they found that painful heat can reduce tactile sensitivity (i.e., increase tactile sensory thresholds), suggesting the presence of a touch gate. The increased thresholds were not due to shifts in attention, as auditory thresholds were unaffected by the heat pain. Kauppila et al. [24] also investigated the possibility of a touch gate. They found that pain causes an increase in the mechanoreceptive fields, which results in a loss of two-point discrimination. If there is a touch gate and electrical stimulation excites nociceptors, then there may be an inhibitory effect, depending on the extent of the excitation of the nociceptors. These receptors generally have high thresholds, but they are superficial and may be excited by proximity alone [22]. Many aspects of these studies are relevant to the development of noise-based sensory prosthetics. For instance, according to the work of Garnsworthy et al. [22], Von Frey [21] and Adrian [21], the location of nerve excitation (receptor or nerve
Electrical stimulation of the somatosensory system
7
track) is dependent on the configuration of the stimulating electrode. Localized current (e.g., from a needle electrode) excites the receptor endings that favor excitation of superficial receptors; a wider electrode configuration favors excitation of nerve tracks. Since it is possible that there is inhibition of mechanoreception with nociceptive excitation, it is important, in the context of noise-based sensory prosthetics, to ensure that the employed electrical stimulation does not excite the superficial nociceptors.
3.2. Convergence of electrical and mechanical stimulation Vernon [25] and B6k6sy [26], in the early 1900s, independently examined the effects of electrical stimulation on mechanoreception. Vernon specifically considered the effects of a subthreshold periodic electrical stimulus on vibrotactile detection thresholds. In Vernon's experiments, the vibrotactile stimulus and electrical stimulus were of the same frequency. He found that if the electrical and mechanical stimuli were in phase, then the detection threshold was lower than that for the mechanical vibration alone; if the two stimuli were out of phase, then the detection threshold was not significantly different from that for the mechanical vibration alone. B~k6sy, after performing similar experiments, suggested that the interaction of the two modalities (i.e., electrical stimuli and mechanical stimuli) may be through the nerve tracks and not at the end organs. He also speculated that since the two types of stimulation travel at different speeds through the affected area (the electrical stimulus moving faster than the mechanical stimulus), there may be a spatially varying phase delay between the two signals. B6k~sy also found that if the electrical stimulus and mechanical stimulus are presented at the same sensation magnitude, then the sensation area for the mechanical stimulus is larger. In addition, contrary to Vernon's results, B6k~sy found that the electrical stimulus could almost fully cancel out the mechanical sensation. One of the basic ingredients necessary for SR-type effects is that of information summation, i.e., the noise signal needs to add in some way (not necessarily directly) to the signal of interest. For noise-based sensory prosthetics, this implies that the electrical stimulation needs to enter the system close to the area where the transduction of the mechanical stimulus occurs. Thus, stimulation along the nerve track may not lead to functional enhancement unless the detection task requires central processing. Typically, though, for low threshold receptors such as the PC, a single afferent firing can generate a psychophysical response. An action potential in a sensory neuron occurs when summation of local potentials in the nerve endings exceeds the receptor's threshold. Electrical stimulation may only enhance detection in areas of local potential summation because a subthreshold stimulus may not conduct any information along the nerve track. A limited number of studies on PCs have examined the interactions between local potentials arising from mechanical stimuli and antidromic electrical stimulation [27,28]. The objective of these studies was to identify the site of action potential generation. It was found that in a decapsulated PC, an electrical signal sent antidromically (i.e., toward the receptor) can summate with subthreshold activity from a
8
K.A. Richardson and J.J. Coffins
mechanical stimulus [27]. It was also found that an antidromically initiated action potential can cause depression of the nerve terminal, leading to an increase in the receptor's threshold to mechanical stimuli [28]. These studies indicate that depolarization of a receptor's terminal parts arising from electrical stimulation can interact with end-organ activity. It is clear from previous studies that the cutaneous mechanoreceptive system is tuned to derive certain information from mechanical stimuli. It is not understood, however, exactly how this system responds to electrical stimulation. It is not completely known what types of receptors are excited, where they are excited, and what kind of information is introduced into the system with electrical stimulation. Vernon and B6k6sy generally found that the interaction between electrical stimulation and mechanical stimulation is not simply linear summation. Gray [27] and Hunt et al. [28] indicate that there is indeed local summation of potentials when signals from both modalities meet near the nerve terminal. These studies indicate further work is needed to understand the effects of electrical stimuli on mechanoreceptors. We have initiated computational studies to begin to address this important issue. In the next section, we discuss the modeling background and some of our preliminary results.
4. Computational studies 4.1. Modeling basics 4.1.1. Membrane dynamics Information in the nervous system propagates as a change in membrane voltage down an axon. The axon membrane can be described by a simple electrical circuit composed of capacitors and resistors, as shown in Fig. 2. The stimulating current either flows through the ion channels, modeled as conductances, or it charges the capacitance of the membrane. By solving Kirchoff's law, the following expression for the time-varying change in membrane voltage can be derived: d V/dt = (--lio n "-[- Iapp)/C m. The membrane can be modeled either as a passive system with a
lapp
T! ',on G~ Vr
Cm
1 Fig. 2. Circuit model of neural membrane. This model shows the circuit elements that represent the membrane components. The conductance (Gin) represents the ion channels in the membrane and the capacitance (Cm) is an intrinsic property of the membrane.
Electrical stimulation of the somatosensory system
9
constant conductance or as an active system with a nonlinear conductance. In the axon of excitable neurons, the ion channels are voltage-gated and are therefore active. When there is a change in the membrane potential, the gates open or close, allowing or disallowing the flow of ions, particular to the channel. In the 1950s, Hodgkin and Huxley provided the first mathematical description of this membrane behavior that accounted for nonlinear conductances [29]. The parameters used in the model for each of the ion channels were derived from fits to experimental data. 4.1.2. E x t e r n a l excitation
Recent computational studies on external excitation of sensory neural fibers have used a modified version of the Hodgkin-Huxley fiber model - the FrankenhaeuserHuxley (FH) m o d e l - to describe the excitable membrane of nerve fibers. Frankenhaeuser and Huxley [30] modeled the membrane as having four major ion channels. In this model, the initial depolarization is due to the influx of sodium ions; potassium ions are responsible for the repolarization of the neuron; a delayed depolarizing current from nonspecific ions is largely mediated by sodium concentrations; and lastly, there is a small repolarizing leakage current that is a linear function of membrane voltage. McNeal [31], in an attempt to model external neuronal stimulation, proposed a model wherein ions could pass through the membrane at specific areas (called nodes) between the insulating myelinated sections. The external stimulation, in this case, was an electric field set up by a current point source located a certain distance above one of the nodes (Fig. 3). McNeal examined subthreshold activity just prior to action potential initiation so that he only modeled the node directly under the electrode with F H dynamics. All other nodes were modeled as passive membranes, as subthreshold activity does not cause large enough depolarizations at these nodes to affect their conductances. Reilly [32] extended this system to include F H conductances at each of the nodes so that properties of the external stimulation could be chosen arbitrarily. He then derived a mathematical equation for the time-varying
lapp
-3
f m ~ ' ~ . i j ,,,~ ~ ! , - 2 ~
~
nodes'
-1
0
,tlF~mm~ ,~:~ ~ .
1
~.~.:~ .
2
~:~-i~~ . . i , ,
3
~...7~I~~1
myelin
Fig. 3. Fiber model used by McNeal [31] and Reilly [32]. The external stimulation is introduced above node 0. Each node is separated by insulating myelin sheaths. In the McNeal model, only node 0 is modeled as active membrane; in the Reilly model, all nodes are modeled with FH nonlinearities.
10
K.A. Richardson and J.J. Collins
change in membrane voltage by applying Kirchoff's law to the model. The exact form of the resulting equations is presented in Section 4.2.
4.1.3. Activating function Rattay [33-35], using the same model as Reilly, noticed that the effects of the external stimulation are built into the resulting equations for myelinated fibers as the second difference function f = V~.,_l- 2V~., + V~.,+l (see Eq. (7) in Section 4.2.2). This so-called activating function is a good approximation of how well a stimulus can excite the myelinated fiber. The function f depends on the field set up by the electrical stimulation and the membrane properties of the target fiber. The shape of the activating function for a myelinated fiber with a point current source directly above one node is shown in Fig. 4.
4.2. Our computational model As noted above, the voltage difference across the membrane is related to current flow in the system. Current can flow through ion channels, charge the capacitance of the membrane, and when considered in an extended system such as a fiber, flow axially along the axon. The model used for the present studies consists of a modified version of the spatially extended nonlinear node (SENN) model first developed by Reilly [32]. For these studies, we have added a specialized receptor ending to the model. A diagram of the fiber of interest is given in Fig. 5. The receptor (node 0) is modeled as an equipotential sphere with a passive membrane; all other nodes are modeled with FH-type dynamics and are considered to be cylindrical in shape.
4.2.1. Receptor dynamics Typically, mechanoreceptors are thought to have stretch-gated ion channels in their terminal membrane. A mechanical stimulus creates voltage changes across the membrane as the channels open and close in response to the stimulus. These gates (a) 200
(b) ~oo
150
5O
/
100
> E 50 o
~, o g -5o *
-50 -100
" ~ -100 -150
-;-2-'1
6
node
i
2
;
4
-200
- 4 ;_ - 2i
_11
;i
;4
node
Fig. 4. Activating function. The spatial variation of the activating function (f) is shown. (a) Excitation with negative polarity. The solid-line trace (with symbol o) shows lap p ~--- --0.1 mA, and the dotted-line trace (with symbol A) denotes lap p -- --0.05 mA. (b) Activating function for positive polarity stimulation. The solid-line trace (with symbol o) shows lap p - - 0 . 1 mA, and the dotted-line trace (with symbol A) denotes lap p "-- 0.05 mA.
Electrical stimulation of the somatosensory system
M
c2
11
Ve,1
e , 0
xo axon
c', " ~ V r
V
r
Vi,0
Ve,2
G a
,
yeli
Cm'~~V;
Ga
V',2
Myelin
Fig. 5. Neuron model. The end of the mechanoreceptor (~,0) is modeled as a passive, equipotential sphere with membrane parameters C~ c and G~c. The myelin sheaths between each node are considered electrically insulating. All nodes n r 0 (l~,,) are modeled as excitable cylindrical-shaped membranes with parameters Cax~ and G~ ~ The value of G~ ~ however, is a nonlinear function of time and membrane voltage. Ve,, is the value of the external voltage field induced by the electrical stimulation. The parameter Ga is the axial conductance. operate relatively independently of the receptor voltage and thus the membrane is considered to be electrically passive. The circuit describing this node is shown in Fig. 6. Note that the receptor is the terminal end of the neuron and axial current only flows down the axon (I+, to the right in the diagram). The equation for the sum of the currents through the model is:
/C +/ionmY§
(1)
Since the receptor membrane is electrically passive, the current/ion is modeled with a linear conductance, G~ c. Both the expressions for the ionic current and the axial current are derived from Ohm's law, V = IR, where R is the reciprocal of conductance. The current across the capacitor obeys the constitutive relation for capacitance, I = CdV/dt. The resulting expression for the equivalence of currents is - C ~ c d V o / d t - G~CVo = Ga(~,o - ~,1),
(2)
where V0 is the voltage across the membrane of node 0, C~ c the membrane capacitance, Ga the axial conductance, I~,0 and Vi,1 the internal voltages at nodes 0 and n, respectively, and G~ c is the conductance across the membrane. By noting that ~,, = Ve,, + V~ for all n, the final expression becomes dVo/dt = (1/C~C)[-Ga(-Vo + V~ - Ve,o+ Ve,Z) - G~cvs].
(3)
The value of Ve,, is governed by the external excitation and is related to the shape of the electrode as well as the electrode distance (from the point of interest). The neural transduction of mechanical stimuli is a complicated process involving the mechanical properties of the receptor and specialized ion channels on the
12
K.A. Richardson and J.J. Coll&s
Me,0
T c
~j~l r C ~--4
i t------o
Vi,0 Ga Vi,1 Fig. 6. Circuit diagram for the receptor. A voltage difference across the membrane causes current to flow through the passive conductance, ( T r e c and allows the membrane capacitance, C reC to be charged. These two currents sum to become the axial current, I+, which flows down the axon to node 1 (1~.1). "~" IYl
11
'
'
membrane. We will not include the transduction process in our modeling in order to simplify the generation of local receptor potentials set up by mechanical stimuli; instead, we will simply model the mechanical stimulus as a discrete pulse of a local voltage potential, spatially confined to the receptor node. For the present computational studies, the mechanical waveform will be a square voltage pulse. Eq. (3) then becomes d V o / d t - (1/Cr_mC)[-Ga(-V o + Vl - Ve.o +
Ve.1)
G~cv0 + mstim].
-
(4)
4.2.2. Fiber d y n a m i c s
The end of the receptor is modeled as being the site of mechanical transduction, and therefore it does not behave as active membrane. Local potential information from node 0 leaks to the first node of the axon; it is here that the model allows for the initiation of an action potential. At this node, and all other nodes n, where n -r 0, the membrane is active and contains a number of nonlinear conductances. The detailed circuit diagram for these nodes is shown in Fig. 7. The expression for the current flow across the membrane, which is derived from solving KirchotFs law, is IC ++++lion - I _
+l+.
(5)
With the circuit elements for each current at node n, the expression becomes C ax~ d V , / d t - / i o n
--~ITI
--
G a ( ~ n - ~n-1) + G a ( ~ . - V,,+I) ~
~
~
~
"
(6)
Electrical stimulation of the somatosensory system
13
V e,n
T
G,
IL---* V l,on Vi,n_ 1 G a Vi,n
G. Vi,n+l
Fig. 7. Circuit diagram for all active nodes. Current induced by the charged membrane capacitance, Ic, adds to the current traveling through the four ion channels. This summed current then can travel either up the axon to node n - 1 or leak down the axon to node n + 1 through the axial conductance Ga.
By again using ld,. = Ve,n + Vn, we obtain d G / d t = (1/cax~
1 - 2 V~ + G+I + Ve,~-I - 2 Ve,, + ge,n+l) - l i o n ] ,
(7)
where lion = gdl ( iNa q- iK -}- ip -}- iL).
(8)
The sodium current, iNa, is responsible for the initial depolarization of the membrane. The potassium current, iK, is responsible for the hyperpolarization of the membrane. The nonspecific current, ip, is a delayed depolarizing current carried mostly by sodium ions, and the leakage current, iL is another hyperpolarizing current. The first three channels are modeled with nonlinear gating functions; the last channel, responsible for the leakage current, is modeled with a passive membrane. The F H expressions for the gated ion channels and the model parameters used in our simulations are included in Appendix A. A total of nine nodes, including the receptor (node 0), were solved in the simulations. The last node, node 8, was held fixed at d V / d t = 0 for the end boundary condition. 4.3. Po&t current source excitation The expression for the activating function is given in Eq. (7) as f = Ve,n_ 1 2Ve,. + Ve,.+l. The term Ve,. represents the field induced by a current source at node n. Node 0 is considered to be the origin of the fiber axis. For a point source, the voltage field set up by the electrical stimulation is given by the expression
14
K.A. Richardson and J.J. Collins
Ve.,, =
Pelapp
(9)
4 ~ ~Z2' _.[_(Xe _ IlL) 2 '
where Pe is the resistivity of the volume conductor surrounding the fiber, lapp the current source, z the distance between the electrode and the fiber, xe the distance along the fiber, n the number of the node of interest, and L is the internodal distance. The electrical excitation must create a depolarization as close as possible to the receptor site in order that such changes can add constructively with the depolarizations set up by the mechanical stimulus, i.e., regions of depolarization in the activating function (see Fig. 4) must be near the depolarizations resulting from the mechanical stimulus. One of the first modeling predictions that we can make is that electrical stimulation with different polarities results in different system responses, depending on the location of the stimulation. In this study, the mechanical stimulus was modeled as a depolarizing square voltage pulse (duration: 1 ms), introduced at the receptor (node 0). We also included a pulsed current signal (of 1 ms duration) that was assumed to be applied simultaneously by a stimulating electrode. The electrode was held at a perpendicular distance of 0.1 cm above the nerve fiber (z = 0.1 cm). The distance along the fiber (xe) was allowed to vary from - 0 . 6 (behind the receptor, on the opposite side as the axon), to 0.6 cm (over the axon, at node 3). (The location Xe - 0 cm is directly above the receptor). Note that in our simulations, the internodal distance is taken to be 0.2 cm. The amplitude of the electrical stimulus was allowed to vary from -0.1 to 0.1 mA. The minimum amplitude of the mechanical depolarization necessary to generate an action potential (the threshold) was found as a function of the electrode position and the amplitude of the electrical stimulus. The results are shown in Fig. 8. We found that when the polarity of the electrical stimulus is negative, there is an enhancement effect (i.e., the receptor threshold is reduced) only when the electrode is located over the axon (see Fig. 8). The threshold is decreased in this case because the center depolarization caused by the negative stimulation (see Fig. 4a) can add to the depolarization resulting from the mechanical stimulus. If, however, the electrode is shifted to the other side of the receptor (not over the axon), the threshold is increased. This change in threshold is due to the outer hyperpolarizing lobes of the electrical stimulation being located next to the receptor, which suppresses the depolarization resulting from the mechanical impulse. The hyperpolarizing and depolarizing regions of the electrical stimulation are not of equal strengths, and thus the changes in threshold due to each of these regions are not symmetric. We found the opposite effect when the polarity of the electrical stimulus is positive. Specifically, we found that there is an enhancement effect (i.e., the receptor threshold is decreased) when the electrode lies away from the fiber, and there is a detrimental effect (i.e., the receptor threshold is increased) when the electrode is placed over the axon (see Fig. 8). The enhancement effect is present when the outer lobes of excitation, which are depolarizing for positive excitation (see Fig. 4b), are located near the receptor. The increases in threshold, then, are seen when the hyperpolarizing region, directly below the electrode, is located near the receptor and
Electrical stimulation of the somatosensory system
15
'...
0.8
'....
1,, ~ " 0.5
vE
0.6
; %/,,
0.4
o
0.2
o c.. (a - 0 . 5 (D (.<:1
-1,,
-0.2
-10.1"5"~,
-0.4 9 .
0.05 ..
Iapp (mA)
0
"
......."
9 i...'"'
. "
. ""'.
~
. "
..-"":
> I
"
".
~
-0.1
..
.f,:-
_5><~ -0.4
-o.2
...
..
0.2 -
o
""......
...
0.4
-0.6 0.6 -0.8
x (cm) e
Fig. 8. Change in threshold for a point current source. Plotted here is the change in mechanical threshold as a function of the amplitude and location of the electrical stimulation. The threshold is decreased in the front right region, where the electrical stimulation is of negative polarity and located above the fiber, and the back left region, where the electrical stimulation is of positive polarity and located away from the fiber. The threshold is increased in the back right region, where the electrical stimulation is of positive polarity and located above the fiber, and the front left region, where the electrical stimulation is of negative polarity and located away from the fiber.
acts to suppress any activity that occurs there. Again, these changes in threshold are not symmetric because the hyperpolarizing and depolarizing regions of the electrical stimulation are not of equal strengths.
4.4. Circular electrode excitation Wiley and Webster [36], motivated by a desire to gain an understanding of burns suffered by patients undergoing electro-surgery, developed a model of the voltage field induced under circular electrodes. The model is basically that of a dispersive electrode with a circular surface directly in contact with a conductive medium. The skin, in which the receptors and fibers lie, is considered to be a homogeneous conductive medium. To find an expression for the field induced by such an electrode configuration, one must find the solution of v Z v = 0, subject to the boundary conditions V-V0
for
6V/6z-O V~0
for
z-O, for
z-0,
r~+oo,
r<_a, r>a, z~-oo.
and
(lo)
16
K.A. Richardson and J.J. Collins
The solution, found by Wiley and Webster, is
V,rz 2 ~ = ~
)
[(r-- a) 2 -t- z2] 1/2 + [(r 4- a) 2 4- z2] 1/2
(11)
for z -r 0. In the above expression, V0 is the voltage on the equipotential surface of the electrode, a the radius of the electrode, r the radial distance from the center of the electrode, and z is the perpendicular distance from the electrode (z = 0 is the surface of the electrode). The activating function, as a function of electrode radius, that results from this type of stimulation is shown in Fig. 9. Note that the area of excitation increases as the size of the electrode increases. This effect is also apparent in Fig. 10. Figure 10a shows the change in threshold for a small electrode (a = 0.05 cm). The decreases in threshold are considerable only when the center of the electrode is close to the receptor; this finding holds for both positive and negative polarities. This behavior is similar to that seen for a simple point source electrode (Fig. 8). Figure 10b clearly shows that the area of beneficial excitation is increased for a larger electrode (a --- 0.25 cm).
40
~a=
II a = O.05cm
O. 15cm
20
0
--
--
7.
207
0
40[-
~ a =
-
70.25cm
-7 40
0 a = 0.35cm
20
2O
_~~ -7
0
7
-207
node Fig. 9. Activating function for circular electrodes. The activating function, which is a measure of the relative excitability of the membrane to an external stimulus, is shown as a function of the location and size of a circular electrode. The center of the electrode is placed over node 0. The electrode widths, shown at the top of each plot as a black rectangle, are in proportion to the nodal distances. Electrode size increases from left to right and top to bottom.
Fig. 10. Thresholri changes for sm:ill : ~ n dI;lrgr electrodes, ( ; I ) Change in threshold for :in clcctrodc of r:idius 11 = 0.05 cnl a f ~ ~ n c t i oofn s t i m ~ ~ l ustrength s itnd elt.ctrodt. position. The p ~ s i t i o n.I-, is the 1oc:ition of the center nf the electrode :11ons the asis of the ason, with s, = ( I heing located ilirectl!. :~hovcthc rcccptor. ( h ) C1i;tngr in thrrshold for ;in electrode of r:udiu< a = 11.25 cm :is :I Ihnction of stimulus strength iind electrode pclsitinn.
:IS
18
K.A. Richardson and J.J. Collins
4.5. Electrical noise e x c i t a t i o n
The above findings have ramifications for the use of noisy electrical signals to stimulate the somatosensory system, e.g., in the context of noise-based sensory prosthetics. For instance, these findings suggest that when an electrode is placed over an axon and the excitation is a bipolar noisy waveforrn (e.g., a zero-mean Gaussian noise signal), then the threshold can be either increased or decreased at any given time. To explore these points further, we conducted a series of computational studies in which we applied noisy electrical signals, using both the point source electrode model and the circular electrode model. For each case, we varied the location and amplitude of the electrical stimulus. For each condition, we conducted 10 trials and determined the mean and standard deviation of the receptor threshold for each set. We found that with a point source electrode, a significant decrease in threshold can be obtained with an electrical noise signal primarily if the electrode is placed over node 1 (Xe = 0.2 cm). The mean and standard deviation of the change in threshold is shown in Fig. 1 l a. While node 1 is the best location for this electrode configuration, decreases in threshold are also found at other locations. However, the standard deviation of these threshold changes are large enough to include a zero change in threshold, which implies that for some trials, the threshold is increased rather than decreased. The computational results shown in Fig. 11 a indicate that a point current source electrode is relatively ineffective in causing mechanical threshold changes as only a small range of current amplitudes and electrode locations show significant reductions in threshold. At low current values, there is a small decrease in the threshold mean, but the standard deviation is large, which implies that threshold increases are seen during some simulations. At intermediate current values, the mean threshold
(a) 0.3
(b) 0.3
0.0
0.0
-0.3
-0.3
.=. o -0.6 ,.E: (1.2 ,.= -0.9 ,i..a
"6 -o.6 -0.9 .,,,.a
-1.2 -1.5 0.0(
"~ i
i
i
i
i
0.02
0.04
0.06
0.08
0.10
lap p ( m A }
0.12
-1.2 .00
0.04
0.08
O. 2
0.16
0.20
0.24
lap p {Ill_A.}
Fig. 11. Mean threshold changes for a point source electrode over node 1. The change in threshold (from a no-noise threshold value of 1.354 mV) is shown as a function of the applied current (/app). (a) Mean and standard deviation of 10 threshold determinations for a noise signal with a sampling frequency of 10 kHz. (b) Mean and standard deviation of 10 threshold determinations for a noise signal with a sampling frequency of 100 kHz. The asterisks placed on the graphs indicate the number of times, out of the 10 simulations, the threshold went to zero, i.e., when the fiber responds to the electrical stimulation alone.
Electrical stimulation o f the somatosensory system
19
is decreased and the standard deviation is small enough so that an increase in threshold is rarely seen. At higher values, the mean threshold change remains significant, but the fiber now begins to fire from electrical excitation alone (at lap p -- 0.11 mA). The computational results for an electrical noise signal with a faster sampling frequency (100 kHz) are shown in Fig. 1 l a. Firstly, note that a larger range of current values (up to lapp - 0.21 mA) can be applied before the fiber responds to the electrical stimulation alone. This effectively stretches out the curve on the left so that a larger region of stimulus amplitudes can cause significant decreases in threshold. The magnitude of the change in threshold also increases considerably: the mean threshold change for Iapp --0.19 mA and a sampling frequency of 100 kHz is 0.90 (66% of the no-noise threshold), whereas the mean threshold change for lapp-0.09 mA and a sampling frequency of 10 kHz is only 0.70 (52% of the no-noise threshold). In addition, with a faster sampling frequency, small but significant changes in threshold can be found at other electrode locations (xe = -0.2, 0 cm), for intermediate and large current amplitudes. Consistent with the results described in Section 4.4, we found that the circular electrode, with a noisy electrical stimulus, is superior to the point source electrode in causing reductions in threshold. With a circular electrode of radius a - 0.2 cm, we found significant changes in threshold over a wider range of locations both away from the fiber (xe = -0.2 cm) and over the fiber (xr - 0 . 0 - 0 . 3 cm). We also found that a larger range of stimulus amplitudes can be used to reduce the threshold significantly. These computational studies provide a series of novel predictions that can be tested in neurophysiological and psychophysical studies. The information derived from such studies could be incorporated into the optimal design of noise-based somatosensory prosthetics. Appendix
A
The expression for the ionic currents, as shown in Eq. (7), is /ion = ~ d l ( i N a -k- iK + ip + iL).
The mathematical representation for each current is as follows iNa = PNahm2(EF 2/RT) [Na]~ - [Na]i Es
1-
iK -- P~zn2(EF 2/RT) ip
-
-
iL -
[K]o -
[K]i
eEF/RT
e EF/RT
1 -- eEF/RT
'
Pp p2 (EF 2/RT) [Na]~ - [Na]i eEF/RT 1 - eEF/RT gL(V.
-
vL),
K.A. Richardson and J.J. Coll&s
20
where E - V~ + V~. The variables h, m, n and p represent nonlinear gating of the channels. The dynamic equations describing these variables are Otm(1
dm/dtdh/dt
-
-
-
m ) - ~m m
Oth ( 1 -- h) - ~hh
or,,(1 - n) - 13,,n
dn/dt-
-- Otp(1 - p ) - ~pp,
dp/dt
where 0.36(V~ - 22) (~m
2 2 - t}~
[1 - e( ---~---)1 0 . 1 ( - 1 0 - V~) 0t h - -
[1 - e(~)] O.OI(V~ - 35) (~n
--
[1 - e ("-~" ~)] __
0.006( V~ - 40) 4o-vn
[1 - e(-w-)] 0.4(13V,) [1 [Vn-133
~m
-- e,-r~,]
4.5 13h-
45-Vn
[1 + e(-w-)] 0.05(10 - V~)
[1
Vn-IO .
-
e(~)]
0 . 0 9 ( - V~ - 25) 25+Vn .
[1 - e(-rr
l
The values of the above expressions were determined empirically by F r a n k e n h a e u s e r and Huxley [30]. The initial conditions used in our simulations were: V~(O) = 0 re(o)
=
for all n
0.0005
h(0) = 0.8249 n(O) -- 0.0268 p(0) = 0.0049.
Electrical stimulation of the somatosensory system
21
Membrane and medium parameter values Constant Pi
Pe Cm
gm l L/D
a/D Vr Oaxon Orec
Value
Description
110~ cm 300 ~ cm 2 gF/cm 2 30.4 mmho/cm 2 2.5 gm 100 0.7 -70 mV 0.002 cm 0.01 cm
Resistivity of axoplasm Resistivity of external medium Membrane capacitance per unit area Membrane conductance per unit area Nodal gap width Ratio of internodal length to fiber diameter Ratio of axon diameter to fiber diameter Resting membrane potential Fiber diameter Receptor diameter
Ionic current parameter values Constant
Value
Description
/Y~Na e~
8 • 10 -3 cm/s 1.2 • 10 -3 cm/s
Pp gL VL [Na]o [Na]i [K]o [K]i F R T
0.54 • 10-3 cm/s 30.3 mmho/cm 2 0.026 mV 114.5 mM 13.7 mM 2.5 mM 120 mM 96514.0 C/g/mole 8.3144 J/K/mole 295.18 K
Sodium permeability Potassium permeability Non-specific permeability Leak conductance Leakage current equilibrium potential External sodium concentration Internal sodium concentration External potassium concentration Internal potassium concentration Faraday's constant Gas constant Absolute temperature
References 1. Chiou-Tan, F.Y., Magee, K.N., Robinson, L.R., Nelson, M.R., Tuel, S.S., Krouskop, T.A. and Moss, F. (1996) Int. J. Bifurc. Chaos 6, 1389-1396. 2. Collins, J.J., Imhoff, T.T. and Grigg, P. (1996) J. Neurophysiol. 76, 642-645. 3. Cordo, P., Inglis, J.T., Verschueren, S., Collins, J.J., Merfeld, D.M., Rosemblum, S., Buckley, S. and Moss, F. (1996) Nature 383, 769-770. 4. Douglass, J.K., Wilkens, L., Pantazelou, E. and Moss, F. (1993) Nature 365, 337-340. 5. Gluckman, B.J., Netoff, T.I., Neel, E.J., Ditto, W.L., Spano, M.L. and Schiff, S.J. (1996) Phys. Rev. Lett. 77, 4098-4101. 6. Levin, J.E. and Miller, J.P. (1996) Nature 380, 165-168. 7. Morse, R.P. and Evans, E.F. (1996) Nat. Medi. 2, 928-932. 8. Pei, X., Wilkens, L.A. and Moss, F. (1996) J. Neurophysiol. 76, 3002-3011. 9. Simonotto, E., Riani, M., Seife, C., Roberts, M., Twitty, J. and Moss, F. (1997) Phys. Rev. Lett. 78, 1186-1189. 10. Collins, J.J., Imhoff, T.T. and Grigg, P. (1996) Nature 383, 770. 11. Collins, J.J., Imhoff, T.T. and Grigg, P. (1997) Phys. Rev. E 56, 923-926. 12. Richardson, K.A., Imhoff, T.T., Grigg, P. and Collins, J.J. (1998) Chaos 8, 599-603.
22
K.A. Richardson and J.J. Collins
13. Bear, M.F., Conners, B.W. and Paradiso, M.A. (1996) Neuroscience: Exploring the Brain, Williams and Wilkins, Baltimore. 14. Odland, G.F. (1991) in: Physiology, Biochemistry, and Molecular Biology of the Skin, ed. L.A. Goldsmith, pp. 3-62, Oxford University Press, New York. 15. Miller, M.R., Ralston, H.J. and Kasahara, M. (1959) in: Advances in Biology of Skin, Vol 1.: Cutaneous Innervation. Proceedings of the Brown University Symposium on the Biology of Skin, ed. W. Montagna, pp. 1-47, Pergamon Press, Oxford. 16. Greenspan, J.D. and LaMotte, R.H. (1993) J. Hand. Ther. 6, 75-82. 17. Bell, J., Bolanowski, S. and Holmes, M.H. (1994) Prog. Neurobiol. 42, 79-128. 18. Johansson, R.S. and Vallbo, ,~.B. (1979) J. Physiol. 297, 405-422. 19. Halata, Z. (1975) The mechanoreceptors of the mammalian skin, eds A. Brodal, W. Hild, J. van Limborgh, R. Ortmann, T.H. Schiebler, G. Wiirzburg and E. Wolff, Advances in Anatomy, Embryology and Cell Biology, Vol. 50, Fasc. 5, Springer, Berlin. 20. Lynn, B. (1991) in" Physiology, Biochemistry, and Molecular Biology of the Skin, ed. L.A. Goldsmith, pp. 779-815, Oxford University Press, New York. 21. Pfeiffer, E.A. (1968) Med. Biol. Eng. 6, 637-651. 22. Garnsworthy, R.K., Gully, R.L., Kenins, P. and Westerman, R.A. (1998) J. Neurophysiol. 59, 11161127. 23. Apkarian, A.V., Stea, R.A. and Bolanowski, S.J. (1994) Somat. Mot. Res. 11, 259-267. 24. Kauppila, T., Mohammadian, P., Nielson, J. Anderson, O.K. and Arendt-Nielson, L. (1998) Brain Res. 797, 361-367. 25. Vernon, J.A. (1953) J. Exp. Psych. 45, 283-287. 26. B~k~sy, G. (1959) J. Acoust. Soc. Amer. 31,338-349. 27. Gray, J.A.B. (1959) in" Neurophysiology, ed. H.W. Magoun, Handbook of physiology, Sec. l, pp. 123-145, American Physiological Society, Washington. 28. Hunt, C.C. and Takeuchi, A. (1962) J. Physiol. 160, 1-21. 29. Hodgkin, A.L. and Huxley, A.F. (1952) J. Physiol. 117, 500-544. 30. Frankenhaeuser, B. and Huxley, A.F. (1964) J. Physiol. 171, 302-315. 31. McNeal, D.R. (1976) IEEE Trans. Biomed. Eng. 23, 329-337. 32. Reilly, J.P. (1985) IEEE Trans. Biomed. Eng. 32, 1001-1011. 33. Rattay, F. (1986) IEEE Trans. Biomed. Eng. 33, 974-977. 34. Rattay, F. (1988) IEEE Trans. Biomed. Eng. 35, 199-202. 35. Rattay, F. (1999) Neuroscience 89, 335-346. 36. Wiley, J.D. and Webster, J.G. (1982) IEEE Trans. Biomed. Eng. 29, 381-389.
CHAPTER 2
Phase Synchronization" From Periodic to Chaotic and Noisy L. S C H I M A N S K Y - G E I E R
V.S. A N I S H C H E N K O
Institut ffir Physik, Humboldt-Universit~'t zu Berlin, Invalidenstr 110, D-lOll5 Berlin, Germany
Nonlinear Dynamics Laboratory, Department of Physics, Saratov State University, Saratov 410026, Russian Federation
A. N E I M A N Center for Neurodynamics, University of Missouri at St. Louis, St. Louis, MO 63121, USA
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
23
Contents 1.
Introduction
2.
Synchronization in aperiodic stochastic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................
27
25
2.1. Synchronizing the stochastic Schmitt trigger . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.2. Noise-enhanced synchronization of excitable systems 3.
.....................
Synchronization of periodic oscillations - the classical theory
...................
32 34
3.1. Amplitude and phase description of synchronization in periodically oscillating systems
..........................................
34
3.2. Resonance in periodically driven linear dissipative oscillators 3.3. Nonlinear oscillators: the Van der Pol oscillator
................
........................
39
3.5. Synchronization as phase and frequency locking . . . . . . . . . . . . . . . . . . . . . . . .
42
3.6. Mutual synchronization: two coupled Van der Pol oscillators 4.1. Langevin equation description
................
5.1. Phases in the analytic signal representation 5.2. Periodically driven chaotic systems
48
...............................
Synchronization of systems with complex dynamics
46 47
...................................
4.2. F o k k e r - P l a n c k equation description
6.
38
3.4. Bifurcation analysis of synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Synchronization in the presence of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.
36
49
.........................
51
...........................
51
................................
54
5.3. Synchronization of stochastic systems with continuous output . . . . . . . . . . . . . . . .
56
5.4. Noise-enhanced synchronization of excitable media
60
Synchronization in stochastic point processes
......................
.............................
66
6.1. Phases for discrete events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Synchronization of noisy bistable systems by stochastic signals
66 ...............
6.3. Analytical approach to phase synchronization of bistable systems Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
.............
68 74 79 80
I. Introduction
One of the fundamental nonlinear phenomena observed in nature and, particularly, one basic mechanism of self-organization of complex systems [1] is synchronization. From the most general point of view synchronization is understood as an adjustment of some relations between characteristic times, frequencies or phases of two or more than two dynamical systems during their interaction. Synchronization has attracted much attention in different fields of natural sciences. For instance, applications of synchronization in engineering sciences [2] have achieved great practical importance and are widely employed. Moreover, and specifying to biophysics, several kinds of synchronization have been observed in biological systems. We would like to mention here, the behavior of cultured cells [3], of neurons [4,5,16] and of biological populations [6]. More complex types of synchronization have been reported recently for the human cardio-respiratory system [7] as well as in magnetoencephalography [8]. Synchronization of regular, chaotic and stochastic oscillations have been reported for ensembles of interacting oscillators [9-15]. It has resulted in a saturation of the growth of the attractor dimensions in arrays of oscillators and led to the appearance of stable spatio-temporal structures. The classical theory of synchronization distinguishes between forced synchronization by an external periodic driving force, and mutual synchronization between coupled oscillators. In both cases manifestations of synchronization are the same. They are determined by an interplay of time scales by: 9 phase locking or, respectively, natural frequency entrainment or due to 9 suppression of inherent frequencies. From the mathematical point of view the theory of synchronization of periodic selfsustained oscillators is well established [2,17,18]. If ~(t) is the phase of a periodic oscillator and ~(t) is the phase of external periodic force or, otherwise, the phase of another periodic oscillator coupled with the first one, then the condition of synchronization can be formalized as (1)
Ira. ~(t) - n. V(t)J < const,
where m and n are integers. This condition defines locking of two phases ~(t) and W(t) and requires that the phase difference should be bounded function in time. In other words, periodic oscillations should be in phase. Synchronization is also defined as frequency entrainment, providing that the frequencies of the oscillator and the driving force are in rational relation. In this paper we mostly consider the simplest case of 1:1 synchronization (m -- n = 1). More complicated synchronization regimes are discussed in paper [19] published in this book.
25
26
L. Schimansky-Geier et al.
Synchronization of periodic self-sustained oscillators in the presence of noise was studied in detail by Stratonovich [20]. The theory shows, that noise acts against synchronization in a sense that under the noise influence synchronization occurs only for a limited period of time. Recent developments of nonlinear dynamics have opened new aspects in the theory of synchronization. From the modern point of view, the regular (e.g. periodic or quasi-periodic) oscillatory regimes are only a small fraction of all possible types of dynamical behavior. With the increase of a degree of nonlinearity and the dimension of the phase space of the forced or the mutually interacting dynamical systems nonperiodic, or chaotic behavior is more typical. Also the effect of noise in nonlinear systems far from equilibrium is a nontrivial problem. Nonequilibrium noise might even change the qualitative behavior of the dynamical system, inducing new regimes which are absent in the noiseless system [21]. That is why the problem of an extension of the concept of synchronization to chaotic and to stochastic motions is of great interest and importance. Nowadays, there are few concepts of synchronization of chaos. 9 One of the first approaches of synchronization in chaotic dynamics has described the appearance of stable periodic motion if the system stands under the influence of external periodic force [22]. In this situation periodic oscillations occur for sufficiently strong amplitudes of the external force, hence, there is threshold amplitude. The chaotic motions are suppressed by the external periodic force. It does not exhaust all possible mechanisms of chaos synchronization. 9 Another approach frequently discussed in the literature [23,24,12] corresponds to the situation when chaotic systems are coupled and the corresponding dynamic variables of the sub-systems become completely identical. This approach has been extended to the more general case of generalized synchronization as the appearance of a functional dependence between the dynamical variables of the subsystems [25,14]. 9 A third concept of chaos synchronization based on a generalization of the classical notion of frequency entrainment and frequency suppression has been proposed in [26]. As was shown this concept works well for chaotic oscillators with a Feigenbaum-type scenario (that is period-doubling cascade) of the chaos onset. 9 Finally, in [27], the classical approach based on a generalization of the notion of an instantaneous phase to the case of chaotic motion has been proposed. This allowed to define synchronization of chaos as a phase locking effect, e.g. in terms of classical theory. It has also been shown, that a wide class of nonlinear systems can make use of noise, providing an optimal coherent behavior at a nonzero noise level in response to a second system or an inputting signal. Dynamical systems of this kind possesses a noise-controlled characteristic time scale which plays the role of natural period of a self-sustained oscillator. A double-well potential system driven by noise is a typical representative of this class. When two of such systems coupled, the mutual stochastic synchronization could be observed [28]. Otherwise, when stochastic bistable system is driven by a weak periodic force, the response of the system to this weak signal can be significantly amplified by tuning the noise intensity. This phenomenon, known as
Phase synchronization: from periodic to chaotic and noisy
27
stochastic resonance (SR) [30], has attracted recently much attention [31,32]. It has been shown that stochastic resonance can be characterized in terms of frequency entrainment [33,34] and phase locking [32,35-37]. Thus, recent studies have shown that synchronization from a general point of view can be extended to deterministic nonperiodic or chaotic and to stochastic systems. As we tried to show in this introductory section, synchronization includes a wide range of problems from different fields of nonlinear dynamics and statistical physics. It is therefore impossible to review all these problems in the framework of a single tutorial paper. The application of synchronization analysis to biomedical data will be discussed in the paper of Rosenblum et al. [19] of this book. In our contribution we will concentrate on the synchronization of stochastic system which exhibits the phenomenon of stochastic resonance and, therefore, possesses noisecontrolled time scales. That is why in Section 2 we present an illustrative example of how synchronization looks like in a periodically driven overdamped stochastic system. The classical case of synchronization of self-sustained oscillator is discussed in Section 3. Details of forced, mutual and synchronization under noisy perturbation are presented. Section 4 gives several examples of synchronization in chaos and excitable media. Synchronization of stochastic systems by external signals is considered in Sections 4 and 5. In Section 6 synchronization of SR systems is considered from the point of view of stochastic discrete processes.
2. Synchronization in aperiodic stochastic systems To exemplify how synchronization effects may look in a stochastic system which does not possess any deterministic time scale, we introduce in this section the effect of noise-induced synchronization in a stochastic Schmitt trigger driven by a weak periodic signal. We also consider an excitable system, the FitzHugh-Nagumo neuron model as another example of nonlinear dynamical system with noise-controlled characteristic time scale. 2.1. Synchronizing the stochastic Schmitt trigger
The Schmitt trigger is one of the best experimentally and theoretically investigated examples of nonlinear stochastic dynamics exhibiting SR [38]. The input/output characteristic of the Schmitt trigger is shown in Fig. 1. The input of this device is an arbitrary function of time ~np(t), but the output Vout(t) has only two possible values, say 4-1. The Schmitt trigger is characterized by a threshold voltage A V and it is bistable for the input voltage in the range - A V / 2 < I~np(t) < A V / 2 . Let us consider the input as a sum of the periodic signal s(t) - a sin(cOst + qb0) with the amplitude a, the frequency cOs and the initial phase ~0, and the Gaussian colored noise ~(t) with intensity D and correlation time %. The ideal Schmitt trigger can be modeled as y(t + At) - sign[Ky(t) - ~(t) - s(t)],
(2)
where y(t) stands for the dichotomic output voltage gout, K the threshold voltage, and sign(z) is +1 or - 1 for the positive or negative argument, respectively. We
28
L. Schimansky-Geier et al.
~ut
IA
I
V/np
.2
AV
Fig. 1.
Input/output characteristic of the Schmitt trigger.
suppose that the amplitude a of the signal is below the threshold and switches without noise do not occur. An example of the input and the output time series is shown in Fig. 2. Without signal switches y -~ - y occur on a time scale exponentially depending on the ratio of the threshold and the intensity of noise, T(D) cx exp AV/2D. Thus, for a small noise the mean time between switchings could be much larger than the half period of the signal Tp/2---~/0~s. Otherwise, for a large noise, the trigger switches so fast, that the mean time between transitions becomes much smaller than the half period of the signal. Intermediate situation of an optimal noise intensity, when the mean time between switchings matches approximately the half period of the signal corresponds to SR, when the response of the trigger to the weak periodic signal is maximal [31]. Originally, the phenomenon of SR is quantified in terms of the power spectrum of the output process. In the case of a weak signal, the power spectrum of the output can be decomposed into a continuous part, or noise background, and a discrete part with Dirac-delta functions at the signal frequency and its odd multiples (due to the symmetry of the Schmitt trigger even multiples are suppressed). This second contribution is called the signal part. The mentioned merging condition is obeyed for a given finite frequency of the input at a finite noise intensities. SR is manifested in the existence of a maximum in the dependence of the weight of the first peak in the power spectrum versus noise intensity. Moreover the spectral power amplification, that is the ratio of the weight of the first harmonic peak at the output to that at the input, can be significantly larger than 1. Therefore, the optimal amplification of the signal can be achieved by tuning noise intensity. How now does synchronization come into play? What would change if one finds that the input and the output are synchronized in the process of SR? What can be improved in the process of SR? Note that these questions are important for practical purposes concerning the agreement between the output and the input. If one would like to amplify a signal by SR, i.e. by adding noise, the most preferable situation is just a synchronized state comparable to a nonlinear amplifier. Investigations so far have proven that the averaged response at the output becomes maximal. But it does
Phase synchronization: from periodic to chaotic and noisy
(a)
29
x
i I i I I I I I I
Illl
I i I I I
I
,ll
II li II II
,al
I I I I
II II II
II
I I I I
i
I I I I
Ill,iriliiil! i i liiiiii,
0 !i! ! VVVVVV 1!VIIVVII I ' ., ', ,- ,
'. r
-,,,. ,,,,
....
I
,, .. - . ,,,, .il ,
(b) ]
t
Fig. 2.
Numeric simulations: (a) input signal and noise with A V = 0.3, a = 0.25, cos = 0.1, Vc = 0.1; (b) the corresponding dichotomic output signal.
not necessarily mean that the switchings at the output are phase locked with the input signal during a large number of periods. We will show now, that the synchronization of the switchings by external weak signal is possible. We have performed analog and numerical simulations of the periodically driven Schmitt trigger described above. Results are presented and discussed in detail in [34]. In these experiments we used a Schmitt trigger made with an operational amplifier with the threshold A V - 150 mV, Gaussian noise generator with the cut-off frequency is foff = 100 kHz and the periodic signal with the frequency f0 = 100 Hz. The output switching sequences were digitized and stored in a computer. From these sequences we calculated the mean switching frequency (that is the reciprocal of the
30
L. Schimansky-Geier et al.
time interval between switchings averaged over the whole number of switchings in a sequence) of the trigger as a function of noise voltage, signal amplitude and the results of the measurements are shown in Fig. 3a. For a weak signal the dependence of the mean frequency versus noise intensity follows the Kramers law cx 1 / T ( D ) . However, for sufficiently larger values of the amplitude being nevertheless lower than the trigger's threshold, the dependence of the mean frequency is qualitatively different. There a range of noise intensities appears in which the mean frequency does not change with the growth of the noise level and coincides in the limits of experimental error with the signal frequency. Thus, the effect of locking of the mean switching frequency is observed [34]. Repeating the measurements of the mean frequency for different values of the signal amplitude we will obtain synchronization regions on the parameter plane "noise intensity- signal amplitude" inside which the mean switching frequency equals the signal frequency. The synchronization regions similar to Arnold tongues are shown in Fig. 3b. As seen from this figure, there is a threshold value of the amplitude ath above which synchronization of switchings can be observed. After the achievement of the threshold value the dynamics of stochastic transitions is effectively controlled by the periodic signal despite the fact that the input periodic signal cannot reach the threshold by itself. The increase of the external frequency leads to a worsening of synchronization: the Arnold tongues shrink and the threshold value of the amplitude increases (see Fig. 3b). To verify that the output indeed merges with the input inside the Arnold tongues one may compare both using the Kullback entropy [32,39]. It is a widely used measure for the distance of two functions. We introduced a time step At and mapped the in- and output from the experiment onto binary strings with i~ - sign[s(t + kAt)] and o h - y ( t + k A t ) . The temporal interval was chosen At = Tp/12 where Tp is period of the input. In result we obtained two symbolic sequences of length n denoted i~ = i~, i2,..., i, and 6, - oi, o 2 , . . . , o,, for the in- and output, respectively.
(a)
(b)
2~
140
200 ~"
120
150
8O
30
40
50
60
70 Vn (mY)
80
90
100
10
40
20
30
~
40
~
50
~
70 Vnth80 60 Vn (mY)
~
90
1O0
Fig. 3. (a) Dependence of the mean switching frequency of the output signal of the Schmitt trigger versus noise voltage Vn for different values of the amplitude of periodic signal: a = 0 mV (A), a = 60 mV (D), a = 100 mV (*) (the results of analog simulation); (b) synchronization regions for the Schmitt trigger for different values of the external frequency: fo -- prl O0 Hz (*), f0 = 250 Hz (A), f0 -- 500 Hz (El).
Phase synchronization: from periodic to chaotic and noisy
31
We can now determine the probabilities of occurrence of certain "words" (that are various combination of symbols) at the input and at the output, p0i " - - pinn (/-s and Pi := p~ The Kullback entropy K[p~ is defined by K[p~
"-- Z p
i logP--~.
i
(3)
Pi
Here, p0 and p denote the input and the output probability distributions, both with respect to the same set of events. Therefore, K[p~ gives the average information gained when replacing p0 by p. In particular, it vanishes if the input and the output sequences are identical. Results for the Schmitt trigger are shown in Fig. 4; n ranges from 1 to 8. Common to all curves is a pronounced minimum for values of V~where the frequency is locked. This indicates that for these values of noise intensity both distributions maximally match. We note, that the value of the noise intensity V~ at which the Kullback entropy takes its minimum exactly corresponds to the onset of the synchronization region (see Fig. 3). Therefore, a periodic signal with a sufficiently large value of amplitude synchronizes the stochastic dynamics [39]. We underline again, that although the signal amplitude is not small, it is still insufficient to cause the switchings in the absence of noise. Thus, noise is a necessary component of synchronization, but this phenomenon requires signal amplitudes which are beyond the limits of linear response theory. Thus, synchronization of the stochastic Schmitt trigger is manifested by the mean switching frequency locking and by specific behavior of the Kullback entropy. In the following sections we will show that the classical description of this type synchronization in terms of phase locking is also possible. This provides further generalization of synchronization to stochastic systems which possess statistical noise-dependent characteristic frequencies instead of deterministic natural ones in conventional self-sustained oscillators.
1.0
I
"?
K __.~)----< f
0.0
~. 30
~-'~>-<~ " ~~ 40 50 60 70
~ 80
90
100
110
120
Vn
Fig. 4.
Kullback entropy vs noise intensity for different length n of binary sequences: a = 100 mV, all other parameters as in Fig. 3.
L. Schimansky-Geieret al.
32
2.2. Noise-enhanced synchronization of excitable systems As a second introductory example we consider stochastic excitable systems using the example of the FitzHugh-Nagumo model [40]. This model has appeared as a reduction of the four-dimensional Hodgkin-Huxley equations [41] to a two-dimensional system and represents basic properties of excitable systems. The model is described by two coupled equations for fast u(t) and slow w(t) variables:
~/~--U
//3 ~---w,
Zw w - u + a + v ~ D ~(t),
(4)
where e << 1, ~(t) is white Gaussian noise with intensity D, and a is the parameter of the system. From the point of view of simulations of a nerve activity u(t) represents the membrane potential, while w(t) refers to a slow recovery variable. The parameter a is responsible for the excitory properties of the system. In the absence of noise for a < 1 the system possesses a stable periodic solution (limit cycle) generating a periodic sequence of spikes. The parameter value a = 1 corresponds to the birth of the limit cycle, and for a > 1 the system is excitable: it possesses a stable equilibrium of focus type, however it generates spikes, or fires, under perturbations which exceed the threshold. The presence of noise leads to the existence of a peak at a nonzero frequency in the spectral density even in the excitable regime. The frequency of this peak is a function of both the noise intensity and the parameter a. A typical time series of noisy excitable system (4) is shown in Fig. 5. The important information about the stochastic activity of the system can be obtained from
tl
t2
t3
t4
t5
t6
7
8
0
I -1
-2
Fig. 5.
f 0
I
10
.....
/
,
20
30
Time series of system (4) for ~ = 0.01, a = 1.05, cy = ~ = 0.08. The upper part of the figure represents the process as a sequence of delta impulses (5).
Phase synchronization: from periodic to chaotic and noisy
33
the time sequence ti which correspond to the firing events (see upper part of Fig. 5). In this way we map the continuous stochastic process u(t) into a point process ti and therefore represent the spike train in the form O(3
Us(t) -- Z
(5)
~)(t- ti).
i:l
The mean firing rate, (f), which plays the role of the mean frequency is ( f ) = Xlim ~ ~ Z (1t i +Nl
--ti) -1.
(6)
i=1
In the excitable regime the mean firing rate grows exponentially with the increase of noise intensity and then saturates [42]. The same kind of dependence is observed with decreasing a while the noise level is small. The interspike intervals T/= ti+l - ti play the role of an instantaneous period of the process. The peculiarity of excitable systems is that the coherence of noise-induced activity can be optimized by tuning the noise strength. In [43] this effect was called coherence resonance and explained by different noise dependence of the activation time (that is the time needed to excite the system from the equilibrium state) and the recovery time (that is the time needed to return from the excited state to the equilibrium state) [44]. The appropriate measure of coherence of noisy spike train is the coefficient of variation (CV) of the interspike intervals which is widely used in neurophysiology
'
(7)
where
L. Schimansky-Geier et al.
34
1.1
. . . . . . . . .
,
0
5
t
~
. . . . . . . . .
,
. . . . . . . . .
,
. . . . . . . . .
,
. . . . . . . . .
10 15
0.9
0.7
0
> ro
0
5
10 15
5
10 15
' ' ' '
0.5
0.3
0 , 1
Fig. 6.
_
. . . . . . . .
0.0
,
i
.
0.1
.
.
.
.
.
.
.
.
i
.
0.2
.
.
.
.
.
.
.
.
i
.
0.3
0.4
0.5
Coefficient of variation (7) versus noise variance cy = v / ~ for e = 0.01, a = 1.05. The insets show the ISIH for the selected values of noise intensity.
exhibits a global maximum with respect to noise intensity and driving frequency. The coherence resonance amplifies SR as analytically shown in [46]. Additionally forced synchronization of the firing events was observed in [47] where stochastic Arnold tongues were obtained. On the other hand, synchronization of two coupled excitable noisy oscillators was recently studied in [48] both numerically and experimentally. Noise-enhanced synchronization of globally coupled excitable elements was a subject of study by several authors [49,50]. It was shown that the noise intensity can be used as a control parameter which is able to greatly increase the coherent behavior of the system via synchronization. This positive, e.g. ordering, role of noise has been observed also in extended excitable systems, including spatio-temporal stochastic resonance [51], noiseinduced spiral waves [52], noise-enhanced wave propagation [53,54]. Recently in Ref. [55] a new effect of noise-induced transition from pulsating spots to global oscillations in excitable media modeled by cellular automata has been reported. On the other hand, the authors of [56] studied synchronization of oscillators with randomly distributed natural frequencies. These phenomena will be discussed in detail in Section 5.4. 3. Synchronization of periodic oscillations - the classical theory
3.1. Amplitude and phase description of synchronization in periodically oscillating systems To study synchronization one has to assign a phase to the dynamical variable(s) of the system. First we discuss the notion of a phase for periodic motions. Later, in the following sections, this notion will be generalized for more complicated types of dynamics.
35
Phase synchronization: from periodic to chaotic and noisy
The term "phase" was originally introduced for harmonic processes with ~0) (see, for instance [17,57]). As long as x(t) is strictly periodic with a constant amplitude A one assigns ~(t) = cot + qb0 as the instantaneous phase. The amplitude and phase can be found from x(t) and a second conjugated variable, that is its time derivative 5c(t). This definition of amplitude and phase corresponds to the transition to the polar system of coordinates with the radius-vector A and the angle (I) given by
x ( t ) - A cos(cot +
A=
~/X
~2 tan~= 032~
2+~
JC Xco
(8)
if 03 is the circular frequency of the harmonic process which can be found, for example, comparing the maximal elongation of the coordinate and the velocity. A similar approach is the analytic extension of periodic signal on the complex plane. Instead of the real signal x(t) we introduce x(t) = A exp i(cot + qb0) = A[cos(cot + ~0) + i sin(cot + ~0)],
(9)
which is now complex. Harmonic oscillation is pictured as the rotation of the vector A with a constant angular velocity 03 (see Fig. 7) on the complex plane. The phase of oscillations corresponds to the angle of the vector A, and the angle ~0 at t - 0 is the initial phase. However, these definitions cannot be used directly in nonlinear dissipative systems since the oscillations in such systems are not harmonic. How to proceed in more complex situations? Several approaches will be discussed below for different examples. The particular procedure depends on both, the type of the process generated by the dynamical system and the structure of driven force. An often used first generalization for periodic oscillations will be discussed in this section. Suppose that x(t) is a nearly periodic process representing a nonlinear system. It can be written in the form [57]
Re x
Fig. 7.
The instantaneous phase of oscillations as a turn angle of vector A. q~0 = 0.
L. Schimansky-Geier et al.
36
x(t) = A(t) cos(~(t)),
~c(t) = v(t) = - A ( t ) cosin(tI)(t)),
(10)
where co is a characteristic frequency of the process. With this definition we introduce the notions of the instantaneous amplitude A(t) and the instantaneous phase 9 (t) of a nonlinear quasiharmonic oscillator. So far, (10) is simply a nonlinear variable transformation. O(t) can be further specified as ~ ( t ) = cot + ~(t) which distinguishes between a cot and a, possibly, slowly varying part ~(t). The later circumstance, whether ~(t) is slowly changing compared to cot depends on the choice of co and is generally conditioned by some requirements concerning the parameters of the oscillator. For example, a good selection for external synchronization in periodically driven system is that co equals to the driving frequency or its rational multiples [57] and external synchronization becomes perfect if ~(t) ~ const. Using the nonlinear transformation (10) we can rewrite the equations of motion of the oscillator in terms of amplitude and phase. Under an obeyed assumption of slowly changing A (t) and phase ~(t) these equations can be further simplified.
3.2. Resonance in periodically driven linear dissipative oscillators Let us consider a linear dissipative oscillator driven by periodic force 5~+ 75c + coZx - a cos(colt),
(11)
where 7 is the damping coefficient, and coo is the natural frequency of the oscillator. Without periodic force it has a single stable attractor at the origin, x0 = ~c0 - 0. Let us discuss the dynamical properties of this linear system in the amplitude-phase description. In the limit of small friction 7 << 2co0 and small amplitude of driving force the nonlinear transformation to the amplitude A(t) and phase ~(t) by (note that a and A have different dimensions here and later on)
x(t) = A(t) cos tI)(t), ~(t) = v(t) = -A(t)col sin tI)(t)
(12)
will give us well-known asymptotic solution. It is reasonable to decompose the phase variable tI)(t) into two parts of fast and slow motion: ~ ( t ) = colt + q~(t). The new function ~(t) represents the phase difference between driving force and the response of the system and is slowly changing function in comparison with the external force. We proceed with an usual technique employed in the theory of oscillations developed by Bogolubov and Mitropolski [57]. Inserting (12) into (11), gives a dynamical system which describes the time evolution of A(t) and ~(t). With the assumption made about the smallness of the friction coefficient and the applied force, we can suppose that the amplitude and the phase change on time scales larger than the period of applied force. It gives the possibility to average the dynamics of A and ~ over one period fixing their values during this period. In the result we derive the following approximate, the so-called reduced amplitude-phase equations: 7 d = - ~A - ~tsin q~,
~t + -- A - ~ cos q~.
(13)
37
Phase synchronization: from periodic to chaotic and noisy
Therein g = a/2co1 is the normalized amplitude of the force, A - ((o2_ o32)/20)1 coO -- co! is the frequency mismatch or the detuning between the natural frequency of the autonomous generator and the frequency of the external signal. The long-time asymptotics of the amplitude and the phase are given by the stationary solutions of (13) A0 =
a
,
v/(O;o _ 0,2)2 +
qb0 = - a r c t a n
7co-------L--t 0,0 -
(14)
which fully characterize the response of a linear dissipative oscillator to the periodic force. We note that expressions for the stationary amplitude and the phase shift (14) obtained in the framework of the amplitude-phase approximation are in full agreement with the exact theory discussed in textbooks. The response amplitude versus driving frequency is shown in Fig. 8 for different values of the friction coefficient 7. The phenomenon of resonance manifests itself in the abrupt increment of the amplitude of the forced oscillations when the external frequency 0 ) 1 coincides with the natural frequency COoof the oscillator. In detail, the peak is approached if
(01 --" COp - -
V/o) 2 -
72/2 ~
COo.
Resonance is characterized by
this typical dependence of the amplitude A(co~) which is called resonance curve.
4.0 ----
3.0
7=0.05
....
7-- 0.1
---
7--0.3
2.0
1.0
0.0
o.o
'
ols
'
11o - , i s
_..3
20
(0j
Fig. 8. Dependence of the amplitude of forced oscillations in system (11) on the external frequency for different values of dissipation coefficient 7. Other parameters are: COo-- 1.0, a = 0.2.
38
L. Schimansky-Geier et al.
3.3. Nonlinear oscillators: the Van der Pol oscillator 3.3.1. Self-sustained oscillations in the Van der Pol oscillator
The Van der Pol model is the most simple example for a Thomson type generator with an internal feedback. It is the prototype for many mechanical, electronic and biological systems exhibiting self-sustained oscillations. It is a generic model for the investigation of different types of synchronization if the oscillator is periodically driven or coupled with another oscillator. First we discuss the unperturbed regime of the autonomous Van der Pol oscillator. Afterwards we elucidate different types of bifurcations and elaborate when synchronization to an external periodic force appears. Finally, we consider mutual synchronization of two coupled Van der Pol oscillators. The autonomous Van der Pol oscillator is described by the second-order differential equation -
-
x
+
-
o,
(15)
where ~ is a small parameter characterizing the degree of nonlinearity and corresponding to the feedback strength in an electronic realization of the oscillator, COois the frequency of the oscillator. For -2CO0 < e < 0 this system possesses a single stable state of equilibrium of the focus type at the origin with the eigenvalues S1,2 -- -~ + i
-
.
(16)
Crossing e = 0, the system undergoes the A n d r o n o v - H o p f bifurcation and a limit cycle is born. For e > 0 the limit cycle is the single stable limit set of the system and its basin of attraction is the whole phase plane. From the physical point of view, the notion stable "limit cycle" corresponds to self-sustained oscillations. The properties of this regime, e.g. amplitude and frequency, do not depend on initial conditions and are fully determined by the internal properties of the system. For 0 < e << 1 the period of self-sustained oscillations is (Andronov-Hopf theorem) 2 rt ~/ g2 T = CO(----~, CO(~;)= COo 2 4'
(17)
for example, the frequency of oscillations is close to the natural frequency COoof the resonance circuit. 3.3.2. Periodically driven Van der Pol oscillators
To discuss synchronization we add a periodic force to the right-hand side of Eq. (15) 3~-- t;(1 - x2)k + to2x -- a cos(colt + r
(18)
We consider the case of small nonlinearities 0 < ~ << 1 and introduce the instantaneous phase and amplitude according to the nonlinear transformation (10)
Phase synchronization: from periodic to chaotic and noisy
39
with O(t) = 031t d-- dp(t). Sufficient conditions of slow amplitude and phase evolution is the choice of 03o ~ 031, small feedback e and that the amplitude of the periodic force also scales with e. The exact equations for the phase and the amplitude contain fast oscillating terms which can be neglected, taking into account frequency selectivity of the Van der Pol oscillator, by averaging over the period of external force. Omitting this procedure of simple but awkward transformations [57], we further discuss the slow dynamics of the first approximation for the instantaneous amplitude A(t) and phase eA J - -~- 1 -
- ~tsin q~,
~t qb -- A - ~cos ~,
(19)
again with ~ t - a/2031, the frequency mismatch A = (032_ 032)/2031 ~ 030- 03, and A0 = 2 being the amplitude of the stable cycle of the autonomous case. System (19) can be considered as a Poincar6 map of the original Van der Pol oscillator (18) where a stationary fixed point of system (19) corresponds to the periodic solution of the initial system (18), and a periodic solution of (19) corresponds to a quasi-periodic one of (1 8).
3.4. Bifurcation analysis of synchronization Assume that system (19) has a fixed point A - 0, qb - 0 as solution which is stable. The condition qb = 0 means that + - COl, that is, the frequency of the forced oscillations coincides with the frequency of the external force. Thus, the generator is tuned up to the external signal frequency and the effect of forced synchronization takes place. This phenomenon can be studied in more detail. The phase space of the dynamical system (19) is a two-dimensional cylinder. Setting the right-hand sides of system (19) equal to zero, we find the coordinates of the stationary fixed points pictured in Fig. 9a. One finds easily three different points with amplitudes A > 0: a stable node at O1, a saddle at O2 and an unstable node 03. The presence of the stable point O1 corresponds to the regime of synchronization (A -- const, qb -- const). If the detuning parameter A increases, the points O1 and 02 approach closer to each other, merge and disappear through a saddle-node bifurcation at a certain critical value of A (see Fig. 9b). As a result, the limit cycle C1 of second kind (that is, surrounding the whole cylinder) is born. This bifurcation corresponds to the lost of synchronization and a regime with two frequencies appears in the original system (18). Indeed, if d~(t) changes periodically then ~) ~: COl. The region in the space of the control parameters A, ~t in which the fixed point O1 remains stable is a synchronization region with frequencies relation 1:1. In order to find the boundaries of this region one solves the linearized equations of (19) in the vicinity of the fixed point O1 for its eigenvalues. A vanishing real part of the eigenvalues indicates the values of the control parameters where the stability of this point is lost or O1 disappears. This problem can be solved analytically and the results are shown in Fig. 10. Inside the first synchronization region (region I) the
L. Schimansky-Geier et al.
40
(a) A
/
//
.~t Fig. 9.
~~
. rd2
(b)
!,,\"~\
'NK
"
;
~/2
7[
-~
-rd2
0
~/2
Structure of the phase space of system (19) for different values of the parameters: (a) = 0.1,/a = 0.052, A = 0.026; (b) 8 = 0.1, ~t = 0.052, A = 0.032.
0.20
// lb
lh / 7
,y
#
0.15
A
P" 0.10
Ill
B
C
la
IQ
III
0.05
0.00 -0.16
-0.12
-0.08
-0.04
0.00
0.04
0.08
0.12
0.16
A Fig. 10. Bifurcation diagram for systems (19) and (18). The lines la and lc correspond to the saddle-node bifurcation in system (19); lb is the line of the A n d r o n o v - H o p f bifurcation in system (19); ld is the bifurcation line of cycle C2 crisis; [h is the bifurcation line of torus birth from the resonant cycle in the original system (18).
Phase synchronization: from periodic to chaotic and noisy
41
fixed point O1 is a stable node. On the lines lo the saddle-node bifurcation of O1 and 02 takes place. Moving from I to III by crossing the lines lo corresponds to the birth of the limit cycle C1. Thus, the synchronous regime is destroyed by increasing the absolute frequency mismatch or decreasing the effective amplitude of the external force. Bifurcation points B and C are of cusp type where all three fixed points O1, 02 and 03 merge. The line l~ corresponds to the merging and disappearance of another pair of the fixed points 02 and 03. This line can be determined by the stability analysis of these points. However, since the fixed point O1 is unaffected by this local bifurcation it still exists and is stable above the line l~ (see Fig. 1l a). Therefore, inside region II synchronization still takes place. Further, let us increase the absolute value of the detuning with fixed ~t and above lc in the region II. Starting from a certain critical value of [A[ the point O1 loses its stability through the Andronov-Hopf bifurcation indicated in Fig. 10 by the lines lb. Outside the region bounded by lb a stable limit cycle C2 of the first kind (lying fully within the surface between -rt and rr) appears. This bifurcation again destroys synchronous oscillations as the original system (18) performs quasiperiodic oscillations (see Fig. 1 lb). The picture of simple bifurcations is completed if the detuning parameter is increased, further on. It leads to a nonlocal homoclinic bifurcation (crisis). The cycle C2 disappears and in consequence of this bifurcation, the cycle C1 of the second kind is born (see Fig. 11b). Lines ld correspond to the crisis of the cycle C2. Thus, regions of a 1:1 synchronization for the Van der Pol oscillator are bounded by the bifurcation lines la below points B, C and by the lines lb beyond these points. Inside the regions of synchronization the original system (18) exhibits a stable limit cycle whose frequency coincides with the frequency of the external force. This fact means that both the frequency and the phase are locked by external force. Let us summarize bifurcations described above. Assume that the parameters of the oscillator are inside the region I of Fig. 10. As seen from Fig. 9a, inside the region I the separatrixes of the saddle O2 are pointing into the stable node O1. Both points lie on the closed invariant curve l going around the full cylinder. This curve is (b~
. . . . . . . . . . .
A (-c,),
\ ,-
. . . . . .
81
'~
$-
2"~
*o
*o -x
-x/2
o
x/2
~
-Tt
-~12
o
x/2
~t
Fig. 11. Phase portraits of system (19) for different values of the parameters: (a) ~;=0.1,1a=0.056, A=0.028; (b) ~=0.1, gt=0.056, A=0.031; (c) e=0.1,Vt=0.056, A =0.033.
42
L. Schimansky-Geier et al.
an image of a two-dimensional resonant torus in the original system (18). The stable point corresponds to the synchronous regime of the oscillator. At the onset of the saddle-node bifurcation on lines la of Fig. 10 curve l takes the form of an ergodic curve being the image of a two-dimensional ergodic torus in the original phase space. In the region III the motion is quasiperiodic and is represented by an ergodic torus in the phase space of the system. On the line lc the ergodic torus is destroyed, the invariant curve disappears. However, the point O1 exists and is stable. Therefore, synchronization regime in the region II is no longer related to a resonance on the torus, but to a stable limit cycle of the original system (18). At the transition from the region III to the region II the synchronous regime is realized by passing through the bifurcation lines lb. Quasiperiodic oscillations in system (18) disappear softly and the regime of the stable limit cycle arises. This mechanism is called synchronization via asynchronous suppression of oscillations. In terms of the reduced equations (19) this mechanism corresponds to the suppression of periodic oscillations with an amplitude A(t) and to the appearance of the regime with A - const [58]. The bifurcation lines la and 16 converge in points D which are called Bogdanov-Takens points. It is clear that the phenomena for higher values of the parameter ~, especially the crisis, cannot be exactly described by the reduced equations (19) derived under the assumption of small forcing. In this case, numerical methods should be applied to the original system (18) to build the bifurcation lines. The results of the numerical study are incorporated in Fig. 10. For the case of weak external driving (~t < 0.05 in region I) numerical and analytical results coincide completely. Also lines of the Andronov-Hopf bifurcation (lines lh in Fig. 10) and Bogdanov-Takens D points were confirmed numerically. However, this good quantitative agreement disappears above and left and right, respectively, from the points D (torus birth lines). Nevertheless, it is important to emphasize that the numerically obtained bifurcations inside the I and II regions, as well as their borders coincide with results of the analytical study of the reduced dynamics. Moreover, these bifurcations are typical for any periodic self-sustained oscillator synchronized by an external periodic force in the case of small detuning. 3.5. Synchronization as phase and frequency locking
Though the bifurcation scenario depends significantly on the dynamics of the instantaneous amplitude, synchronization can be well represented by the long-time behavior of the instantaneous frequency of the oscillator (18). The instantaneous frequency is given by the time derivative of the instantaneous phase
o(t)
d ~(t)
(20)
and is the left-hand side of the phase dynamics. Obviously, this additional reduction of the dynamics allowing the consideration of the phase dynamics only, requires further assumptions on the amplitude dynamics. It will be valid for the interesting situation of a small amplitude of the external periodic force: ~ > a/(o~l 9A0), where
Phase synchronization: from periodic to chaotic and noisy
43
A0 is the amplitude of the unperturbed Van der Pol oscillator. This condition can be established in the region I of the bifurcation diagram Fig. 10. It is possible to show that in this case the amplitude changes much faster than the phase (to do so we compared coefficients at the linear terms, which defines relaxational time scales). That is why we can substitute qb = const to the first equation of system (19) and use the unperturbed amplitude A0 in the equation for the phase d dt~)- A
~t A-~COS~.
(21)
This equation is one of the canonical in the theory of phase synchronization [59]. It can be re-written in a potential form: ~ =-dU(q~)/dq~ with the potential U ( ~ ) - - A q ~ + ~0sin ~. Therefore, the dynamics of the phase difference q~ can be viewed as the motion of an overdamped particle in the tilted potential U(~) (see Fig. 12). The detuning parameter A determines the slope of the potential and ~t/Ao gives the height of the potential barriers. For A < p/A the minima of the potential dpk =arccos(A.Ao/~t)+2nk exist and correspond to synchronization as the instantaneous phase difference remains constant in time. The instantaneous frequency is constant and matches the driving frequency in the regime of synchronization. Otherwise it changes in time and we have to calculate the mean frequency as (co)- lim~_~ 1/T fo co(t)dt. The dependence of the mean frequency versus detuning is shown in Fig. 13 by the solid line. As clearly seen from this figure, the mean frequency coincides with the external frequency col in a finite range of A. The plateau in Fig. 13 corresponds to the synchronization region. Outside this region, co(t) differs from the external frequency and two frequency oscillations occur. If we increase detuning further then higher-order regimes of synchronization can occur. To study these regimes we introduce the ratio of the driving frequency to the mean frequency of the oscillator, 0 = col/(co) which is also called winding number. This ratio tells how many periods of external force are within one period of the
0 -4n
Fig. 12.
-2g
Schematic potential profile U(qb) in the case of phase locking.
L. Schimansky-Geier et al.
44
0.15
3 ,'7
0.05
j,,?---
/.(""
-0.05
-0.15
-~
~" "
/
1
o oS
D=O.O
- - - D=0.07 -- -- D = 0.02
9
olo oi~ '0'.2 A
.
0.3 . . .0.4 .
Fig. 13. Dependence of the difference between the mean frequency of oscillations in system (18) and the frequency of the external signal vs. detuning parameter for different values of the noise intensity. oscillator. Up to now we study the regime of 1:1 synchronization when 0 - 1. As we already know, this situation corresponds to the existence of the resonance stable limit cycle on a two-dimensional torus. However, with the increase of the driving frequency o~, 0 also increases and can take both rational and irrational values. The structure of the phase trajectories on the torus will undergo bifurcations. Irrational numbers of 0 belong to ergodic motion on the torus. In this case the phase trajectories cover the whole surface of the torus. Rational values of 0 conform to resonant limit cycles lying on the torus surface. Such resonant motion on the torus is unambiguously related to synchronization with locked frequency relations corresponding to the winding number. Some regions of high-order synchronization for different winding numbers are qualitatively presented in Fig. 14a. These regions are called "Arnold tongues". The rational values m:n of the winding number are indicated by the numbers in the plot. As seen additionally from the figure, the tongues are topologically equivalent to the synchronization region at the basic tone 1:1. The phenomenon of synchronization, whose mathematical image is represented by a resonant torus with winding number 0 = m : n, can be described using the circle map. The Poincar6 section along the small torus circle gives rise to a one-dimensional map of the circle to itself. It has the form ~,+, = qb, + f ( ~ , ) ,
f(d~,) =_ f(d?, + 2 rtk).
(22)
Each iteration of the map corresponds to one turn of a phase trajectory along the large torus circle and in general case leads to a shift of a representative point on the circle on a certain angle ~. If a finite number of points is fixed on the circle as
Phase synchronization." from periodic to chaotic and noisy
(a)
'
0.5
b) ~.o
r
5:1
45
0.5
[
--T--
0.9
.
0.4
0.8
0.4
0.3-
0.7 0.6
0.3 i
0.2. 6
0.5 0,4
0.2 !
0.3 0.2
0.0 0.5
Fig. 14.
i
,
1.0
t
,
1.5
i
,
....
2.0
2.5
3.0
I
3,5
,
L
4.(
,,
i
4.5
4
0.1
I
0.0
,,
_
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(a) Typical resonance regions for indicated values of the winding number; (b) dependence of the winding number vs. detuning parameter.
n --+ ~ , then we can observe the image of a resonant torus. If the number of points is infinite and they cover densely the circle, then we deal with the image of an ergodic torus in the form of invariant circle. The circle map is governed by the following difference equation: K X~+l - xk + ~ - ~ sin(2 ~ xk),
mod 1.
(23)
For K = 0 the parameter 6 represents the winding number, which characterizes the ratio of two frequencies of uncoupled oscillators. If 0 < K < 1, the map (23) may have a period n cycle (n = 1,2,...) even in cases when ~ is an irrational number. In this case we deal with the effect of synchronization. To illustrate this, let us calculate the winding number 0, 0 = lim k~
Xk
-- Xo
k
'
(24)
as a function of the parameter 6 (in addition one needs to exclude the operation mod 1). The results are shown in Fig. 14b and indicate the presence of plateaus, which correspond to the synchronization regions with different rational winding numbers 0 = m :n. The graph in Fig. 14b also demonstrates the property o f selfsimilarity. The self-similarity manifests itself in the fact that between any two plateaus with winding numbers 01 = r :s and 02 = p : q there always exists one more region of synchronization with winding number 0 = (r + p ) / ( s + q). For this reason, the dependence 0(~i) in Fig. 14b is called "devil's stair". On the parameter plane (K, 8) synchronization regions, inside which the winding number 0 = m : n is rational, forms Arnold tongues.
L. Schimansky-Geier et al.
46
3.6. Mutual synchronization: two coupled Van der Pol oscillators So far we were concerned with forced synchronization when the driving influence on the oscillator is unidirectional without a feedback to the force. However, let us imagine the periodic force is originated by a second Van der Pol oscillator and both generators interact but have different natural frequencies m0~ and co02. The interaction is symmetric and we assume that one oscillator is driven additively b y the second one proportionally to the difference of their coordinates and with the strength 7- From the physical point of view it may be realized by a spring with constant 7 which tries to synchronize the motion of the two oscillators. The particular equations are: X, -- 8(1 -- Xl2)XI -+- (021 x ' - - ')t(X 2 -- X 1),
3~2 - c(1 - x2)~2 + m~2x2 -- 7(x, - x2)
(25)
and starting from here we call 7 the coupling parameter. The question which should be sketched here is: Is it possible to observe the effect of synchronization in this case and what are its peculiarities? The answer can be given qualitatively by means of an analysis of the bifurcations in system (25). The structure of the bifurcation diagram for system (25) on the parameter plane is pictured in Fig. 15. As seen from the figure, the bifurcation diagram for the case of the two coupled oscillators is topologically equivalent to the situation in Fig. 14a. From the viewpoint of bifurcation analysis the case of mutual synchronization is completely equivalent to the earlier studied case of forced synchronization.
/
0.5
rp 2.'1
~1.1
0.4 7 0.3
/, //
0.2
// //
0.1
0"%.5
Fig. 15.
,
1.0
1.5 ~
,
2.0 0 ' p l
215
t
,/ ,/ ,
3.0 I
,
35 i
Resonance regions for system (25) on the parameter plane "detuning- coupling". The parameters are p -- m0z/m01, ~ = 2.0.
Phase synchronization: from periodic to chaotic and noisy
47
Concluding this section we point out that the analysis of types of synchronization allows us to formulate some fundamental properties and criteria of synchronization. The fact that we first consider the case of forced synchronization is not of principal importance. It is important that the external force is periodic. This follows, in particular, from the qualitative consideration of the dynamics of two symmetrically coupled generators. It is known that oscillations in real generators are periodic but not harmonic. The basic indication of both forced and mutual synchronizations is the appearance of the oscillatory regime with a constant and rational winding number 0 = m : n which holds in some finite region of the system's parameter space. This region is called the synchronization region and is characterized by the effects of phase and frequency locking. The frequency locking means a rational ratio of two initially independent frequencies o l / o 2 = m :n everywhere in the synchronization region. The phase locking means that the instantaneous phase difference is constant in the synchronization region (~ = 0, qbst = const). 4. Synchronization in the presence of noise
The above-considered problems do not take into account the presence of random perturbations. Noise is inevitably present in any real system in the form of natural (or internal) fluctuations caused by the presence of dissipation as well as in the form of random perturbations of the environment. The introduction of a phase in a noisy oscillating system requires a probabilistic approach. With the transformation of the the previous section
x(t) =- A(t) cos @(t),
k(t) = v(t) = -A(t)ol sin ~(t)
(26)
the instantaneous amplitude and phase convert into stochastic variables since x(t) and k(t) are stochastic. With noise taken into account, the amplitude and phase dynamics are described by stochastic differential equations including a noise term ~(t). In the physical literature this type of equation is called a Langevin equation first being developed for Brownian motion. To extract information from stochastic dynamics we have to calculate the moments of A(t), O(t) and o ( t ) = 4)(t) or consider the transition probability density P(A, a#, tlAo, ~o, to) which is sufficient for the Markovian approximations. It gives the conditional probability to observe the amplitude A and the phase 9 at time t if started at time to with A0 and ~0, respectively. In noisy systems the phase ~(t) as well as the difference with respect to external driving ~ ( t ) = O ( t ) - Ost performs motion similar to a Brownian particle in the potential U(qb) (see Fig. 12). The stochastic process qb(t) can be decomposed into two parts: a deterministic part given by its mean value or the mean value of the instantaneous frequency, and a fluctuating part characterized, for example, by the diffusion coefficient around its mean value. Synchronization as a fixed relation between two phases is always interrupted by randomly occurring abrupt changes of the phase difference, also known as phase slips. Therefore, in noisy oscillating systems the notion of synchronization must be mathematically expressed by relations and
L. Schimansky-Geier et al.
48
conditions between the moments of the fluctuating phase or its corresponding probability density. The noise influence on the periodically driven Van der Pol generator was first studied in details by Stratonovich [20]. He considered different types of noise and found conditions of synchronization perturbed by noise. We will restrict ourselves in basic results for weak additive Gaussian white noise.
4.1. Langevin equation description The stochastic force ~(t) is added to the deterministic differential equation of a periodically driven Van der Pol oscillator as 5~- s(l -x2)k +o~2x - a cos(o~,t + qb0) + x/2D~(t).
(27)
For simplicity we argue that the noise is a part of the external driving. We assume ~(t) as Gaussian white noise with zero mean and the new parameter, D, is the noise intensity. Following [20] with the ansatz (26) we can obtain reduced equations for the stochastic amplitude and phase difference: ~ / - -~- 1 -
- g s i n , + 2A~o~ + ~~l(t)'o~, (28)
where ~1,2 are statistically independent Gaussian noise sources: I~i(t)~j(t + ~ ) ) 8i,j8(~) and {~i(t))= 0, where i,j = 1,2. Again let us consider the most interesting situation corresponding to the region I in Fig. 10. With small noise, D << 2s~402, and weak external signal this probability distribution is centered at A ~ A0, e.g., the amplitude will be very close to its unperturbed value A0. This gives us possibility to consider the second equation of (28) separately by substituting A0 instead of A + - A - A-oc o s , + A0o~, ~2(t)"
(29)
Therefore, the dynamics of the phase difference ~ can be viewed as the motion of the overdamped Brownian particle in the tilted potential U(~) (see Fig. 12) with the slope defined by the detuning. The parameter ~t/Ao and A define the height of the potential barriers. The presence of noise leads to the diffusion of the instantaneous phase difference in the potential U(~), that is, ~(t) fluctuates for a long time inside a potential well (that means the phase locking) and rarely makes jumps from one potential well to another (i.e., displays phase slips) changing by 2n. Time series of the phase difference for different values of the noise intensity D obtained numerically by integration of Eq. (29) are shown in Fig. 16. As clearly seen from this figure, for small noise intensities (D = 0.02) the instantaneous phase difference remains bounded during a long observation time. The increase of the noise
Phase synchronization." f r o m periodic to chaotic and noisy
49
400 350 300 250
D=0.22
200 -e-
150 100
.....
D - O.02
-50 -1011
Fig. 16.
0
2000
4000
6000
Time dependence of the instantaneous phase difference for the indicated values of noise intensity. Other parameters are A = 0.06, s --0.15.
intensity leads to the decrease of the duration of residence times inside a potential well and causes the hopping dynamics of phase difference (D = 0.07). Although the phase locking epochs, qb ~ const, are clearly seen, the mean value of the phase difference increases in time. Evidently, that for a large slope (detuning) and for a small value of the periodic force amplitude the jumps from one metastable state to another become very frequent and the duration of phase locking segments becomes very short. This leads to the growth of phase difference (see the dependencies of qb(t) for D = 0.22) causing the change of the mean frequency of oscillations.
4.2. Fokker-Planck equation description The Fokker-Planck equation corresponding to the stochastic differential equation (29) is (further on we assume A0 -- 1 and replace D ~ 2 De02 for simplicity)
~p(d?, t)/~t = -~/~qb[(A - pcos dp)p(dp, t) - D ~p(d), t)/~d?].
(30)
The phase difference qb is unbounded variable and the stochastic process defined by Eq. (30) is nonstationary. However, since coefficients of the Fokker-Planck equation are periodic with respect to d~, we can introduce the probability distribution P(qb, t) of the wrapped phase, which is bounded in [-n, re] O(3
P(qb, t) -
Z
p(qb + 2 rm, t).
(31)
t'/~- - - ( 2 0
The Fokker-Planck equation for P(~b, t) has the same structure as Eq. (30), but now we can find the stationary probability density Pst(qb), taking into account the periodic boundary conditions P ( - n , t ) = P(n,t) and the normalization condition f n p(~, t)dqb = 1 [20]
50
L. Schimansky-Geier et al.
D
a~
exp
A. q~ - . sin(,).] dql, D J (32)
-rt<~<=,
where N is the normalization constant. In a particular case A = 0, e.g., when the natural frequency of the oscillator matches exactly the driving frequency, the stationary probability density of the wrapped phase difference takes a simple form Pst(~)-2rff0(~) exp
cos(~ + ~/2) ,
-re <_ ~ <_ re,
(33)
where Io(z) is the modified Bessel function. For a large noise intensity Io(g/D) ,.~ 1 and exp(la/Dcos(~ + re/2)) ~ l, thus the stationary probability density tends to the uniform one, P~t(~) = 1/(2re). This situation corresponds indeed to the absence of synchronization. Otherwise, for very weak noise, cos(~ + re/2) ~ 1 - (~ + rc/2)2/2, Io(g/D) ,.~ exp(g/D)/v/2rcg/D, and the stationary probability density has Gaussian shape: &t(~) - exp(+g(4) + rc/2)2/D)/v/2rcg2/D 2 centered at ~0 - -re/2. The wellexpressed Gaussian peak in the stationary probability density of the phase difference indicates phase locking. In the limit D ~ 0 the probability density becomes 8function: limD-~0P~t(4)) = 8(4) + re/2). The mean frequency of oscillations (m) can be found via the stationary probability density P~t(~) of the wrapped phase difference (f.O) -- (+) -+- 031 =
[m -- g c o s ( * ) ] P s t ( * ) d q b ,
(34)
where co~ is the frequency of a synchronizing signal. The dependencies of the difference between the mean frequency and the external frequency versus detuning parameter are presented in Fig. 13 (curves 2 and 3) for different values of noise intensity. With the increase of noise intensity the region of frequency locking shrinks, which is another manifestation of noise-induced breakdown of synchronization in the Van der Pol oscillator. Let us now go back to Fig. 16, where the noise-induced diffusion of the unwrapped phase difference is shown. Let in an initial state the distribution of the phase difference be concentrated at some initial value ~0, P(~, t = 0) = 8(~ - ~0), so that (~)2(t -- 0)) = 0. Due to noise the phase difference diffuses according to the law [20] (~)2(t)) c~ Dell" t, where Decf is the effective diffusion constant which measures the rate of diffusion
l d [(r
Deff - 2 ~
(t)) - (*(t)):]
(35)
In the absence of noise there is no phase diffusion, Deft = 0. With the increase of noise intensity the effective diffusion constant increases also, so that the diffusion is speeded up. In Fig. 16 this situation corresponds to very frequent phase slips. The effective diffusion constant is therefore connected to the mean duration of the phase
Phase synchronization." from periodic to chaotic and noisy
51
locking epochs: the longer are phase locking segments, the slower is spreading of phase difference, and, thus, the smaller is the effective diffusion constant. Analytical estimation of the effective diffusion constant for the case of Eq. (29) can be obtained by solving the Kramers problem [60] of the escape from a well of the potential U(qb)[20] Deff
=
V/g2-A2 [1 + e x p ( - 2 ~ ) 2re
] [ 2 (V/la2 - A 2 - A a r c s i n A ) l j xexPk-D (36)
Thus, the effective diffusion measures a number of 2rt-jumps of the phase difference per unit of time and grows exponentially with the increase of noise intensity. Due to the phase diffusion, the definition of synchronization in the presence of noise appears to be "blurred". That is why the conditions of synchronization should be defined in a statistical way, by using the notion of an effective synchronization [29]. It can be done by imposing restriction to some statistical measures of corresponding stochastic processes. In particular, the effective synchronization can be defined based on: (i) stationary probability density of the wrapped phase difference. In this case a peak in Pst(qb) should be well expressed in comparison to the uniform distribution; (ii) mean frequency. The mean frequency should match (indeed up to some small statistical error) the driving frequency; (iii) effective diffusion constant. This measure should be small enough, so that the phase locking segments are much longer than the period of external force. In other words, this restriction requires that the phase of oscillator is locked during considerable number of periods of external signal and can be expressed as D e tf ~
C01
2 n--, n
(37)
where n >> 1 is the number of periods of external force. Using these definitions of effective synchronization we can define synchronization regions in the parameter space.
5. Synchronization of systems with complex dynamics 5.1. Phases in the analytic signal representation Complex behavior requires new concepts of the phase definition compared with the case of harmonic and quasi-harmonic oscillations. In this section we consider a possible generalization of the definition of the instantaneous phases based on the analytic signal representation. We do not account for generality, since there are many other possibilities and the usage of possibly applied definitions depends strongly on the dynamical system and the signals under consideration. Nevertheless, we will show that using this concept, synchronization of nonperiodic and even
L. Schimansky-Geier et al.
52
stochastic oscillations can be studied in the framework of classical notion of synchronization presented in the previous sections for periodic oscillations. The concepts of analytic signal [61,62] is a generalization of the analytic expansion of oscillations as presented in Section 3.1. For a signal x(t) one constructs a complex signal w(t) by
w(t) = x(t) + iy(t) -- A(t)e i*(t).
(38)
The definition of the instantaneous amplitude and phase is straightforward
A(t) = V/X2(t)+y2(t),
~ ( t ) = arctan(Y),
(39)
as well as of the instantaneous frequency
o (t)
d O(t) - a21([x(t)~(t) - y(t)k(t)]. t)
(40)
So far the choice ofy(t) was arbitrary. The key point of our further consideration is the analytic signal representation [61, 62] where as y(t) the Hilbert transform of the original process x(t) is selected. There are several possible ways to motivate the usage of the Hilbert transform. We will sketch two mathematically equivalent procedures. Both originate from the fact that the new representation converges with the usual phase definition in the limit of periodic harmonic processes. How did we proceed for harmonic processes? From the signal x(t) = A cos(or) we defined y(t) earlier in (12), particularly, we took y ( t ) = - A c 0 s i n ot. The map x--~ y differs for positive and negative frequencies. For positive frequencies y is advanced with respect to x, while in the case of negative frequencies it is delayed. For harmonic processes one easily sees that the mapping consists of a phase shift y(o~t) = x ( o t - rt/2) for c0 > 0 and y(ot) - x( ot + x/2) for negative frequencies, respectively. Additionally one may require that y(t) results from a convolution of x(t), i.e.
y(t) =
K(t - z)x(z) dz.
(41)
~C
How does the convolutional kernel look like for the harmonic process? Convolution of x(t) means multiplication of the Fourier transforms of x, y and K
Yn = Knxn.
(42)
The necessary phase shift is obtained simply by multiplying xn by +i, and the Fourier transform of the kernel reads Ka -- - i sign(f~)
(43)
with sign(0) = 0. Performing the backward Fourier transform it gives 1
K(t) = - - . rtt
(44)
Phase synchronization: from periodic to chaotic and noisy
53
The described procedure can be generalized to complex processes and gives the advice on how to construct the analytical signal. With the kernel (44) we define y(t) in (38) as
y(t) - H[x] - .zrl/~~t_ . . . . ~x( . 'cd'c ) - Telj'o ~ x(t- "c) -.c x(t + "c) d'v,
(45)
where the integral is taken in the sense of a Cauchy principal value. Expression (45) represents the Hilbert transform of the original process x(t). Similarly, we can start with the spectral decomposition of the original process [64]
x(t)
1 -- ~nn
x~
ei~t
1 dfl - -
f0
(Xc(fl) cos(fit) + xs(fl)sin(fit)) dfl,
(46)
where xc and Xs are cosine and sine transforms of x(t). For the harmonic process we obviously have xc(c0) - - A S ( ~ - co) and xs(o~) = 0. The conjugated y is defined by shifting the phase by - ~ / 2 which can be realized by multiplying the spectral amplitudes by exp(-iTc/2). Thus we obtain the spectral representation of the Hilbert transform as
'/0 _ _1
e
+Xs O o i sin O, /dO
(Xc(n) sin(nt) - Xs(n)cos(nt)) dn
and for the harmonic process we indeed obtain the correct answer From (47) one particularly finds with f~ > 0 H[exp(int)] = - i exp(int),
H[exp(-int)] = i exp int,
(47)
y(t) = A sin(rot). (48)
which again determines the Fourier transform of the kernel of the assumed convolution Kn = - i sign(~).
(49)
In the result one retrieves the Hilbert transform in the shape of a convolution (41) with the kernel (44). As a linear transformation H[x] obeys several useful properties [64]. Every Hilbert transform of a linear superposition of two signals is the superposition of the separate Hilbert transforms. If the time of the signal is shifted by some amount it shifts also the argument of the Hilbert transform. The Hilbert transform of a Hilbert transform gives the negative original signal. Even functions give odd Hilbert transforms and vice versa. The original signal and the Hilbert transform are orthogonal. The full energy of the original signal, the integral of xZ(t) over all times, equals the energy of the transformed one. Hilbert transform can be also performed for stochastic variables. In the case of a stochastic signal x(t) the convergence of this integral should be understood in the
54
L. Schimansky-Geier et al.
mean square sense [63]. The transformed signal is correlated in the same manner as the original signal. But both are anticorrelated with a correlation function being the Hilbert transform of the autocorrelation function of the original signal. The stochastic instantaneous amplitude A(t) and phase ~(t) are thus defined by Eq. (38). Parameters for an effective synchronization can be found from the study of the mean frequency
(0,)
-
rl i~mT
l fr
m(t) dt
(50)
and the diffusion coefficient (35) of the instantaneous phase difference ~ ( t ) 9 (t) - m s t with respect to an inputting periodic signal with ms. The concept of analytical signal found applications in the theory of nonlinear oscillations [64] as an unique technique to separate motions with different time scales. To demonstrate the wide ranging property of the Hilbert transform, let us consider amplitude-modulated signal in the form x(t) = 2q(t)cos(mft), where q(t) = cos(rest) is slowly modulated amplitude with of > COs.The Hilbert transform of x(t) is H{2q(t) cos(oft)] - H{(cos((mf + ms)t) + cos((mf - ms)t))]
= sin((mf + ms)t) + sin((mf- ms)t) = 2q(t)sin(m#).
(51)
Hence, only the fast part of the signal was transformed. A similar expression yields for a high frequency sine-carrier mode. Let A(t) be an amplitude and qb(t) a phase which are not harmonic but slowly varying compared with a carrying frequency mc and the Fourier spectrum is concentrated in a band [So[ < mc/2. How are the instantaneous amplitude and phase of the signal x(t) = A ( t ) c o s ( m e t + ~(t)) defined from the analytical signal representation? One may expand the effective amplitudes of the signal x(t) - A(t) cos 4)(t) cos(met) + A(t) sin ~(t) sin(met) in Fourier modes all having frequencies smaller than me. Hence, every mode obeys the derived property. In result, the Hilbert transform of x(t) y(t) = H[A(t)cos(met + 4)(0)] - A(t) sin(met + 4)(0)
(52)
and the slowly varying parts remain unaffected. 5.2. Periodically driven chaotic systems
In this section we demonstrate the phase of synchronization of chaos on the example of periodically driven oscillator with inertial nonlinearity [58a]: 2 = mx + y - xz + B sin(ml t), p = -x, = - g z + g(x + Ixl)x/2,
(53)
where m, g are parameters of the system. This oscillator is representative of the systems with saddle-focus separatrix loop [65] and a regime of dynamical chaos can be realized in the autonomous case (53) for m = 1.1, g = 0.3. This regime is
55
Phase synchronization."from periodic to chaotic and noisy 0.20 0.15
s~
~- = B=O.04 ~-- -::, B - 0.02
S
s/
0.10 s,r 0.05 0.00 -0.05 -0.10 e
-0.15 -0.20 -0, 5
- 0 10
- 0 05
0.00
0.05
0.10
Fig. 17. The dependence of the difference between the mean frequency of chaotic oscillations and the external signal frequency vs. detuning parameter for different values of the amplitude of periodic force. Other parameters are m - 1 . 1 , 9 - 0.3.
characterized by a b r o a d b a n d power spectrum, however it also contains sharp peaks at a basic frequency m2, its harmonics n0)2 and sub-harmonics co2/2n. We take the external frequency as ml = m2 - A and consider some characteristics of chaotic oscillations for different values of amplitude B and detuning A. The results of calculation of the region of synchronization and the phase difference using the analytic signal concept are presented in Figs. 17 and 19. As seen from these figures, chaotic synchronization occurs for A - -0.02121 and breaks down with the increase of the detuning parameter. Using the definition of mean frequency (50), we calculate a dependence of the difference (0)2) --0)1 VS. A. The mean frequency
(a)
o
,
.
,
9
,
,
,
,
,
,
,
(b)
,
~2 -10
i/,
-20
Jt -50
l
-40
-60
0,0
. . . . . . . . . . . . . . . . . . . 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
-100
2.0
-150
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
Fig. 18. Power spectra of chaotic oscillations and the external signal inside (a) and outside (b) the synchronization region. Other parameters are: m = 1.1, 9 = 0.3, B = 0.02.
L. Schimansky-Geier et al.
56
200 150 A=0.03171 100
~-
A = -0.02121
0 J -50
~=-0.02421
-100
" ~
-150 -200
Fig. 19.
o
so/~o
'
loo~
'
l~xx,o
2000o0
The instantaneous phase difference for different values of the detuning parameter. Other parameters are m - 1.1, 9 = 0.3, B - 0.02.
locking of chaotic oscillations takes place in a finite range of detuning parameter values. The synchronization region width is determined by the external force amplitude B (see Fig. 17). The effect of frequency locking can be also illustrated in terms of power spectrum. In Fig. 18 we show the power spectra of chaotic oscillations and of external signal for A = 0.01 (a) and A = 0.03 (b). As clearly seen, the basic frequency co2 is tuned up as a result of synchronization and coincides with the external frequency, co2 = col. As we already mentioned, the entrainment of the basic frequency is accompanied by the mean frequency locking and also phase locking. A peculiarity of system (53) is the presence of the basic frequency in the power spectrum which almost coincides with the mean frequency (co2 - (co)). A similar picture is observed in all cases when a chaotic attractor in the phase space of a synchronized system is born in accordance with the Shilnikov's theorem (for example, the case of synchronization of the R6ssler system). Recently the problem of bifurcation mechanisms of transition to phase synchronous chaos has been addressed [66-68] in terms of unstable periodic orbits. In particular, it has been shown [67] that for chaotic systems with broad spectrum of intrinsic time scales perfect synchronization cannot achieved due to existence of periodic orbit with very long periods.
5.3. Synchronization of stochastic systems with continuous output From the previous consideration of the Van der Pol oscillator one may conclude that noise plays a negative role shrinking the regions of synchronization. The period of the limit cycle defines the mean characteristic time of the oscillator. Noise causes deviations from the mean value of this time which becomes more and more widely spread and, in consequence, noise deteriorates synchronization in the Van der Pol oscillator.
Phase synchronization: from periodic to chaotic and noisy
57
Oppositely, there are systems whose characteristic time only appears in the presence of noise and in which noise plays a positive role. One example, the noisy Schmitt trigger, we presented already in Section 2. Generally these are dynamical systems with a single or several thresholds which can be surmounted in the presence of noise only. Candidates of such behavior are excitable, bi- and multistable systems. After surmounting(s) and, if the motion is bounded and ergodic, trajectories return to their original value or the neighborhood of it. The characteristic period of such processes is determined by noise. Let us consider the motion of a particle in a potential U(x) dU(x)
m2+ v 2 + ~
= O,
(54)
where m is the mass of a particle and v is the friction coefficient (Stokes friction). For the linear potential U(x) - mo32x2/2i yields with 7 - vim the above-studied linear damped oscillator of Section 3.1. In this section we consider the motion of a particle in a double-well potential U(x) - - a x 2 / 2 + bx4/4. Under the assumption of large friction the inertia of the particle can be neglected (m2 << v2). The velocity in (54) is adiabatically eliminated yielding the dynamics for the coordinate k=
ax-
3.
(55)
Therein a = a/v and [3 = b/v are the parameters defining the depth of the potential wells and the steepness of their slopes. System (55) possesses three states of equilibrium, two stable c -- -1-x/~13 and one unstable between them corresponding to the minima and the maximum of the potential U as shown in Fig. 20a. In the absence of external excitations the particle approaches to one of the stable states with characteristic time ~r = 1/(2a). Adding to (55) a noise source with intensity D, for simplicity, we choose zero mean Gaussian white noise,
(a)
(b) 4~ . . . . . . . . . .
U(x) 2.0
x(O o.o
-c
0
c
-z0
tI i_
-4.0 0
t2 t ~
tk
1000
2000
3000
t
Fig. 20.
(a) The double-well potential, (b) time series x(t) for an overdamped Brownian particle.
L. Schimansky-Geier et al.
58
~c - a x -
13x3 + 2v/2-D~(t),
(56)
changes the situation qualitatively. The stochastic system (56) has a new physical feature in comparison with (55): under the influence of noise the particle performs transitions from one potential well to another one. These transitions are noiseassisted escapes over the potential barrier AU at random time moments (Fig. 20b). The noise leads to the appearance of random switchings between the two potential wells. These stochastic "oscillations" can be characterized by means of their mean return time T~et = 7'1 + Tz
(57)
with T1 = T2 = Tmfpt for the symmetric system (56) being the mean times to escape from one well. Analytic expressions of these times were first given by the mean first passage time analysis made first by Pontryagin et al. [69] and by escape rate analysis of Kramers [70] who has calculated the rate rk of leaving an attractor region. Explicitly it yields Tmfpt
~ - rk - -
"~
exp -- ~
.
(58)
Despite the fact that the jumps occur according to a Poissonian waiting time distribution we will look at the dynamics of (56) in the mentioned sense that the motions forwardly and backwardly over the threshold can be considered as stochastic self-sustained "oscillations". Then it is natural to pose the problem of synchronization [35]. Is it possible to synchronize the random oscillations in (56) by an additional harmonic force? And if yes, what are the features of this effect? To answer these questions let us consider the driven overdamped stochastic bistable system prioritized in studies of stochastic resonance [71,31] Jc - a x -
13x3 + 2v~-D~(t) + acos(o~lt + ~0).
(59)
System (56) has no natural deterministic frequency. At the same time, this system is characterized by the noise-controlled time scale represented by the mean time of escape from a potential well (58). In the frequency domain this time scale defines the mean switching frequency of the system. The added periodic signal with amplitude a represents an external "clock" with frequency col and initial phase ~0. Suppose that a, 13 > 0 and ~0 = 0 in (59) and the periodic modulation amplitude a is always sufficiently small a < a0--~ 2(3)
3~ ,
(60)
which guarantees that transitions do not occur without noise. Furthermore, we suppose that the modulation frequency is low as compared to the frequency of the intrawell relaxation rates 1/rr. The direct application of the analytic signal concept to the investigated model (59) gives the following stochastic differential equation (SDE) for the analytic signal w(t) [35]:
Phase synchronization: from periodic to chaotic and noisy
- Qtw - ~ (3 AZw + w 3) + E(t) + ae i~
59
(61)
where E ( t ) = ~(t) + iq(t) is the analytic noise with q(t) being the Hilbert transform of ~,(t). From Eq. (61) it is easy to derive the SDEs for the instantaneous amplitude and phase: f~
A -- atA - 2A3[1 + cosZ(qb + colt)] + a cos #p + ~1 (t), + - -col
a I3 2 1 ~sin ~ - ~A sin[2(~ + colt)] + ~ , 2 ( t ) ,
(62)
where ~(t) = ~(t) - colt is the instantaneous phase difference and the noise sources ~l,2(t) are defined by the following expressions: ~I (t) - ~(t) cos ~ + rl(t) sin tI), ~z(t) - q(t) cos ~ - ~,(t) sin ~.
(63)
As seen, the second SDE in (62), describing the evolution of the phase difference, has the structure similar to Eq. (19) with noise (29). But in contrast to the case of periodic oscillator, the term corresponding to the natural frequency is absent. Instead, the term col occurs singular in (62). This once again indicates that system (56) has no deterministic time scales, e.g., the corresponding rotation term is still hidden in the nonlinear dynamics with multiplicative noise and will occur after averaging. For computational reasons it is more convenient to integrate numerically the original SDE (59) and then to perform the Hilbert transform by well-established techniques (see, for example, [72]). Numerically obtained phase differences versus time defined by means of the analytic signal concept are shown in Fig. 21a for different values of noise intensity. The slope of the curves gives the difference between the instantaneous frequency (40) of x and col. As seen, there exists an optimal noise level for which the frequencies converge, i.e. the slope vanishes. Additionally, for this noise and the selected amplitude of the driving force the phase is locked during the observation time. Deviations from this optimal noise intensity originate phase differences and the appearance of the phase slips which leads to a systematic nonvanishing slope of the curves. Resulting mean frequencies defined via (50) are presented in Fig. 21b for different values of the amplitude of the signal and versus noise intensity. These curves demonstrate locking of the mean frequency of the output for optimal noise. It was firstly reported in [34] for the case of the stochastic Schmitt trigger (see Section 2). Fig. 21b illustrates both the presence of a threshold for synchronization and the broadening of the synchronization region with growing amplitudes of the modulating force. In the absence of the signal the mean frequency grows monotonically in accordance with Kramers formula (58) multiplied by ~. For larger amplitudes starting about a ~ 1 the dependence (co) versus D has a plateau where (co) does not
L. Schimansky-Geier et al.
60
(a)
100 80 60
I
/
40 20
3
0
.
.
D=l.05 _
_
•
.
. . . . . . .
S
t (b)005 0.04
0.03
.
_
f
D=0.80 .
.
.
.
.
.
.
.
.
.
4>
.
0.02
-20 -40
0-0.44
.-6O
4
0.01
-80 -100
0
. . . . . . . . . . . . . lOO 200 300 400
50o
600
0.00 0.1
i
0.3
,,'To
0.5
0.7
0.9
1.1
1.3
,
1.5
D
Fig. 21. (a) The instantaneous phase difference calculated through the analytic signal concept for the indicated values of noise intensity. The parameters are a = 3, a - 5, 13= 1, c01 = 0.01; (b) The mean frequency (50) ( ~ ) and mean switching frequency (68) (.) vs. noise intensity for different values of the periodic force amplitude: a = 0(1), a--1(2), a - 2(3), a - 3(4). Other parameters are the same as in Figure (a). depend on D. Further increase of the amplitude leads to the broadening of the synchronization region [35]. As outlined above the occurrence of the frequency entrainment is accompanied with, at least, a smaller number of phase slips. Therefore, not only the mean frequency is locked but the mean phase differences is minimized. In accordance with the introduced definition (see Section 4) one might call x effectively synchronized with the inputting external periodic force. To prove this one has to calculate the effective diffusion constant Doff from (35) using phases defined via the analytical signal. Results from computer simulations are presented in Fig. 22. With the increase of the amplitude a the presence of the minimum in the dependence of Deff versus noise intensity becomes brightly pronounced. The quantitative values of Deft testify that phase and frequency locking takes place over about 103 periods TI = 2~/COl of the diving signal. Therefore, the presence of effective synchronization at the basic tone was verified for the standard example of stochastic resonance. It manifests itself in the mean frequency locking by the external signal and minimal diffusion coefficients. It is an important fact, that the introduction of noise in the system has led to the ordering of its phase dynamics: the increase of the noise intensity caused the deceleration of the phase difference diffusion.
5.4. Noise-enhanced synchronization of excitable media In this section we describe the mutual synchronization of locally coupled non identical FitzHugh-Nagumo oscillators [73]. This discrete network of diffusively
Phase synchronization: from periodic to chaotic and noisy
10 0
.
,
,
t '
.
~
9
,
,
~
61
,
r ,
,
9
i
~.. 4 10 -~
,/~..~j~,.,..~
.r162
~
10 -2
=m 10~ Q 10~
2~
o
f
*--**-z0
o a.-.3.o
10 -s
10 -6
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
D
Fig. 22. The effective diffusion coefficient vs. noise intensity for different values of the external signal amplitude. Other parameters are a = 3, ~ = 5, 13= 1, co] --0.01.
coupled oscillator mimics a noisy excitable media which is of high interest in biology, chemistry, physics and is described by the following set of stochastic differential equations: u3
~(t, .~(t..)
,,) -
u - -~- - w + ~, ~ [ u ( t , , , ' ) ~ + ~(.)
+ 24~(t..).
- u(t,
,,)], (64)
where u(t, n) and w(t, n) are again fast and slow variables, respectively. For the onedimensional case these variables are defined on a chain n = 1 , . . . ,N while in the two-dimensional case u and w are defined on a square lattice. The sum over the neighbors stands for the discrete Laplace operator in one and two dimensions modeling the local interactions with coupling strength 7. The parameter a(n) depends on the spatial variable n and is assumed to be a uniformly distributed r a n d o m variable. In this way we simulate a network of nonidentical F i t z H u g h - N a g u m o elements. Further on we assume stochastic forcing by Gaussian white noise ~, statistically independent in space and with zero mean (~(t,n)~(t + ~,m)) = ~m,,5(~). The number of parameters in the model can be reduced by introducing the spacing of the lattice, l, and then scaling it as 1 = x/~10. Then the coupling factor in front of the Laplacian becomes one but the noise intensity changes. In the result the effect of the noise and the dependence on the coupling strength can be discussed on behalf of a c o m m o n parameter Q - D~ x/~, where d equals to 1 or 2 for the oneor two-dimensional case, respectively. For example, strong coupling decreases the action of the noise and the large noise case responds to the weak coupling limit. That is why in the following we fix u and use the noise intensity as a control parameter.
L. Schimansky-Geier et al.
62
For a network of coupled FitzHugh-Nagumo oscillators it is natural to expect that for strong enough coupling the firing events of particular elements will be synchronized. In our numerical simulations we fixed e = 0.01 and 3' = 0.05, while the activation parameters a(n) are random numbers distributed uniformly on [1.03, 1.1]. This leads to a distribution of spiking times if noise is applied. We also use free boundary and random initial conditions. In the absence of noise any initial state of the system evolves to an equilibrium state. Depending on noise strength D, for a sufficiently large value of the coupling strength three basic types of space-time behavior can be observed. For a small noise centers of excitation are nucleated very seldom in random positions in the media giving rise to propagating waves. In this case different cells in the media are correlated only on a short time scale of a mean time of wave propagation and there is no synchronization between distant cells. For a large noise strength, the nucleation rate is very high and the media is represented by stochastically firing cells. However, for an optimal noise intensity the medium becomes phase coherent: firings of different and distant cells occur almost in phase. Those three cases are shown in Fig. 23. Two-dimensional excitable media (see lower part of Fig. 23) demonstrate qualitatively the same behavior: for a weak noise the system possesses noise-induced target waves which are initiated in a random position in the media. Collapsing of such waves cannot exhibit stable spiral waves since the velocity of the waves at the intersection is always directed outside of the intersecting region. Therefore no new open spirals may occur. However, in the case of parametric noise the propagating fronts may locally backfire small directed spots which break propagating excitations and make spirals possible [74]. At the optimal noise level the whole media oscillate nearly periodically (see the middle row in the lower part of Fig. 23). Finally, the case of large noise is represented by randomly flushing clusters. The same behavior has been observed in a model of visual cortex [75]. We describe this effect for the one-dimensional case in terms of phase synchronization. We introduce the instantaneous phase ~(t,n) of the nth element using the analytic signal representation. The analytic signal z(t,n) is defined as z ( t , n ) = u(t, n) + iy(t, n), where y(t, n) is the Hilbert transform of the original variable u(n, t) in the time domain:
y(t,n)--~
1
~ t-z
n)
dl:
and
~(t, n) = arctan
y(t,,,)
.ui,,,/
We choose the central cell in the media (n = N/2) as a reference element and then calculate the phase differences dp(t,k) -- ~ ( t , N / 2 ) - ~ ( t , N / 2 + k), k = - N / 2 , . . . ,N/2. The results of calculations of the phase differences are shown in Fig. 24 for three values of noise intensity. For the optimal noise level the phases of different oscillators are locked during the time of computations. In the case of large distances between oscillators the phase fluctuations do indeed grow. Nevertheless, the phase difference is still bounded during long periods of time in a certain range. For nonoptimal noise intensities a partial phase synchronization with randomly occurring phase slips can be observed only between neighboring elements (top
63
Phase synchronization." from periodic to chaotic and noisy
I ~
~
.
-r:? ~'~'..~!:'~1
.
.
.
.
.
.
~ I
.....
~ - - - - ~ - <-'-;-"-~1 "~
~
"""
I
. I~. [..;_ .~ll i _ .ill I.~. I1.1~.~,~.--~;/-
'
" " " " . . . . """ " ~ "
,
".:,~:.'::,.'--:.::<,v :.'..,:,-.:.'.~ 4;.~.,:,;
"" "=" " " " " " ' " " ~ ' t
,"-~"-q~-"-"
Y_"~--"
.,..,,..,..,,_,,._.,.,_-.-~..,..~
""
r, .-EJ- i~..." . . . . .. ..... . I_-'"-'-, [_..S
-v--'-----
:.------.. --...,~.-1
I::~-- . . ....... .. .
---""'"q "'-'---.,-"'""
----"---. .,. , % ,...,
...--.,. , . " ' - - . - - . , - - " ' " ' ' " ' - -
......
"'"
t]--
. . . . . .
- -.'T...;L.-:.---! :
"'.,-
i
,
,,
.-',T....--"-'-=."
---:~:~%-:-~-~-~;._-; . . . . ,
IlllllllIIliIIIlIIIIIlIlIIIl.|.,.,....IllIIllIIIlIIllIlIIllI
'
!:: '''~ :"-~'~ ~-"':""=':il ~-':'.:--,'.,'-; "-'---- :'.,-~-'! ,..:; -;~..,,..:..~,,..q~: ;....:.,,.~,%
r::: :':--:-:" -" - : - -'- -- ' -:" -:- - - :" - :"
,-:----. .-.-::--,---:---;,,.-:~-. :.:~-..:.:.-:.~,':;.;;.'.?::.-:, ~,::';:" :'
I
. " - ' . ,.-.',," --"'- "- ":" "-". ;'-"', '-'~'-.---- ; r-,.
E':.:
.,]:,.,.,., :.-...:-.:,....-., .:.::..,...:.,-,..:.,.,;
IIIIlIlII | ' ' I | I I I I
I
I
,
,
!
!
!
!
I l
|
I'IllI''IIIIIIIl
~.~ ~ .
~
|
d
,,.'..,
. ..... .,',. .... ,,'-;-:,-'-'"-'.,..
I I
I I
.~.-'.%,..... -,...-,.
I I I I
~ ~ ~
," I I I
........... I
I
! ii
a .
i
,
,
I I I I
I I I I
~
~
,
Fig. 23. Spatio-temporal evolution of the system described by Eqs. (64) for different values of noise intensity (from the left to the right) cy = ~ = 0.015; 0.025; 0.1. The upper figure corresponds to the one-dimensional case with N = 500 elements: the vertical axes correspond to time and the horizontal line is space variable. Black dots indicate firing elements. The lower figure represents three frame sequences for the two-dimensional case of 200 • 200 lattice.
64
L. Schimansky-Geier et al.
5O A
0
v
-50 -100 A O
100
T--
12 . . . . .
0
'1
-100 -200 ~,
100
O
oJ .,..7
0
-O- -100 -200 -300
0
10000
t
20000
30000
Fig. 24. Phase differences between indicated oscillators for different values of noise variance cy = v/2D: 0.015 (1), 0.025 (2), 0.1 (3). Other parameters are the same as in the previous figure.
graph). For larger distances the diffusion of the phase differences becomes very strong and synchronization breaks down. Indeed the same results were obtained with another definition of the phase through the spike times. In our case an appropriate measure of stochastic synchronization is the crossdiffusion coefficient defined as Dell(k) - ~ 1d [(dp2(t, k) ) _
(~p(t,k))2] .
(65)
This quantity describes the spreading in time of an initial distribution of the phase difference between the N / 2 t h and all other elements. If this diffusion constant decreases longer phase locking epochs appear and, therefore, phase synchronization becomes stronger. A single measure is obtained by averaging Deft(k) over the spatial distance 1
Deft -- ;
N/2
Z
(66)
Deft(k).
k=-N/2
The dependence of this averaged effective cross-diffusion constant versus noise intensity is in Fig. 25 and demonstrates a global minimum at non-zero noise level. Thus, phase synchronization can be enhanced by tuning the noise intensity. Synchronization is also defined as a frequency locking effect. In the case of a stochastic excitable system one must use the mean frequencies (c0(n)) - {O(t, n)) of the oscillators [20]. Due to the given distribution of the the elements in the
a(n),
65
Phase synchronization." from periodic to chaotic and noisy
102
!
I
101
10o it:
0
10
-1
10 -2
1 0 -3
,
0.01
~
0.04
J
0.07
O.10
Fig. 25. The averaged effective cross-diffusion constant versus noise intensity. Other parameters are the same as in the previous figure. The dashed line corresponds to uncoupled lattice (3, = 0). network have different randomly scattered frequencies for vanishing coupling. We have numerically built the distribution of the mean frequencies, calculated for every element across the network, P((co)), for different noise intensities. The results are shown in Fig. 26. A remarkable effect of noise-enhanced space-time synchronization can be seen from this figure. F o r the optimal noise intensity, when the 200 (a)
150
v8 100
5O
0
0.8
1.0
1.2
1.4
1.6
1.8
<~>
Fig. 26.
Distribution of the mean frequencies of oscillators for different values of noise variance cr - ~ :
0.015 (1), 0.025 (2), 0.1 (3).
66
L. Schimansky-Geier et al.
phases of different oscillators are locked for long periods of time, the mean frequencies are entrained and the distribution of the mean frequencies becomes extremely narrow. For nonoptimal noises the mean frequencies show rather wide distributions indicating the lack of synchronization. The last figure clearly indicates noise-induced space-time ordering in the system based on synchronization mechanism. This behavior can be quantified further by calculating the mean square deviation of the mean frequencies averaging over the network which shows a deep minimum at the same optimal noise intensity as the effective diffusion constant [73]. The mechanism of the noise-induced synchronization is rooted in the behavior of a single uncoupled element. The noise-induced oscillations are most coherent at a nonzero noise intensity and the quality factor of the noise-induced peak in the power spectrum is maximal. In this regime the mean firing rate (or the mean frequency) of the system approaches the peak frequency in the power spectrum. In the case of weak noise the mean firing rate depends exponentially on the control parameter a (a > 1). However, with the increase of noise the dependence of the mean frequency on a becomes very weak. That is why with the increase of noise from a very low level, the mismatch between characteristic frequencies of elements in the coupled array decreases providing better conditions for mutual synchronization. On the other hand, the noise-induced oscillations becomes more coherent. These effects will tend to facilitate synchronization among elements in the network. Large noise, alternatively, destroys again the coherence of local stochastic oscillations (the frequency and phase fluctuations grow rapidly) and also leads to destruction of spatial coherent structures. The optimal noise intensity at which synchronization is most pronounced depends on the range of the distribution of activation parameters a(n): with the increase of the range of disorder the optimal noise intensity shifts towards smaller values.
6. Synchronization in stochastic point processes 6.1. Phases for discrete events In this section we will deal with synchronization in connection with stochastic point processes. We will interpret changes of the phase in a stochastic system as events at random times t~ which occurrence is due to a distribution function or a dynamical process. For example, the above-considered Schmitt trigger as well as the stochastic bistable system are well described by a point process: in this case the random times tk are moments at which switches from one state to another occur. Thus, we map a continuous process x(t) to a point process tk, where tk are moments of time at which the trajectory of the system crosses some secant plane. The time between two successive crossings is T ( t ) = t k + l - tk, tk < t < t k + l . In this case the instantaneous phase of x(t) is defined as (I)(t)lin - - r C ~t--+ -t~
tk+ l -- tk
rtk,
tk < t < tk+l,
(67)
Phase synchronization: from periodic to chaotic and noisy
67
which is a piecewise linear function of time. The mean frequency reads
1 (co)- lim - - ) s
(68)
m-+ocM k=l tk+l -- tk
A second definition of a phase neglects the linear interpolation between two subsequent events taking (I)(t)discr -- g k ( t ) -- ~ ~
O(t - tk)
(69)
k
with k(t) being the sequence of increasing integers and tk is again subject to some dynamics or distribution. 0(x) stands for the Heavyside function. The mean frequency of this definition takes the average of a sequence of delta pulses (o~)(t) = r t ( ~ 8 ( t - k
tk)).
(70)
For the overdamped bistable oscillator k(t) defines a dichotomic process
x(t) = exp[i (I)discr(t)],
(71)
which represents a time sequence of successive states + 1 and - 1 . The sequence of the states may be periodic or random, respectively, we speak about dichotomic periodic and random sequences. Periodic processes additionally require averaging over the initial state to become a stationary process. If the probability that k subsequent events of x(t) have taken place during the time interval [0, t] is given by
P k ( t ) - (Tt)k exp(-Tt) k!
(72)
then x(t) is the Markovian dichotomic process exponentially correlated in time with correlation time tc = 1/3'. Switches occur due to a Poissonian distribution with the mean switching time T = 1/7. Let us compare the definition of the phase (67) and (69) with the analytical signal concept. For this purpose we calculate the Hilbert transform of a dichotomic process x(t) (even k stands for a transition - 1 ~ + 1)
1 t - tk __2~--'ln t - t2k ' ~kt , - 1 ) k In t L tk-y(t) -- --~~-'~' 7"C@ t -- t2k+l
(73)
Assume that at even k's a transition - 1 ~ +1 takes place. For t < tk, therefore,
x(t) = - 1 and the Hilbert transform y(t) decreases monotonously and reaches - o c at t = tk. At this moment x switches to + 1 and y(t) starts to grow becoming +oc at t = tk+l. Thus an arrow following x and y in the phase space fulfills a circle of 2rt during two subsequent transitions and returning to the initial state x = - 1 .
68
L. Schimansky-Geier et al.
Adding one ~ during every transition in definition (39) the instantaneous phase from the analytical signal reads
O'"i,ert
§ arctan;l;I
Fig. 27 shows the dichotomic signal, its Hilbert transform, the instantaneous amplitude and phase. As seen, the instantaneous phase monotonously increases in agreement with the former introduced phases of stochastic point processes. Its mean slope defines the mean frequency of the signal. Hence, we find that the phase concept in stochastic point processes well approximates the findings from the analytical signal. This is presented in Fig. 28 where the three different phases are compared. Both, the piecewise linear and the piecewise constant definition fit well (I)Hilber t f r o m the analytical signal. For simplicity we will later on use the piecewise constant phase.
6.2. Synchronization of noisy bistable systems by stochastic signals Although the majority of SR of studies consider periodic input signals, it was shown that SR can also be observed with nonperiodic signals [42,76]. For such complex signals the effect was called aperiodic stochastic resonance (ASR) [42] and it was shown that ASR can be described in the framework of conventional theory of SR [77]. In this section we show that noisy systems which do not have any deterministic natural frequency can be synchronized by a stochastic stationary driving signal represented by a dichotomic Markovian process. This new type of synchronization [36] will be shown on behalf of a simple but generic kinetic model. A bistable system is driven by two noises: the first is broad-
1 X
-1
~ 1
L ~
Ill L
l
10
20
Y o
A 1
r
10 5 0
-30
-20
-10
0
30
t Fig. 27. Dichotomic signal, its Hilbert transform, the instantaneous amplitude and phase according to (74). Note that y and A reach infinitely large values if the dichotomic input switches.
Phase synchronization."from periodic to chaotic and noisy
69 ..........
i .-|
...............
(~)Hilbert
10
f..~
(~)linear ~dL~crctc
-
i
-
-30
|
,
I
9
-10
i
10
-
30
Fig. 28. Comparison of the three different definitions of the instantaneous phase. Both the linear interpolation as well as the definition by piecewise constant segments well approximate the phase from the analytical signal. band Gaussian noise which represents internal (or thermal) noise, while the second, dichotomic noise, stands for the input signal. The bistable system driven by dichotomic noise is mapped on a simplified 4-state Markovian model where the output is approximated by dichotomic process also. Therefore, within this model we can define phases according to stochastic point processes as in the previous section.
6.2.1. The simplified 4-state Markovian model Let a stochastic bistable system possess two symmetric stable states a(t) = + l which we call the output. The dynamics of transitions between them is characterized by the rate rk (58). Suppose now that Markovian dichotomic process (71) with values d ( t ) = +1 drives the bistable dynamics as an additive input signal. Again we assume that transitions 1 ---+ - 1 occur with the rate ~, which is assumed to be smaller than the intrawell relaxation rate y << a (see (55)). In this approximation, one can make use of an adiabatic treatment for the calculation of the transition rates. The input changes the barrier height between the two states A U ~ A U • Q. Here Q is the magnitude of the signal and it is smaller than the barrier height, AU > Q. In this adiabatic approach the rate (58) is modified in dependence on the state of the output ~(t) and the input d(t). Transitions between two states of a are determined by the rates W~_~(d(t)) - rk exp - ~Qcy(t)d(t)J
(75)
70
L. Schimansky-Geier et al.
Hence, with cy(t)d(t) = + l the original rate decomposes into two new rates al - so exp -
au+o) D
'
a 2 - ao e x p ( - -
AU D- Q ) "
(76)
It allows the distinction of the following rate separations: al _< a2 << y, al _< 7 --< a2, and 7 << a~ < a2. We point out that with the assumed y << ~ all three cases can be approached by a growing noise intensity D with other fixed parameters. Synchronization will be established in the second case of the rate separation. The two states of the system output ~(t) and the two states of the input signal d(t) form a four-state Markovian system, which are schematically drawn in Fig. 29. The states of the system {c~,d} are marked by the two indexes, referring to the output and input, respectively. The stochastic dynamics of the system is due to the master equation for the conditional probability density P~,,a = P(cy, d, tier0, do, to) (77) It possesses a simple stationary solutions _
a2 + 7
pS
PS~=d -- 2(al + a2 + 7)'
_
al + 7
(78)
o=-a -- 2(al + a2 + y)"
In aperiodic SR the cross-correlation functions between the output and the inputing stochastic signal P_
(cyd)
(79)
(a: )
yields to be the central measure indicating stochastic resonance. Using the stationary solutions of the master equation (78), this correlation coefficient is easily calculated as a2 -- al
p -
al
.
(so)
+ a 2 + 27 -1,-1
0,
al
+1,-1
a2
a2
-1,+1
a~
+1,+1
Fig. 29. Sketch of the model. The first index marking states of the system refers to the output cy, while the second index refers to the state of the input dichotomic noise.
71
Phase synchronization." from periodic to chaotic and noisy
Its dependence is shown in Fig. 30 as a function of noise intensity D for different values of signal magnitude Q and flipping rate 7. One sees that the degree of output-input correlation is maximal at a certain noise intensity which recalls aperiodic stochastic resonance [42]. In the limit Q--+ 0 we recover results of the linear response theory [78]. With the increase of Q, the correlation coefficient increases. Note, that for sufficiently large signal magnitudes (Q = 0.2) the dependence p(D) flattens and the correlation coefficient takes its maximal values in a certain region of noise intensity. This behavior reflects synchronization effect of the mean switching frequency locking and cannot be revealed in the framework of linear response theory. First, the effect of mean switching frequency locking can be estimated analytically. The outputting two-state stochastic process can be characterized by the mean durations of the upper state and lower state: (T)+ and (T) . The mean "period" of return is therefore (T)s = (T)+ + (T)_. In the frequency domain this quantity corresponds to the mean switching frequency (MSF) [79] 2~ (gOSut) -- Ts :
2re (T)+ + (T)_"
(81)
In the same way we define the M S F for the input dichotomic noise (82)
(gOin) = ~ .
To calculate the mean switching rate at the output of the system we impose absorbing boundary condition [79] at the state cy = 1 and seek the mean time of leaving the state cy = - 1 . Initially we suppose that both states d = +1 of the dichotomic stochastic signal are equally populated. The evolution of probability to find the system in say the left potential well P-1 (t) = P-1,-1 (t) + P-l,+l (t) is described by the following equations (d = + 1): (a) ,.~
(b) ,0
"
~-
0.5
0.0 0.00
0.05
0.10 D
O.15
0.20
0.5
0.0 0.00
0.05
0.10 D
0.15
0.20
Fig. 30. Correlation coefficient (80) versus noise intensity for indicated values of signal magnitude Q and fixed flipping rate y = 0.001 (a) and for different values of flipping rate with fixed magnitude Q - 0.2 (b). Rates al and a2 are given by Eqs. (76) with A U - 1/4 and ~0 = 1/(v~rc).
72
L. Schimansky-Geier et al.
d dtP-l.d - -(W~=-l-~=l [d] + Y)P-l.d + YP-l.-d, which have to be solved with the initial P-l.+l(t = 0) = 1/2. Eigenvalues of this problem are rl.2
-
-
,[
-~ al + a2 + 2 y +
r
(al - a2
(83) condition
P - l , - l ( t = 0) =
]
+ 4 y2 .
(84)
The global relaxation rate is determined by the smaller eigenvalue and gives the required estimation of the M S F (mS~
- 5
al + a2 + 2 y -
(al - a2) 2 + 4 72 .
(85)
This M S F versus noise intensity is plotted in Fig. 31 for different values of the signal magnitude Q. For small Q, the dependence {m~,ut)(D) follows the exponential Arrhenius law. However, for larger driving magnitudes, the Arrhenius law is modified and for sufficiently large Q, the M S F remains nearly constant over a large range of noise intensities. It equals the mean frequency of the input signal (roy). For a small noise az(D) << V and the M S F approaches ~ a l / 2 and for a large noise ((0Sut) app r o a c h e s rra2 [79] and the systems become desynchronized. In other words, the M S F is locked in a f i n i t e region of noise intensity in a similar way as it was observed for periodically driven stochastic bistable systems of previous sections. Imposing the condition I(COSo,t)- ny[ <_ 8,
a << 1,
(86)
0.015
i V
!
0.15 i
0.010
0 0.10
0.005
,~~
<0)> o
. . . .
~
o.o2
o.o,,
O
o.~
0.05 o.~z
o.~o
~176
o.~
o.o,,s
O
o.o~
Fig. 31. (a) Mean switching frequency (MSF) versus noise intensity D for different values of signal magnitude Q: Q = 0.05 (1); Q = 0.1 (2); Q = 0.2 (3). Other parameters are the same as in the previous figure. The MSF at the input is shown by the dashed line. (b) Regions of MSF locking defined by (86) with c = 10-s for different values of the flipping rate: 3' = 0.0005 (1); 3' = 0.001 (2) and 3' - 0.002 (3).
Phase synchronization: from periodic' to chaotic and noisy
73
we can obtain regions of MSF locking in the parameter plane of noise intensity D versus signal magnitude Q. These synchronization regions shown in Fig. 31 for different values of flipping rate look similar to Arnold tongues. Their width decreases with an increase in flipping rate in the same manner as for periodically driven stochastic dynamics in Section 2. The tongues occur even for Q ---, 0 comparable to the case of a forced Van der Pol oscillator. 6.2.2. The overdamped noisy bistable system with dichotomic input
The situation considered above represents an oversimplified model of an overdamped stochastic bistable oscillator driven by a dichotomic input signal d(t) (87)
- x - x 3 + Qd(t) + x / 2 D ~(t).
We again assume that the magnitude Q is small, i.e., the signal alone cannot cause the noise-free system with D -- 0 to switch from one state to another. To compare numerical results with the theoretical predictions, we carefully calculated the two rates al and a2 for the adiabatically slow driven bistable system (87). We first determined the equilibria of the noiseless system as x 3 - x + Q = 0. Denoting the coordinate of the unstable fixed point as x~:, and of the left equilibrium as x{, the mean first passage time T • to reach the potential top from the left basin of attraction reads for both possible values of the input as
exp[l(
x4
The inverse values of the mean first passage times give us the corresponding rates a~ and a2 for (87) which will be inserted in (85). During numerical simulations of (87) intrawell motion was filtered out. The information about the phase is carried by the switching times counting transitions with tk >> 1/~. We used the piecewise constant definition (74) for the phases. In this way we computed the MSF (68) of system (87) as a function of the internal noise intensity for different values of the driving-signal amplitude- see Fig. 32. This figure shows the effect of MSF locking. Over a wide range of noise intensities the MSF of the output equals the mean frequency of the input. We also note good agreement between the theory (85) with the rates due to (88) and the numerical simulations. Fig. 33 a shows time series of the phase difference ~ ( t ) = ~ o u t ( t ) - (I)in(t) for different levels of internal noise. The evolution of d~(t) for the stochastic inputs is similar to that of classically synchronized oscillators with noise: there are patterns of nearly constant phase difference referring to a phase-locked regimes. These episodes are interrupted by phase slips where the phase difference makes jumps of 2n. The duration of these phase-locked segments is maximized at noise intensities where the MSF locking takes place. To quantify this effect we again compute the effective diffusion constant (35). The noise-enhanced phase coherence [36] is manifested through the existence of a minimum in the dependence Deff versus D which is shown in Fig. 33.
74
L. Schimansky-Geier et al.
0.010
1
-,
1
'
i 0.04
,
i 0.06
,
"
l
0.008
0.006
0.004
0.002
0.000
0.00
0.02
I. 0.08
0.10
D
Fig. 32. Mean frequency (toS) of the overdamped bistable oscillator versus internal noise intensity D for different values of the magnitude Q of the external dichotomic noise: Q -- 0.05 (O); Q = 0.1 (11) and Q = 0.2 (A). The flipping rate of the signal rt7 - 0.002. The theoretical curves for the MSF (o)s) from Eq. (85) with rates al and a2 calculated from (88) are shown as solid lines. (a)
) lo"
H(~,,O.OS IH~0.1
300
/-
200 100
o
.....
-
-100
~i;s5 ........
~
104
-200 -..300
~
1000
2000 '~
3000
4000
10.4
0
0.02
0.04
D
0.06
0.08
0.1
Fig. 33. (a) Instantaneous phase difference q~(r) for the overdamped bistable oscillator for indicated values of the internal noise intensity D, with Q = 0.3, n7 = 0.002. The time axis is given in the units of the mean switching frequency of dichotomic noise: x = trty. (b) The effective diffusion constant (35) versus noise intensity for indicated values of signal magnitude.
6.3. Analytical approach to phase synchronization of b&table systems On the basis of the simplified 4-state m o d e l (77) an analytical t r e a t m e n t of synchronization in a stochastic bistable system is possible. As will b e c o m e clear f r o m our analysis the key a s s u m p t i o n yielding this description is to let the transition rates of the bistable system depend on the phase difference q~ -- ~out - (bin. In passing by
Phase synchronization: from periodic to chaotic and noisy
75
we note that the transition rates of the by now classical two state model of McNamara and Wiesenfeld [71] can be re-expressed to explicitly depend on the phase difference qb.
6.3.1. Transition rates and the master equation We start from the standard ansatz by McNamara and Wiesenfeld [71]
aXOcos(~t) W~ = rkexp T--D--
(89)
and a is again the amplitude of the input. Identifying (I)in -- f~t and (I)ou t = lrc and setting x0 = 1 we find W~ -- W((I)in , (I)ou t) - rk exp -
(90)
a cos((i)out ) COS((i)in)]
and rewrite (91)
COS((I)out ) COS((I)in) __ 1 [COS((I)) -~- COS(2 (I)ou t -- (]))].
Since 2(I)out is always the even multiple of rc and due to the fact that the cosine is an even function we arrive at [37]
[ a (qb)].
(92)
W+ = g(qb) = rk exp - ~ c o s
It is an expression for the rates which explicitly depends on the phase difference qb and can also be used for the dichotomic Markovian process (DMP). This definition introduces the two noise-dependent time scales al and a2. The function cos(~) favors states with even multiples of the phase difference, i.e. in-phase configurations. The stochastic evolution of the phase difference is based on the probabilities P(d~, t]~0, to) to experience a phase difference ~ at time t conditioned by a phase difference d~0 at time to. Different from previous studies of two-state systems [71] this distribution is tailored to study the phenomenon of phase synchronization. The periodic signal we will treat as a dichotomic process. The phase will be approximated by discrete jumps accordingly to (74). In result, due to the discrete character of d~ allowing only for multiples of re, we briefly denote P1 = P(qb =lrc, tl~0, to). Then the probabilistic evolution operator reads with 91 = 9(dp = Ire)
~Pl(t) = [~Pt(t) + 9l-lPl-~ (t) Ot
-
(93)
glPl(t).
While the last two terms on the right-hand side of (93) account for the change of qb by transitions of the output, the operator/2 reflects the input switches s
__ it(p/+ 1 _ Pl),
oc n-----O0
(
/'/g -~-
(P/+l - Pz).
(94)
76
L. Schimansky-Geier et al.
The nonstationary character of the dichotomic periodic process(DPP) can be dropped by averaging over the initial phase dp0. Performing this average yields (Wffee)•~ -
8 /02rt dqb0 2rt , = - ~
t-
f~
o
= --. rt
(95)
We see that the ~0-averaged periodic process formally corresponds to the Markovian process with the transition rate ~/replaced by f~/rt.
6.3.2. Effective frequency locking Using standard techniques [80] from Eq. (93) we can derive the evolution of the mean phase difference (~) [37] 7~ (qb) -- -(coin) + (mout) -- - ( c o i n ) + 5rC(a 1 +a2) - -~ (a2 -al)(cy)
(96)
Here, ((Din) denotes the average drift of the input phase and equals ~,rt for D M P and f~ for the ~0-averaged DPP. Assuming higher moments uncoupled, i.e. (or(if)) = cy((~)), (96) is the phase dynamics for locking (21). In this context the mismatch or detuning A - -(coin) + 1 (al + a2) is given as the difference between the mean flipping rate of the output and the mean frequency of the signal. The factor preceding (~) reflects the possibility of phase locking. This factor becomes stronger with increasing difference of the rates al and a2, i.e. by increasing input amplitudes. For the short-time evolution of a given initial state qb0 = 0 the necessary condition for locking is -(o.)in ) --1--~-(a, + a2)
IAI -
< ~-(a2 - al ),
(97)
which gives rise to the famous Arnold tongues. The kinetic equation for (6) can be evaluated explicitly yielding (6") -- - 12(c~
+ (a, + a2) (or) + (a2 - a , ) .
(98)
From (98) we can see that (cr) approaches a stationary value at times large compared with the relaxation time ~,, = (2rt -1 (rain) + al + a2) -1 _
(o'S)
_
a2 --al 2 71:-1 (COin) -+- al q- a2"
(99)
For D M P this expression converges with the cross-correlation coefficient P defined in (80). In this stationary limit the output phase velocity can be derived from (96) by insertion of (99) (f-0Sut) -- ~-(al --{-a2) -- ~ (a2 -- al)((3-s/.
( oo)
In Fig. 34 we show the mean output switching rate (mout) as a function of the noise intensity D for several values of the input signal amplitude a. The formation of a
Phase synchronization." from periodic to chaotic and noisy
77
0.003
/ 0.002
s p s *r
Ill
g ~8 V
0.001
t
0 0.00
/
9
t
I e e
,.,.i :t ,.-" ,.." .,-"
.........
[.-"
0.02
0.04
0.06
0.08
D
Fig. 34. Mean switching frequency of the output versus noise intensity for different values of a and flipping rate y - 0.001. The MSF at the input (DMP) is shown by the horizontal line. a = 0 ( - - ) , a = 0.1 (---), a = 0.2 (---). AU = 1/4 and 0t0 = 1/(v~n). plateau for sufficiently large signal a m p l i t u d e indicates the emergence of a frequency locked range. W i t h increasing amplitudes the s y n c h r o n i z a t i o n region, i.e. the range o f D for which (co~ut) "~ (COin), widens. Setting a n a r r o w b a n d o f width ACO a r o u n d the m e a n "carrier f r e q u e n c y " (COin) defines A r n o l d tongues in the D vs. a plane wherein the frequency is effectively locked. These are s h o w n in Fig. 35 a n d almost exactly agree with numerical d a t a f r o m Section 6.2 [35,36]. W e w a n t to emphasize
0.2
0.1
0.0
I
0.03
,
!
0.05
0.07
D
Fig. 35. Regions of locking of the mean switching rate of the output. Amplitude a of the driving force (DMP) versus noise intensity of the bistable dynamics. From left to right: y = 0.001, y = 0.002, y = 0.005. Other parameter as in Fig. 34.
78
L. Schimansky-Geier et al.
that the frequency locking region is observed near noise intensities where the sos called "resonance" condition (in our case by definition Win--03out(D)) holds and which maximizes the spectral power amplification for periodic processes. It significantly deviates from values of D where the signal-to-noise ratio attains its maximum (D ,~ A U). 6.3.3. Effective phase locking
The phenomenon of phase locking can be demonstrated analytically by calculating the diffusion coefficient of the phase difference Ot[(qb2) - (~)2]. Performing the whole calculation yields for the DMP
Ot[(~2)_ ((1))2]DMP _g2[ V~ (0)out)
(a2 -- al)
((~EY)-- (~)((Y)I
(101)
and for the DPP (we have to be careful with O,[(~)(cy)] for periodic processes because the discontinuous jumps give rise to an extra term) ~t[((l)2) -
((I))2]Dpp- ~2 [(s
ut)
(a2--a,) (*or) --l (*)(or) r t
(102)
The correlator (8~8r = (~cy) - (~)(~) can be computed from the corresponding kinetic equation in the stationary long-time limit. Inserting (8~8cy)s into (101) and (102) eventually renders the diffusion coefficient for the Markovian process
D DMP = 5~2 [
(O~Sut) /1;
(2 7
2 (a2 - al)(r
'
-
(a, + a2))(cys)2 + (r
(103)
1
and for the periodic process
DDPP-(0)sut)rt2 2 [ rt
( 2f~-rl:--
(al -I- a2)) (ors)2
2 (a2 -al)(~'s)(1 + { r '
(~s)
~ 1
(104)
with (~s) given by (99)and (m~ut) by (100). These expressions are plotted in Fig. 36 and exemplifies that for sufficiently large input signal amplitude the diffusion coefficient drastically diminishes. The smaller this quantity the longer the episodes between two subsequent phase slips. One may define the phase locked region by, for example, the demand D DMP < DiDMp -- g2/2 7 which yields the tongues for phase locking for the Markovian input. The difference between a Markovian and periodic input becomes evident for small noise intensities. In this limit the output does not switch at all whereas the situation is different for the dichotomic periodic process and Markovian process.
Phase
synchronization: from periodic to chaotic and n o i s y
79
0.003
iI
/
/
it
/ /
0.002 l
III
iI lI l It t
0.001
',
............
................ ,.
...//
,,
II
. . . . . . . . . . . . . . 9~ ~
s s s t
""
-.. .
.
.
.
.
.
.
.
.
0
.
.
.
.
~
0.00
0.02
_
--- "
-~
0.04
0.08
0.06
D 0.003
/ l /
l t
0.002
t
/
t
e i
e,I
O.OOI i
0 0.00
,s
/
_ ~
9"
.........
".
"
.':."f- _ _ _
0.02
""
, - _ _,
. . . . .
0.04
1117
",
0.06
0.08
D Fig. 36. Diffusion coefficient of the instantaneous phase difference dp with respect to the noise intensity D for different values of signal magnitude a and flipping rate 7 = 0.001. The value at D = 0 is determined by the input diffusion and vanishes for strictly periodic signals. a = 0 (--), a - 0.1 (...), a - 0.2 (- - -). Other parameters are the same as in Fig. 34. The Markovian process diffuses proportional to 7. Contrarily, fluctuations in the the periodic process are absent. It gives the different behavior as shown in Fig. 36 [37].
Acknowledgements We acknowledge fruitful discussions and help by Jan A. Freund (Berlin), A. Silchenko, G. Strelkova (Saratov) and F. Moss (St. Louis). The work was supported by the Deutsche Forschungsgemeinschaft Sfb-555, the Fetzer Institute, the Alexander-von-Humboldt Foundation, the Royal Society of the U K and by the Russian Ministry of Higher Eductation (grant # 97-0-8.3-47).
80
L. Schimansky-Geier et al.
References 1. Haken, H., Advanced Synergetics, Springer, Berlin. 2. Blekhman, I, Synchronization of Dynamical Systems, Nauka, Moscow, 1971 (in Russian); I. Blekhman, Synchronization in Science and Technology, Nauka, Moscow, 1981; English translation: ASME Press, New York, 1988. 3. Soen, Y., Cohen, N., Lipson, D. and Braun, E. (1999) Phys. Rev. Lett 82, 3556. 4. Neiman, A., Pei, X., Russell, D., Wojtenek, W., Wilkens, L., Moss, F., Braun, H., Huber, M. and Voigt, K. (1999) Phys. Rev. Lett. 82, 660. 5. Elson, R.C., Selverston, A.I., Huerta, R., Rulkov, N.F., Rabinovich, M.I. and Abarbanel, H.D.I. (1998) Phys. Rev. Lett. 81, 5692. 6. Winfree, A.T, J. Theor. Biol. 16, 15; Winfree, A.T, The Geometry of Biological Time, Springer, New York, 1980; Buck, J. (1988) Quart. Rev. Biol. 63, 265; Strogatz, S.H. and Stewart, I. (1993) Sci. Am. 269, (6), 102. 7. Sch~ifer, C., Rosenblum, M., Kurths, J. and Abel, H. (1998) Nature 392, 239; Phys. Rev. E60, 857. 8. Tass, P., Rosenblum, M., Weule, J., Kurths, J., Pikovsky, A., Volkmann, J., Schnitzler, A. and Freund, H. (1998) Phys. Rev. Lett. 81, 3291. 9. Vasilev, V.A., Romanovsky, Yu.M., Chernavsky, D.S. and Yakhno, V.G. (1987) Autowave Processes in Kinetic Systems, VEB Deutscher Verlag der Wissenschaften, Berlin. 10. Anishchenko, V., Aranson, I., Postnov, D. and Rabinovich, M. (1986) Dokladi Academii Nauk SSSR 286, 1120 (in Russian). 11. Belykh, V., Verichev, N., Kocarev, L. and Chua, L. (1993) J. Circuits, Syst. Comput. 50, 1874. 12. Pecora, L. and Carroll, T. (1990) Phys. Rev. Lett. 64, 821. 13. Osipov, G.V., Pikovsky, A.S., Rosenblum, M.G. and Kurths, J., (1997) Phys. Rev. E 55, 2353. 14. Kocarev, L. and Partlitz, U. (1995) Phys. Rev. Lett. 74, 5028. 15. Afraimovich, V.S., Nekorkin, V.I., Osipov, G.V. and Shalfeev, V.D. (1994) Stability, Structures and Chaos in Nonlinear Synchronization Networks, World Scientific, Singapore. 16. Abarbanel, H.D.I., Rabinovich, M.I., Selverston, A., Bazhenov, M.V., Huerta, R., Sushchik, M.M., Rubchinskii, L.L. (1996) Physics-Uspekhi 39, 337. 17. Andronov, A., Vitt, A. and Khaykin, S. (1966) Theory of Oscillations, Pergamon Press, Oxford. 18. Hayashi, C. (1964) Nonlinear Oscillations in Physical Systems, McGraw-Hill, New York. 19. Rosemblum, M., Pikovsky, A., Sch~ifer, C., Tass, P.A. and Kurths, J. Phase synchronization: from theory to data analysis (chapter 9 in this book). 20. Stratonovich, R.L. (1967) Topics in the Theory of Random Noise, vol. 2, Gordon and Breach, New York. 21. Horsthemke, W. and Lefever, R. (1984) Noise Induced Transitions, Springer, Berlin. 22. Dykman, G., Landa, P. and Neimark, Y. (1992) Chaos, Solitons and Fractals 1, 339. 23. Fujisaka, H. and Yamada, Y. (1983) Prog. Theor. Phys. 69, 32. 24. Afraimovich, V.S., Verichev, N.N. and Rabinovich, M.I. (1986) Radiophys. Quantum Electron. 29, 795. 25. Rulkov, N.F., Sushchik, K.M., Tsimring, L.S. and Abarbanel, H.D.I. (1995) Phys. Rev. E 51, 980. 26. Anishchenko, V., Vadivasova, T., Postnov, D. and Safonova, M. (1992) Int. J. of Bifurcation and Chaos 2, 633. 27. Rosenblum, M.G., Pikovsky, A.S. and Kurths, J. (1996) Phys. Rev. Lett. 76, 1804. 28. Neiman, A. (1994) Phys. Rev. E 49, 3484. 29. Malakhov, A.N. (1968) Fluctuations in Auto-oscillation Systems, Nauka, Moscow. 30. Benzi, R., Sutera, A. and Vulpiani, A. (1981) J. Phys. A: Math. Gen. 14 L453; Benzi, R., Parisi, G., Sutera, A. and Vulpiani, A., (1982) Tellus 34 10; Nicolis C. (1982) Tellus 34 1. 31. For reviews see: Moss, F. (1994) in: Some Contemporary Problems in Statistical Physics, vol. 205, ed. by Weiss, G., SIAM, Philadelphia; Moss, F., Pierson, D. and O'Gorman, D. (1994) Int. J. Bifur. Chaos 4, 1383; Jung, P. (1995) Phys. Rep. 234, 175; Wiesenfeld, K. and Moss, F. (1995) Nature 373, 33; Bulsara, A.R. and Gammaitoni, L. (1996) Physics Today March, 39; Gammaitoni, L., Hfinggi, P., Jung P. and Marchesoni, F. (1998) Rev. Mod. Phys. 70, 223.
Phase synchronization." from periodic to chaotic and noisy
81
32. Anishchenko, V.S., Neiman, A.B., Moss, F. and Schimansky-Geier, L. (1999) Uspekhi fiz. nauk 69, 7 (Sov. Phys. Usp. 42, 7). 33. Neiman, A. and Schimansky-Geier, L. (1995) Phys. Lett. A 197, 379. 34. Shulgin, B., Neiman, A. and Anishchenko, V. (1995) Phys. Rev. Lett. 75, 4157. 35. Neiman, A., Silchenko, A., Anishchenko, V. and Schimansky-Geier, L. (1998) Phys. Rev. E 58, 7118. 36. Neiman, A., Schimansky-Geier, L., Moss, F., Shulgin, B. and Collins, J.J. (1999) Phys. Rev. E 60 284. 37. Freund, J.A., Neiman, A. and Schimansky-Geier, L. (2000) Europhys. Lett 50, 8.; in: Stochastic Climate Models: Progress in Probability, eds. Imkeller, P. and von Storch, J. Birkh~iuser, Basel. 38. Fauve, S. and Heslot, F. (1983) Phys. Lett. A 97, 5. 39. Neiman, A., Shulgin, B., Anishchenko, V., Ebeling, W., Schimansky-Geier, L. and Freund, J. (1996) Phys. Rev. Lett. 76, 4299; Schimansky-Geier, L., Freund, J., Neiman, A. and Shulgin, B. (1998) Int. J. Bifurcation and Chaos 8, 869. 40. FitzHugh, R. (1961) Biophys. J. 1, 445; Nagumo J., Arimoto S., Yoshizawa S. (1962) Proc. IRE 50, 2061. 41. Hodgkin, A.L. and Huxley, A.F. (1952) J. Physiol. (London) 117, 500. 42. Collins, J.J., Chow, C.C. and Imoff, T.T. (1995) Phys. Rev. E 52, R3321. 43. Pikovsky, A.S. and Kurths, J. (1997) Phys. Rev. Lett. 78, 775. 44. Lindner, B. and Schimansky-Geier, L. (1999) Phys. Rev. E 60, 7270. 45. Longtin, A. (1995) I1 Nuovo Cimento D 17, 835. 46. Lindner, B. and Schimansky-Geier, L. (2000) Phys. Rev. E 61, 6103. 47. Longtin, A. and Chialvo, D.R. (1998) Phys. Rev. Lett. 81, 4012. 48. Han, S.K., Yim, T.G., Postnov, D.E. and Sosnovsteva, O.V. (1999) Phys. Rev. Lett 83, 1771. 49. Kurrer, Ch. and Schulten, K. (1995) Phys. Rev. E 51, 6213. 50. Rappel, W.-J. and Karma, A. (1996) Phys. Rev. Lett. 77, 3256. 51. Jung, P. and Mayer-Kress, G. (1995) Phys. Rev. Lett. 74, 2130; L6cher, M., Jonson, G.A. and Hunt, E.R. (1996) Phys. Rev. Lett 77, 4698; Kuperman, M.N., Wio, H.S., Izus, G. and Deza, R. (1998) Phys. Rev. E 57, 5122. 52. Jung, P. (1997) Phys. Rev. Lett. 78, 1723. 53. Jung, P., Cornell-Bell, A., Moss, F., Kadar, S., Wang, J. and Showalter, K. (1998) Chaos 8, 567. 54. L6cher, M., Cigna, D. and Hunt, E.R., (1998) Phys. Rev. Lett. 80, 5212; Linder, J.F., Chandramouli, S., Bulsara, A.R., L6cher, M. and Ditto, W. (1999) Phys. Rev. Lett. 81, 5048. 55. Hempel, H., Schimansky-Geier, L., and Garcia-Ojalvo, J. (1999) Phys. Rev. Lett. 82, 3713. 56. Braiman, Y., Linder, J.F. and Ditto, W.L. (1995) Nature 378, 465; Braiman, Y., Ditto, W.L., Wiesenfeld, K. and Spano, M.L. (1995) Phys. Lett. A 206, 54; Wiesenfeld, K., Colet, P. and Strogatz, S.H. (1996) Phys. Rev. Lett. 76, 404. 57. Bogolyubov, N.N., Mitroplski, Yu.A. (1974) Assymptotic methods in the theory of nonlnears oscillators, 4th edn, Moscow, Nauka (in Russian). 58. Anishchenko, V.S., Neiman, A., Vadivasova, T., Astakhov, V. and Schimansky-Geier, L., Nonlinear Dynamics of Chaotic and Stochastic Systems, Springer, 2001 (in press). 58a. Anishchenko, V.S. (1995) Dynamical Chaos-Model and Experiment, World Scientific, Singapore. 59. Kuramoto, Y. (1984) Cehemical Oscillations, Waves, and Turbulence, Springer, Berlin. 60. H~inggi, P., Talkner, P. and Borkovec, M. (1990) Rev. Mod. Phys., 62, 251. 61. Gabor, D. (1946) J. lEE London 93, 429. 62. Panter, P. (1965) Modulation, Noise and Spectral Analysis, McGraw-Hill, New York. 63. Middleton, D. (1960) An Introduction to Statistical Communication Theory, Mc. Graw-Hill, New York. 64. Vainshtein, L. and Vakman, D., Frequency separation in Theory of Oscillations and Waves, Nauka, Moscow (in Russian). 65. Shilnikov, L.P., Shilnikov, A.L., Turaev, D.V. and Chua, L.O. (1998) Methods of Qualitative Theory in Nonlinear Dynamics, Part I, World Scientific, Singapore. 66. Pikovsky, A., Osipov, G., Rosenblum, M., Zaks, M. and Kurths, J. (1997) Phys. Rev. Lett. 79, 47. 67. Zaks, M.A., Park, E.-H., Rosenblum, M.G. and Kurths, J. (1999) Phys. Rev. Lett. 82, 4228.
82
L. Schimansky-Geier et al.
68. Rosa, E., Ott, E. and Hess, M.H. (1998) Phys. Rev. Lett. 80, 1642. 69. Pontryagin, L., Andronov, A. and Vitt, A. (1933) Zh. Eksp. Teor. Fiz. 3, 165; see for English translation in: Noise in Nonlinear Dynamical Systems, vol.1, eds. Moss, F. and McClintock, P.V.E. (1989) pp. 329-348, Cambridge University Press, Cambridge. 70. Kramers, H.A. (1940) Physica 7, 284. 71. McNamara, B. and Wiesenfeld, K. (1989) Phys. Rev. A 39, 4854. 72. Bendat, J.S. and Piersol, A.G. (1986) Random Data. Analysis and Measurement Procedures Wiley, New York. 73. Neiman, A., Schimansky-Geier, L., Cornell-Bell, A. and Moss, F. (1999) Phys. Rev. Lett. 83, 4896. 74. Garcia-Ojalvo, J. and Schimansky-Geier, L. (1999) Europhys. Lett. 47, 298. 75. Fohlmeister, C., Ritz, R., Gerstner, W. and van Hemmen, J.L. (1995) Neural Comput. 7, 905. 76. Neiman, A., Schimansky-Geier, L. (1994) Phys. Rev. Lett. 72, 19. 77. Neiman, A., Schimansky-Geier, L., Moss, F. (1997) Phys. Rev. E 55, R9. 78. Dykman, M.I., Mannella, R., McClintock, P.V.E. and Stocks, N.G. (1990) Phys. Rev. Lett. 65, 2606; Dykman, M.I., Mannella, R., McClintock, P.V.E. and Stocks, N.G. (1990) JETP Lett. 52, 144. 79. Van den Broeck, C. (1993) Phys. Rev. E 47, 4579; Ziicher, U. and Doering, C.R. (1993) Phys. Rev. E 47, 3862. 80. van Kampen, N.G. (1981) Stochastic Processes in Physics and Chemistry, Elsevier North-Holland, Amsterdam.
CHAPTER 3
Fluctuations in Neural Systems" From Subcellular to Network Levels
P. ARHEM
H. LILJENSTROM
Agora for Biosystems and Department of Neuroscience, Karolinska Institutet, SE-171 77 Stockholm, Sweden
Agora for Biosystems and Department of Biometry and Informatics, SLU, SE-750 07 Uppsala, Sweden
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
83
Contents
1.
W h a t are the issues? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
2.
Classification o f irregularity: noise a n d c h a o s
87
....................................
2.1.
Strict d e t e r m i n i s m a n d i n d e t e r m i n i s m
2.2.
Empirical determinism and indeterminism ...................................
......................................
2.3.
Randomness .........................................................
88
2.4. N o i s e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5. 3.
89
Chaos ..............................................................
89
M i c r o s c o p i c fluctuations: ion c h a n n e l kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
3.1. T h e c h a n n e l idea
90
.....................................................
3.2. T y p e o f f l u c t u a t i o n s
4.
...................................................
3.3.
Molecular background: structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.
M o l e c u l a r b a c k g r o u n d : gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
M e s o s c o p i c fluctuations: cellular processes a n d n e u r a l c o d i n g 4.1.
........................
T e m p o r a l fluctuations: single-channel i n d u c e d impulses . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. T e m p o r a l fluctuations: stochastic t r a n s m i s s i o n at synapses . . . . . . . . . . . . . . . . . . . . . . . 4.3. 5.
6.
Impulse-amplitude fluctuations
...........................................
90 91 92 96 97 101 102
M a c r o s c o p i c fluctuations: n e t w o r k s a n d f u n c t i o n a l efficiency . . . . . . . . . . . . . . . . . . . . . . . . .
104
5.1. N e u r o d y n a m i c s
105
......................................................
5.2.
Computational approaches
5.3.
F u n c t i o n a l significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..............................................
O r i g i n o f fluctuations: p h i l o s o p h i c a l i m p l i c a t i o n s 6.1. A n e u r o p h y s i c a l a p p r o a c h
7.
87 88
.................................
..............................................
6.2.
A n a r g u m e n t for i n t e r a c t i o n i s m
6.3.
Concluding remark ....................................................
..........................................
Conclusion ..............................................................
106 114 117 118 119 120 120
Acknowledgements ........................................................
121
A p p e n d i x A: M a t h e m a t i c s o f c h a n n e l kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
121
A p p e n d i x B: A cortical n e u r a l n e t w o r k m o d e l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
124 126
1. What are the issues?
Living systems are highly ordered, and it is generally assumed that disorder is destructive or at best without functional significance. In particular, this would apply to the brain, where it seems almost self-evident that its structure and neural information processing would require a high degree of order. Any disorder would presumably make the system work less efficiently, since disorder corrupts information. This provides a general problem for all kinds of communication. (See Ref. [1] for an exhibition of aspects on disorder and order in brain function.) This line of arguments was early recognized and developed by Schr6dinger in his influential book "What is Life?" [2]. Schr6dinger asks fundamental questions regarding the stability and sensitivity of our body in general, and of the brain and sensory organs in particular. He argues that our sense organs (and the brain itself) would be useless if they were too sensitive and reacted to single atomic motions. Schr6dinger points at a general statistical law, the so-called v/-n law, as a measure of the degree of inaccuracy to be expected in any physical law. The relative error of such a law is of the order of 1/x/n (i.e. n-l/Z), where n is the number of molecules that cooperate to bring about that law. Hence, if n = 1 million, the relative error in any measurement would be 1/1000, or 0.1%. From this one can see, as Schr6dinger argues, "that an organism must have a comparatively gross structure in order to enjoy the benefit of fairly accurate laws, both for its internal life and for its interplay with the external world. For otherwise the number of co-operating particles would be too small, the 'law' too inaccurate". Similar arguments would apply to single action potentials (APs), or other events dependent on single cell activities in the nervous system. If we could determine the number of APs (or active neurons) involved in, say, the perception of an object, the v/~ law would give the inaccuracy of that particular brain process. Conversely, if a certain degree of accuracy is needed for any particular process, the same law would give the approximate number of events, or "particles", necessary to be involved in the process. Since the brain consists of large numbers of continuously active neurons, it should not normally be sensitive to single APs, according to Schr6dingers argument. The activity of single cells appears to be largely unpredictable and noisy, but the mass of cells cooperates to produce a coherent pattern. It is the mass action of millions of cells that make the orderly dynamics necessary for cognitive functions
(cf. [3]). However, it is evident that amplified irregular events at a microscopic level are essential in biological processes. This is manifested already in the evolutionary process itself, where mutations are fundamental. This insight, to some extent contradicting the Schr6dinger view, is now being increasingly recognized also in neuroscience. Internally generated fluctuations, either amplified micro events or system 85
86
P. Arhem and H. LiljenstrO'm
generated macro events may have a functional value. The opening of a single ion channel can be amplified, resulting in an AP, which in turn can result in a cascade of neural activity [4,5]. A single AP can cause an avalanche of neural activity that is experienced consciously [6]. Irregular spontaneous activity is essential for the development of the synaptic organization during ontogeny (see Ref. [7]), and there are reasons to believe that irregular spontaneous activity plays a role for conscious processes (see Refs. [8-10]). Internal system generated fluctuations can create state transitions, break down one kind of order to make place for and replacing it with a new kind of order. Externally generated fluctuations can cause increased sensitivity in receptor cells through the phenomenon of stochastic resonance. Often considered as a nonlinear problem, the effect is known in linear response theory. The typical example is when signals with the addition of fluctuations overcome a threshold. However, it has been shown that a similar effect can occur also in threshold-free systems. (For a general overview, see Refs. [11-13].) The increased interest in the functional role of fluctuations in the nervous system is paralleled by an increased interest in fluctuation analysis as a tool for understanding brain function. Fluctuation analysis is a way to obtain information on lowlevel processes by studying the activity at a higher level. Thus membrane current fluctuations are studied to give information on ion channel kinetics and systems level fluctuations, such as electroencephalography (EEG), to give information on underlying cellular mechanisms (see Ref. [14]). In the following, we will discuss and review some aspects of fluctuations in nervous systems. By necessity, this will be rather fragmentary and perhaps somewhat biased, but is intended to provide concrete examples of the problems involved, and of the ways they could be approached. We will focus on underlying mechanisms and the possible functional role. We will approach the problem at different complexity levels: (1) the microscopic, or molecular level, (2) the mesoscopic, or cellular level, and (3) the macroscopic, or systems (neural network and brain) level. At the microscopic level, stochastic processes are evident in, for example, ion channel activity. We will discuss the molecular background and give mathematical descriptions of channel kinetics. We will also discuss the functional role of microscopic fluctuations/disorder at the higher, cellular and systems levels. At the mesoscopic, or cellular level, we will discuss the fluctuations/disorder of nerve impulse activity, both in terms of interval and amplitude variability. We will investigate the underlying mechanisms of the interval variability by considering channel kinetics and synapse function. We will investigate the underlying mechanisms of amplitude variability by considering some recent studies of hippocampal neurons. At the macroscopic level, we will focus on the fluctuations/disorder of extracellular recordings of brain activity, such as different forms of EEG. In particular, we will review some large-scale simulations of cortical models and the complex dynamics displayed by such systems. Finally, we will briefly discuss the role of fluctuations for the classical mind-brain problem. We will argue for an interactionistic solution to this problem. We will also briefly touch upon some philosophical consequences of this view. We suggest that it implies a strictly indeterministic worldview. However,
Fluctuations & neuralsystems
87
to give a background for the discussions, we will first give a brief overview of basic concepts used in the field.
2. Classification of irregularity: noise and chaos The characterization of fluctuations and related phenomena requires the use of concepts and principles related to order and disorder, which often are used in different contexts and with different meanings, and consequently often leads to confusion. Examples of such concepts include fluctuations, random and stochastic processes, noise and chaos. For natural reasons, these concepts are harder to define than those related to more ordered processes, such as oscillations or steady states. Some of these concepts are also directly related to a (metaphysical) worldview, such as determinism or indeterminism. Our aim here is to try to clarify and to provide an overview of some of the basic concepts used here and elsewhere in the literature (for a more extensive discussion, see Ref. [1]). When referring to order in the neural information processing, we mean regular or controllable activity at all complexity levels, such as constant or oscillatory activity, or sets of electrical pulses (APs) with well-defined and controlled intervals. When referring to disorder we mean irregular, aperiodic activity, that is generally considered as noise or chaos. 2.1. Strict determin&m and indeterminism Basal to the discussion on these issues are the concepts determinism and its antithesis indeterminism. These concepts are notoriously difficult to define (see, e.g. Refs. [15,16]), and have been used in a number of more or less imprecise meanings. A general definition of determinism in a strict, mathematical sense implies that a future time course is completely determined by actual conditions, past and present. The future development of a strict deterministic process can only occur in one unique way. A strict determinism can never be verified for real physical processes, in particular not for a unique process, such as the evolution of the universe. Still, the concept has a profound importance for our conception of the world, and it provides a basic idea for the foundations of a rational natural science. Although deterministic worldviews may have lost ground during the past century, there is still a widely held idea that strictly deterministic, fundamental natural laws govern "the reality", at the basis of all phenomena of the world. Such laws, e.g. Newton's laws of motion, quantum mechanical laws, and the theories of relativity, are all formulated in ways that are considered to be strictly deterministic. A classical formulation of strictly deterministic world view is that of Laplace: An intelligence knowing all the forces acting in nature at a given instant, as well as the momentary positions of all things in the universe, would be able to comprehend in one single formula the motions of the largest bodies as well as the lightest atoms in the world, provided that its intellect were sufficiently powerful to subject all data to analysis; to it nothing would be uncertain, the future as well as the past would be
88
P. ~lrhem and H. Liljenstr6m
present to its eyes. [17,18]. Yet, this formulation is more than a strictly deterministic view; it claims that the universe also in principle should be predictable. This view we here refer to as empirical determinism. Today we know that Laplace was wrong. Strictly deterministic laws are not always predictable. This was early pointed out (see the discussion on a result of Hadamard in Ref. [15]), but has been highlighted by recent advances in the study of chaotic processes. In contrast to the concept of strict determinism, that of strict indeterminism is more difficult to handle. There is no mathematical way to define it. Nevertheless, it is intuitively clear and we will use it without further attempts to clarify it. 2.2. Empirical determinism and indeterminism In many respects the concepts of determinism and indeterminism are more fertile when used in an empirical sense. An empirically deterministic process is a process which, when starting from initial conditions that are indistinguishable, continue along time courses that cannot be separated. The concept of empirical indeterminism is defined correspondingly. A process that is not empirically deterministic is empirically indeterministic. In contrast to the strict determinism and indeterminism, the empirical concepts are context and model independent. Closely related to empirical determinism is the concept of predictability. Systems that start at similar initial conditions, and will continue to follow similar time courses, are predictable. This definition can be made stricter to distinguish predictable processes from empirically deterministic ones (see Ref. [16]). For the present purpose, however, we use empirical determinism and predictability synonymously.
2.3. Randomness As with a strictly deterministic process, a strictly indeterministic process is not verifiable for real systems. Neither can any known mathematical framework generate it. It is therefore reasonable to treat random or stochastic processes as well as the associated concepts, in an empirical sense, referring to probabilistic situations where some random variables can attain various values with certain probabilities. In this view, a random process is an empirically indeterministic process. Most, if not all, systems that are measured accurately enough could be considered empirically indeterministic. As already noted, this understanding is not necessarily contradictory to the concept of strict determinism: A strict deterministic process can have random features in this sense. Thus, randomness is always associated with a probability distribution. Hence, we here use randomness in an empirical sense. A random process is an empirically indeterministic process, thus unpredictable. This means that strictly deterministic processes can be random and have probability distributions, since strictly deterministic processes can be empirically indeterministic. We are aware that this nomenclature may seem counterintuitive, but argue that it is consistent. This will be more clear when we consider the major classes of empirically indeterministic processes below.
Fluctuations in neural systems
89
2.4. Noise There are two major classes of empirically indeterministic processes. The first, and the traditional representative of stochastic processes, is noise. This relates to processes generated by uncorrelated influences from a large number of incoherent, uncontrolled influences. (For a general presentation of noise see Ref. [19].) It is important to distinguish external and internal noise [19]. External noise is simply described as disturbances from outside a system, by variations of temperature, irregular movements, irregular spreading of substances in the air, sensory input and so on. This is described by some probabilistic framework, which is independent of the basic dynamics of a studied system. In contrast, internal noise is part of the system under study, e.g. due to thermalfluctuations. Effects of external noise can be significantly reduced although never completely avoided, but this is not possible with internal noise. Any activity is inevitably associated with internal noise. In general, properties of thermal fluctuations are related to features of thermal equilibrium and to the dissipation that governs the spreading of energy or momentum by transport processes such as electric or thermal conduction. This is, for instance, evident in the archetype of noise, Brownian motion. The functional importance of Brownian motion and thermal fluctuations in biology relates to the fact that it is a major source of structural transformations in macromolecules, and consequently, also a source of mesoscopic and macroscopic fluctuations.
2.5. Chaos The other major type of empirically indeterministic processes is what is referred to, somewhat inappropriately, as chaos. Chaotic processes are generated by mechanisms described by a relatively simple deterministic, mathematical framework. This means that it is described by a small number of variables (preferably 3-5, usually not more than 10). It provides an irregular pattern, which is confined to a finite (compact) region without any stable fix point, or stable periodic solutions. It may cover an infinite region, which can be transformed to a finite region, but this must still be without stable fix points. The most important feature is that the mathematical equations lead to instabilities that make the process extremely sensitive to variations at the initial state or during the time course. This means that a chaotic process could be predictable at short times, but not at long times. A chaotic process cannot at long times be distinguished from other random processes by general methods such as analysis of correlations, spectral densities or entropy functions. To establish that a process is chaotic, it is necessary to test special consequences of the mathematical framework, such as strong short-time correlations, occurrence of unstable periodic orbits, or fractal features. In summary, the distinction between chaos and noise lies in the mechanisms and the simplicity: chaos is generated by simple, controllable mechanisms, noise by a large number of uncontrollable mechanisms. But the complexity border is not sharp. There is a continuous transition between chaos and noise, when increasing the complexity of the system. It is also important to emphasize the time scale. At short time scales, chaos is predictable and thus empirically deterministic, while noise is unpredictable at all
90
P. flrhem and H. Liljenstr6m
time scales. At long time scales, chaotic processes are not distinguishable from noise when referring to probabilistic properties. However, what can be considered as short or long time scale is system and level dependent and not well defined. For neural systems at different organizational levels, this is an important issue, if one wants to investigate the underlying mechanisms behind the process irregularities.
3. Microscopic fluctuations: ion channel kinetics
All electrical activity of the nervous system, from intracellularly recorded impulse patterns of single neurons to extracellularly recorded field potentials of complex neuron populations, depends ultimately on discrete irregular unit events. These events are caused by the activity of specific integral membrane proteins, ion channels. What type of fluctuations do these channel-induced currents represent? To better understand this, we will here discuss the kinetics of channel gating, with special emphasis on their molecular background. We will also briefly review how these stochastic events at the microscopic level may affect the more or less ordered processes at higher levels of the nervous system. This is a question of fundamental importance for the understanding of a wide range of brain phenomena, including the elusive one of consciousness (see Refs. [9,10,20]). Nevertheless, it is a question remarkably little studied.
3.1. The channel idea It was not until the voltage clamp studies on giant squid axons by Hodgkin and Huxley in the early 1950s (see Ref. [21]), followed up by Frankenheauser [22] for the more complex vertebrate nerve fiber, that more detailed ideas about ion channel mechanisms were developed. The voltage clamp studies showed that the cause of the AP was time- and potential-dependent Na + and K + currents. By pharmacological analysis (see Ref. [23]) it was shown that the current pathways were separate, suggesting that it depended on specific protein molecules in the membrane. Although the idea of water-filled pores in biological membranes goes back to the work of Brticke [24], it was not until the development of the patch clamp technique [25], that solid evidence for the view that ion currents pass through pores in specific protein molecules was presented (for a historical account see Ref. [23]).
3.2. Type of fluctuations The development of the patch clamp technique is one of the more dramatic breakthroughs in experimental neurobiology in the last decades. It made it possible to directly record and analyze single-channel kinetics and thus single-molecule dynamics (Fig. 1). Not unexpectedly, the current records for single-channels showed discrete all-or-nothing events, reflecting switching between open and closed states. The probability distributions of open and closed dwell times could well be described by kinetic models, assuming random memoryless first-order state transitions (see Refs. [23,26]). Such models, a subgroup of models describing Markov processes, had
91
Fluctuations in neural systems
open i ,
Closed
Ill
~Iit~~Iii~.t~ !rtt~l~
I00~ SpA,
Fig. 1. The complex dynamics of single-channel activity. Current recording from hippocampal neuron. Arrows indicate sub-conductance openings (courtesy Dr. Staffan Johansson). already successfully been used to describe macroscopic data in the pre-patch clamp era. Most of them assume a relatively limited number of states with time-independent transition rates. However, other types of models, some of them non-Markovian, have also been suggested to explain the channel kinetics. A reason for such interpretations is the argument that recent results in protein physics are not compatible with the traditional kinetic models used. As mentioned, traditional channel models assume a relatively few states with discrete and relatively high energy barriers, while protein data suggest a very large number of states with relatively low energy barriers. Such a nontraditional approach is still highly controversial, as is the question of what constitutes the physical basis of the randomness in the channel models (i.e. do we need a quantum mechanical interpretation?) 9 3.3. Molecular background." structure
Ion channels show great diversity and a widespread occurrence. A first classification separates between channels regulated by membrane potential and channels regulated by ligands. They can be further subclassified according to different schemes; according to their ligands or to their selective permeability for specific ions. The channels relevant for the propagation of nerve impulses are voltage-gated channels of the Na, K and Ca channel super families, including the Na and delayed rectifier K channels of the classical studies by Hodgkin and Huxley [27]. All the channels belonging to this class show a general building plan, a central water-filled pore surrounded by four symmetric domains or subunits. Early pharmacological studies
92
P. ,~rhem and H. Liljenstr6m
(see Refs. [23,28]) suggested that the pore consists of a wide external mouth, a narrow selectivity filter and a wide inner vestibule. Molecular biology studies [29-30] revealed that each subunit or domain consists of about 500 amino acid residues forming six transmembrane helical segments (S1-$6), where $4 is characterized by regularly located positively charged residues (arginine and lysine), early conceived as the voltage sensor of the channel [29]. The loop between $5 and $6 forms a reentrant hairpin loop, tucked down into the channel (P-loop), forming the walls of the external part of the pore. The outer portion of this loop forms the wider vestibule and the inner portion the selectivity filter. The K channels are structurally simpler than Na and Ca channels, and consist of four separate subunits aggregated around the pore [31], while Na and Ca channels form four linked domains (I-IV), making the total channel protein consist of about 2000 amino acid residues [29].
3.4. Molecular background." gating To understand the role of channels for cell dynamics it is essential to understand two basic features of the channels: the capacity to selectively allow specific ions to pass through the channel pore and the capacity to gate the pore by activation and inactivation processes. These features were treated as separate mechanisms by Hodgkin and Huxley in their classical investigation [27], and has been a bedrock principle in later analysis of channel biophysics.
3.4.1. Activation The gating processes in Na and K channels have since the classical Hodgkin-Huxley analysis been separated in activation and inactivation processes. The activation process was early conjectured to be associated with a ratchet mechanism, where a charge movement precedes the opening of the channel (see Ref. [28]). Measurements of these gating charge movements were possible already in the beginning of the 1970s [32,33]. With later insights in the structural details it was assumed to be associated with movements of the positively charged $4 segment [29]. Mutation and cystein substitution experiments with concomitant application of thiol reagents on both Na and K channels confirmed this conclusion and suggested that the $4 segment at depolarization moves outward in a spiral rotation (the helical screw model [34]). The movement seems to proceed in several steps; the first independently for each subunit or domain and the later in a concerted fashion (for a review see Ref. [35]), dependent on stabilizing negative countercharges. Another remarkable feature of the $4 movement in Na channels, is the immobilization of the segment in depolarized position when the membrane is repolarized. In recent fluorescence experiments it has been shown that this immobilization mainly concerns domains III and IV, where the $4 segments are locked by the inactivation process, while the corresponding segments in domains I and II are freely movable. The $4 movement is followed by a process that opens the pore by some conformational change. Early pharmacological experiments suggested that this gating mechanism is located at the internal side, allowing a wide inner vestibule to accommodate relatively large compounds [36]. The recent crystallographic picture of a
Fluctuations in neural systems
93
2TM K channel reveals that the walls of this inner vestibule are formed by the $6 segments [37]. Electronic paramagnetic resonance (EPR) studies suggest that they also are directly involved in the gating [38]. How this is related to the movements of the $4 segments which initiate the gating process in 6TM K channels is unclear. The cytoplasmic loops between $4 and $5 seem here of critical importance, either as links in the process or as direct gates.
3.4.2. Inactivation Na, Ca and many K (but not all) channels show a second gating process (or rather a complex of gating processes), an inactivation process separate from the activation. The main component of this inactivation, the fast inactivation, seems to be of another, mechanistically simpler, kind than the activation process. Early studies of Na channels suggested that the channels open before they inactivate, that the inactivation process is relatively voltage independent, and that the inactivation depends on a internal particle that prevents closing of the activation gate (the chain and ball model by Armstrong and Bezanilla [39]). This suggestion has to a large extent been confirmed in later structural studies. The most detailed investigation has so far been performed on Shaker channels, inactivating K channels in the fruit fly, Drosophila melanogaster. In these channels, the first 20 amino acid residues of the N-terminal end of each subunit form a blocking particle and the next 63 residues form the chain [40]. The resulting fast inactivation has consequently been named Ninactivation. In accordance with this view, mutated channels with only one inactivation particle are fourfold slower [41]. The receptor for the inactivation particle seems to be located in the internal mouth of the pore, presumably critically dependent on residues of the loop between $4 and $5 and of the P-loop. For Na channels the inactivation process seems slightly different. Only one gating structure seems here to be involved, most likely a portion of the linker between domain III and domain IV. Of special interest is a sequence of three hydrophobic residues of this linker forming the bonds to a receptor, presumably formed by residues of the $6 segment. In addition to these fast inactivation processes, there are, as mentioned above, other, slower inactivation processes. One such inactivation process seems to involve a constriction of the external part of the pore rather than block of the internal portion of the pore. Mutation studies show that mainly residues of the C-terminal end are involved, consequently this form of inactivation is referred to as C-inactivation.
3.4.3. Physics of gating What forces are involved in these gating processes? No doubt, electrostatic forces play an essential role in the rotating movement of the $4 segment at activation and inactivation. The $4 segment is moving in a landscape of charges. Some of these charges have been the subject of extensive investigations. By taking advantage of the fact that some metal ions seem to neutralize fixed charges on the external surface of the channel protein by screening [42] the role of these charges for the gating has been possible to analyze. Recent investigations suggest that charges of the loop between $5 and the pore region are the determinants of the functional surface charge density
P. flrhem and H. LiljenstrO'm
94
[43]. They further suggest that the segment comprises a helical structure, with a consequent dipole moment stabilizing the $4 segment [44]. However, few attempts to analyze the $4 movement from electrostatic first principles have been made. In a strict sense, such an approach is most likely impossible. As repeatedly pointed out, perhaps most insistently by Mayr [45], biological mechanisms are not fully explainable without including an evolutionary history, most likely not predictable from physical first principles. Nevertheless, with the increased insights in molecular details, analysing channel gating in terms of physical principles seems a profitable enterprise. The same situation applies to the issue of the physical basis of stochastic gating. The dominating view is that thermal fluctuations play a fundamental role. However, alternative views assuming a greater role of deterministic atomic, electrostatic and hydrophobic forces as driving mechanisms have been proposed. Yet, as already pointed out, it has not been possible so far to determine whether this view provides a better description of the observed data than the traditional one. An analysis along the lines of Moss and Pei [46,47], i.e. an analysis of the occurrence of unstable periodic orbits, seems an interesting alternative to resolve this question.
3.4.4. Mathematics of gating The origin of most mathematical treatment of channel kinetics is the equations developed by Hodgkin and Huxley [27] in their classical study of the squid giant axon. These form a system of ordinary differential equations describing the Na and K currents in terms of activation (m and n) and inactivation (h) parameters, determined by time-independent but voltage-dependent rate constants (see Appendix A). This system can readily be transformed into an equivalent system of differential equations in terms of channel states instead of activation and inactivation parameters, and consequently readily into a state scheme [48]. Thus, the Hodgkin-Huxley equations for the K channel are described by the following state diagram 4a
CI ~
3~
C2 ~
2~
C3 ~
u
C4 ~
0
Scheme 1. where C1-C 4 denote closed states, O an open state and 0~ forward and [3 backward rate constants. Such a scheme lends itself to a probabilistic interpretation. In this case the scheme describes a time-homogeneous Markov process, with the rate constants reflecting transition probabilities (they are not probabilities proper, since they can be larger than 1). Scheme 1 has, not unexpectedly, been shown too simplistic to explain detailed results from gating current and single-channel experiments. Consequently it has been replaced by other schemes, the most detailed perhaps presented for the Shaker K channel of D. melanogaster, comprising considerably more discrete states than Scheme 1 and cooperative state transitions [49].
Fluctuations in neural systems
95
The Na channel of the classical studies of Hodgkin and Huxley [27] is correspondingly described by the following scheme (see Ref. [23]): 3or
C
1
Ii
a
2or
of.
v---
--~
C
2
--~ v---
C
3
~-
-~
I
2
v---
--~
I
3
~,-~- I 4
~--
1L
1L
0
Scheme 2. where C1-C3, O, a and 13 have the same meanings as above, and Il-I4 denote inactivated states. • and 8 denote forward and backward rate constants, respectively, for the inactivation. Also here studies have shown that this scheme does not explain the more detailed results from whole-cell, gating and single-channel current experiments. Perhaps the most used Na channel model today is the following coupled scheme presented by Patlak [50]: C
1
--~ v---
C
2
--~ v---
C
3
--~
C4
~
% 1L 11
O
1L ~-
12
Scheme 3. In recent years, several additions to the Patlak scheme have been suggested on the basis of new experimental data. For instance, the idea of several open states and of cooperative state transitions seems necessary to explain certain experimental findings and to make the model more structurally relevant. However, it should be noted that all models successfully used so far are extensions of the type just discussed, i.e. described by a limited number of states and by time-independent transition rates. Adopting a probabilistic interpretation they become Markov models. Such models explain macroscopic features of currents from populations of channels as well as statistical mean values of single channel behavior, such as mean open and closed dwell times [26,51]. They predict autocorrelation functions of current fluctuations as sums of exponentials and frequency spectra as sums of Lorenzian functions. The dwell time distributions of open and closed times under stationary conditions are predicted to be sums of exponentials. However, to simulate stochastic properties of the channels, numerical stochastic simulations are required. Such simulations have been used, although up till now sparingly, to investigate for instance the variability in impulse firing [52] and fluctuations under nonstationary conditions [53]. The importance of such studies is likely to increase in the future. By definition, the discussed Markov processes are probabilistic, described by strictly indeterministic or high-dimensional chaotic models. As mentioned above and as pointed out by Liebovitch and Todorov [54], alternative low-dimensional chaotic models have also been suggested as explanations to certain single channel
96
P. ,4rhem and H. Liljenstr6m
behavior. However, so far it has not been possible to experimentally extract chaotic components in single-channel recordings. The attempts to do so raise a number of interesting questions, some with far-reaching principal consequences. The models suggested to date are physically uncommitted; the parameters have no defined physical meaning. The problems around physically (or ontologically) committed and uncommitted models touch basic problems within the theory of science and will be dealt with in the last section of this chapter. 3.4.6. Functional role of stochastic gating
What role does the stochasticity of channels play for cellular and other higher level events? This question is remarkably little studied; the whole area is characterized by a conspicuous dearth of data. However, both experimental [4,5] and theoretical [52] investigations suggest that spike patterns are directly affected by the stochasticity of single-channel openings under certain conditions. In a few instances, specific neurons that function as random generators of impulses have been found; impulses in high resistance olfactory receptor cells [4] and cultured hippocampal interneurons [5] have been shown to be caused by single-channel openings. Due to the central role of hippocampus for brain information processing, the results from this investigation will briefly be described in Section 4. In summary, the results suggest that single-channel openings can cause intrinsic spontaneous impulse generation in a subset of small hippocampal neurons under physiological conditions. These neurons will obviously function as cellular random generators. Thus, understanding the stochastic nature of the channel kinetics seems essential for understanding the activity at both cellular and network levels. What functional consequences will the findings of cellular random generators have for the higher level activities of the brain? And what consequences will the type of stochastic process have? We will discuss these issues in more detail when dealing with mesoscale fluctuations in Section 4. We will also discuss the results of macroscale simulations of cortical network models in Section 5. In short, we could show that spontaneous activity of critically spaced neurons can induce global, synchronized activity parameter oscillations of the excitatory layer with a frequency in the gamma range (30-70 Hz) [55,56]. As the density of spontaneously active neurons or the activity level were increased, the oscillations tended to change into irregular patterns. The importance of these conclusions relate to the fact that synchronized oscillations have been suggested to play a central role in brain functions such as memory and consciousness. Specifically, oscillations in the gamma range have been implicated as essential for binding together neurons involved in a common perceptual task (the binding problem) [57,58].
4. Mesoscopic fluctuations: cellular processes and neural coding Since the neuron traditionally is described as the functional unit of the nervous system, the AP may be regarded as the unit process. But while the origin of the
Fluctuations in neural systems
97
AP per se is relatively well known, since the classical work of Hodgkin and Huxley in the early 1950s (see Ref. [21]) and while the molecular background to the underlying ion channel activity is now beginning to be revealed (see Ref. [23]), much less is known about how the APs are used by the nervous system to code information. It seems natural to assume that neural information is encoded as some ordered sequence of impulses. But is it the rate of impulses that is the main information carrier, or does the temporal pattern of impulses play the main role? These are the classical alternatives discussed. The traditional view when modeling information processing in the brain is to use rate coding. In recent years, however, both theoretical (how to convert intensity to temporal patterns [59]) and empirical (the variability of interspike intervals is larger than predicted by the rate code interpretation [60]) studies have stressed the temporal code possibility. Recently, Moss and Braun [61] demonstrated fairly complete evidence for temporal coding in a crustacean mechanoreceptor system. This demonstration included: (i) identification and definition of the signal in the noisy environment, (ii) statistically significant occurrence of the signal, (iii) the signals connection to a stimulus, and (iv) a signal induced defined dynamical response. A signal induced behavioral response has not been reported so far. There are also a number of recent alternatives of coding types, such as amplitude modulation [56] and spatial integration over a large number of neurons [62]. However, most likely the neural coding is not uniform, but information is coded differently in different parts of the nervous system and at different situations. These issues will be discussed in more detail below. Especially we will consider the case of amplitude coding. It is less evident that the stochastic features of the pulse trains play some functional role. We will here discuss this issue by taking up two aspects of the stochastic features, the fluctuating interval between APs and fluctuating amplitude. While the distribution of spike intervals has been studied extensively for a number of neuron types, less is known about the distribution of spike amplitudes. Even less is known about what type of stochastic pattern the impulse trains belong to; noise or lowdimensional chaos? We will here first discuss interval fluctuations, and then amplitude fluctuations. We will also briefly discuss coding and the functional role of stochastic pulse patterns.
4.1. Temporal fluctuations: single-channel induced impulses The mechanisms of irregular spontaneous neuronal activity (Fig. 2) are poorly understood (in contrast to regular spontaneous activity; see Ref. [63]). However, extensive studies of interval distributions in different neural systems have been performed, showing activity dependent patterns [60,62,64]. Cortical neurons show different distributions at low, normal and high activities [62]. Recently, it was shown in simulation experiments that interspike-interval distributions of pyramidal cells are neither compatible with the spike patterns predicted by traditional integrate-and-fire neurons, nor by more realistic model neurons. As an explanation,
P. Arhem and H. Liljenstr6m
98
(c)
..../
J 20 mV
(b)
~
5 pA i
1 O0 ms
Fig. 2. Spontaneous activity in small hippocampal interneurons, induced by single-channel openings. (a) Action potentials associated with plateau potentials, caused by single-channel openings. Whole-cell recordings in primary cultures. (b) Spontaneous action currents associated with single-channel openings. Cell-attached recordings in primary cultures. (c) Action currents and single-channel openings in "intact" neurons. Cell-attached recordings in hippocampal slice. Experimental details for (a) and (b) in Ref. [5]. coincidence detectors and excessive feedback have been invoked [60,65]. One possible cause is the mechanism of single-channel induced impulses indicated above, observed in olfactory neurons and hippocampal neurons [4,5]. Another extensively studied mechanism is based on the probabilistic nature of synaptic activity [66]. Here we will focus on the single-channel induced impulses in hippocampal neurons, but we will also briefly treat the synapse-induced interval fluctuations. Hippocampus is a cortical structure that, in spite of its relatively simple architecture, seems essential for the cognitive activity of the brain, e.g. memory formation. In an analysis of a subset of hippocampal neurons in the rat, neurons with a soma diameter of less than 10 ~tm we found impulse activity caused by single-channel openings [5]. Many of these neurons are most likely inhibitory interneurons. The reason for focusing on small sized neurons is that, in spite of their abundance in cortical structures, they are little investigated due to experimental reasons. Classical microelectrode measurements require relatively large cells, such as pyramidal cells, which can be more than 100 pm in diameter. However, with the introduction of the patch-clamp technique in the beginning of the 1980s it has been possible to study also small cells with good precision [25,67]. We used this tech-
99
Fluctuations in neural systems
nique to analyze these neurons, isolated in primary culture as well as embedded in intact slices. The conclusion that single-channel events induce APs in small hippocampal neurons was based on experiments with whole-cell and cell-attached recordings. The whole-cell recordings revealed a clear correlation between the generation of impulses and plateau-potential events, the time course of which clearly suggests that they were caused by single-channel openings and closures (Fig. 3). The cell-attached recordings confirmed the findings, showing a good correlation between action currents (and consequently APs) and single-channel openings preceding the action currents [5]. The cell-attached recordings, in addition, showed exponential relaxation of single-channel currents, indicating that the membrane potential was significantly affected by the current flowing through single ion channels, as required by the hypothesis. The natural frequency of the spontaneous activity was relatively low
BURSTING
>
0 -4O I
I
l
0
LL[ilHIi ! [
I
2
3
i ............. "~
3
I
4
, 5
I 6.
5
6
PACING
>
{}
-40 "
-V r. -;~
0
i I
r
......
i
t 4
f
IRREGULAR
-40 I
I
0
l
I
l
3 s
2
....l . . . . . . . . . . . .
4
l .............
5
J
6
SILENT
40 ! 4(1 9
--_~ ~ o
-
_ I !
.....
l 2
,
....
I
4
I
5
....
I
6
Fig. 3. Different types of spontaneous activity in neurons. Whole-cell recordings from different cells in the hypothalamic preoptic nucleus of rat. (courtesy Dr. Staffan Johansson).
100
P. Arhem and H. Liljenstr6m
(less than 1 Hz), making time-series analysis difficult. Furthermore, since the type or types of channels involved have not yet been identified, the channel kinetics under more physiological conditions are consequently unknown. Assuming that the channels are ligand activated, the normal spontaneous impulse frequency under physiological conditions may be considerably higher than that of the isolated cells. This was one reason to investigate corresponding neurons in a slice preparation (manuscript in preparation). The frequency of spontaneous impulses was, not unexpectedly, considerably higher than in the cultured neurons. Using the cell-attached configuration to avoid effects of unphysiological internal solutions, the mean frequency was found to be 12 Hz (mean for 11 neurons). Preliminary data on the interspike-interval distribution suggest a skewed distribution. An analysis of the correlation between channel openings and spontaneous impulses was investigated in more detail, suggests a causal relation. In conclusion, there is reasonably strong evidence that single-channel openings can cause intrinsic spontaneous impulse generation in a subset of small hippocampal neurons under physiological conditions. Under these conditions, understanding the stochastic nature of the channel kinetics is clearly essential for understanding the activity at a cellular level. These neurons will evidently function as cellular random generators. What type of stochastic process do these random generators demonstrate? Noise, which is the dominant view today [26], or low-dimensional chaos [54,68]? The functional role of these cellular random or pseudo-random generators is yet unknown. Even less is known about the relevance, if any, of the detailed stochastic nature of the channels for the cellular and the cortical network functions, or about the mechanism underlying the stochasticity. It has been argued, although not at all generally accepted, that quantum mechanical processes may play a role here (see e.g. Refs. [8,69,70]). 4.1.1. Functional role of cellular random generators One obvious possibility is that the discussed random generators are used to generate spontaneous activity in the brain, with consequent phase transitions etc. Spontaneous brain activity seems essential for normal brain function. A special case has been made for the role of spontaneous activity to shape the synaptic plasticity during ontogeny (see Refs. [5,7]), and it has even been argued that spontaneous activity plays a role for conscious processes (see Refs. [8,9,71]). To investigate these questions for hippocampal circuits, we have simulated spontaneous activity in cortical network models. In a first step, we used a model based on a simplified description of the architecture of the CA 1 area of hippocampus, comprising a layer of fast inhibitory feedforward interneurons (corresponding to basket cells), a layer of excitatory neurons (pyramidal cells) and a layer of feedback inhibitory interneurons [72]. The model is described in more detail in Appendix B, and simulation results are discussed below and in Section 5.3. The results showed that a very small number of spontaneously active neurons in the excitatory layer, critically located close to each other and at a critical activity level, may induce global, synchronized activity parameter oscillations with a frequency in the gamma range (30-70 Hz) [55]. As the number of spontaneously active
Fluctuations in neural systems
101
neurons or the activity level is increased, the oscillations tend to change into more irregular patterns. When the spontaneously active neurons are spatially spread out no oscillatory activity is induced. In another series of simulations we analyzed the effects of spontaneously active feedforward interneurons. This was prompted by the reported experimental finding that single inhibitory neurons can synchronize the activity of up to 1000 pyramidal cells [73]. We could show that a single spontaneously active cell in the inhibitory feedforward layer could induce periods of synchronous activity oscillations of the cells in the excitatory layer with a frequency in the gamma range, interrupted by periods of irregular activity. The relevance of these simulations for the present discussion about the role of cellular random generators is the following. They suggest that spontaneous inhibitory activity (in small interneurons), acting directly on either cells in the excitatory layer (pyramidal cells), or cells in the feedforward inhibitory layer (basket cells), may induce global oscillatory activity in hippocampus. The importance of this conclusion relates to the fact that synchronized oscillations have been suggested to play a central role in brain function as memory states. Specifically, oscillations in the gamma range have been implicated as essential for binding neurons involved in a common perceptual task together (the binding problem), and even as associated specifically with conscious states [57,58]. Furthermore, spontaneous activity has been shown to improve system performance by reducing recall time in associative memory tasks [74]. This will be further discussed in Section 5. 4.2. Temporal fluctuations: stochastic transmission at synapses
Another source of fluctuations of the temporal pattern of impulse sequences is the probabilistic nature of synaptic function. The information processing at a synapse may conceptually be broken down into three steps. The first step consists of transmitter substances affecting the postsynaptic membrane on the dendrites or the soma, giving rise to summed excitatory or inhibitory potential changes. The second step consists of the triggering of the neuron by the integrated input signals. The third step consists of the exocytosis of neurotransmitter substances at the presynaptic terminal. At all levels, irregular fluctuations play a role. The relative role of these steps in vivo is unclear. However, a number of suggestions are presented by recent studies of in vitro preparations. For instance, synaptic release of a transmitter has been found to be very unreliable, ranging from an exocytosis/AP frequency of as low as 0.01 in sympathetic terminals to close to one in certain neocortical neurons [66,75]. 4.2.1. Functional role of synaptic fluctuations What could be the functional role of the probabilistic nature of the synaptic transmission, and what would be the mechanism? An information processing system which only transmits every 10th, or 100th "word" seems rather inefficient. Is this apparent inefficiency due to construction difficulties? Or does it indeed have
102
P. Arhem and H. Liljenstr6m
a functional role? There are reasons to believe that it could have an adaptive value. Central neurons have been shown to contain different active sites with different probabilities, some sites with probabilities close to one [66]. The conclusion drawn is that the probabilistic activity is not due to construction deficiencies, but instead would be of functional significance. A proposed role is the increased dynamic range implied by features, such as pulse facilitation and depression, found in the neuromuscular junction, and explained by probabilistic release of transmitter. The probabilistic features may also explain plasticity. A number of recent studies suggest that synaptic plastic can be implemented as a change in release probability [75].
4.3. Impulse-amplitude fluctuations The all-or-nothing principle has been the central dogma of nervous conduction since the work of Gotch at the beginning of the century. This means that a nerve impulse shows constant amplitude for all super threshold stimulation under constant conditions. However, several recent studies suggest deviations from this principle [56,76]. Studies of for instance hippocampal interneurons, preoptic neurons of hypothalamus, and crab axons show spontaneous graded APs [77]. In fact, it may be argued that amplitude variability is a rather common feature in the brain [76], the layered structure of the cortex being evidence. Here we will focus on an analysis of graded spontaneous impulses in a subset of hippocampal neurons, small sized interneurons (Fig. 4). The study was performed on both cultured cells and cells in intact tissue. The amplitude variation of spontaneous impulses was considerable, in whole-cell as well as in cell-attached recordings [78]. The mechanism of the amplitude variation in cultured cells was investigated with a stimulus protocol during whole-cell recording. The study showed that the AP amplitude systematically depended on stimulus magnitude: the amplitude increased with increased stimulus amplitude. A voltageclamp analysis was performed to obtain quantitative details about underlying currents. This revealed voltage-gated Na channels and two types of voltage-gated K channels (A-type and delayed-rectifier channels). The voltage dependence and time dependence of these channels were described in terms of modified FrankenhaeuserHuxley equations [79], an array of first-order differential equations, for the use in computer simulation experiments [78] (see Appendix A). The study of corresponding cells embedded in intact slices was performed to investigate whether the impulse-amplitude variability occurred physiologically, and to exclude artifactual culture conditions as a cause. Results from the cell-attached configuration clearly showed amplitude fluctuations of the same magnitude as those recorded in cultured cells. The results thus support the view that amplitude-modulated impulse trains may be a mode of normal information transmission in hippocampus. Similar amplitude variations of spontaneous as well as of stimulus-elicited impulses have also been observed in the preoptic nuclei of hypothalamus [77], suggesting a more general role in the brain.
103
Fluctuations in neural systems
A
B
20
o
f
-4o
-4o
-60
-60
-80
-80 L
o
I.
,
J
,
20
Time (ms)
I
3'0
,,
40
20 30 Time(ms) .....!
i ....
40 J
C
o -4o [
-80 [
0
-
I
0.5
,
I
~
1.0
.J
1,5
Time (ms)
_
.j
2.0
Fig. 4. Computed and recorded action potentials, showing graded and all-or-none responses. A - Computed graded responses for increasing stimulus amplitude. Model of hippocampal neuron. B - Experimentally recorded graded action potential from isolated hippocampal neuron. C - Computed all-or-none action potentials for increasing stimulus amplitude. Model of myelinated axon. From Ref. [78].
4.3.1. Mechanisms of amplitude regulation The mechanism of the observed amplitude variability was investigated by computer simulations of the cellular membrane properties. These simulations show that graded impulses are possible to produce with the Frankenhaeuser-Huxley equations obtained from the voltage-damp experiments [79]. They further show that in this model the deviation from the all-or-nothing principle critically depends on the density of Na channels [78]. There seems to exist a critical permeability window: a higher or lower number of Na channels was found either to make the cell respond with all-or-nothing impulses or nonregenerative passive potential changes. The density of K channels contributed to the range of variability, but was not essential for the phenomenon. Neither was the large time constant (mean value 33 ms). An interesting finding was that a critical increase (about 17 times) in the number of Na channels could transform the neuron model into a bistable memory device [80].
104
P. flrhem and H. Liljenstrfm
4.3.2. Functional role of graded impulses The finding of impulse-amplitude variability suggests that amplitude modulation may be a mode of information transmission in the brain, perhaps complementing frequency modulation. Most discussions about neuronal information transmission have been based on the assumption that information is coded as frequency-modulated impulse trains, a consequence of the generally accepted all-or-nothing principle for neuronal signaling. Consequently, the present debate has mainly concerned the question whether the neuronal information is rate or temporal coded [59,60]. Most experimental support for these discussions has however, been obtained from large cortical neurons, mainly pyramidal cells. Theoretically, the addition of amplitude-modulated impulse transmission to the basic frequency-modulated transmission would increase the neuronal information transmission considerably. In a recent study of photoreceptors, transmission with graded (i.e. amplitude-modulated) impulses in insect eyes were found to carry fivefold more information than transmission with exclusively all-or-nothing impulses [81,82]. Thus, the amplitude variation experimentally observed in small hippocampal neurons may be of considerable interest for the study of information processing in the brain. A main question is whether the amplitude variability is input related, extrinsic, or whether it mainly depends on the internal state, i.e. it is an intrinsic property. To show that such amplitude-modulated impulses have a functional role in hippocampus it is necessary to demonstrate that the modulation of the AP amplitude in the soma is reflected in the output from the neuron. In principle, this would occur if graded impulses could propagate along the axon, or if the output was located at the soma or at the dendrites. An experimental demonstration requires simultaneous recordings from synaptically connected neurons, not yet done. In principle, graded axonal conduction does not seem unreasonable. Decrement-free conduction of graded potentials has been demonstrated in axons of insect neurons [83]. Furthermore, most small hippocampal interneurons have short axons, shorter than their probable length constants, suggesting that passive electrotonic conduction will suffice for impulse transmission. It has been argued that the layered cortical structure in itself suggests that nonregenerative, and consequently graded impulses may play an important role in cortical information processing [76]. The informationprocessing capacity of larger cortical networks comprising such impulse transmission is still largely unexplored.
5. Macroscopic fluctuations: networks and functional efficiency The macroscopic activity of the brain can be studied with experimental techniques, such as electroencephalography (EEG), magnetoencephalography (MEG), positron emission tomography (PET), or functional magnetic resonance imaging (fMRI). These methods reveal a very complex neurodynamics, which seem to be more or less correlated with mental processes. Typically, there are oscillations within certain dominant frequency bands, often interspersed with aperiodic, chaotic-like or noisy behavior. There are also spatio-temporal activity patterns that change rapidly and appear over extended areas of the brain (see e.g. Refs. [3,84]).
Fluctuations in neural systems
105
What is the origin and significance of this complex activity? How is this activity related to that of the lower levels of organization? In particular, what are the effects of the lower level fluctuations discussed in Sections 3 and 4, on this higher, network level? Could such fluctuations just be regarded as background noise, without any significance, or could the fluctuations sometimes be amplified and significantly effect the network dynamics? How much of the complex dynamics is due to the network circuitry, and how much is due to the activity of its constituent parts? For example, is the oscillatory activity seen with EEG due to pacemaker neurons, or to the interplay of excitatory and inhibitory neurons in feedback loops? How can the complex neurodynamics be regulated and controlled? Computational methods may aid in approaching some of these questions. In this Section, we will discuss the network dynamics, primarily from a computational point of view, and we present simulation results that may give a clue to the origin and role of this dynamics, in particular its irregular part.
5.1. Neurodynamics If the nervous system has been optimized during evolution to deal with complex and rapid environmental changes at time scales shorter to, or comparable to the life span of the individual, it should be reflected by a corresponding rich and complex internal dynamics, operating at approximately the same time scale. Such dynamics would presumably underlie an increasingly efficient way of interacting with the world, a trait which presumably has been crucial throughout evolution. (For a more extensive discussion on these issues, see Ref. [85].) Below, we will briefly discuss the kind of neurodynamics that could account for an efficient information processing of an organism, which increases its chance of survival, both at a very basic level and at the level of higher cognitive functions. The rich dynamics of the brain can be well exemplified by the olfactory system (primarily bulb and cortex), which has been extensively studied by e.g. Freeman and co-workers [3,86-89]. This system processes odor information, determining the quality and quantity of odor objects in a fluctuating environment. An essential feature in its dynamics is spatio-temporal patterns of activity, which do not seem to depend critically on the detailed functioning of individual neurons. Self-organization of patterns appears at the collective level of a very large number of neurons, and oscillations occur at various frequencies, in particular around 5 Hz (theta rhythm) and 40 Hz (gamma rhythm). There are also waves of activity moving across the surface of the olfactory cortex. EEG studies of bulb and cortex also show evidence of chaos, or at least aperiodic behavior different from noise and with some degree of order (see Fig. 5, upper graph). Similar dynamics are also displayed by the hippocampus, the structure that more than any other is associated with learning and memory. The possible existence of chaos in various brain structures, as revealed by EEG and other methods, is discussed, for example, in Refs. [84,90,91]. The causal origin of this dynamics, and what it might mean to the system, is still uncertain but can be investigated with computational methods, as described below. However, regardless if the network dynamics is a result of underlying cellular or
P. ~lrhem and H. Liljenstrfm
106
100 50
>
-100 -150 0
100
....
I 200
I 400
t 600
,
.
i 200
t
I
400
600
.
I 800
.
I 1000 msec
.
I 1200
I 1400
l 1600
~ 1800
I
I
I
I
1400
1600
1800
2000
-
50
~>
o
-5o -100 -150
.. 0
.
I 800
.
t
1000 "msec"
I
1200
2000
Fig. 5. Real (top) and simulated (bottom) EEG, showing the complex dynamics of cortical structures. The upper trace is from rat olfactory cortex (data courtesy of Leslie Kay), whereas the bottom trace is from a simulation with the current model of the olfactory cortex. The x-axis shows milliseconds, and the y-axis is in microvolts. From Ref. [74]. circuitry dynamics, and regardless if it is useful to the system or not, it should be important to modulate this dynamics by means of regulatory or control mechanisms of some kind. Many factors influence the dynamical state of brain structures, for example the excitability of neurons and the synaptic strengths in the connections between them. A number of chemical agents, such as neuropeptides, acetylcholine (ACh) and serotonin (5-HT), can change the excitability of a large number of neurons simultaneously, or the synaptic transmission between them (see e.g. Ref. [92]). Such changes normally also result in changes in network dynamics. Other means of regulating the network dynamics, include various feedback mechanisms [93] and fast, nonsynaptic effects (gap junction and/or electromagnetic fields) that could cause synchronization over large cortical areas [94,95].
5.2. Computational approaches Computational methods have since long been used in neuroscience, most successfully for the description of APs, by the work of Hodgkin and Huxley in the early 1950s [27]. Also when approaching the problem of interactions between different neural levels, perhaps in particular for fluctuations at the network level, computa-
Fluctuations in neural systems
107
tional models can prove useful, and sometimes be the sole method of investigation. The main problem for the modeler is to find an appropriate level of description, or level of detail. An appropriate level is one that is sufficient in capturing any particular feature, process or function of the real system, but that does not include more details than necessary for the problem under investigation. In short, the model used should be "as simple as possible, but no simpler". A great number of neural network models have been developed with the aim of capturing some feature(s) of biological neural networks. Such attempts include the so-called multilayer perceptron, the neo-cognitron, and self-organizing feature maps. Many of these models have also been used, more or less successfully, for some kind of pattern recognition. We will not further discuss this type of network models, but refer to any textbook on artificial neural networks (see for example Ref. [96]). A good account of different computational approaches in neuroscience and brain theory is given in Refs. [97,98]. Today, different models cover a range of functions and systems, from early sensory processing and spinal motor control to perception and associative memory, as well as many intermediate processing stages. Many of these models are based on recurrent, attractor neural networks, the most well known of which is the Hopfield net [99,100]. The function of such an associative, or content-addressable memory, is based on the ability to retrieve a pattern stored in memory, in response to the presentation of an incomplete or noisy version of that pattern. The Hopfield net is based on an analogy with the spin-glass model in statistical mechanics, with a large number of identical two state elements, corresponding to electronic spin up and spin down of atoms in a magnetic crystal. In the original Hopfield net, the basic processing unit used is the formal neuron of McCulloch and Pitts [101]. A Hamiltonian, or energy function, for the system gives a multidimensional "energy landscape", which determines the system dynamics. In the most simple case, this landscape is more or less fixed with "valleys" and "ridges" that are statically determined by the network connections, and where the valleys correspond to fixed point attractor memory states. As mentioned above, the dynamics of a biological neural network is not based on point attractor dynamics, but is rich and complex, and seems more associated with limit cycle and chaotic attractor dynamics. Perhaps a more realistic picture than that of a static landscape would be that of a roaring sea, which constantly is changing, and where the memory states rather would correspond to the rolling waves. Alternative models with a more realistic brain dynamics have been developed and investigated, for example for describing the olfactory system, as will be described below (see Ref. [84]). In contrast to the study of neural systems at a microscopic or mesoscopic level, the study of the "macroscopic" dynamics of a biological neural network, it may not be necessary to use very detailed network elements. Instead, more importance could be given to the network structure. For example, in many cases it may be sufficient to model the network elements as single compartment units, with a continuous inputoutput relation, where the output corresponds to the mean firing frequency of a large population of neurons. However, in some cases, such as when the temporal
108
P. ~lrhem and H. Liljenstr6m
relation in the neuronal firing is considered important, spiking network elements, perhaps also with several compartments, would be needed. A good overview of neural computation at the single neuron level is given in Ref. [102].
5.2.1. Simulation results: a model example We have used a cortical neural network model [55,72], resembling the three-layered structure of the olfactory cortex and the hippocampus, for the study of fluctuations at different neuronal levels. The model has one layer of excitatory units, corresponding to pyramidal cells, and two layers of inhibitory units, corresponding to two different kinds of interneurons, slow feedforward and fast feedback, respectively. Network units are mostly modeled with a non-spiking, continuous input-output relation, but many of the model properties and simulation results discussed below have also been reproduced with spiking network units of integrate-and-fire type. The basic model is described in more detail in Appendix B. Simulations with our three-layered cortical model display a range of dynamics found in the olfactory cortex and the hippocampus. In particular, oscillations at different frequencies and spatio-temporal activity waves, but also more complex behavior that can be controlled by various means. Here, we will primarily focus on simulations of noisy and nonregular, chaotic behavior, but we will start by giving a brief description of the oscillatory dynamics of the model. The model accurately reproduces response patterns associated with a continuous random input signal and with a shock pulse applied to the cortex [72]. In the latter case, waves of activity move across the model cortex consistent with corresponding global dynamic behavior of the functioning cortex. A strong pulse gives a biphasic response with a single fast wave moving across the surface, whereas a weak pulse results in an oscillatory response, showing up as a series of waves with diminishing amplitude. For a constant random input, the network is able to oscillate with two separate frequencies simultaneously, around 5 Hz (theta rhythm) and around 40 Hz (gamma rhythm), purely as a result of its intrinsic network properties. Under certain conditions, the system can also display chaotic-like behavior, similar to that seen in EEG traces [3,90] (see Fig. 5, and discussion below). In associative memory tasks, the network initially displays a chaotic-like dynamics, which can converge to a near limit cycle attractor, representing a stored memory (of an activity pattern). Such a case is shown in Fig. 6a. In Fig. 6b two attractors are shown, corresponding to two different memory states. All of these phenomena depend critically upon the network structure, in particular on the feedforward and feedback inhibitory loops, and the long-range excitatory connections, modeled with distance-dependent time delays. Details concerning neuron structure or spiking activity seem not necessary for this type of dynamic behavior. Instead, a balance between inhibition and excitation, in terms of connection strength and timing of events, is necessary for coherent frequency and phase of the oscillating neural units. (There are no inhibitory-inhibitory connections in this model, since there is no clear evidence for their existence in the real system. Yet, in a test simulation such disinhibitory effects resulted in decreased frequencies in the network oscillations. The oscillations ceased completely for large connection strengths between feedback inhibitory units.)
109
Fluctuations in neural systems
a
7.. 6\
........ .... . .
5-.
9 9 ,, 9 . . , . . . -
.
. ,.
..
:..
. .... ........
...
9
9 5, .. .. :
., . . : : , . . . . . . : ~ .
4-. 3.. 2.. 1.. 0.. -1~
9. . .
... ..........
..
. ..
. .. ..:...
.
6
..
5
~
,
.
~ ' ~
4
~ ~ ~ ~ 3
2 2
"~
j<~
1 ~ ,
,
,
0
-2 0
3
6 4
,
-4 ,
2-
1
0
-1
~-
-2
-3
-4 -5
i -4
t -3
i -2
i -1
i
i
i
t
0
1
2
3
4
ul(t)
Fig. 6. Network attractor dynamics, where a near limit cycle attractor corresponds to a memory state. (a) Transient chaos obtained for an intermediate neuronal excitability, with Q = 14.0 (See Eq. (B.2)). The activity of three excitatory units is plotted against each other for 1000 simulated milliseconds. After the initial chaotic phase (dotted line), the activity converges to a near limit cycle state. (Noise is added to the system). (b) Noise-induced state transition, from one (near) limit cycle attractor to another. The "thinner" limit cycle is obtained when the noise level is 0.0. When the noise level is increased to 0.03 the system switches to another ("thicker") limit cycle state. The activity of two excitatory units is plotted against time for 1000 ms, with the initial (transient) 100 ms removed. From Ref. [74].
110
P. ~lrhem and H. Liljenstrdrn
With a proper choice of parameter values the network can be viewed as two sets of coupled oscillators, each characterized by an intrinsic frequency. One set consists of the excitatory units connected to the "fast feedback" inhibitory units, producing high frequency oscillations. The other set is made up of the excitatory units connected to the "slow feedforward" inhibitory units, producing low frequency oscillations. If the time constants (and delays) in the "two sets of coupled oscillators" differ by a factor of five or more, the system can oscillate with two different frequencies simultaneously, corresponding to the experimentally determined rhythms of the real cortex. If the significant time constants instead are very close, the system locks into a single frequency oscillation. However, for values in between, the two "natural frequencies" of the system interfere with each other and can give rise to aperiodic, or chaotic, behavior. These results are in no way surprising; they are well in accordance with findings from electrical oscillatory circuits [103]. Oscillations and chaos are known to appear in systems with feedback loops and nonlinearities under certain conditions, especially when time delays are included. What we have shown is that the circuitry of the brain also can have these characteristics and that the complex dynamic behavior described in the literature does not require any detailed knowledge at the single cell level. Time delays given by the geometrical structure of the present model are necessary for the spatiotemporal p a t t e r n s - single or consecutive waves of activity across the cortical s u r f a c e - but oscillations and chaotic activity behavior can arise locally even without them. Randomness is introduced in artificial neural networks in various ways: in the connectivity (structure), in the initial states, or in the differential equations, governing the dynamics of the system. Structural randomness could correspond to genetic, developmental, or learning differences in the nervous system of individuals, whereas "activity fluctuations" would correspond to the neuronal and synaptic noise in any functional neural system. Although noise could be used to stabilize a system, fluctuations can also result in state transitions [74,104]. An increased noise level in all network units can result in a transition from a stationary to an oscillatory state, or from an oscillatory to a chaotic state, or alternatively, in a shift between two different oscillatory states (see Fig. 6b). Even if only a few network units are noisy (have a high intrinsic random activity), and the rest are quiescent, coherent oscillatory activity can be induced in the whole network under certain circumstances (see also Refs. [55,56,105]). The onset of
Opposite: Fig. 7. Synchronous oscillatory activity is induced by five noisy units in a network with 32 by 32 network units in each of the three layers of the hippocampal model, corresponding to a 10 by 10 mm square of the real cortex. The noise is turned on at t = 100 ms. (a) The onset of the network oscillations appears at approx, t = 500 ms when the noisy units are densely packed. (b) Simulation results when the five noisy units are spread out (same parameter values as in (a)). Here, the onset of network oscillations appears at approx. t = 800 ms. From Ref. [105].
Fluctuations in neural systems
ll1
a
200
ms
400
ms
600
ms
800
ms,
b
200
600
ms
800
ms
1000
Fig. 7.
ms
ms-
Caption opposite.
P. Arhem and H. Liljenstr6m
112
global oscillatory activity depends on, for example, the connectivity, the noise level, the number of noisy units, and the duration of the noise activity. (In some of our simulations, even a single network unit with Gaussian noise activity is sufficient to induce global oscillation). In Fig. 7 we demonstrate the phenomenon with five noisy units. The location and spatial distribution of these units in the network is important for the onset of oscillations. If the noisy units are separated beyond a certain distance, or if the noise level is too low, no oscillations occur. Likewise, no transition to global oscillations occurs if the noise "frequency" is too low (that is, if the number of random "peaks" within a certain time interval is too low). In Fig. 8, we show that global network activity can be induced also if the five noisy units are spontaneously active for only 200 ms (the noise is turned on at t = 100 ms and turned off at t = 300 ms). In this case, global chaotic-like activity begins after the noisy activity is turned off. This activity eventually converges to global oscillations, similar to those shown in Fig. 7. If, instead, the noisy activity of the five units lasts for 400 ms, or more, global oscillations are immediately induced.
1
r
!
'1
"'
!
!
'"!
"
!
rl
l
0
100
I
1
200
300
,
,
I
1
400
500 ms
,
I
1
I
600
700
800
.
I
900
1000
Fig. 8. Global activity can be induced in the network, even when the noisy activity of the five excitatory units of Fig. 7a only lasts for 200 ms. In this case, there is a long period of spatio-temporal chaos-like behavior, before convergence to synchronized global oscillations. The time evolution is given for three excitatory network units for 1000 ms. The middle trace shows the activity of one of the five noisy units, where the noise is turned on at t -- 100 ms, and turned off at t - 300 ms, and the two other traces are from initially "silent" units. Other parameters as in Fig. 7a.
113
Fluctuations in neural systems
Global oscillations can further be induced if inhibition of excitatory activity is reduced during short periods. Such disinhibition may be caused by the spontaneous activity of small inhibitory interneurons in the hippocampus. In these cases, an oscillatory behavior can be induced in the network for periods much longer than the "triggering" disinhibition. The oscillations die out as a result of neuronal adaptation, or as an effect of the noise being too strong at times, interfering too much with the oscillatory activity. In addition to the direct effects on the dynamics, neuronal noise can reduce recall time (convergence time) if noise amplitudes within certain ranges are applied during learning and recall. Consonant with stochastic resonance theory [106-108] we obtain an optimal value of the noise amplitude for which the recall time reaches a minimum [74,109]. Stated differently, the rate of information processing, which in this case is the rate of convergence to a (near) limit cycle memory state, can be maximized for optimal noise levels. This value seems to be independent on the number of stored patterns in the network. In Fig. 9 we show that a maximum in convergence ("information processing") rate is reached for a noise level that is optimal for a particular set of parameter values. To summarize, our three-layered cortical model is able to reproduce many of the characteristics of the real system dynamics. In particular, it can display oscillations at two different frequency domains, as well as chaotic-like behavior. Also spatio-
80-
~
e"
~
,,
75-
i
v..
i
i
(D
/ oO ~ o9
~70-L_
0
,o
c-
9149
~65~
tO
(O
o
o~176
9
,o
60-
~ o 9149
0.02
I
I
0.04
0.06
. . . .
I
[
I
0.08
0.10
0.12
Noise level [A]
Fig. 9. The graph shows the rate of convergence to a stored limit cycle memory state, when a distorted version of the pattern is presented to the network, plotted for various noise levels. A maximum rate is obtained for an optimal noise level. Details are given in Ref. [55].
114
P. ~lrhem and H. Liljenstr6m
temporal patterns associated with memory storage in the olfactory cortex have been demonstrated, and neuromodulatory and stochastic control of this dynamics has been shown to enhance memory performance [74,92]. For example, simulation results show that oscillations during the recall phase can be used for a fast convergence to a memory state. This would be especially useful for patterns that are difficult to distinguish, and where a longer "search process" is necessary. Oscillatory "resonance" could thus quickly recruit all network nodes belonging to any particular memory pattern. Simulations with nonsynaptic (gap junction) effects also show an increased synchronization of network units, resulting in a more accurate recall of stored memory patterns [110].
5.3. Functional significance What is the significance of the complex cortical neurodynamics described and simulated above? How is it related to computation and information processing in the brain, and in particular, to cognitive functions? As discussed above, the complex dynamics of the brain should reflect and adapt to the dynamics of the environment, but it could have many explanations. The simplest would be to assume it is only an epiphenomenon, a by-product of the network circuitry or neuronal properties, without any specific biological role. This possibility cannot be ruled out, but there are many indications that it may have a functional role for neural information processing. It is reasonable to believe that the complex dynamics of the brain, exemplified above by the olfactory cortex, is due to an evolutionary optimized strategy to deal with rapid changes in the environment. Biological systems have presumably evolved to become efficient with respect to time, energy or accuracy. A highly advanced system would be able to shift the strategy for different situations, within the limits set by evolution, and the complex dynamics could be an appropriate way to deal with this. This will be discussed briefly below.
5.3.1. Functional role of network oscillations The simplest and perhaps most direct role of the cortical oscillations could be to enhance weak signals and speed up information processing. However, also more elaborate roles have been attributed to cortical oscillations. One possible role for the higher, gamma frequency (around 40 Hz) oscillations in the olfactory system, could be a means to compare incoming stimulus with previously stored patterns (see e.g. Ref. [111]). The slow theta rhythm (around 5 Hz) could serve to hierarchically trace down the olfactory information in a feedback loop between the olfactory bulb and cortex [112]. Yet another possibility is that oscillatory bursts at this low frequency, approximately 5 Hz, would allow the system to relax to an attractor (limit cycle) within a 200 ms cycle, without disturbance of successive sensory information during the interburst intervals. Cortical oscillations may also have a more direct role in cognitive functions, including segmentation of sensory input, learning, perception, and attention [3,57,58,113,114]. For example, it has been shown that theta rhythm oscillations are
Fluctuations in neural systems
115
optimal for induction of long-term potentiation (LTP) [115], a type of synaptic modification believed to be related to leaning. The finding of strongly correlated stimulus-invoked oscillations at approximately 40 Hz in the cat visual cortex [113,114,116] has led to a theory for visual awareness (as part of consciousness) that could solve the so-called binding problem, as previously mentioned [57,58]. The binding problem relates to the fact that an object is perceived as a whole, in spite of its different aspects being represented by different sets of neurons. The idea is that separate cell assemblies, responding to the different aspects of an object could be "labeled" by frequency and/or phase to form the perception of one single object.
5.3.2. Functional role of network chaos The observed highly aperiodic and irregular behavior in the olfactory cortex, and elsewhere in the brain, could perhaps be classified as "chaotic", if that term is not used in a too narrow way. Adding the activity of many neurons, oscillating at different frequencies and phases, could easily yield the collective chaotic-like behavior apparent in the EEG. However, it may not be important if the dynamics found is "truly chaotic" in a mathematical sense, or not. The importance could rather be to have a disordered aperiodic state, with a great deal of "uncertainty", and which readily switches over to an ordered oscillatory or other state. Such a chaotic-like state would give the system flexibility and rapid response to a (small) change in the input. In this sense, "chaotic" states would probably be most useful in biological systems if they were transient, i.e. if they did not persist for any longer periods of time. One of the characteristics of chaotic dynamics is that the solutions tend to diverge asymptotically when starting from initial values that are arbitrarily close. Computation in neural systems would seem to demand a convergence. For instance, for associative memory, if the input is somewhat corrupted or noisy, the system should still converge to the same solution, a memory state somehow stored in the network connections. Further, most computations in neural systems should be done on a time scale on the order of 100 ms. If one has to wait for a second or longer to determine any characteristics of "true chaos", it is probably of no biological significance. Yet, there is some evidence that a chaotic-like dynamics can improve the system characteristics, for example by providing an initial state that is very sensitive to the (sensory) input. It has, for example, been proposed [3,90] that an animal presented a novel sensory (odor) input would initially display a (pseudo-)chaotic dynamics in its sensory systems, subsequently converging to a limit cycle dynamics, and resulting in an ordered state of behavior. Computer simulations with our model of the olfactory cortex seems to support this view, in that the system rapidly can converge to a (near) limit cycle memory state, from an initial chaotic-like state [109,117]. It also seems plausible that the chaos-like behavior of a neural system, such as the olfactory system or the hippocampus, could yield flexibility to the information processing, by allowing the system to rapidly shift from one complex activity pattern to another in response to a small external or internal input. It should be important to avoid getting stuck in any stable limit cycle (or other) attractor state, and a chaotic
116
P. flrhem and H. Liljenstr6m
dynamics could provide the necessary aperiodicity. At a higher level, it could be responsible for the brain's capacity to generate novel activity patterns, corresponding to its internal self-generated ("creative") thought processes [3]. Several other roles for chaos in neural systems have been suggested (see for example Refs. [91,118]).
5.3.3. Functional role of network noise As discussed in previous sections, noise appears at the subcellular (microscopic) and cellular (mesoscopic) levels, but it is uncertain to what degree this noise is affecting the network (macroscopic) level. For example, it is not clear how "pure" noise, that would not originate at a lower level, could be generated by network activity at a macroscopic level. However, irregular (high-dimensional) chaotic-like behavior, perhaps indistinguishable from noise, could be generated by the interplay of neural excitatory and inhibitory activity at the network level. This activity could of course also contain noise, but generated at the cellular or subcellular level, and only indirectly transferred to the macroscopic level. In contrast to a chaotic dynamics, where the dynamics can be controlled and easily shift into an oscillatory or other state, noise is not equally controllable. Nevertheless, a certain degree of randomness in the neural activity seems inevitable at every level of organization, and organisms have presumably evolved to cope with it. The question is if it is also of an advantage to the system, and in that case, what could this advantage be? The simplest answer would be that the noise or (high-dimensional) chaos could be used for maintaining a baseline activity, necessary for neural survival, and/or for a readiness to respond to input. However, the computer simulations we have described above show that noise could induce global synchronous oscillations and shift the system dynamics from one dynamical state to another. This in turn could change the efficiency in the information processing of the system. We also demonstrated that system performance could be maximized at an optimal noise level, analogous to the case of stochastic resonance. Thus, in addition to the (pseudo-)chaotic network dynamics, the noise produced by a few (or many) neurons, could be used for making the system flexible, increasing the responsiveness of the system and for avoiding the system to get stuck in any undesired oscillatory mode. 5.3.4. Sensitivity, flexibility and regulation Oscillatory or complex dynamics provide a means for fast response to an external input, such as a sensory signal. If sensitivity to small changes in the input is desired, a chaotic-like dynamics should be optimal, but a too high sensitivity should be avoided. Oscillations can also be used for enhancing weak signals, and by "resonance" large populations of neurons can be activated for any input. In addition, such "recruitment" of neurons in oscillatory activity can eliminate the negative effects of noise in the input, by canceling out the fluctuations of individual neurons. As discussed above, noise can, however, also have a positive effect, which we will return to shortly. Finally, from an energy point of view, oscillations in the neuronal activity should be much more efficient than if a static neuronal output (from large populations of neurons) was required. In engineering, great efforts are made to eliminate oscillations in the system, but if the system can perform as well (or better) with oscillations, energy can be saved.
Fluctuations in neural systems
117
It seems extremely difficult to determine whether the neurodynamics, as seen with for example EEG, is of a chaotic or noisy origin, but it can be concluded that a (low-dimensional) deterministic process would be more controllable than a pure noise process. If the dynamics to a large extent is determined by a deterministic mechanism, it could easily shift between different dynamical states, depending on one single control parameter. Such a control parameter could correspond to, say, the level of any neuromodulator, such as acetylcholine, or to the level of arousal of an animal [3,90]. Thus, if an irregular behavior would be advantageous to the system, it should be easier to generate and control such behavior via some sort of (low-dimensional) deterministic mechanism. If the underlying mechanism were purely noisy, a shift in any parameter would only result in a change in the noise level, not in the qualitative character of the dynamics. Of course, both types of processes, chaos and noise, could co-exist and together result in any desired irregular dynamics. If the dynamical behavior is important for the system performance, neuromodulatory regulation of gain and/or connection strengths should be an efficient way to control this dynamics. The level of arousal and neuronal excitability can be regulated by neuromodulators, such as acetylcholine and serotonin, but possibly also by the spontaneous activity of the neurons themselves. A complex system, like the brain, can be kept near a threshold to a highly active state by noise or "chaos", due to the more or less uncorrelated activity of its constituents (the neurons). Away from any threshold, noise can also have a stabilizing effect on the system. An "appropriate" activity, e.g. finding a particular memory state, could emerge as a collective behavior of the parts, if the system is pushed above this threshold by any correlated input, e.g. presented as a spatio-temporal pattern of activity. As demonstrated in our computer simulations, an oscillatory mode can be induced by increasing the neuronal excitability, but also by noise in a system with only excitatory network units modeled with neuronal adaptation. The results also show that recall time can be minimized for optimal noise amplitudes, and that transitions between different attractor states can be induced by noise.
6. Origin of fluctuations: philosophical implications Finally, we will briefly touch upon one of the central problems of the classical philosophical debate, the question of a deterministic or indeterministic worldview. This question relates naturally to the mind-brain problem, the problem of the relation between conscious processes and neural activity. Thus, it critically affects theories about the nature of brain dynamics. The main question seems to be whether a psycho-physical interactionism or parallelism best describes the facts available. We will here argue that an interactionistic theory is more reasonable than the alternative parallelistic one. This to us implies that a strictly indeterministic worldview is more compatible with a scientific worldview than a strictly deterministic one. A brief outline of the argument will be given below.
118
P. flrhem and H. Liljenstr6"m
It is essential for the debate to realize that cognition and consciousness are not equivalent. All cognitive processes are not conscious. On the contrary, it has been suggested that most cognitive processes are unconscious. Furthermore, it is apparent that conscious processes per se show different levels of complexity, ontogenetically as well as phylogenetically. There are reasons to believe that a fully developed human consciousness is more complex than that of any other species. For instance, it has not been shown that any other species displays more than rudimentary forms of self-consciousness. It can be argued that consciousness is central for higher cognitive functions [10,85]. In spite of the more or less gradual evolution of conscious processes, we think the emergence of conscious cognitive processes implied a major transition in the evolution of life. We think conscious cognition would imply (although not by logical necessity) a more goal-directed behavior and allow for prediction, expectation, wills, plans, goals, hopes, etc. beyond the immediate perception. We do not think that a strict definition of consciousness is necessary, or even desired, at this stage. Basically, we see consciousness as the state of being aware, of experiencing. But as also mentioned above, we do not see it as a single state; there are many levels of consciousness, displaying different characteristics. Edelman [119] separates between an evolutionary primitive form, primary consciousness, and a more advanced form, higher-order consciousness. A capacity for selective attention and simultaneous processing characterizes primary consciousness, while higher-order consciousness is accompanied by a sense of a person, of a self. We think there are strong arguments for the view that this higher-order consciousness is a specific human consciousness. The main reasons are based on the high degree of complexity and organization of the human brain, as well as on the complexity of human language and behavior [119-121]. Of all biological communication systems the human language is unique in its complexity. No other language has, for example, a developed descriptive function, and even less an argumentative function [122]. 6.1. A neurophysical approach
A natural way to approach the problem is to identify neural structures and processes underlying conscious cognition and experiences. However, although by such an approach we have tools to follow the evolution of cognition and consciousness, we do not have the solution to the question on how conscious cognition arises. We have not solved the classical mind-brain problem. A prerequisite for approaching an answer is, of course, first to understand what physical states are directly associated with conscious states. If a certain spiking pattern in certain neurons is critically associated with conscious events, we must ask what physical feature of the spike pattern is relevant: Is it the dynamic configuration of charged particles? Or is it associated with electromagnetic fields? Or is it still something else? Most neurophysiological studies of correlates to consciousness are somewhat vague on this issue. However, some main answers to the question can be distinguished in the literature. One line argues that the neural correlate should be found in the quantum mechanical events at a subcellular level [69,123,124]. Another line
Fluctuations in neural systems
119
argues that it is the electromagnetic fields, resulting from the electrical activity of the neurons that show the closest correlation to mental processes [8,62,125,126]. That should imply that the electrical activity measured with e.g. EEG or with electrodes measuring the local field potentials in the extracellular fluid, actually would reflect some of the information-processing going on in the brain. If so, the 40 Hz oscillations found to be correlated with visual awareness discussed above [58,59,127] could indeed be a close correlate to mental activity, and not merely a sign of synchronized cellular activity, as many researchers seem to suggest.
6.2. An argument for interactionism As mentioned above, the basis for the present discussion is the assumption that the problem of consciousness is a central issue for the discussion of cognition (see Ref. [10]). The modern discussion of consciousness started with Descartes' interactionist hypothesis; mental events and brain events were seen as separate and interacting substances, the separation being based on the notion of extension (see Ref. [128]). A well-known problem with this solution is to understand how extended substance can interact with unextended substance within the framework of Descartes' view of causality, i.e., action by contact. Discussions of this problem (the mechanistic argument [8]) led to a number of alternative solutions that can all be classified as forms of psycho-physical parallelism. Versions of this solution came to dominate the philosophical discussion, and do so even today in the form of the identity theory (see Ref. [129]). Most neurobiology-oriented discussions of the problem also seem to be based on some form of identity theory [130,131]. In this theory, mental states or events are assumed to be, in some respect, identical to certain physical states or events, i.e., certain brain states or events. It is clear that the identity theory says rather little, if relevant physical states are not further specified. This is usually not the case. The identity theory, as well as parallelistic theories in general, have further a major weakness. The theory is not easily reconcilable with the theory of evolution. The reason is the following: According to the theory of evolution, outstanding features of organisms have evolved because they have a survival value for the organism. They causally affect the physical states of the organism. Consciousness is an outstanding feature of man, and presumably of other species as well, and must thus be causally effective and interact with physical events of the organism. This argument was probably first used by William James [132,133], and has been further developed by Beloff [134], Popper [128] and Hodgson [135] (for an analysis of the argument, see Ref. [136]). We think this is a strong, although not conclusive, argument for an interactionist view upon the mind-brain problem. Indeed, this interaction may be a driving force in the evolution, giving a bias to the probabilistic laws at work. In the form of conscious cognition, it may provide a basis for the execution of some kind of a will, where an appropriate choice/selection is made out of a set of possible actions. Other recently proposed interesting arguments against parallelist and identity mind-brain theories concern the nature of human thinking. They all criticize a basic thesis of the identity theory, namely that conscious processes are assumed to be fully
120
P. flrhem and H. Liljenstr6m
described by algorithms. The rather specific case of mathematical understanding has been discussed by Penrose [123,124], who concludes that mathematical understanding involves noncomputable, non-algorithmic, components. A more general approach has been taken by Hodgson [135], analyzing formal and plausible reasoning. His conclusion is stated in terms of a criticism of mechanism, the view that the world is completely describable in terms of physical quantitative laws (i.e., algorithms). The conclusion is thus that conscious cognition cannot, in principle, be fully described by algorithms, as argued by some proponents for the identity theory and parallelism [137]. These considerations suggest some form of an interactionist solution to the consciousness problem. As mentioned, however, parallelist and identity solutions still dominate. We have already mentioned the mechanistic argument against Descartes' version of interactionism [8] - that something immaterial cannot influence something material. This argument is based on a Cartesian concept of matter, as something extended and impenetrable, and a Cartesian notion of causality, confined to action by contact. However, the mechanistic argument should have lost its power already with the development of the Newtonian concept of action at a distance, and with the modern concept of force, introduced by Faraday and Maxwell (see Refs. [8,125,128]). Nevertheless, it is still used; even modern arguments are to some extent based on the Cartesian notions of matter and causality (see Ref. [8]). Perhaps the time is now ripe to replace this old dichotomy of mind/matter with the interaction between non-computable/computable processes (see Ref. [138]).
6.3. Concluding remark To sum up, the presented argument, based on an evolutionary perspective, leads us to suggest an interactionistic solution to the mind-brain problem. Further, we suggest a shift in the discussion of this problem; it may be more fertile to investigate the relation between computational and noncomputational (algorithmic/non-algorithmic) processes than between material and immaterial events. Such a view is supported by the analysis of mathematical understanding and plausible reasoning. Conscious processes seem not identical with any known physical process. It may simplest be regarded as an emergent phenomenon. This has consequences for theorizing on the origin of fluctuations. Is the universe strictly deterministic or is it indeterministic? We think it is difficult to avoid the tentative conclusion of an indeterministic worldview in the light of an interactionistic mind-brain theory. For the present purpose to understand the role and nature of fluctuations in the neural systems, however, this may be of little importance. 7. Conclusion
The central question in the present chapter is whether fluctuations in the nervous systems should be regarded as a nuisance under all circumstances. As hopefully pointed out, evidence show that neural fluctuations may well be beneficial for the organism, giving it an evolutionary advantage in some respect.
Fluctuations in neural systems
121
It may not be important whether the neuronal fluctuations, the irregular activity of the nervous system, has a chaotic or a noise basis, i.e. whether it is governed by some deterministic process, or if it is governed by uncorrelated processes. Both types might even co-exist (but perhaps be dominant under different conditions). What might be more important is that aperiodic, uncorrelated processes actually do exist, with little or no structure, keeping the system going even in the absence of external stimuli. Indeed, it could be advantageous to have such irregular activity as a "resting activity", in order to avoid unwanted regular activity patterns to interfere with any signal entering the system. Such activity could also be responsible for much of the flexibility of the nervous system, and for the transition between different dynamical states. Noise and chaos may be energy consuming, but could still be selected because of their advantageous effects on the system performance. It seems important to maintain a high level of activity in the brain, also in times of no or little external stimuli. During such periods, the internal activity, maintained by the spontaneous activity of the neurons and by the recurrent, oscillatory and (pseudo-)chaotic activity in the feedback loops of the neuronal circuits, could keep the cells close to threshold and ready to receive external and internal signals. The activity of the nervous system may always be high, even at rest, but the net effect could be small, because excitation is outbalanced by inhibition. If this is true, the picture emerges of a rather energy consuming system that is designed more for rapid and accurate response than for saving energy. It would be "easier" in this way to amplify a signal, or suddenly increase the neural activity by reducing the inhibitory effects. This could be done, either by a direct disinhibition of inhibitory neurons, or by increasing (through some mechanism) the probability of synaptic transmission. It may be that energy consumption must not be kept low at all costs. Nature may have other preferences. Apparently, a high information processing rate, for nervous systems as well as for the making of proteins and nucleic acids, should be crucial for the organisms in their struggle for life. In the long run, winning this race may in effect ensure also a sustained energy consumption for the organism and its offspring.
Acknowledgements This work was made possible through grants from the Swedish Research Council for Engineering Sciences (TFR) and the Swedish Medical Research Council (MFR), as well as from the Agora Consortium (The Bank of Sweden Tercentenary Fund, the Swedish Council for Planning and Coordination of Research, the Swedish Council for Research in the Humanities and Social Sciences, the Swedish Foundation for International Cooperation in Research and Higher Education, and the Swedish Natural Science Research Council).
Appendix A: Mathematics of channel kinetics The mathematical models of the K and Na channels developed by Hodgkin and Huxley [27] are sets of equations describing the relation between voltage and K and
122
P. flrhem and H. Liljenstr6m
Na currents through the membrane. A basic idea behind the theory is that the opening and closing of the pathways for ions (diffusion through pores in channels was not established at the time) depend critically on membrane bound particles moving in the electric field over the membrane. For the K channel, the probability of a single particle to move into its critical site to open the pathway was denoted n and its kinetics was described by dH
dt=
%(1 - n) - ~,n,
(A.1)
where an and 13n denote voltage-dependent rate constants. The opening of the channel was assumed to require four independent particles moving to critical sites. Thus the probability that the channel is open is n 4. The corresponding conductance gK is then given by gK -- .qKn4,
(A.2)
where OK is the maximum conductance, i.e. gK = ,OK when the probability n = 1. For the Na channel three independent particles were assumed, the probability of each particle to move to the critical site being m. In addition an inactivating particle was assumed, the probability to move to its site being h. The activation and inactivation kinetics were assumed to be described by the following equations dm
d--)-= 0tin(1 - m) - ~m m,
(A.3)
dh = 0th(1 -- h) - ~hh, dt
(A.4)
where at,,,, O~h, [~,,, and [3h denote voltage-dependent rate constants. (ah and 13h correspond to y and ~i in Scheme 2). The conductance gNa is given by gNa = ,qNam3h 9
(A.5)
The voltage dependence of the rate constants in all the cases above were described by empirical equations of the type A(E-B)
~" = 1 - exp((B - E ) / C ) '
(A.6)
A(B-E)
1 - exp((E - B ) / C ) ' where E is voltage and A , B and C constants experimentally determined. As described in Section 3, the Hodgkin-Huxley models above have been extensively modified to account for new experimental data. The transitions in these
123
Fluctuations in neural systems
modified versions are often assumed to be one-barrier transitions and according to the Eyring rate theory [139] the rate constants are consequently described as kf
-
-
~ z~F (E keq exp l, R T
kb -- keq
Eeq)}
(A.7)
exp{z(1Rr-[3)F (Eeq - E)},
(A.8)
where kr and kb denote forward and backward rate constants, respectively; keq the rate constant at equilibrium (i.e. when kr = kb); Eeq the membrane potential at equilibrium; ]3 a symmetry factor, signifying the location of the barrier peak in terms of potential drop, F, R, T have their usual significances. The equations above can be used to simulate an AP. Assuming that the total membrane current is +/K + IL
I=Ic+INa
where Ic is a capacitive current and IL a leak current, we obtain dE
I -- C ~
--k gNa ( E - ENa) "Jr-gK ( E - E K ) nt- gL ( E - E L ) ,
(A.9)
which can be solved numerically. Fig. 10 shows such a solution giving an AP, computed by Frankenhaeuser and Huxley [79] for a myelinated axon.
I 100
> E
~"
50
0
Fig. 10.
0
-
'
l I
,
Time (msec)
Computed action potential of myelinated axon. Equations and parameter values in Ref. [79].
124
P. Arhem and H. Liljenstr6m
Appendix B: A cortical neural network model
The neural network model used for the studies discussed in Sections 4 and 5 is of intermediate complexity with simple network units and realistic connections. The network units correspond to populations of neurons with a continuous input-output relation, describing pulse density characteristics. The gross connectivity is in accordance with known facts about the architecture of the olfactory cortex, mainly based on the circuitry as determined by Haberly [140]. This implies a three-layered structure with two layers of inhibitory units and one layer of excitatory units (see Fig. 11). The top layer consists of inhibitory "feedforward interneurons", which receive inputs from an external source ("olfactory bulb") and from the excitatory "pyramidal cells" in the middle layer. They project only locally to the excitatory units. The bottom layer consists of inhibitory "feedback interneurons", receiving inputs only from the excitatory units and projecting back to those. The two sets of inhibitory units are characterized by two different time constants and somewhat different connections to the excitatory units. In addition to the feedback from inhibitory units, the excitatory units receive extensive inputs from each other and from the "olfactory bulb". All connections are modeled with time delays for signal propagation, corresponding to the geometry and fiber characteristics of the real cortex. The time evolution for a network of N neural units is given by a set of coupled nonlinear first-order differential equations for all the N internal states (u). (The equations used here are similar in structure to those used for regular Hopfield nets [99,100], but differ primarily in how the network units are connected and in the transfer function.) With external input, I ( t ) , characteristic time constant, % and
Fig. 11. The three-layered cortical neural network model used. The top layer corresponds to feedforward inhibitory interneurons, the middle layer to excitatory pyramidal cells, and the bottom layer to feedback inhibitory interneurons. Spontaneously active, noisy units are indicated.
Fluctuations in neural systems
125
connection weight wij between units i and j, separated with a time delay 8ij, we have for each unit activity, Ui(f), at time t, dui
. . . . dt
ui
N
(B.1)
+ ~ wijgj[uj(t - ~ij)] + Ii(t) -Jr-r "ci j#i
The input-output function, gi(ui), is a continuous sigmoid function, experimentally determined by Freeman [141] and with a single gain parameter, Q, determining slope, threshold and amplitude of the curve.
gi--C.Qi{1-exp
I - e x p ( u i ) - 11}
(B.2)
Fig. 12 shows the sigmoid curves of Eq. (2) for three different Q values. For simulations with general neuromodulation, we let all excitatory units be determined by the same constant Q value, Qi = Qe• and all inhibitory units by a constant Qin. In a more detailed description of specific cholinergic effects the gain parameter Qi of each unit i is made dependent upon the previous unit activity. Then, it is not the gain per se that is changed, but instead an increased excitability is implemented as a suppression of neuronal adaptation. This is described more thoroughly in Ref. [92]. External input, Ii(t), is given with distance-dependent time delays to each one of the excitatory units as well as to the feedforward inhibitory units, simulating the afferent input to the cortex from the olfactory bulb via LOT, the lateral olfactory 1.5
Q=IO
Q=5 0.5
'
05
1'
1.5 '
2'
2 i5
3'
3 15
4'
4.5
Fig. 12. The input-output function for network units (See Eq. (B.2)) for three different values of the gain parameter, Q.
P. /lrhem and H. Liljenstr6m
126
tract. The input is in most cases a r a n d o m pattern that may be constant or varying in time. When simulating shock pulses to the L O T all input "fibers" will be activated with high amplitude for two simulated milliseconds. Noise is introduced to the system as a Gaussian function, coi(t), such that ( ~ i ( t ) ) = 0 and ( ~ i ( t ) ~ i ( S ) ) - - 2 A S ( t - s), which is added to the differential equation of each network node activity. In some cases we introduce temporal synaptic noise in the connections, q(t) = q0 + c0(t), which enters as a multiplicative factor in the summation of all synaptic currents. Noise effects are studied by continuously increasing the level A of the additive or multiplicative noise. In some of the simulations, the noise level is changed equally for all network units, whereas in other simulations, the change is only taking place in a few excitatory or inhibitory units. To allow for learning and associative m e m o r y the connection weights wiy are incrementally changed, according to a learning rule of Hebbian type, suitable for the dynamics of this particular system. It takes into account that there is a conduction delay, 8ij, between the output (presynaptic) activity of one network unit and its (postsynaptic) effect on the receiving unit. The change in connection strength is also dependent on the absolute value of that connection weight, so that this value cannot exceed some m a x i m u m weight strength, Wmax. With learning rate q the change at time t in connection weight between unit j and i is given by
Awij = 1]. gi[ui(t)]gj[llj(t- ~ij)](Wmax -- Wij ).
(B.3)
In the simulations, we typically use 32 by 32 excitatory units and to 32 by 32 units each of the inhibitory types (i.e. totally 3072 units, but simulations with up to 96 by 96 units in each layer have been run, without any qualitative difference observed). The conduction delays are set so that the network corresponds to a 10 m m square of the cortex.
References 1. Arhem, P., Blomberg, C. and Liljenstr6m, H. eds (2000) Disorder vs. Order in Brain FunctionEssays in Theoretical Neurobiology. World Scientific, London. 2. Schr6dinger, E. (1944) What is Life? Cambridge University Press, Cambridge. 3. Freeman, W.J. (1991) Sci. Am. 264, 78-85. 4. Lynch, J. and Barry, P. (1989) Biophys. J. 55, 755-768. 5. Johansson, S. and ,~rhem, P. (1994) Proc. Natl. Acad. Sci. USA 91, 1761-1765. 6. Ochoa, J. and Torebj6rk, E. (1983) J. Physiol. 342, 633-654. 7. Thompson, I. (1994) Curr. Biol. 4, 458-461. 8. Lindahl, B.I.B. and ,~rhem, P. (1994). J. Theor. Biol. 171, 111-122. 9. Arhem, P. (1996) BioSystems 38, 191-198. 10. Arhem, P. and Liljenstr6m, H. (1997) J. Theor. Biol. 187, 601-612. 11. Moss, F. and McClintock, P.V.E. eds (1989) Noise in Nonlinear Dynamical Systems, 3 Vols. Cambridge University Press, Cambridge. 12. Wiesenfeld, K. and Moss, F. (1995) Nature 373, 33-36. 13. Dykman, M.I. and McClintock, P.V.E. (1998) Nature 391, 344. 14. DeFelice, L.L. (1981) Introduction to Membrane Noise. Plenum Press, New York. 15. Popper, K.R. (1982) The Open Universe. Rowman and Littlefield, Totowa, NJ. 16. Earman, J. (1986) A Primer on Determinism. D. Reidel, Dordrecht.
Fluctuations in neural systems
127
17. Laplace, P.S. (1820) in: Th6orie analytique des probabilit6s. V. Courcier, Paris. 18. Nagel, E. (1961) in: The structure of science. Harcourt, Brace and World, New York. 19. van Kampen, N.G. (1981) Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam. 20. Arhem, P. and Lindahl, B.I.B. (1997) in: Matter matters? On the Material Basis of the Cognitive Activity of Mind, eds P. ~,rhem, H. Liljenstr6m and U. Svedin. Springer, Berlin. 21. Hodgkin, A.L. (1964) The Conduction of the Nervous Impulse. Liverpool University Press, Liverpool. 22. Dodge, F. and Frankenhaeuser, B. (1958) J. Physiol. 143, 76-90. 23. Hille, B. (1992) Ionic Channels of Excitable Membranes. Sinauer, Sunderland, MA. 24. Briicke, E. (1843) Ann. Phys. Chem. 58, 77-94. 25. Neher, E. and Sakmann, B. (1976) Nature 260, 799-802. 26. Colquhoun, D. and Hawkes, A.G. (1995) in: Single Channel Recording, eds B. Sakmann and E. Neher, pp. 397-482. Plenum Press, New York. 27. Hodgkin, A.L. and Huxley, A.F. (1952) J. Physiol. 117, 500-544. 28. Armstrong, C. (1992) Physiol. Rev. 72, $5-S13. 29. Noda, M., Shimizu, S., Tanabe, T., Takai, T., Kayano, T., Ikeda, T., Takahashi, H., Nakayama, H., Kanaoka, Y., Minamino, N., Kangawa, K., Matsuo, H., Raftery, M.A., Hirose, T., Inayama, S., Hayashida, H., Miyata, T. and Numa, S. (1984) Nature 312, 121-127. 30. Tanabe, T., Takeshima, H., Mikami, A., Flockerzi, V., Takahashi, H., Kangawa, K., Kojima, M., Matsuo, H., Hirose, T. and Numa, S. (1987) Nature 328, 313-318. 31. Tempel, B., Papazian, D., Schwartz, T., Jan, Y. and Jan, L. (1987) Science 237, 770-775. 32. Armstrong, C.M. and Bezanilla, F. (1973) Nature 242, 459-461. 33. Keynes, R. and Rojas, E. (1974) J. Physiol. 239, 393-434. 34. Catterall, W.A. (1986) Trends Neurosci. 9, 7-10. 35. Keynes, R.D. and Elinder, F. (1999) Proc. R. Soc. Lond. 266, 843-852. 36. Armstrong, C.M. (1971) J. Gen. Physiol. 58, 413-437. 37. Doyle, D.A., Cabral, J.M., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T. and MacKinnon, R. (1998) Science 280, 69-77. 38. Perozo, E., Cortes, D. and Cuello, L. (1998) Nature Struct. Biol. 5, 459-469. 39. Armstrong, C.M. and Bezanilla, F. (1977) J. Gen. Physiol. 70, 567-590. 40. Aldrich, R.W., Corey, D.P. and Stevens, C.F. (1983) Nature 306, 436-441. 41. MacKinnon, R., Aldrich, R.W. and Lee, A.W. (1993) Science 262, 757-759. 42. Frankenhaeuser, B., and Hod gkin, A.L. (1957) J. Physiol. (Lond.) 137, 218-244. 43. Elinder, F., Madeja, M. and Arhem, P. (1996) J. Gen. Physiol. 108, 325-332. 44. Elinder, F. and ,~rhem, P. (1999) Biophys. J. 77, 1358-1362. 45. Mayr, E. (1989) Toward a New Philosophy of Biology. The Belknap Press, Cambridge, MA. 46. Pei, X. and Moss, F. (1996) Nature 379, 618-621. 47. Pei, X. and Moss, F. (1996) Int. J. Neural Systems 7, 429-435. 48. FitzHugh, R. (1965) J. Cell. Comp. Physiol. 66, 111-117. 49. Zagotta, W., Hoshi, T., and Aldrich, R.W. (1994) J. Gen. Physiol. 103, 321-362. 50. Patlak, J.H. (1991) Physiol. Rev. 71, 1047-1080. 51. Dempster, J. (1993) Computer Analysis of Electrophysiological Signals. Academic Press, London. 52. Strassberg, A. and DeFelice, L. (1993) Neural Comput. 5, 843-855. 53. Elinder, F., Frankenhaeuser, B. and Arhem, P. Manuscript. 54. Liebovitch and Todorov (1996) Int. J. Neural Systems 7, 321-331. 55. Liljenstr6m, H. (1996) Int. J. Neural Systems 7, 497-505. 56..~rhem, P. and Johansson, S. (1996) Int. J. Neural Systems 7, 369-376. 57. Crick, F. and Koch, C. (1990) Semin. Neurosci. 2, 263-275. 58. Koch, C. and Crick, F. (1994) in: Large Scale Neuronal Theories of the Brain, eds C. Koch and J.L. Davis, pp. 93-110. The MIT Press, Cambridge, MA. 59. Hopfield, J.J. (1995) Nature 376, 33-36. 60. Softky, W.R. and Koch, C. (1993) J. Neurosci. 13, 334-350.
128
P. Arhem and H. LiljenstrO'm
61. Moss, F. and Braun, H. (2000) in: Disorder vs. Order in Brain Function- Essays in Theoretical Neurobiology, eds P. Arhem, C. Blomberg and H. Liljenstr6m, pp. 117-134. World Scientific, London. 62. Freeman, W. (1996) Int. J. Neural Systems 7, 473--480. 63. Berridge, M.J. and Rapp, P.E. (1979) J. Expl. Biol. 81, 217-279. 64. Tuckwell, H.C. (1989) Stochastic Processes in the Neurosciences. Society for Industrial and Applied Mathematics, Philadelphia, PA. 65. Douglas, R.J., Koch, C., Mahowald, M., Martin, K.A.C. and Suarez, H.H. (1996) Science 269, 981-985. 66. Stratford, K.J., Tarczy-Hornoch, K., Martin, K.A., Bannister, N.J. and Jack, J.J.B. (1996) Nature 382, 258-261. 67. Hamill, D.P., Marty, A., Neher, E., Sakmann, B. and Sigworth, F.J. (1981) Pfliigers Arch. 391, 85-100. 68. Liebovitch, L.S. and Toth, T.I. (1991) J. Theor. Biol. 148, 243-267. 69. Beck, F. and Eccles, J.C. (1992) Proc. Natl. Acad. Sci. USA 89, 11357-11361. 70. Beck, F. (1996) Int. J. Neural Systems 7, 343-353. 71. ~,rhem, P. and Lindahl, B.I.B. (1966) in: Matter matters? On the Material Basis of the Cognitive Activity of Mind, eds P. Arhem, H. Liljenstr6m and U. Svedin, pp. 235-253. Springer, Berlin. 72. Liljenstr6m, H. (1991) Int. J. Neural Systems, 2, 1-15. 73. Cobb, S.R., Buhl, E.H., Halasy, K.O., Paulsen, O. and Somogyi, P. (1995) Nature 378, 75-78. 74. Liljenstr6m, H. and Wu, X. (1995) Int. J. Neur. Syst. 6, 19-29. 75. Stevens, C.F. and Wang, Y. (1994) Nature 371, 704-707. 76. Bullock, T.H. (1981) in: Neurons Without Impulses, eds A. Roberts and B.M.H. Bush, pp. 269-284. Cambridge University Press, Cambridge. 77. Johansson, S., Sundgren, A. and Klimenko, V. (1995) Brain Res. 700, 240-244. 78. Johansson, S. and Arhem, P. (1992) J. Physiol. 445, 157-167. 79. Frankenhaeuser, B. and Huxley, A. (1964) J. Physiol. 171, 302-315. 80. Johansson, S. (1995) J. Theor. Biol. 164, 515-529. 81. de Ruyter van Steveninck, R.R. and Laughlin, S.B. (1996) Nature 379, 642-645. 82. de Ruyter van Steveninck, R.R. and Laughlin, S.B. (1996) Int. J. Neural Systems 7, 437-444. 83. Zettler, F. and J~irvilehto, M. (1971) Zeitschrift ffir vergleichende Physiologie. 75, 402-421. 84. Arbib, M.A., Erdi, P. and Szentagothai, J. (1997) Neural Organization- Structure, Function and Dynamics. MIT Press, Cambridge, MA. 85. Liljenstr6m, H. (1997) in: Matter matters? - On the Material Basis of the Cognitive Activity of Mind, eds P./krhem, H. Liljenstrfm and U. Svedin, pp.177-213. Springer, Berlin. 86. Freeman, W.J. (1975) Mass Action in the Nervous System. Academic Press, New York. 87. Freeman, W.J. (1978) Elect. Clin. Neurophys. 44, 586-605. 88. Bressler, S.L. and Freeman, W.J. (1980) Elect. Clin. Neurophysiol. 50, 19-24. 89. Freeman, W.J. and Skarda, C.A. (1985) Brain Res. Rev. 10, 47-175. 90. Skarda, C.A. and Freeman, W.J. (1987) Brain Behav. Sci. 10, 161-195. 91. Babloyantz, A. and Lourenco, C. (1996) Int. J. Neural Systems 7, 461-471. 92. Liljenstr6m, H. and Hasselmo, M.E. (1995) J. Neurophysiol. 74, 288-297. 93. Lourenco, C. and Babloyantz, A. (1996) Int. J. Neural Systems 7, 507-517. 94. Jefferys, J.G.R. (1995) Physiol. Rev. 75, 689-723. 95. Liljenstr6m, H. and Aronsson, P. (1999) in: Proceedings of the 9th Workshop on Virtual Intelligence- Dynamic Neural Networks, eds T. Lindblad, M.L. Padgett and J. Kinser, SPIE Vol. 3728, pp. 46-66. 96. Haykin, S. (1999) Neural Networks- A Comprehensive Foundation. Macmillan, New York. 97. Koch, C. and Segev, I. (1989) Methods of Neuronal Modeling: From Synapses to Networks. MIT Press, Cambridge, MA. 98. Arbib, M.A. ed (1995) The Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge, MA. 99. Hopfield, J.J. (1982) Proc. Natl. Acad. Sci. USA 79, 2554-2558.
Fluctuations in neural systems
129
100. Hopfield, J.J. (1984) Proc. Natl. Acad. Sci. USA 81, 3088-3092. 101. McCuUoch W.S. and Pitts, W. (1943) Bull. Math. Biophys. 5, 115-133. 102. Koch, C. (1999) Biophysics of Computation - Information Processing in Single Neurons. Oxford University Press, Oxford. 103. Kennedy, M.P., Kriegh, K.R. and Chua, L.O. (1989) IEEE Trans. Circuits and Systems 36, 11331139. 104. Kelso, S. (2000) in: Disorder versus Order in Brain Function, eds P. ,~rhem, C. Blomberg and H. Liljenstr6m, pp. 185-204. World Scientific, London. 105. Liljenstr6m, H. and ,~rhem, P. (1997) in: Computational Neuroscience, ed J.M. Bower, pp. 711-716. Plenum Press, New York. 106. Bulsara, A., Jacobs, E.W., Zhou, T., Moss, F. and Kiss, L. (1991) J. Theor. Biol. 152, 531-555. 107. Mandell, A.J. and Selz, K.A. (1993) J. Stat. Phys. 70, 355-373. 108. Anishchenko, V.S., Neiman, A.B. and Safanova, M.A. (1993) J. Stat. Phys. 70, 183-196. 109. Liljenstr6m, H. (1995) Int. J. Intelligent Systems 10, 119-153. 110. Aronsson, P. and Liljenstr6m, H. (2000) Biosystems (in press). 111. Wilson, M.A. and Bower, J.M. (1990) J. Neurophys. 67, 981-995. 112. Ambros-Ingerson, J., Granger, R. and Lynch, G. (1990) Science 247, 1344-1348. 113. Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M. and Reitboeck, H.J. (1988) Biol. Cybern. 60, 121-130. 114. Gray, C.M., K6nig, P., Engel, A.K. and Singer, W. (1989) Nature 338, 334-337. 115. Larson, J. and Lynch, G. (1986) Science 232, 985-988. 116. Gray, C.M. (1994) J. Computat. Neurosci. 1, 11-38. 117. Wu, X. and Liljenstr6m, H. (1994) Network: Comput. Neural Systems 5, 47-60. 118. Tsuda, I. (1991) World Futures 32, 167-184. 119. Edelrnan, G. (1992) Bright Air, Brilliant Fire. Allen Lane, The Penguin Press, London. 120. Thorpe, W.H. (1978) Purpose in a World of Chance. Oxford University Press, Oxford. 121. Popper, K.R. (1994) Knowledge and the Body-Mind Problem. Routledge, London. 122. Popper, K.R. (1976) Unended Quest. An Intellectual Autobiography. Fontana/Collins, London. 123. Penrose, R. (1989) The Emperor's New Mind. Oxford University Press, Oxford. 124. Penrose, R. (1994) Shadows of the Mind. Oxford University Press, Oxford. 125. Popper, K.R., Lindahl, B.I.B. and Arhem, P. (1993) Theor. Med. 14, 167-180. 126. Libet, B. (1996) J. Theor. Biol. 178, 223-224. 127. Singer, W. (1994) in: Large Scale Neuronal Theories of the Brain, eds C. Koch and J.L. Davis, pp. 93-110. The MIT Press, Cambridge, MA. 128. Popper, K.R. and Eccles, J. (1977) The Self and its Brain. Springer, Berlin. 129. Churchland, P.M. (1988) Matter and Consciousness: A Contemporary Introduction to the Philosophy of Mind, Revised edition. The MIT Press, Cambridge, MA. 130. Changeux, J.-P. (1983) L'Homme Neuronal. Fayard, Paris. 131. Crick, F. (1994) The Astonishing Hypothesis- The Scientific Search for the Soul. Charles Scribner's Sons, New York. 132. James, W. (1879) Mind 4, 1-22. 133. Richards, R.J. (1987) Darwin and the Emergence of Evolutionary Theories of Mind and Behavior. The University of Chicago Press, Chicago. 134. Beloff, J. (1962) The Existence of Mind. MacGibbon and Kee, London. 135. Hodgson, D. (1991) The Mind Matters. Clarendon Press, Oxford. 136. LindahI, B.I.B. (1997) J. Theor. Biol. 187, 613-629. 137. Dennett, D. (1991) Consciousness Explained. Little, Brown and Company, Boston. 138. Arhem, P., Liljenstr6m, H. and Svedin, U. eds (1996) Matter matters? On the Material Basis of the Cognitive Activity of Mind. Springer, Berlin. 139. Glasstone, S.K., Laidler, K.J. and Eyring, H. (1941) Theory of Rate Processes. McGraw-Hill, New York. 140. Haberly, L.B. (1985) Chem. Sens. 10, 219-238. 141. Freeman, W.J. (1979) Biol. Cybern. 33, 237-247.
This Page Intentionally Left Blank
CHAPTER 4
Detecting Unstable Periodic Orbits in Biological Systems K. DOLAN, M.L. SPANO* and F. MOSS Center for Neurodynamics, University of Missouri at St. Louis, 8001 Natural Bridge Rd., St. Louis, MO 63121, USA *NSWC, Carderock Laboratory, 9500 MacArthur Blvd., Code 645, W. Bethesda, MD 20817, USA
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
131
Contents
1.
Introduction
2.
T e c h n i q u e s for finding periodic orbits in d a t a . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.
.................................................
2.1.
Direct o b s e r v a t i o n
2.2.
T h e t o p o l o g i c a l recurrence m e t h o d
133 135
..........................................
135
................................
136
2.3.
Periodic orbit t r a n s f o r m m e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
140
2.4.
C h o o s i n g the p r o p e r s u r r o g a t e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143
A p p l i c a t i o n s to biology
..........................................
3.1. T h e crayfish c a u d a l p h o t o r e c e p t o r , a n d the t o p o l o g i c a l r e c u r r e n c e m e t h o d
146 ........
146
3.2.
U P O ' s in n e u r a l tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
3.3.
D e t e r m i n i s m in c a n i n e v e n t r i c u l a r fibrillation . . . . . . . . . . . . . . . . . . . . . . . . . .
150
3.4.
D e t e r m i n i s m in h u m a n atrial fibrillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
150
Discussion
..................................................
Acknowledgements References
.............................................
.....................................................
132
152 152 152
I. Introduction
With the rise to prominence of nonlinear dynamics over the past few decades, chaotic systems have become of great interest to science and engineering. The defining quality of these systems, extreme sensitivity to small perturbations [1], implies that there must be strong instabilities present in these systems that cause this sensitivity. These instabilities manifest as unstable periodic orbits (UPO's). In both driven and autonomous chaotic systems, the motion (or orbit) of the system in phase space can at times be almost periodic. In driven systems, where the drive period is denoted as T, the motion will occasionally approach various periodic motions with period T, 2T, 3T, etc. In autonomous systems, it is also possible for multiple periodicities to appear, despite the lack of a reference drive period. But in either case the approach to one of these periodicities is soon followed by a retreat from it and a subsequent approach to another periodic motion. This continues indefinitely as the system transiently visits all the orbits available to it. Thus in a chaotic system there are numerous UPO's. The definition of chaos, the sensitivity to perturbations mentioned above, implies that a chaotic attractor (the entirety of the path in phase space traversed by the system) is dense in such UPO's. Thus every point in a chaotic attractor lies arbitrarily close to an unstable periodic orbit of some period. Another important property of the system is that it is ergodic; that is, the system explores the entire allowed region of phase space. Thus time averages can be replaced with simpler phase space averages. Unstable periodic orbits have become of great experimental importance in the last decade. The original experimental use for UPO's was in the control of chaos [2,3]. Ott et al. [2] described how to balance the system state point on any UPO of interest. But before controlling a chaotic system, one must first determine that it is chaotic. Global techniques (such as Lyapunov exponent and fractal dimension) for detecting chaos abound, but most fail miserably when faced with real-world data. These techniques often require very long and very clean data sets, while experimental data (especially in biology) is often of strictly limited duration and may have a large noise component. In addition, real-world data sets may be nonstationary. This last almost guarantees that any global technique must fail. As an alternative, one can look for local signatures of chaos in the data [4]. By far the easiest local signature to observe is a UPO. Since chaotic systems consist of an infinite set of unstable periodic orbits, the trajectory of the system will occasionally come close to a relatively low period orbit, causing the system to behave visibly in an almost periodic manner for a short time. Such encounters with unstable periodic orbits of low period can be detected, sometimes even in the presence of very strong noise [5]. By detecting and characterizing many of these UPO's representing 133
134
K. Dolan et al.
different unstable periodic motions of the system, one can be assured that the experimental system in question is chaotic. Often, due to the presence of noise and short data sets, the analysis of stable and unstable orbits in a system is the only practical way to extract information. This can be particularly true in extremely noisy chaotic systems, where determinism in data taken from the system may be very difficult to find using more traditional nonlinear techniques. Unstable periodic orbits, and in some cases higher period stable orbits, can even influence the dynamics of stable periodic systems, if there is enough noise present. But the analysis of periodic orbits can be used for more than just the detection of dynamical behavior in noisy systems. As mentioned above, once an unstable periodic orbit has been detected and its properties identified, it is possible to stabilize it using small perturbations, thereby forcing the system into a stable periodic behavior. This chaos control has been experimentally demonstrated in physical systems [3,6-8] and has the potential to become a powerful tool in biology as well [9,10]. Unstable periodic orbits can also be used for the anticontrol (or maintenance) of chaos. Sometimes a chaotic attractor will have two or more basins of attraction, which may merge during a dynamical crisis. Often this crisis is mediated by a UPO. Or a chaotic attractor may have two basins separated by a small region of phase space. Noise can then drive transitions between two different types of behavior (the two basins) even in stationary conditions. By analyzing the unstable periodic orbits in the region of either transition, small perturbations can be used to lock the system's dynamics into one basin of attraction [11]. Another use for UPO's is in what might be termed "chaotic spectroscopy". Although all chaotic systems have an infinite number of UPO's, the weights given to each UPO (i.e., the time the system spends near each UPO) are different for each system. Thus one can characterize the chaos by identifying not only the predominant UPO's, but also by noting the weight given each UPO. The analysis of unstable periodic orbits is also of interest in biology. Classical analysis methods are often completely unable to distinguish biological data from purely stochastic processes. The situation is further complicated by the fact that the underlying dynamical processes in biology are often not well understood. In fact, learning more about these underlying processes is often the motivation behind the analysis. Searching for UPO's in biological data provides a method for determining whether a process is dynamical or purely stochastic, as well as for obtaining information about the type of dynamics. Most biological data are inherently very noisy, and although some of the UPO analysis methods that we will present here are very robust with respect to noise, it is still a limiting factor in any analysis. Statistical methods must be used to determine if the UPO's detected in a data set are real or just the result of random chance. This becomes an especially difficult problem when looking for high period orbits, which by their very nature are not encountered very often. Simply analyzing very long data sets in order to sharpen the statistical significance of a test is often not an option, since the length of a biological data file is usually limited either by the protocol under which the experiment is done or by nonstationarity.
Detecting unstable periodic orbits in biological systems
135
2. Techniques for finding periodic orbits in data In many mathematical maps, it is quite easy to locate UPO's. But measurements of physical systems are usually continuous. That is, trajectories of these systems trace out loops in phase space. A useful tool would enable one to convert the continuous phase space trajectory into a map for the location of UPO's. Such a tool is available; it is call a Poincar~ section. In essence it allows one to take a slice of the phase space of the system, thereby reducing the dimensionality of the phase space by one. Points on the section do not move continuously, but rather jump from one location to another in a seemingly random fashion. However, in a low-dimensional system, when such a point is near a UPO, one notes that it always approaches the UPO from the same direction (the stable direction or eigenvector) and departs its vicinity along another consistent direction (the unstable direction or eigenvector). This structure in phase space, called a saddle point, is what Ott et al. exploited to implement their chaos control scheme [2]. Calculating a system's Poincar6 section can be done by actually plotting the intersections of the phase space trajectory with the sectioning plane. But in driven experimental systems, it is often easier to simply strobe the continuous data at the drive frequency. This can be often be done in hardware, thereby obviating a computationally intensive calculation and simultaneously sectioning the data much more precisely than can usually be done in software. Another technique for obtaining a map from a physical system is called a delay coordinate embedding [12-14]. This relies on the fact that, instead of measuring all of the system's n phase space coordinates at one time, the geometry of the attractor can often be satisfactorily reconstructed by measuring one system variable at n times. This embedding can then be used to look for UPO's in the same fashion as a Poincar6 section. A final technique that is of importance to biological systems is to construct an embedding of data that consists of intervals between events. This was first used in the water drop experiment [15] where the interval between two successive drops was measured to characterize the system. This method was extended to biological systems by plotting the intervals between heartbeats in a rabbit heart preparation [9] versus the previous interval. That such an interval plot is indeed a proper embedding has recently been shown by Sauer et al. [16].
2.1. Direct observation In many low-dimensional chaotic mathematical systems, the behavior of the system state point around UPO's is apparent to the eye, without recourse to detailed mathematical analysis. In low-dimensional chaotic physical systems with very low noise, it is also easy to pick out UPO's by eye. Such was done in the original experiments on the control of chaos [3], where the UPO, along with its associated stable and unstable directions, could clearly be observed. In addition it was possible to calculate the UPO's eigenvalues by taking simple ratios of the positions of successive points in the neighborhood of the UPO. Figure 1 illustrates this technique. As should be obvious, this method is of most use in an almost noise-free system of reasonably low dimension.
136
K. Dolan et al.
4.5
Magnetoelastic Ribbon Experiment
4.0
o o,.i r~ :
Stable Direction Slope -- 0.0146 Intercept = 3.33
3.5
L_ L_
3.0 Unstable Direction Slope = -2.64 Intercept = 12.3
2.5
I
I
I
I
I
2.5
3.0
3.5
4.0
4.5
Previous
Position
(V)
Fig. 1. Low-dimensional attractor consisting of 2000 points from the magnetoelastic ribbon experiment. A sequence starting at point 1280 is highlighted with red dots. The fixed point is at the intersection of the attractor with the diagonal. The saddle-like geometry around the fixed point is clearly visible to the eye. This pattern is repeated at the same location with the same directions over a hundred times in this clean physical data set. It is also possible to observe visually the pattern of motion associated with UPO's in higher-dimensional chaotic systems. If the data from such a system are projected onto a two-dimensional plot, one may see the typical U P O pattern. This is because one can model the stable and unstable directions with a linearization around the UPO. In a projection onto a two-dimensional plot, these straight lines project as straight lines, thereby preserving the essentials of the motion near a UPO. However, while such a plot m a y reveal the presence of UPO's, it can only do so if there is one unstable direction and if one of the stable directions is substantially stronger than the others. In addition, a projection direction along the unstable direction would prevent observation of the UPO in question. While one can observe UPO's in this fashion, one should be warned that the eigenvalues and eigenvectors measured in this fashion have more to do with the projection than with the details of the physical system itself. Thus this technique is unsuitable for a detailed analysis of higher-dimensional systems. 2.2. The topological recurrence m e t h o d
A simple statistical method for detecting unstable periodic orbits is the topological recurrence method. This method addresses many of the problems associated with
Detecting unstable periodic orbits in biological systems
137
noisy systems by searching for the occurrences of a simple pattern that is indicative of an encounter with a UPO. This method operates on time series data such as the time intervals between neuron firings or on nearly any other type of embedding. The attractor can be reconstructed from this time series as an n + 1 dimensional return map, that is {iri vs Ti-1 vs ... Ti-n}, where n is the dimensionality of the system. In this type of embedding, encounters with an unstable periodic orbit will appear as a sequence of points converging towards the UPO and then diverging away from it along the stable and unstable manifolds, respectively. This is difficult to search for, since the dimension of the system is usually not known and often cannot be determined with any reliability. However, if we look only at the first return map {T,. vs T,._I}, then the stable and unstable manifolds will project as straight lines. The overall structure of the attractor may be lost, particularly in the presence of noise, but in the vicinity of a fixed point, this converging and diverging behavior will still be present [5], as mentioned above and as shown in Fig. 2 [17]. Note that in such a plot the 45 ~ line represents consecutive points that are nearly equal, so any period-1 fixed-points in the system will lie on this line. Only period-1 UPO's (period-1 fixed points) will be discussed here, but this method can be modified to locate higher period orbits as well. To locate encounters with fixed points in the data, we search for the following pattern: Three points on the return map whose orthogonal distances to the 45~ line grow consecutively smaller, followed by three points whose orthogonal distances to the 45~ line grow consecutively larger. The point of nearest approach is shared, so that five points on the return map are used (six time series data points in total). The number of times that this pattern is seen in a file is then counted. As with any statistical test, this pattern can occur even in pure noise with some probability. It is therefore necessary to determine if the number of encounters found is significantly larger than what we would expect to find just due to chance. This is usually done by analyzing surrogate files. Some number of surrogate data files is analyzed (typically 100), and the number of encounters found in each surrogate is tabulated. The mean and standard deviation are calculated, and a statistical significance can be determined with the following formula:
N-:Vs
K - ~ ,
(1)
0- s
where K is the statistical significance in units of standard deviation, N is the number of encounters found in the original data file, Ns is the mean number of encounters found in the surrogates, and ~rs is the standard deviation of that mean. Assuming gaussian statistics, a value of K > 3 indicates with a probability of 99% that the encounters in the file are not due to random chance. This assumption of gaussian statistics is valid as long as the number of surrogates is fairly large (more than 20), and the number of encounters typically found in the surrogates is not close to zero. Specific details about the various types of surrogates relevant to this analysis will be discussed in Section 2.4.
K. Dolan et al.
138
(a) 250 #
"~''A QO
00 v
+ E
iI if# li t i
O0
II
1O0
150
200
250
200
250
Tn(MS) ( b ) 250
200
(/)
150
+ r I--
100
5
50
50
1O0
150
Tn(MS) Fig. 2. (a) First return map of interspike time intervals from the rat facial cold receptor [17]. Points near the 45~ line represent consecutive time intervals that are nearly equal. (b) A typical encounter found in the preceding data. The converging points are shown with circles, and the diverging points with triangles. The two arrows denote the stable and unstable directions. One strength of this type of analysis is that it is very fast. Even a relatively slow computer can analyze several hundred points of data in a fraction of a second, making the T R method ideal for real-time analysis. This is especially important for control techniques, since the location and properties of the fixed point can be determined very quickly, even in a nonstationary environment where the properties of a fixed point under control are changing with time. Alternatively, in post-experiment
Detect&g unstable periodic orbits & biological systems
139
analysis, where speed is not as important, additional criteria can be added to sharpen the statistic. For example, we could require that the converging and diverging points do so exponentially or put restrictions on how close the intersection of the calculated stable and unstable manifolds must be to the 45 ~ line. Another strength of the TR method is that it can operate on very small data files. Not only is this useful for control, since it allows you to quickly respond to changes in the system, but it is also useful for analyzing data that are sampled as some system parameter changes, e.g., when generating a bifurcation diagram. Small samples for which the parameter is essentially constant can be individually analyzed and changes in the K statistic can be used to locate bifurcations or other changes in the dynamics of the system. To demonstrate the effectiveness of this analysis method, we apply it to a simple deterministic system, the Henon map:
X,+I = 1 - a X 2 ~ + Y,,
(2)
Y,+I = bX, + ~,, where ~, is a uniform random deviate from - e to + ~, ~ being the magnitude of the noise. A uniform deviate is used because adding too large a random number could cause the system to diverge. Because the noise is added internally at each iteration, it cannot simply be filtered out. This is a problem very common in physical nonlinear noisy systems. The parameters a = 1.4 and b -- 0.3 were used, resulting in chaotic dynamics. Noise was added in varying quantities from ~ = 0.000 to e = 0.010 in steps of 0.002. Data sets of 5000 points were generated, since this is typical of the length of many of our actual biological data sets. Only the variable X, was recorded, since it contains all of the deterministic dynamical information of the system. The resulting time series is thus already suitable for analysis by the topological recurrence (TR) method. For this analysis 100 surrogates were used. The surrogates were generated by randomizing the order in which the numbers making up the original file occur, thus maintaining the distribution of the original data, but destroying all other information contained in the file. The resulting statistics for each file are shown in Table 1. Table 1 Noise (e)
0.000 0.002 0.004 0.006 0.008 0.010
# of Encounters
Surrogate mean
Standard deviation
(N)
(Ns)
(CYs)
502 494 481 489 479 486
256.68 255.94 251.31 254.27 255.57 253.62
11.06 11.40 12.20 11.03 10.55 9.97
Statistic (K)
22.18 20.88 18.83 21.27 21.17 23.32
140
K. Dolan et al.
The TR method is able to detect the presence of the unstable fixed point with much more than 99% confidence even in the presence of very large internal noise. The robustness of this method with respect to noise comes from the fact that it searches for a pattern that is very rare in non-dynamical data. The presence of noise may cause some of the actual encounters with UPO's to be missed by the algorithm, but data from systems with real UPO's will typically have many more encounters in them than what we would expect to see by chance. Thus only when the noise is so high that the majority of the real encounters are not detected should our algorithm fail. 2.3. Periodic orbit transform method
Ott et al. recognized that, although a chaotic attractor is highly nonlinear, the local region around an unstable periodic orbit can be modeled as a linear map [2]. They used this to formulate their theory for controlling chaos. So et al. [18,19] used this fact to devise a transformation applicable to time series data that concentrates the data from this local (linear) region onto the fixed point. These transformed data are then used to construct a histogram that will exhibit peaks at the locations of the fixed points. Data outside the linear region do not map to the fixed point, but rather are scattered randomly across the histogram. Following the exposition of So et al. [18], we will initially describe this technique in a one-dimensional map. Start by considering a small region around a fixed point. The linear approximation to the map is x,+l ~ Ax, + B, where A is the slope of the map and B is the intercept. Thus the fixed point is x* ~ Ax* + B, which yields an expression for the fixed point location x* ~ B / ( 1 - A ) . Substituting for B, we obtain x* ,~ (Xn+l - A x , ) / ( 1 - A).
This estimate transforms x,+~ onto the fixed point. So et al. thus supply the following formula for calculating the nth estimate for the fixed point, d,: =
Ix.+,
- s.(k)x.]/[1
(3)
- ,.(k)],
where the slope is approximated as ,.(k)
=
-
- x~
+ k(x.+,
- Xn).
(4)
Ignore for the moment the second term in Eq. (4). Once we have transformed the data, it only remains to construct a histogram of the transformed data. This histogram is however subject to spurious singularities due to the differences in the denominators in Eq. (3) and in the first term of Eq. (4). Thus the second term in Eq. (4) was introduced by So et al. to avoid these divergences. We set k = KR, where R is a random number in the range [-1,1] and ~cis the magnitude of the randomization. The transformation is applied to a given data point a large number of times, typically between 100 and 500, each time with a different random number R. This is repeated for each data point in the set. When the histogram is plotted, points that are not near a fixed point are scattered by this randomization unless Xn+l ~ xn, where the second term in Eq. (4) vanishes. In this fashion spurious singularities are removed, but peaks in the histogram due to fixed points are retained.
Detecting unstable periodic orbits in biological systems
141
The constant ~cmust be chosen carefully. It is highly data dependent and must be chosen large enough to remove the unwanted singularities, while being kept small enough not to destroy the peaks marking a fixed point. (This is necessary because, although the second term in Eq. (4) does vanish at the fixed point, it erodes the size of the linear region around the fixed point and thus the quantity of data that is transformed onto the fixed point.) The POT method can be generalized to evaluate higher-dimensional data sets [18,19]. In this case the transform is given by Zn -- (1 - Sn) -1 (Zn+ 1 -- SnZn) ,
(5)
where z, - (x,, X,_l, . . . , x,_a+l) t is a delay coordinate vector constructed with a suitable choice of dimension d and Zn~ denotes the transpose of zn. S, is defined as Sn-
1
2
3
an
an
an 1
(d-l)
"'"
an
d)
an 0
+~RIIzn+a-znll
(6)
where for computational simplicity the norm was arbitrarily chosen to be the L1 norm, I] z II- ~ = 1 zi[, and where an
ill / ;d
(zn _ zn_l)t
(Zn_(d_ 1 __
-1
//
1 1/
Zn+ 1 -- Z n
(7) 2"1n- (d-l)
zl-(d-1 )
Here • is again the magnitude of the randomization, but R is a d x d random matrix with each matrix element chosen randomly as for the scalar R. It should be noted in passing that So et al. later revised the POT method slightly. In their longer paper, they chose the second term of Eq. (6) to be ~c R 9 (zn+l - zn). Although this choice is analytically more tractable, it increases the time needed for computation significantly. As with the TR method, the question also arises as to the significance of the UPO's found by the periodic orbit transform (POT) method. Once again we resort to surrogates to address this issue. For this analysis 100 surrogates were used. As done previously, the surrogates were generated by randomizing the order in which the numbers making up the original file occur, thus maintaining the distribution of the original data, but destroying all other information contained in the file. In order to facilitate the comparison of the transformed surrogate data with the transform of the original data, we calculate a statistic T~- at each bin i of the histogram Ti -- (Transformed d a t a ) / - (Average of the transformed surrogates)/. (Standard deviation of the transformed surrogates)/
(8)
(So et al. calculate a similar statistic, but choose to divide by the maximum value of the standard deviation over the entire histogram.) Thus a value of Ti above 3 indicates the location of a UPO with a statistical confidence greater than 99%.
142
K. Dolan et al.
Fig. 3 shows the P O T method applied to the same H e n o n data presented in Table 1. The results agree perfectly with the analytic calculation of the U P O when a = 1.4, b = 0.3, and noise is zero, giving a U P O position of 0.631. The P O T method is elegant in its application and has several advantages over the other methods discussed so far. Unlike the T R method, which uses only the stable and unstable directions, the P O T method takes all the geometry around the fixed .~ 3 0 0 .~
=. 2 5 0 -
Original Data Surrogates Surrogate Average Surrogate Average +/- 3 std. devs. Analytic value of fixed point
E 200r~
~
150-
[.., "~ k lOOo~
"~
50-
o~
O100 ~. r~
8060-
.==i r~
-9
402O 0
-1.0
-0.5
0.0
0.5
1.0
X
Fig. 3. Periodic Orbit Transform for the same Henon data presented on the first line of Table 1. The transform of the original data (green) is well above the band of the transformed surrogate data (purple). The lower plane shows the POT statistic (blue) calculated from the values in the upper plane. The result agrees well with the analytic value (red hash marks) calculated from the Henon equations.
Detecting unstable periodic orbits in biological systems
143
point (eigenvectors and eigenvalues) into account. In addition, since the POT method performs the transform on each data point a large number of times (typically 100-500), there is an effective "amplification" of the peaks in the histogram. This amplification makes the POT method very useful on short data sets such as those typically found in biological systems. However, because of this, the POT method is also computationally intensive and thus is unsuitable for use in real time applications such as chaos control or anticontrol.
2.4. Choosing the proper surrogate In the previous sections we discussed the need for testing against surrogate data files in order to establish the statistical significance of our results. There are many different methods for generating surrogate data files, and it is a nontrivial task to determine which surrogate is appropriate for these types of analyses. In any statistical test, the type of surrogate that should be used depends on what you are trying to prove. If we are trying to detect the presence of periodic orbits in a data set, then what you want to prove is that there actually are periodic orbits in the data. In order to prove such a strong statement, one would have to create a surrogate data file that does not contain any periodic orbits, but which is in every other way identical to the original data set. Unfortunately this is for all practical purposes impossible. Both of the algorithms described in Sections 2.2 and 2.3 are based on the fact that if a fixed point is present in the system, then at various points in the data file that fixed point will be visited. The most we can hope for from a surrogate is that it destroys any and all 'real' encounters with fixed points and also preserves any statistical properties of the data which could contribute to false positive results from the analysis. Here we will discuss the two types of surrogates used most often in this kind of analysis, shuffled surrogates and amplitude adjusted Fourier transformed surrogates. Each of these has its own strengths and weaknesses. The simplest surrogate one could create would be to just generate a series of random numbers from some distribution. This would certainly satisfy the first requirement of destroying any real encounters with fixed points in the original data, but this surrogate would not preserve any of the other statistical properties of the data, and thus would not be very useful. A vast improvement that is still quite simple is the shuffled surrogate. Here the order in which the data appears in the file is randomized, but the distribution (shape, mean, width, etc.) of the data is unchanged. Maintaining the distribution of the original data file is very important, since it is clear from the descriptions of both the T R algorithm and the POT algorithm that the distribution can have a significant influence on the results [20]. A simple shuffling algorithm that accomplishes this is as follows: Given a time series {xl ,X2,... ,XL-1 ,XL} where L is the length of the data file, for each X i select a random integer j from 1 to L. The elements xi and x/are then switched. The result of this is that each element xi from the original file will appear as element x/in the
144
K. Dolan et al.
surrogate file with each j a random number uniformly distributed from 1 to L. To ensure complete randomness, this procedure may be repeated a number of times. Although preserving the distribution is a step in the right direction, shuffled surrogates are still not perfect. The distribution of the surrogate will match that of the original file, but in every other respect the surrogate is simply white noise. Therefore technically the only null hypothesis that can be rejected is that the original data are white noise. A statistically significant result from either of the two tests described previously will certainly suggest the presence of either stable or unstable periodic orbits, but it does not prove it. There are many other statistical properties of the original data that could, in principle, affect the results of our test. One property that has been known to have an impact on nonlinear analysis tests is the power spectrum. Although we have found no evidence that the power spectrum does influence the results of the TR and POT algorithms, it is possible that it could. To account for this possibility, we apply a type of surrogate originally developed for correlation dimension analysis [21]. These surrogates, which we will refer to as amplitude adjusted fourier transformed (AAFT) surrogates, preserve both the power spectrum and the distribution of the original data set. The following algorithm will generate such surrogates: 1. Generate L points of Gaussian white noise. 2. Reorder the white noise sequence so that its ranks match those of the original data. This is done by sorting both the white noise and the original data (determining which element is the smallest, next smallest, etc.) using the original data as the sorting key. 3. Create a surrogate of the white noise sequence using the Fourier transform or the windowed Fourier transform method. 4. Re-rank the original data so that its ranks agree with those of the surrogate created in step 3. This assures that the distributions of the surrogate and original data are identical, but has very little effect on the power spectrum. Using these surrogates we can reject the null hypotheses that the data are a linear stochastic process passed through a static nonlinear transformation [21], i.e., what we would commonly refer to as colored noise. Given this more advanced surrogate, one might wonder, "Why bother with shuffled surrogates at all?" The answer is that the A A F T surrogates are much more computationally intensive. For post-experiment analysis, the time required to generate A A F T surrogates is usually not a problem, but for real-time analysis this computation time can become a serious issue. The TR method is ideal for real-time analysis since it can operate on very small data files and can do so in a fraction of a second on a moderately fast computer. This type of analysis is very important in the control of chaos, where the dynamical parameters of the system may be changing with time and the control must compensate for these changes. If surrogate analysis is to be used in this type of situation, shuffled surrogates are currently the only ones that can be generated quickly enough to be useful. In the above we state that a good surrogate should destroy any real UPO encounters in the original file, while preserving any statistical properties that could influence the results of the algorithm being used. This raises the question of whether
145
Detecting unstable periodic orbits in biological systems
the power spectrum of the data does in fact have such an influence, specifically on the T R algorithm. To test this property, r a n d o m data files were generated with non-white power spectra using a two-dimensional Ornstein-Uhlenbeck process [22] as follows: dY = l ( y _ x ) , dt dy dt
"~ 1
= -x r
(10) 1
+ - v/2D~(t) "c
where ~(t) is Gaussian white noise of unit variance, D is the noise intensity, and ~ is the correlation time of the resulting process. The data are then embedded by measuring the time between positive zero crossings of the variable y, thus making it suitable for analysis with the T R method. Various correlation times were used, and both shuffled surrogates and A A F T surrogates were tested to determine if the power spectrum had any effect on the results (see Table 2a). Both surrogates produce statistically indistinguishable results and are not "fooled" by the colored noise into thinking that UPO's are present. To determine if the surrogates are also capable of properly detecting UPO's when they are present, we use the same data files, but this time with encounters added to the file. This is done by taking the following pattern of numbers, [50,108,73,87,72,121], which meets the criterion stipulated above for an event and inserting it into the data file many times. This sequence is inserted between every encounter found in the data file, overwriting whatever data were there originally, in order to preserve the file length. The event described is scaled in order to match the distribution of the file; that is, each number in the sequence is multiplied by the same constant, which is chosen to make the sequence match the original data distribution as closely as possible. The sequence still remains an encounter because rescaling of this type does not affect order statistics. These new data sets were then analyzed with both the shuffled and A A F T surrogates, and the results appear in Table 2b. In this case both surrogates are able to detect the UPO's in our new files. This test may seem somewhat artificial, but it should be noted that many real biological systems are very difficult to distinguish from the Ornstein-Uhlenbeck process des-
Table 2 Two-dimensional Ornstein-Uhlenbeck process analyzed with the TR method, using 100 surrogates (a) and with inserted encounters (b) (ms) (a) TR method SS surrogates AAFT surrogates (b) Inserted encounters SS surrogates AAFT surrogates
25
50
75
100
0.39 0.22
0.85 0.56
0.57 0.69
1.05 0.82
7.71 6.35
5.39 7.56
6.59 6.52
7.42 9.26
146
K. Dolan et al.
cribed here, indicating that the TR method can be very useful for identifying dynamics in these types of systems. Some biological systems that are also difficult to distinguish from noise differ in that they possess a very strong peak in their power spectrum. Since the power spectrum in these types of systems is qualitatively very different from the OU process described above, we will perform the same tests using another random process known as harmonic noise [23,24]. These data are produced by finding the positive zero crossings of the phase space trajectory of a damped linear harmonic oscillator driven by Gaussian white noise: d2x dx dt----~ + F ~-~ + r
= x/2 DF~(t),
(11)
where ~(t) is Gaussian white noise of unit variance, D is the noise intensity, F is the damping, and ~o0 is the natural frequency. Four different values of o30 were used, and the data were analyzed just as in the previous example. The results are shown in Table 3. Once again the two surrogates both produce essentially the same results, suggesting that the power spectrum of a data set does not have a significant influence on the results of the TR algorithm. Of course it is impossible to test every possible type of colored noise, but these results nevertheless demonstrate that both the A A F T and shuffled surrogates seem to work equally well.
3. Applications to biology Until now we have discussed general methods for analyzing noisy time series data and some of the issues that arise when using them. We will now demonstrate these algorithms on some experimental biological systems. 3.1. The crayfish caudal photoreceptor, and the topological recurrence method
Pei and Moss [25,26] studied the hydrodynamically sensitive hair mechanoreceptors, the attached sensory neurons, and the terminal ganglion of the crayfish Procambarus clarkii. Extracellular recordings were taken from the output neuron of one of the two caudal photoreceptors. The photoreceptor cell was illuminated via optical fiber,
Table 3 Harmonic noise analyzed with the TR method using 100 surrogates (a) and with inserted encounters (b) o3(rad/s) (a) TR method Shuffled surrogates AAFT surrogates (b) Inserted encounters SS surrogates AAFT surrogates
251
126
84
63
0.10 0.46
-0.75 -0.50
0.68 1.21
-0.62 -0.55
6.23 5.87
3.51 4.34
6.15 7.16
4.50 5.20
Detecting unstable periodic orbits in biological systems
147
and the entire preparation was moved sinusoidally along the direction of maximum sensitivity. This yields three control parameters: the hydrodynamic stimulus amplitude and frequency, and the light intensity. The three control parameters were varied over ranges consistent with what the animal would experience in its natural environment, and the resulting interspike time intervals were analyzed using a simple variation of the Topological Recurrence method discussed in Section 2.2. In addition to looking for sequences of three points converging towards the line of periodicity followed by three points diverging, two requirements were placed on the sequence for it to qualify as an encounter: 1. Straight lines fitted to the converging and diverging sets of data should have negative slopes. That is, the stable slope should lie between 0 and -1, and the unstable slope should be between -1 and -oc. 2. The intersection of the two lines must lie within a perpendicular distance e of the 45 ~ line, where e is less than half the average distance of the six points making up the encounter. The first requirement is motivated by the hypothesis that the caudal photoreceptor is a biological example of a nonlinear oscillator. This cell is well known to exhibit pacemaker activity when all sensory input from the tailfan to the ganglion is removed and the sensory neurons are very noisy. Thus we expect the specimen to behave as a nonlinear oscillator driven by a noisy periodic signal. If this is the case, then any UPO's present in the system should be of the flip-saddle type and will exhibit negative slopes on the return map. The second criterion is used to help eliminate false encounters. In principle the stable and unstable manifolds should intersect at the fixed point. Of course the presence of noise and the inherent nonlinearity of the system will cause the fitted lines to vary a little, but this criterion puts a limit on how much they can vary before we reject the encounter as being false. The stimulus amplitude was held at a constant value of 0.78 gm. Four light intensities were used: 0.04, 0.08, 0.12, and 0.20 gw/mm 2. The stimulus frequency was varied from 3 to 24 Hz. The statistic K was calculated for each of these parameters using 100 shuffled surrogates, and the results are shown in Fig. 4. Here we see that, for low frequencies, UPO's are present in statistically insignificant numbers. As we increase the frequency, the statistic K grows considerably and then drops back down to insignificant levels for high frequencies. This is very consistent with the behavior we would expect to see from a periodically forced nonlinear oscillator: as the frequency increases, bifurcations from limit cycles to chaos and, at high enough frequencies, a return to limit cycles.
3.2. UPO's in neural tissue
In a paper on the control of chaos in neural tissue [10], Schiff et al. reported controlling chaos in the electrical population spiking of the in vitro rat brain hippocampus. As a preliminary to control, they sought UPO's of period-1 in the signal. The intervals between successive population spikes were plotted on a delay diagram, as shown in Fig. 5. Sequences of points approaching the diagonal from a
148
K. Dolan et al.
-@
-2
,
0
5
~
-
,
10
,,,
15
v'
20
25
Frequency (Hz)
Fig. 4. The statistic K versus stimulus frequency with constant stimulus amplitude of 0.78 ~tm-peak, for four light intensities: 0.04 ~tw/mm 2 (diamonds), 0.08 ~w/mm 2 (triangles), 0.12 ~tw/mm 2 (squares), and 0.20 ~tw/mm 2 (circles). The dotted line at K = 3.0 marks the 99% confidence level. single direction and departing along a n o t h e r definite direction were identified by direct observation. The sequences were clearly seen, despite the fact that these d a t a a p p e a r to be of dimensionality greater than 2 and are relatively noisy. Nevertheless, as s h o w n in Fig. 5a, excellent unstable saddle points could be observed, and, as shown in Fig. 5b, these sequences were repeated over and over again. A later p a p e r by So et al. [19] examined d a t a from the same p r e p a r a t i o n and confirmed these earlier results using the P O T method.
Opposite: Fig. 5. (a) Return plots of interburst intervals. Seven sequential points are colorcoded. Note that as the trajectory crosses the line of identity along the direction from 1 to 2, the next points take a peculiar sequence that starts close to the line of identity at point 3, and then alternates on either side of the line of identity and progressively diverges from it along a nearly straight line for points 3-7. The points colored in green (1-2) plus point 3 (red) define a stable direction or manifold, while the points in red (3-7) define an unstable manifold. The intersection of these manifolds with the line of identity defines the unstable fixed point. (b) Return plot showing multiple color-coded trajectories that did not follow each other in time. The starting point for each sequence, numbered 1 in each, began at spike numbers 87 (blue), 210 (red), and 317 (green), out of a total series of 320 spikes. The stable manifold is shown with arrows pointing towards the unstable fixed point, and the unstable manifold has arrows drawn in the direction away from the unstable fixed point. Note that each sequence starts at a point roughly the same distance from the unstable fixed point in a region close to the stable manifold. Each trajectory follows the same path, and after closely approaching the unstable fixed point for the points labeled 2, there evolves for each trajectory a sequence of exponentially diverging jumps (on alternating sides of the line of identity along the unstable manifold). Although the trajectories are clustered along the unstable manifold for four spikes, the fifth spikes are widely scattered (not shown). This is the manifestation of the sensitivity to initial conditions typical of chaotic systems. These plots show clear evidence for nonrandom structure in these trajectories, and their patterns bear the hallmarks of deterministic chaos.
Detecting unstable periodic orbits in biological systems
(a)
149
1.6 I II
Stable direction I 9 Unstable directionl
1,5--
9
9
"
9
"
9
9
"
:
9
"
l I
" 9
9
/
/
9
i
1.4
i 9
9
.
o" " 9
9
9
9
9
:
-.
el
t'q ";.:
9
9 9
9
9
1.3 9
.
9 9
9
9
9
9
.
9 eo ":
9
99~
1.2-
,. 9
.
9 9
=.. s
9
9
9
-
-~
el
9
9
9 e 9
9
9
.9
9
.,;
oo0
.''
~
.
~ ,
-: 9
9
e"
00
9 ., 9
9
9
9
.
9
9
9
""
9 9
~1
9
9
"
9
9
_
..."
"
.'I
9
"..
",'."
9
;""
-.-
9
1 / . :
; ' 2 . " .
9
..
9
.,'.'~
.-.~
9. . . . . . . / e - " i " . . . ' : 9
9
c~
w
9
.
( 9
9
. . : ~ . . . . .9
v
".Y.~
9
9
9~
9
.;//
."
9
..
~
,.,-
" : "'...~.
.
9 e "9.
9
9
,..,.
0~o51, ~
2"
.; s / / , - ' .'o , e ~ . . . ' ~ . : .
:
9.
r ~a
= .=
-
9 .
9 9
,,
:7
". 9 9 .
~
1.1
1.0
I
I
I
I
I
1.1
1.2
1.3
1.4
1.5
9
1.0
|n-1
9 Sequence starting at point 87 I Sequence starting at point 210
(b)
9 II
1.6-
Sequence starting at point 317[ 9
9
ooo
1.4-
:
~
9
9
9
9 .'a
9 ~a
" 9
.1"'-
" 9.
9
". , z 00 ~.
9 9
"
:
." o~
9
9
~ - . 7""
"9
9g ~ ,
"
.o
9 1 4 9 1 4 ~9 9
9 o.: e 9
9
9
1.0
~'~
o 9
9
9
9
oo
.-
9
9
.. 9
."
9
"
.0".
(~1
9 9 .1"
"
9
ooo
9
.~..~-"
~, -:
"
o:
0.8
9
~
~
: . a.... 9
9
....
~ 9
9
9
~ o ~ e e e be
0%
o%.
o 9
.11 9 # 9
9
..
9
9
9
g:.. 8. 9
"
"-.
9o
.
9
9
9
~
..
9
.. 9
",,
9
9
9
.
9 "
.
9149 9
9
9
"
9
.
9
I
I
J
1.0
1.2
1.4
opposite
9
9
9
Caption
.
..
I
5
9
"
9
4' ..
In_ ! (sec)
"
. 9
0.8
Fig
"
9
9
9 .'''.',P
9
. 9
9
"
9
".
.. 9.# " ~ . " " , 7 , ,
9 9
~ . 9. . ., . . ~ ". r , "~2:,,, ~ " : . r '~'.'."0 " : ' " " ' ~ . "r ~ ' = . " ' ~ - : "" "" ; " 9 " 9 ~. "-.........:-....-., "'."
9
9
"
9
.
9
9
9
~'"
ooo
9
%0
9
-9"
9
o 9
9
9
9 "
9
08
oeb
9 d' 0 9
"1
ee%
e"
9
9
e 9
9
9 |
3
9
9
,:~ -;, . , . . . < . . - . . j , %" .% .'oL ". :.'.'.."" ". .: ~. 7 9 9
9
9
9
:o:
9 " " 9 " t o o ~149149~ 9 "S";'' 9
1.2-
e 9
9
" "~ " . 9. . . 9 .
~
..'..
9
9 e 9 9
= =
1.6
(sec)
I
16
150
K. Dolan et al.
FP = (0.96, 0.96), Xs = -0.03, ,ku =-2.31 Unstable Manifold
+ ,,<
1 0.
9",:E "" . ~-w...,'. "~'" """ " I :~- :..-..':9 "'~,,c,,~ '~
Stable Manifold
If""'"":" " "'"~; " ~ ' ~ "
,144 #146
0.7~~ 0.7
0.8
0.9
1.0
I 1.1
A(i)
Fig. 6. (a) Return plot of the time series generated by the ratio of negative to positive area A(i) under the transmembrane current for a typical 1 min interval of in vivo ventricular fibrillation illustrating the local structure of the chaotic attractor. Coordinates for the calculated unstable fixed point, stable eigenvalue, and unstable eigenvalue for this visitation of the unstable fixed point are provided above the plot.
3.3. Determinism in canine ventricular fibrillation Witkowski et al. [27] published a study examining a dog heart preparation undergoing ventricular fibrillation (VF). VF is a frenzied and irregular heart rhythm that renders the heart incapable of sustaining life and is always fatal within minutes unless externally terminated by passing a large electrical current through the heart. Witkowski et al. were examining the time series formed from the intervals between beats as well as the time series formed by successive measurements of the ratio of the negative to positive areas under the transmembrane current curve (A(i)) for each beat. They sought signs of deterministic chaos as a preliminary to trying to control the fibrillation. Fig. 6 shows a typical return plot of the A(i) from about one minute of their data. The saddle-like structure is clear.
3.4. Determin&m in human atrial fibrillation A paper [28] by Ditto et al. reports attempts at controlling atrial fibrillation in humans. As usual, they begin by searching for UPO's in their data. Figure 7 shows
Opposite: Fig. 7. (a, upper pane) The POT transform of the data (green). The thick purple line indicates the average of 100 surrogates with error bars denoting the standard deviation at each point. (a, lower pane) So statistic (blue) calculated using the data from the upper pane. (b) The same analysis of the data from a second patient.
Detect&g unstable periodic orbits in biological systems
(a)
50-
~
=
---"-
40-
151
Transform Surrogate Average with std dev error bars
.E ~
30-
C m L
I-,~_ 2 0 ,,0
0 u
~ .2
10
a. 0
9
"~A~1501 ~ 91 0 0 ] ._~ "m 50 u) I0 o..
0 1
0.10
(b)
~,
160 -
0.15
---
= 140.d p. 1 2 0 -
1
0.20 Interbeat Interval
1
0.25
0.30
Transform Surrogate Average with std dev error bars
100c
m
80-
I-~9-
60
0 u
40-
-
~.
20-
0~ 15o ~-
t
"~ 100 u
-
._~ "m 50 I-0 0
0.10
I
0.15
Fig. 7.
r
0.20 Interbeat Interval
T
0.25
Caption opposite.
0.30
152
K. Dolan et al.
their analysis of the data using the POT method on two sets of data from different subjects. In both cases statistically significant indications of UPO's were found. 4. Discussion
The study of unstable periodic orbits has enabled researchers in the field of nonlinear dynamics to work with short and noisy sets of nonlinear data. This in turn allows them to study biological systems, which noise and non-stationarity had previously made intractable. Algorithms based on the topological recurrence and periodic orbit transform methods have been used to identify deterministic dynamics in many biological systems. One is the crayfish neural system, which data had previously been considered indistinguishable from exponentially correlated noise. Another is canine ventricular fibrillation, which is a high-dimensional system that is difficult to analyze by direct observation. And a final example is human atrial fibrillation, which is by its very nature highly non-stationary. These methods not only allow for the detection of determinism in extremely noisy systems, but they also provide a great deal of information about the type of dynamics present. The methods described here have been shown to be very robust to the presence of dynamical noise and can operate on extremely short data files. They have also been demonstrated to be capable of distinguishing between dynamical data and colored noise, something which is difficult for more traditional nonlinear techniques, even when very long data files are available. In addition, the topological recurrence method in particular is extremely fast to implement, allowing for the online statistical analysis of data as they are generated. This is very important when responding to the system during real-time control. These methods are not only diagnostic. In addition to providing new methods for extracting dynamical information from biological systems, they can also be useful for developing new treatments for illnesses. Analysis of periodic orbits could lead to new insights as to what dynamics are involved in pathologies like cardiac arrhythmia and plays a key role in the control of chaos, which could ultimately be useful as a treatment for chronic illnesses like atrial fibrillation. Acknowledgements
The authors gratefully acknowledge the support from the Office of Naval Research, Physical Sciences Division. MLS also acknowledges assistance from the NSWC ILIR Program. References
1. 2. 3. 4. 5.
Li, T.Y. and Yorke, J.A. (1975) Amer. Math. Monthly 82, 985. Ott, E., Grebogi, C. and Yorke, J.A. (1990) Phys. Rev. Lett. 64, 1196. Ditto, W.L., Rauseo, S.N. and Spano, M.L. (1990) Phys. Rev. Lett. 65, 3211. Kalpan, D.T. (1994) Physica D 73, 38. Pierson, D. and Moss, F. (1995) Phys. Rev. Lett. 75, 2124.
Detecting unstable periodic orbits in biological systems 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
153
Hunt, E.R. (1991) Phys. Rev. Lett. 67, 1953. Roy, R., Murphy, T.W., Jr., Maier, T.D., Gills, Z. and Hunt, E.R. (1992) Phys. Rev. Lett. 68, 1259. Petrov, V., Gaspar, V., Masere, J. and Showalter, K. (1993) Nature 361, 240. Garfinkel, A., Spano, M.L., Ditto, W.L. and Weiss, J. (1992) Science 257, 1230. Schiff, S.J., Jerger, K., Duong, D.H., Chang, T., Spano, M.L. and Ditto, W.L. (1994) Nature 370, 615. In, V., Spano, M.L., Neff, J.D., Ditto, W.L., Daw, S., Edwards, K.D. and Nguyen, K. (1997) Choas 7, 605. Takens, F. (1981) In: Dynamical Systems and Turbulence, eds D. Rand and L.S. Young, p. 230, Springer, Berlin. Eckmann, J.-P. and Ruelle, D. (1985) Rev. Mod. Phys. 57, 617. Sauer, T., Yorke, J.A. and Casdagli, M. (1991) J. Stat. Phys. 65, 579. Shaw, R. (1984) The Dripping Faucet as a Model Chaotic System, Aerial Press, Santa Cruz. Sauer, T. (1994) Phys. Rev. Lett. 72, 3811. Braun, H.A., Dewald, M., Sch/ifer, K., Voigt, K., Pei, X., Dolan, K. and Moss, F. (1999) J. Comp. Neurosci 7, 17. So, P., Ott, E., Schiff, S.J., Kalpan, D.T., Sauer, T. and Grebogi, C. (1996) Phys. Rev. Lett. 76, 4705. So, P., Ott, E., Sauer, T., Gluckman, B.J., Grebogi, C. and Schiff, S.J. (1997) Phys. Rev. E 55, 5398. Dolan, K., Witt, A., Spano, M.L., Neimen, A. and Moss. F. (1999) Phys. Rev. E 59, 5235. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B. and Farmer, J.D. (1992) Physica D 58, 77. Uhlenbeck, G.E. and Ornstein, L.S. (1930) Phys. Rev. 36, 823. Schimansky-Geier, L. and Ziilike, C. (1990) Z. Phys. B 79, 451. Neiman, A. and Schimansky-Geier, L. (1994) Phys. Rev. Lett. 72, 2988. Pei, X. and Moss, F. (1996) Int. J. Neural Syst. 7, 429. Pei, X. and Moss, F. (1996) Nature 379, 618. Witkowski, F.X., Kavanagh, K.M., Penkoske, P.A., Plonsey, R., Spano, M.L., Ditto, W.L. and Kaplan, D.T. (1995) Phys. Rev. Lett. 75, 1230. Ditto, W.L., Spano, M.L., In, V., Neff, J., Meadows, B., Langberg, J.J., Bolmann, A. and McTeague, K. (2000) Int. J. Bif. and Chaos 10, 593.
This Page Intentionally Left Blank
CHAPTER 5
The Topology and Organization of Unstable Periodic Orbits in Hodgkin-Huxley Models of Receptors with Subthreshold Oscillations R. G I L M O R E Department of Physics, Drexel University, Philadelphia, PA 19104, USA
9 2001 Elsevier Science B.V. All rights reserved
X. PEI Center for Neurodynamics, University of Missouri, St. Louis, MO 63121, USA
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
155
Contents 1.
Introduction
2.
Hodgkin-Huxley equations
.................................................
3.
Modified H o d g k i n - H u x l e y e q u a t i o n s
4.
Model behavior
157
........................................
158
..................................
...............................................
159 161
5.
P h a s e space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.
Classification o f s t r a n g e a t t r a c t o r s by integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
166
7.
T o p o l o g i c a l analysis p r o g r a m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
174
7.1. T o p o l o g i c a l o r g a n i z a t i o n o f periodic orbits
177
7.2.
...........................
Identify a b r a n c h e d m a n i f o l d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3. V a l i d a t e the b r a n c h e d m a n i f o l d 7.4.
163
C o n s t r u c t a flow m o d e l
7.5. V a l i d a t e the m o d e l
.......................................
8.
T o p o l o g i c a l analysis o f modified H o d g k i n - H u x l e y e q u a t i o n s Jelly rolls
...................................................
Flows without equations ..........................................
178 178
..........................................
9. 10.
177
..................................
179 ...................
180 183 187
11.
C h a o s in higher d i m e n s i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
193
12.
Discussion and conclusions
199
Acknowledgements References
........................................
.............................................
.....................................................
156
202 202
I. Introduction
If the behavior of a system is bounded, deterministic and nonperiodic, then it is chaotic. Chaos is generated by the interplay between two mechanisms that operate on a flow in phase space: stretching and squeezing. The first mechanism is responsible for 'sensitivity to initial conditions', while the second is responsible for recurrently building up a self-similar (fractal) structure. The purpose of the present article is to describe a set of equations that has been proposed to model neurons with subthreshold oscillations [1], to show that this model can generate chaotic behavior, and to describe how the stretching and squeezing mechanisms operate on the phase space for this model. These stretching and squeezing mechanisms in phase space are described in terms of topology. They act concurrently to build up a strange attractor. They simultaneously organize all the unstable periodic orbits in the strange attractor in a unique way. This unique organization is a tool that has been used to classify strange attractors [2]. It is also the probe which is used to identify the mechanism which generates chaotic behavior in specific dynamical systems. We will apply this probe to understand how the flow in the phase space appropriate for neurons with subthreshold oscillations is deformed under the deterministic equations of the model. We will show that the mechanism involved is one which has previously been identified [3]. It is responsible for chaos in the Duffing oscillator and in the YAG laser. It has been affectionately called the 'jelly roll' or the 'gateau roul6' mechanism. The reason is simply that these delightful edibles are built up by exactly the same mechanism. This contribution is organized as follows. In Section 2 we review briefly the basic Hodgkin-Huxley model of electrical activity in a neuron. The modified version of the Hodgkin-Huxley model is discussed in Section 3. This model describes nerve cells which fire without external inputs. In Section 4 we describe the behavior of the output of this model, while in Section 5 we discuss how this behavior is manifested both in the phase space of the model as well as in an appropriate, reduced phase space of smaller dimension. In Section 6 we describe the classification of low-dimensional strange attractors by topological means, and outline the Topological Analysis Program in Section 7. We apply this program to the analysis of the strange attractors generated by the modified Hodgkin-Huxley equations in Section 8. There we find that the strange attractor is of 'jelly roll' type. Strange attractors of this type have previously been analyzed. As a result, many of the properties exhibited by the modified HodgkinHuxley equations have already been well studied. Some of these results are summarized in Section 9. Sections 10 and 11 are devoted to a description of how it is possible to go beyond the program of analyzing low-dimensional chaotic dynamical systems which is des157
158
R. Gilmore and X. Pei
cribed in Sections 6 and 7. In Section l0 we describe how to simulate dynamical systems without necessarily going through the intermediary of developing a system of equations to mimic the flow ('flows without equations'). In Section 11 we present preliminary results on how the stretching and squeezing mechanisms can occur in higher (than 3) dimensional dynamical systems, and under what conditions a discrete classification theory, as exists for three-dimensional strange attractors, can exist in higher dimensions. Our results are summarized in Section 12.
2. Hodgkin-Huxley equations Hodgkin and Huxley treated the axon as an electrical device [4]. This device carries a current, I, along the axon of the neuron. The membrane is permeable to ionic currents, lions, produced by important anions and cations. In addition, the membrane has a capacitance, CM, per unit length. A potential difference, V, exists across the neuron membrane. By visualizing the axon as a series of Gaussian pillboxes whose walls consist of the cylindrical membrane bounded by a pair of circular sections perpendicular to the axis, they were able to construct a partial differential equation describing propagation of electrical pulses along the axis of the neuron. The current conservation equation for each pillbox has the form dV I -- CM --~ + Z / i o n s .
(1)
The ionic species which contribute most substantially to the current flow through the membrane are Na +, K +, Ca ++, and C1-. Under rest conditions, the membrane is polarized so that the potential in the interior is about 65 mV less than the exterior potential (V ~ _ - 6 5 mV). This potential difference is maintained by ion pumps. At rest, the concentrations of Na + and C1- outside the neuron greatly exceed the concentrations inside: [Na+]outside/ [Na+]inside ~ 9 and [C1-]outside/[C1-]insid e ~ 6, while the reverse is true for the potassium ion: [K+]inside/[K+]outside ,-,, 20. These ratios depend strongly on the potential V and weakly on the ambient temperature, T. As a depolarization wave moves down the axon, sodium channels open and Na + flows into the axon. The membrane becomes depolarized. The Na + influx stops after a short time (,,~ 1 ms), and the potassium channels open. A fast efflux of K + then takes place, acting to restore the polarization across the membrane. Neural dynamics was modeled with the current conservation equation dV
CM--~--I--(INa+ + I K + +Icl-).
(2)
The ionic currents are lion = g i o n ( V - ]J'ion)"
(3)
Here l,tion is a chemical potential for each ion. This is established by ion pumps which transport ions through the membrane. The ionic conductances, gion, depend sensi-
The topology and organization of unstable periodic orbits
159
tively on V. These conductances are modeled by the following phenomenological equations: 9Na + -- m3h0Na
+,
gK + -- n4~/K + ' 9C1-
(4)
-- 9C1-.
The factors m, h, n which appear in the equations for the ionic conductances were assumed to obey simple relaxation equations of the form d c / d t - - ( c - c~)/z: dm
dt = - ( m - m~)/zl, dh dt = - ( h - h~)/z2, dn d t = - ( n - n~)/1:3.
(5)
These three functions were interpreted as probabilities. Their respective steady-state values, m~, h~, and n~, depend strongly on V but weakly on T. In the absence of current flows (I = 0), the Hodgkin-Huxley equations for an isolated segment of an axon reduce to a four-dimensional dynamical system in the four variables (V, m, h, n). This system consists of the current conservation equation (2) and the three relaxation equations (5), together with the constitutive equations (3) and (4). This system has a stable fixed point whose equilibrium values are (m,h,n) = (m~, h~, n~), with V determined from the transcendental equation INa+ +IK+ + ICl- = 0.
(6)
As a result, the Hodgkin-Huxley equations model passive ("Platonic") nerve cells. In the absence of external stimulations, including coupling to other nerve cells, this model does not generate time-dependent membrane potentials.
3. Modified Hodgkin-Huxley equations A number of experimental findings have challenged the idea that neurons fire only in response to external stimuli. Experiments have been carried out on neurons isolated from the inputs of other neurons [5], thermosensitive neurons isolated from thermal inputs [6], electroreceptors isolated from electrical stimuli [7], and mechanoreceptors isolated from mechanical stimuli [8]. There is a growing body of evidence that neural oscillations are intrinsic [9]. In order to account for the possibility that neural spike train activity could be intrinsic, Braun et al. [1] refined the basic Hodgkin-Huxley equations by modeling in greater detail the nature of the ion current flows. In this elaboration of the basic Hodgkin-Huxley model, the ion current flows are dominated by two physical processes: 1. Specific ion gates open at appropriate membrane potentials. This gate opening allows ions to flow into or out of the axon in a way which attempts to equilibrate
160
R. Gilmore and X. Pei
the internal and external ion concentrations. This is a fast process. Gate time scales are short. 2. Ion pumps operate to transfer specific ions through the membrane and attempt to maintain an ion concentration gradient across the membrane. This is a slow process. Pump time scales are long. To simplify the model of nerve cell dynamics. Braun et al. begin with the basic Hodgkin-Huxley equation (2), neglect the C1- ion current, assume that the current flow along the axon is in the nature of a "leakage current", and include a noise term dV CM -d--i - I1 + noise - (INa+ nt- IK +).
(7)
Then both current flows are written explicitly in terms of gate (fast) and pump (slow) current terms INa + -- I N a - g -~- I N a - p -- Id nt- Isd,
(8)
IK + ~---/K_g -[-/K_p -- Ir --[- Isr.
The current flows L (i - d, r, sd, sr) are Ii = D g i a i ( V -
(9)
Vi).
Here Oi is the maximum ion conductance, Vg the reverse potential, and ai is an activation variable. The activation variables obey relaxation equations da___2i= _ dp(ai - ai,~) dt ri
i = d, r, sd,
(10)
dasr
~ ( r l a s r nt-//sd)
~ ( a s r + n'Isd)
~)(asr -- asr,oc)
dt
tsr
Tsr
Tsr
In the last expression Xsr- tsr/rl, r l ' - k / q , and asr,oc ----lqt/sd perature-dependent scale factors o(T), ~b(T) are P-
1.3 (r-r~176
~ = 3.0 (r-r~176
(11) .
The weakly tem-
To = 25 ~
(12)
The asymptotic relaxation values are assumed to have a Fermi function form 1
ai,~ = 1 + e x p ( - ( V -
Vo,i)/Ai)
i - d, r, sd.
(13)
Here V0, i is the half-activation value of V (ai,~ - 89when V - ~/0,i) and Ai is a 'wall thickness', determining the range over which activation takes place. Specifically, the activation ai changes through the middle half (46.22%) of its range as V changes from Vi - A i to V~+ A i. The Hodgkin-Huxley model, as modified by Braun et al., is a five-dimensional dynamical system. The five ordinary differential equations for this model consist of
The topology and organization of unstable periodic orbits
161
the current conservation equation (7), the three relaxation equations (10) for the activation coefficients ai, (i - d, r, sd), and Eq. (l 1) for the activation coefficient asr. It is this last equation which is responsible for subthreshold oscillatory activity in this modification of the Hodgkin-Huxley model. 4. Model behavior
The modified Hodgkin-Huxley equations (7), (10), and (1 l) describe neurons with subthreshold oscillations. This five-dimensional dynamical system has been studied both with noise [1] and without noise [1,3]. We have studied this system by integrating the equations of motion using standard Runge-Kutta methods with a fixed time step At = 0.05 ms for the parameter values shown in Table 1. The system has been studied as a function of the ambient temperature, T, in the range 10~ ~
d r
sd sr
Na + K+ Na + K+
Fast Fast Slow Slow
g Conductance (~tA/cm 2)
"c
Ai
Wi
Vo, i
Decay time (ms)
Width (mV)
Reverse (mV)
Activation (mV)
1.5 2.0 0.25 0.40
2 10 20
-4 -4 11
+ 50 -90 + 50 -90
-25 -25 -40
R. Gilmore and X. Pei
162
"6"1.5
t
" .-".~:.... '-'.
-
-~
3r
4f 4f 4f 4f 4f 3r
4f
"""
1.0 - ] ~ ~:- ~::. -" .
~
"Z.
9
:: ;. -'~"~..:.5~;"'-:-9
ooo
ooo
- : ~.:r
ooo
.ooo
Time (msec)
~ ~ 7~:(~::ii;:")~'-: "
,ooo
.ooo
"X_
10
15
Temperature (~
20
25
Fig. 1. Interspike time intervals plotted as a function of ambient temperature, T, for the modified Hodgkin-Huxley equations. This bifurcation diagram shows alternation between periodic and chaotic behavior. Inset: typical model output of slow sodium activator asd = (Y4) (in a chaotic regime (T ~ 17~
one spike per burst, and the model output is periodic. In other temperature ranges the model output is also periodic, with two spikes per burst (22~ T~<25~ three spikes per burst (17 ~ ~
2r
3f
3r
4f
4r
5f
5r
..-
(14)
Periodic behavior consists of bursts of one type, specifically nf. Chaotic behavior consists of unpredictable mixtures or two (or more) types of bursts that are adjacent in this sequence. In fact, with a little imagination, we can designate the bursts of type nr (n full spikes followed by a feeble failure to spike) by the fraction n + l, since these bursts are intermediate between bursts with n spikes and those with n + 1 spikes.
The topology and organization of unstable periodic orbits 5. P h a s e
163
space
The state of a neuron modeled by the modified Hodgkin-Huxley equations is determined by the coordinates of a point (V, ad, ar, asd, asr) = (Yl ,YZ,Y3,Y4,Ys) C R 5. It is very difficult to visualize phase space trajectories in R 5 which are generated by these equations. Projections of phase space trajectories onto the 10 two-dimensional subspaces (Yi,Yj) (1 ~
( a ) 1.2
(c) o.6
(b)
f
0.8
.-----
0.4
r
'
0.6 o,
0.4
0.2
0.2 ~
,
0
-90
-40
Yl
10
O, -90
,
,
-40
10
Yl
0
w
,
i
,
0.4
0.8
1.2
Y2
Fig. 2. Projection of the phase space trajectory in a chaotic regime (T = 12~ into three two-dimensional subspaces: (a) (yl,y2) = (V, ad); (b) (Yl,Y3) = (V, ar); (c) (Y2,Y3) -- (ad, ar). Arrows indicate the direction of flow. Since the projection is a closed curve in each case, the five-dimensional system is effectively three-dimensional.
Fig. 3.
Strange attractor at T = 12~ projected onto the (y4,ys) - (asd,asr) plane.
164
R. Gilmore and X. Pei
The surest way to demonstrate the presence of chaotic motion in a dynamical system is to locate unstable periodic orbits. This can be done by constructing a Poincar~ section and looking for fixed points in the return map. The attractor shown in Fig. 3 has a 'hole in the middle'. This means that we can construct a global Poincar~ section by attaching a half plane to an axis which passes through this hole. A very convenient Poincar~ section is obtained by the condition dy4/dt = 0, y4 < some threshold. The threshold is taken below the m i n i m u m value of y4 which occurs between the spikes in one burst, and the m a x i m u m value which the minimum of y4 assumes during the repolarization stage. This Poincar6 section is valid throughout the entire temperature range shown in Fig. 1. In particular, it shows that the periodic orbits with n spikes per burst (n = 1,2, 3, 4, 5) are all period one orbits. The value of y4 is recorded each time the phase space trajectory intersects this Poincar6 section. This information is used to construct a file of intersections y4(i), where y4(i) is the value of y4 at the ith intersection with the Poincar6 section. The first return map, y4(i + 1) vs. y4(i), is shown in Fig. 4 for three values of the temperature. This figure provides two important pieces of information. First, the return map is very 'thin' at all temperatures. This means that this dynamical system is very dissipative at all the temperatures studied. The dimension of the strange attractor shown in Fig. 3 is 2 + ~, where ~ is close to zero. This is compatible with the reduction in dimension 5 ~ 3 discussed above. Second, as the temperature changes, the return map changes systematically. This can be seen because we have labeled the branches of the return map by burst type. As the temperature increases, the return map 'moves to the left'. At T = 12~ the return map shows six intersections with the diagonal. These are the branches which describe bursts with the morphology 6f, 5r, 5f, 4r, 4f, 3r. The two least unstable branches are 5f and 4f. The branches nr are extremely unstable, as can be seen from their almost vertical slopes. This means that such bursts are very rarely seen. Peaks with morphology of this type are destroyed by small amounts of noise.
5r: v C~
6f
X I ~/ : \ 4r~/~. 3r i
E
miny3(i-1) Fig. 4. First return map on the Poincar6 section. Intersections with the diagonal indicate unstable period one orbits. The branches which intersect the diagonal depend on the temperature. (a) T = 12~ 6f, 5r, 5f,4r,4f, 3r; (b) T = 13.5~ 5f, 4r, 4f; (c) T-- 16.5~ 4f, (3r). As T increases, the first return map slides to the left and changes shape slightly.
The topology and organization of unstable periodic orbits
165
At this temperature, the model output consists of spike train bursts almost entirely of types 5f, 4r, and 4f. As the temperature increases to 13.5~ two things happen. First, the branches 6f, 5r, which are very unstable at T = 12~ no longer intersect the diagonal at 13.5~ The disappearance of these intersections has no observational effect on the spike train output. Second, the branch 3r, which nearly intersects the diagonal at T - - 12~ drops further below the diagonal at T = 13.5~ and in fact no longer appears in the return map. At this temperature, the return map exhibits only the three branches 5f, 4r, 4f. As the temperature increases further, to T = 16.5~ the shape of the first return map, and its intersection with the diagonal, continue to change. The unstable branches 5f, 4r fail to intersect the diagonal, and drop from the return map. Then the intersection angle between the branch 4f and the diagonal approaches - 4 5 ~, and as it does, an inverse period doubling cascade occurs along this branch (14~ < T < 16~ At T = 16.0~ the branch 4f is again slightly unstable, but the branches (3r, 3f) almost intersect the diagonal. That this is the case can be seen by approaching T = 16.5~ from above. As the temperature descends towards T = 17.0~ the branch 3f is stable, but this period one orbit takes increasingly long times to complete its orbit as T ~ 17.0~ This divergence of the period is a clear signature of an impending tangency between a peak of the return map and the first return diagonal. The reason is that the flow slows down in the neighborhood of the branch 3r. This tangency occurs at T = 17.0~ Slightly below this temperature there is an intermittency between the 'virtual' period one branch with three spikes per burst (3r, 3f) and the slightly unstable branch 4f. As T decreases from 17.0~ this intermittency decreases in importance as the peak in the branch (3r, 3f) moves further away from the diagonal. The intermittency ends (at T ~ 16.3~ when the branch 4f becomes stable at the intersection with the diagonal. The return map has the structure of a series of inverted parabolas (logistic maps) lined up side by side. Each labels two branches (nr, nf). They are contiguous and lined up from left to right in order of descending n. As T decreases, the diagonal moves towards the left, so that period one orbits with increasing numbers of spikes per burst are encountered. The bifurcation diagram (Fig. 1) clearly shows periodic (period one) behavior, period doubling cascades, episodes of intermittency, and chaos. We have used these return maps to locate unstable periodic orbits in this dynamical system. In particular, we have used the return map at T = 12~ to locate three distinct period one orbits of types 5f, 4r, and 4f. These, and some other low period orbits, were subsequently used to determine the topological structure of the strange attractor generated by the modified Hodgkin-Huxley equations. These orbits are shown in Fig. 5, using the same projection as in Fig. 3. The following remarks about the projection from a five-dimensional phase space to a three-dimensional reduced phase space are in order. 1. Reduction in dimension always simplifies computations. In the present case this reduces the number of two-dimensional projections from 10 to 3.
166
R. Gilmore and X. Pei
Fig. 5.
Period one orbits 4f(a), 4r(b), 5f(c) and period two orbit (4f, 5f) (d) for the strange attractor shown in Fig. 3.
2. It is the significant separation in time scales that allows the reduction in dimension. Since the relaxations of ad and ar to their (moving) asymptotic values are much faster than the other two relaxation time rates (of asd and asr), it is possible to 'adiabatically eliminate' these two variables. This leaves a reduced dynamical system depending on only the three variables (yl, y4, ys). 3. Adiabatic elimination of variables is an effective but at best ad hoc procedure to reducing the dimensionality of a dynamical system. A more general and powerful result, involving projection to an inertial manifold, has yet to be developed for dynamical systems. 4. The classification theory for strange attractors is now in mature form for low (3) dimensional strange attractors [2]. It does not yet exist for higher-dimensional (d > 3) strange attractors. This means that the reduction in dimension 5 ~ 3 described above is essential for the topological analysis which will be carried out in the following sections.
6. Classification of strange attractors by integers A powerful theory has recently been developed to classify low-dimensional strange attractors [2]. Strange attractors of three-dimensional chaotic dynamical systems can be classified by their topological properties. These properties are summarized, in turn, by sets of integers. We will describe how to extract these integers from chaotic data. This classification theory extends to N-dimensional dynamical systems provided their strange attractors have Lyapunov exponents which obey ~1 > ~2 = 0 > )~3 > ' ' "
and
-)~3 > )L1.
(15)
The topology and organization of unstable periodic orbits
167
Such strongly contracting dynamical systems have a Lyapunov dimension, dL, which is less than three by the Kaplan-Yorke conjecture [10,11] dL--2+~<3.
(16)
Another way of looking at this condition is that the dynamical system has an inertial manifold of dimension three [2]. With the development of the topological classification theory, there are now three approaches to the analysis of chaotic dynamical systems. (1) Metric methods are based on determining the local geometric (fractal) structure of a strange attractor. Calculations of fractal dimensions and scaling functions require very long, clean data sets. The amount of data required grows very quickly with the underlying dimension. The result of a dimension calculation is a real number which provides information about the minimum dimension of the dynamical system. Some dimension calculations resemble a black art. Calculations degrade rapidly with increasing noise. The dimension estimate comes with no reasonable error estimates, no independent way to verify the estimate, and no underlying statistical theory. Metric calculations provide no information on the mechanism which builds up fractal structures through repeated self-similar actions. Nor do they provide any information about what happens when control parameters are varied. (2) Dynamical methods are based on determining how rapidly phase space is deformed on average. This information is contained in the spectrum of Lyapunov exponents. These exponents can be computed with shorter, noisier data sets than are required for metric calculations. The largest Lyapunov exponent is relatively easy to compute; smaller exponents can be computed with increasing difficulty. The spectrum of Lyapunov exponents, )~/, and the partial dimension, di (0 ~
9 9 9 ~)LN,
dl ~ 4 2 ~ d 3
.. " ~ d N .
(17)
In general, the first p exponents are positive, the p + 1st is zero (it corresponds to the flow direction), and the remaining N - (p + 1) are negative. The partial dimension di is + 1 for all the positive exponents and for the flow direction. In the contracting directions it measures the fractal nature of the attractor, so generally for i > p + 1, di < 1. The partial sum i-k
- Z
a, z,
i-1
increases as k increases from k = 1 to k = p, Sp = Sp+l, and decreases for k > p + 1 until it reaches 0 at k = n, and remains at 0 for n ~
168
R. Gilmore and X. Pei
N
d~=~d~.
dL= i=1
(19)
i=1
If the smallest (in magnitude) negative exponent ~p+2 is larger than the sum of all the positive exponents, then Sp+2 - O, dp+2 = ~_,Pi-,;~i/];~p+2[, dL=p+l+
ikp+2]
(20)
and the Kaplan-Yorke expression results. In the more general case that the system is not strongly contracting (in the sense that two or more negative Lyapunov exponents contribute to the dimension formula), the Kaplan-Yorke estimate provides an upper bound on the dimension of the strange attractor [11]. Calculations based on dynamical methods provide more information than calculations based on metric methods. However, the drawbacks are similar: they depend on globally averaged measures; they provide no information on the mechanisms which build up the strange attractor; and they provide no information on what might happen when control parameters are varied. (3) Topological methods are based on an idea due to Poincar6 [12]. That is, we can understand dynamical systems if we understand how the flow affects various neighborhoods in the appropriate phase space. This method requires small data sets (much smaller than required for dimension calculations), degrades gracefully with noise, results in integer values, can be tested by internal consistency checks, and has predictive value in a number of different directions [2]. It predicts which new periodic orbits can be created or annihilated under both small and large parameter variations. Most important, it describes the stretching and squeezing mechanisms which operate together to build up a strange attractor and to organize all the unstable periodic orbits in the strange attractor in a unique way. At the present time the analysis tool based on topology can be applied only to low-dimensional dynamical systems. This includes three-dimensional dynamical systems, but also includes dynamical systems with only one positive Lyapunov exponent but which are strongly contracting, so that the Lyapunov dimension obeys 2 < dE < 3. This includes strange attractors which exist in three-dimensional inertial manifolds, such as the one identified for the modified Hodgkin-Huxley equations. The basic idea behind the topological analysis method is illustrated in Fig. 6. In this figure we begin with a blob of points (mathematicians call this a neighborhood) in phase space (Fig. 6a), and follow it as it is deformed under the flow. At first the blob is stretched (Fig. 6b). Stretching is responsible for 'sensitivity to initial conditions', which is responsible for positive Lyapunov exponents. However, two points in phase space cannot continue to be stretched apart indefinitely if the dynamics occurs in a bounded region (mathematicians call this a compact domain) of phase space. Ultimately, distant points must be squeezed back together (Fig. 6c,d). Squeezing is the repetitive process that builds up the 'self-similar' fractal structure which exists in the contracting directions, and which is responsible for negative Lyapunov exponents.
The topology and organization of unstable periodic orbits
169
( (d) boundary
squeeze
(b) (a) Fig. 6. Evolution of a cube of initial conditions under a flow. A cube of initial conditions (a) is stretched (b), (c). If motion occurs in a compact domain, distant points must eventually be returned to closer proximity (c), (d). The deformed neighborhood then returns to the original neighborhood (a), and the process is repeated. Fig. 6 is a cartoon of what occurs during one cycle of a process which builds up a strange attractor by the 'Smale horseshoe mechanism'. This derives its name in part from the horseshoe shape of the Nob in phase space after it is deformed during one cycle, and in part from the fact that it was first described by Smale [13]. The mechanism itself is a simple 'stretch and fold' process which is repeated. It is essentially this process, a 'stretch and roll' mechanism, which generates strange attractors in the modified Hodgkin-Huxley dynamics. We point out here what should be obvious, but which nevertheless should be stated. Topology is qualitative, so it may not be clear how knowledge of the stretching and squeezing mechanisms which it provides is connected to the quantitative information resulting from metric and dynamical analysis methods. The qualitative becomes quantitative when we describe exactly how much stretching (the positive Lyapunov exponents) and how much squeezing (the negative exponents) go on. Once these quantitative measures are introduced, the qualitative becomes quantitative. These exponents need not be constant: they can be position dependent. Using them as well as topological information, it is possible to compute fractal dimensions, scaling functions, and average Lyapunov exponents. However, from the metric and dynamical measures alone, it is not possible to recover topological information about stretching and squeezing mechanisms. There is a very nice way to classify the stretching and squeezing mechanisms which generate a low-dimensional strange attractor. It is based on a theorem by Birman and Williams [14]. This theorem was originally stated for three-dimensional dissipative dynamical systems which possess a hyperbolic strange attractor, for which )~1 > )L2 = 0 > ~3 and )g 1 nt- ~ 2 - ] - ~ 3 < 0. Birman and Williams proved that it was possible to project the flow down along the stable direction onto a twodimensional branched manifold. The branched manifold is essentially the unstable invariant manifold, with some singularities. In this process, the periodic orbits are
170
R. Gilmore and X. Pei
projected onto the branched manifold also. Their topological organization is unchanged during the projection. The projection is similar to increasing the dissipation of the dynamical system 0~3 ~ - e ~ ) , so that the dimension approaches 2 (dL = 2 + X1/IX31 ~ 2). Periodic orbits cannot change their topological organization during the projection because they would have to cross through each other, thus violating the uniqueness theorem of ordinary differential equations. One of the branched manifolds originally studied by Birman and Williams is shown in Fig. 7. This describes the organization of all the closed magnetic field lines (these are the analogs of periodic orbits for a dynamical system) produced by a constant current flowing through a wire tied into the shape of a Fig. 8 knot. It can be seen that the branched manifold is made up of two types of units, shown in Fig. 8.
)C3
C2 CIE
C4
M (/,j)
1 2 3 4 r 2' y 4'
1 1
2
3
4
1'
2'
3'
4'
.1 2 3 4 r
+1 -1 +1
T(/,D -1 +1 -1 +1
1 2 3 4 1' 2' 3' 4'
2' 3' 4' 1 1
1 1 1 1 1 1
.rr
[1
2 -1
-2
3
4 -3
-4]
1 1 1 1 1 1 1
Fig. 7. One of the first branched manifolds studied by Birman and Williams [14]. This branched manifold describes the topological organization of all the closed magnetic lines of force established by a constant current flowing in a Fig. 8 knot.
171
The topology and organization of unstable periodic orbits
ST RETCH E~: :i :. i :;::=. SHRINK /I /g
,
SHRINK BOUNDARY LAYER SQUEEZE
LINE
Fig. 8. Building blocks for stretching and squeezing mechanisms. (a) A cube of initial conditions evolves (flow direction is down) by stretching in one direction, shrinking in the other. Both directions are transverse to the flow. In the limit of high dissipation the flow becomes two-dimensional. The flows going to different parts of phase space are separated by a singular point ('splitting point'). This singular point describes initial conditions going to a fixed point. (b) Two cubes of initial conditions in different parts of phase space are squeezed together. In the high dissipation limit, the two branches are joined at a singular line ('branch line'). This singularity describes loss of information about the previous history of the flow. Splitting chart. This describes stretching. It contains a singularity: the splitting point. This represents initial conditions which propagate into a fixed point. Joining chart. This describes squeezing. It contains a different type of singularity: the branch line. This represents points at which information about the past is lost. These two types of two-dimensional charts are limits of three-dimensional neighborhoods in which stretching and squeezing take place. This relation is shown in Fig. 8. The two can be combined into a single type of chart ('joining and splitting') which includes both types of singularities and simultaneously describes stretching and squeezing. Every branched manifold is made up of splitting charts and joining charts joined together in Lego 9 fashion, with no free ends. Alternatively, every branched manifold can be constructed from 'joining and splitting' charts, in which the output ends of charts are connected to the input ends of other (or the same) chart(s) by two-dimensional bands which twist and writhe around in phase space in various ways.
172
R. Gilmore and X. Pei
The branches in a branched manifold are determined as follows. Extend each splitting point back against the flow direction to the nearest branch line. Then a branch is a component of the projected flow between two branch lines. For example, the branched manifold shown in Fig. 7 has eight branches. It is possible to describe branched manifolds algebraically. The information which must be encoded includes: Twisting. How the branches twist around their axes. Crossing. How the branches cross over or under each other. Joining. The order (from top to bottom) in which the branches are joined at a branch line, in the projection of the branched manifold which is adopted. Connecting. Which branches are joined to which. This information is encoded in two N x N matrices and an array with N components. Here N is the number of branches in the branched manifold. Crossing matrix T(i,j). This matrix encodes the topological information about twisting and crossing. The diagonal matrix element T(i,i) describes the twist of branch i. This is the signed number of crossings of one edge of branch i over the other edge. The off-diagonal matrix element T(i,j) is the signed number of crossings of the two branches i and j. The sign convention is the usual: take tangent vectors to the upper and lower components (edge, branch) in the projection, and rotate the upper tangent vector into the lower tangent vector through the smaller angle. If the rotation is right handed, the sign is + 1; if left handed, it is - 1 . Joining array J(i). This array encodes information about the order in which branches are joined at branch lines. One useful convention is: the smaller J(i) is, the closer the branch is to the observer. Markov transition matrix M(i,j). This matrix encodes information about which branches are connected to which. The convention adopted is: M(i,j) = 1 if branch i flows into branch j, zero otherwise. The crossing matrix T, joining array J, and Markov transition matrix M for the branched manifold shown in Fig. 7 are also shown in that figure. An algebraic description for a branched manifold is useful because it can be used to identify the periodic orbits which exist in a strange attractor, and to compute the topological organization of these orbits through their linking numbers (LN). The algebraic description of a branched manifold is not unique. For example, it changes if the branched manifold is viewed from a different perspective. Further, branched manifolds can be deformed in many ways without altering either the spectrum or the topological organization of the periodic orbits which they contain [14]. Four nonlinear dynamical systems which exhibit chaotic behavior have been extensively studied. These are, in historical order: the Duffing oscillator [15] which is periodically driven; the van der Pol oscillator [16] which is periodically driven; the Lorenz equations [17]; the R6ssler equations [18]. The branched manifolds for the strange attractors which these four systems generate over a certain standard range of parameter values are shown in Fig. 9. It can be seen by visual inspection that these four branched manifolds are inequivalent. There is no deformation of one that converts it to any of the others. Alternatively, there is no similarity transformation
The topology and organization of unstable periodic orbits
173
c
} bI Fig. 9. Branched manifolds for four standard dynamical systems used to study chaotic dynamics: (a) periodically driven Duffing oscillator; (b) periodically driven van der Pol oscillator; (c) Lorenz equations; (d) R6ssler equations.
that takes the matrix representation of one to the matrix representation of any of the others. This means that it is futile to search for a smooth change of variables which transforms any one system into any of the others. Several remarks are in order: 1. A global Poincar6 section for a dynamical system can be determined from its branched manifold. It is the union of the branch lines. 2. For simple systems (Duffing, van der Pol, Lorenz, R6ssler), the period of a closed orbit is intuitively obvious. For more complicated systems the period is not so obvious. It is useful to define the period as the LN of the orbit with the union of closed loops G [14] Period(orbit) - LN(U/C/, orbit) = ~
LN(Ci, orbit)
(21)
i Here C/is a loop in R 3, the space which contains the branched manifold. Each loop encircles one or more branches of the branched manifold. The loops are chosen so that all periods computed in this way are positive integers, and are as small as possible. For the branched manifold of the Fig. 8 knot (Fig. 7), one loop suffices, and it is the Fig. 8 knot. 3. The topological entropy of a chaotic dynamical system is bounded above by the maximum eigenvalue of the Markov transition matrix which describes the branched manifold for the strange attractor. If all periodic orbits allowed by the branched manifold actually exist in the strange attractor (i.e., the attractor is hyperbolic), then equality occurs. 4. A strange attractor is classified by projecting it down to its branched manifold. Once it has been so projected, it can be recovered, to some extent, by the reverse process of 'expansion' or 'blow-up' [2] (cf., Fig. 8). In this process each twodimensional splitting and joining chart is replaced by its three-dimensional counterpart, and branches are replaced by 'flow tubes' [2]. The exponents for these flow tubes can be adjusted to reflect the properties of the original attractor. In this blow-
174
R. Gilmore and X. Pei
up process the global Poincar6 section, the union of branch lines, blows up to a union of disks, each disk being the blow-up of a branch line. The definition of period remains unchanged. 5. There is a common perception that the topological program works only for highly dissipative dynamical systems. This is not so. The fact that one of the first branched manifolds discussed by Birman and Williams carried the organization for all closed field lines of a conservative system should be enough to dispell this misconception.
7. Topological analysis program It is possible to extract the information about the branched manifold which describes a low-dimensional strange attractor directly from data. The procedure for doing this has come to be known as the topological analysis program. It was first announced in a paper which analyzed data from the Belousov-Zhabotinskii reaction [19]. Perhaps the most surprising thing about this program is that it is possible to extract a set of integers from chaotic data, to test whether this set of integers is correct, and then to use this set of integers to make testable predictions. The topological analysis program for low-dimensional dynamical systems consists of several steps. These are summarized in Fig. 10. We describe these steps below. L o c a t e periodic orbits. Periodic orbits are present in abundance in a strange attractor. As control parameters are changed, some may even be stable over intervals. In a hyperbolic strange attractor, all periodic orbits are unstable. They are dense in hyperbolic strange attractors. Their topological organization uniquely identifies the strange attractor, and is in fact the Achille's Heel by means of which the topological structure of the strange attractor may be identified. By topological structure, we mean the two-dimensional branched manifold which describes the EMBEDDING CLOSE RETURN TOPOLOGICAL INVARIANTS TEMPLATEINDETIFICATION
~" ~"
TEMPLATE VERIFICATION MODEL DYNAMICS ~
]
VALIDATE MODEL
I
Fig. 10. Topological Analysis Program consists of a number of steps. Unstable periodic orbits can be identified either before or after the embedding. The topological part of this program ends with template verification. Vertical arrows describe 'feedback loops' which are used to reject, or increase confidence in, the steps that are identified.
The topology and organization of unstable periodic orbits
175
stretching and squeezing mechanisms which act together to build up the strange attractor. It is often possible, especially in low (2 + ~) dimensional strange attractors, to locate unstable periodic orbits. One powerful method is the 'method of close returns'. This method has been used in Section 3 to locate some of the period one orbits in a strange attractor generated by the modified Hodgkin-Huxley equations. These orbits were shown in Fig. 5. E m b e d in a p h a s e space. Since the topological structure is determined by the linking numbers of periodic orbits, and these are defined in three-dimensional spaces, a means must be found to construct a three-dimensional phase space. This problem has to be approached from different directions, depending on circumstances. For three-dimensional systems (Duffing, van der Pol, Lorenz, R6ssler) it is sufficient to use the variables intrinsic to the model. For higher-dimensional systems (modified Hodgkin-Huxley equations), a means must somehow be found to project into a three-dimensional space. The optimum would be to have a mathematical theorem that guarantees the existence of a threedimensional inertial manifold for the dynamics if the Lyapunov dimension of the strange attractor is less than 3. This should be accompanied by a prescription for the projection into the inertial manifold. Unfortunately, we have not yet achieved this optimum situation. A fallback, which has been extremely useful in many fields, is the use of adiabatic elimination of fast ('slaved') variables. We have used this method in Section 5 to construct an effectively three-dimensional dynamical system from the original five-dimensional modified Hodgkin-Huxley equations. Another useful alternative is the reduction of dimension using a singular value decomposition [20]. In the real world, acquiring data is a difficult and expensive proposition. Usually one is happy with a scalar (one-dimensional) time series. This means we must somehow create 3-vectors from scalars. Several ways exist to do this. The default is the time delay embedding. Although this has been proposed independently by Packard et al. [21], Marl6 [22], and Takens [23], the original idea goes back to Whitney [24]. The basic idea is the idea of transversality. If two manifolds M 1' and M~'2 of dimensions ml and m2 are contained in a manifold of dimension N ~ (M~'~ c N ~, M~: C N ") then either they do not intersect (M~" n M~: - r or they intersect in a manifold m m = ( M ? 1 AM2 2) of dimension rh - (ml + m2) - n. Ifr~ ~>0, the manifold M m is stable under perturbations. If r~ < 0, the intersection disappears under perturbation (M~ ~ n M~n2 ~ 0). If a dynamical system of dimension m is to be embedded in R ", then self-intersections of the embedded manifold M m will generally occur unless n - 2m < 0 (i.e., n >~2m + 1). At such self-intersections the uniqueness theorem of the theory of ODEs is generally violated. According to this theorem, to reconstruct the dynamics of a three-dimensional system it is sufficient to embed it in a seven-dimensional space. However, it is not necessary, and our experience [2] has shown that it almost always suffices to embed data generated by a three-dimensional dynamical system in a three-dimensional space. We always attempt such an embedding first, and go to higher-dimensional embeddings only when low-dimensional embeddings fail.
176
R. Gilmore and X. Pei
A second embedding procedure is the differential embedding [2,19], where a vector time series y(t) is created from a scalar time series x(t) by x(t) ~ (yl (t),y2(t) = Yl (t),y3(t) = ,v2(t))
(22)
yl (t) - x(t).
Since the data are discretely sampled (x(t) ---+x(ti) = x(i)), the time derivatives are estimated by the usual difference formulas. This embedding is useful for two reasons. First, it already has the desired dynamical system form. Since dyl/dt = y2 and d y z / d t = y3, it is sufficient to determine an equation of motion only for y3. Second, in an embedding using three differentially related variables, it is a simple matter to compute linking numbers of periodic orbits. In Fig. 11 we show a projection of a phase space constructed with the differential embedding (22). The y~ axis is horizontal, the y2 axis is vertical, and the y3 axis is out of the page. Now consider two segments which cross in the upper half plane y2 > 0. The slope of either is given by dy2 = d y z / d t = y 3 . slope - dyl dyl/dt y2
(23)
It is then clear that y3 = (slope)x y2. Thus, the larger the slope, the closer the observer. As a result, all crossings in the upper half plane are left handed with crossing number - 1 , and those in the lower half plane are right handed with crossing number +1. This makes computing linking numbers particularly convenient in this representation of the data (see below). The differential-integral embedding is closely related to (22). The difference is that in this embedding yz(t) - x ( t ) , so that yl(t) - f t ( x ( ~ ) - 2)d~ and y 3 ( t ) - 5c(t). This has the same two virtues as the differential embedding. It is different in the following two ways. As a general rule of thumb, each differential or integral operation on experimental data decreases the signal to noise ratio by an order of mag-
Y2 ~lope= dY' dy, _ dy, l dt _ y~
ay,/at y, Y3 = slope* Y2
Yl Y3 Fig. 11.
Crossing information. In a differential embedding, all crossings in the upper half plane are negative, those in the lower half plane are positive.
The topology and organization of unstable periodic orbits
177
nitude. Except for very clean data, in the differential embedding the coordinate
y3 - dZx/dt 2 may have an unacceptable signal to noise ratio, since SIN is reduced by two orders of magnitude. Unacceptable means, in this case, that we cannot compute linking numbers. For the differential-integral embedding, both Yl (the integral of x) and y3 (the differential of x) have SIN reduced by only one order of magnitude, which is often acceptable. However, since yl integrates the data (subtracting out the data average), it is susceptible to long-term secular trends. These can produce nonstationarity in the embedded data which are harmless in all other types of embeddings. When this occurs, the nonstationarity must be addressed by appropriate filters [2]. Another method for constructing vector from scalar data involves the singular value decomposition (SVD) [2,20]. This use of the SVD has been discussed in the literature and used effectively in many applications.
7.1. Topological organization of periodic orbits Once a selection of periodic orbits has been extracted from the data and an embedding in R 3 has been adopted, it becomes possible to compute the topological invariants of these orbits. The topological invariants which are always useful are the Linking Numbers of pairs of orbits and the local torsions of individual orbits. If the attractor is contained in a solid torus D 2 • T I (O 2 is the two-dimensional disk and T 1 is the circle) then the relative rotation rates [25] are even more powerful topological invariants than LNs. In a differential embedding, linking numbers of orbit pairs are easily computed by counting the number of crossings in the lower half and upper half planes, subtracting the second from the first, and dividing by 2.
7.2. Identify a branched manifold The next step is to use the topological information gained in the previous step to guess an appropriate branched manifold. For complicated branched manifolds, such as that for the Fig. 8 knot, this is not easy. For branched manifolds which can be embedded in the solid torus D 2 • T 1 the task is simpler. In this case, each branch carries a period one orbit. The topological structure of the branched manifold is determined by: 9 Computing the linking numbers of all the period one orbits with each other. The off-diagonal matrix element T(i,j) is twice this integer. 9 Computing the local torsion of each period one orbit. This gives the diagonal matrix elements T(i, i). 9 Computing the linking numbers of some of the period two orbits with the period one orbits. This gives the array information J(i). In case some of the period one and/or period two orbits are not available, higher period orbits can be used to fill in the missing information. This procedure is not entirely straightforward. The reason is as follows. The periodic orbits in phase space are simple closed curves. The periodic orbits on
178
R. Gilmore and X. Pei
branched manifolds are labeled by the branches which are traversed. Each periodic orbit is labeled by a sequence of symbols. The problem is to identify a symbol sequence (on the branched manifold) with a closed curve (in phase space). In other words, we need a 1-1 mapping between orbits and symbols. When the return map is very 'thin', as is the case shown in Fig. 4 for the modified Hodgkin-Huxley equations, creating this 1-1 map (a symbolic dynamics) is straightforward. In fact, we did this without even mentioning any problems in Section 5. However, for systems which are not strongly dissipative, creating consistent partitions on appropriate Poincar~ sections in phase space is a longstanding problem [26]. Fortunately, Plumecoq and Lefranc have now proposed a useful solution to the partition problem [26]. 7.3. Validate the branched manifold Once a branched manifold has been proposed (tentatively identified), it is possible to compute the LNs and relative rotation rates of all the periodic orbits which the branched manifold supports. These must be compared with the corresponding topological invariants for all the remaining periodic orbits extracted from the data, which were not used in the first place to identify the branched manifold. If the two sets of topological invariants (one from the orbits extracted from data, one from the corresponding orbits on the branched manifold) agree, then we have added confidence that the initial identification of the branched manifold was correct. If there is not complete disagreement, then either the branched manifold was identified incorrectly, or the partition needs to be modified. The problems of creating a symbolic dynamics (creating a partition and a 1-1 mapping between symbol sequences and periodic orbits in phase space) and identifying the correct branched manifold are global problems. They must be solved simultaneously. There must be complete agreement between the topological invariants of all orbits extracted from data and their corresponding symbol sequences on a branched manifold. This internal self-consistency check (rejection criterion) is absent from both the metric and dynamical approaches to the analysis of chaotic data. Strictly speaking, the topological analysis program stops here. However, there is always the desire to do better: to construct an appropriate model to describe data which have been analyzed. We describe here the next two steps which can be taken in this effort. 7.4. Construct a flow model A dynamical system model has the form dy/dt = F(y), y E R n. To model data, the functions F,.(y) are usually expanded as a linear superposition of some set of basis functions r
dyi
dt = ZAi'~c~a(Y)" o~
(24)
The topology and organization of unstable periodic orbits
179
This is a general linear model, so standard methods (least squares, maximum liklihood) can be used to estimate the expansion coefficients A;,~. Standard methods (Z2 test) can also be used to test whether this model is any good. For the present purposes, y c R 3. For a differential embedding two of the functions ~ are already known: F1 ( y ) = Y2, Fz(y)= Y3. Only the third function must be modeled. Thus the differential embedding has the added utility that it reduces by a factor of three the effort which is required to develop a model of the dynamics [2]. Once a model has been created, the qualitative validity of the model can be tested. This is done by subjecting its output to a Topological Analysis. If the branched manifolds determined from the data and from the model are not equivalent, the model is not a good representation of the data, and must be rejected. On the other hand, if the branched manifolds are the same, the model cannot be rejected.
7.5. Validate the model A model of a physical process may pass the qualitative test just described, and still not be a very good representation of the dynamics. It would be useful to have some goodness of fit criterion for nonlinear models, analogous to the ;~2 goodness of fit test for linear models. At the present time there is a very useful goodness of fit criterion for nonlinear models. Unfortunately, it lacks a quantitative underpinning. It is hoped that this quantitative underpinning will be supplied during the next decade. The idea behind this goodness of fit test was proposed by Fujisaka and Yamada [27] and independently by Brown et al. [28]. It goes back to an observation by Huyghens made 300 years ago. Huyghens observed that two pendulum clocks on opposite walls gained/lost time at slightly different rates. When they were placed on the same wall close enough they would synchronize their timekeeping. The synchronization effect provides the basis for a nonlinear goodness of fit test. The idea is as follows. Assume that a real physical system satisfies the dynamical system equation • = F(x), and a model for this process is 3~= G(y), x c Rn,y c R ~, where y is supposed to describe x. Then in general, no matter how good the model is, sensitivity to initial conditions and sensitivity to control parameter values will guarantee that the distance between x(t) and y(t) will eventually become large. A perturbation term can be added to the model equation which reduces yi when it gets larger than xi and increases yi when it gets too small. A linear perturbation with this property has the form -)~i(yi - x i ) . The appropriately modified dynamical system becomes
dy~ dt - = ai(y) - ~ i ( Y i - xi).
(25)
If the model is 'good', a small value of )~--()~1,9~2,... ,9~) will cause the model output to follow the data. We then say that the data entrain the model output. The
180
R. Gilmore and X. Pei
better the model, the smaller the value of k which causes entrainment. The entrainment test has been used effectively to test the validity of some models [2]. Unfortunately, the entrainment test for nonlinear systems has not yet been made quantitative, as has the Z2 test for linear systems.
8. Topological analysis of the modified Hodgkin-Huxley equations The analysis of strange attractors generated by the modified Hodgkin-Huxley equations follows the procedure described in the previous section. The first step is the determination of unstable periodic orbits. This has already been done. It is facilitated using an appropriate first return map. For temperature regions in which the output is periodic (e.g., T ~ 20~ the first return map consists of a single point. For regions in which a strange attractor exists, the first return map has the form shown in Fig. 4. From this map it is possible to locate initial conditions for unstable period one orbits which exist in the strange attractor. The second iterate of the map has been used to locate period two orbits. The pth return allows location of period p orbits. The second step in the topological analysis program is the construction of a useful embedding. One already exists (cf. Fig. 3): it is the projection into the threedimensional inertial manifold (yl, y4, ys). However, we also constructed differential and integral embeddings based on single variables. The strange attractor shown in Fig. 3 in the (yl,y4,ys) embedding is replotted in the differential embedding (Fig. 12a) and integral-differential embedding (Fig. 12b) obtained from the coordinate y4. These embeddings are useful since they simplify the computation of linking numbers. We computed the branched manifold for this system in each of these embeddings, and found it to be embedding independent. The third step is the computation of the topological invariants of the unstable periodic orbits extracted from the strange attractor. The local torsion of the period one orbits was computed by displacing the initial condition slightly from that for the closed orbit, and computing the number of crossings of the closed orbit with its perturbation (which was not closed). The local torsion of the orbit nr is ~ x 2n while that for the orbit nf is ~ x (2n - 1). The relation between this sequence of orbits and local torsions is systematic:
Fig. 12.
Strange attractor of Fig. 3 replotted: (a) in the differential embedding; (b) in the integral-differential embedding.
The topology and organization of unstable periodic orbits
6f 11
5r 10
5f 9
4r 8
4f 7
3r 6
3f 5
2r 4
2f 3
181
lr 2"
(26)
The local torsions are measured in units of r~ in the series above. The linking numbers of adjacent period one orbits with local torsions nrc and (n + 1)rt is n/2 or (n + 1)/2, whichever is integer. Two branched manifolds are compatible with the information obtained from the period one orbits. We know this because an identical mechanism has already been studied in the driven Duffing equation [29] and in the YAG laser [30]. Both branched manifolds are simple extensions of the branched manifold which describes the Smale horseshoe mechanism, which is illustrated in Fig. 6. Both of these branched manifolds roll up: one from outside to inside; the other from inside to outside. These two scrolling mechanisms are shown in Fig. 13. These figures summarize how neighborhoods in phase space are deformed under the flow. The algebraic description for each of these two branched manifolds is presented below each of the branched manifolds. It can be seen from these figures that the topological organization of adjacent branches is identical in both the inward winding scroll and the outward winding scroll. Therefore, it is impossible to distinguish between the two branched manifolds on the basis of orbits extracted from a strange attractor that contains only two distinct, adjacent, unstable period one orbits. In order to distinguish between the two, we must study a strange attractor that possesses at least three inequivalent unstable period one orbits. Such strange attractors exist only at lower temperators. For this reason, we concentrated on the strange attractor which exists at 12~ The return map shows intersections with six contiguous branches, but we were able to find only three unstable period one orbits in the strange attractor. These were 5f, 4r, 4f. The other three branches (6f, 5r, 3r) are extremely unstable. That the orbits located belonged to contiguous branches simplified the calculations somewhat. The linking numbers for all three pairs of period one orbits in both branched manifolds are 8/2=4. This can be seen from the template matrices, since T(4f, 4 r ) = T(4f, 5f) = T(4r, 5f) - 8 in both matrices. The two branched manifolds could only be distinguished by locating unstable period two orbits and computing their LNs. In fact, the orbits AB and BC (A = 4f, B - 4r, C = 5f) have identical linking numbers with the period one orbits in both branched manifolds. It is only the period two orbit AC which has different linking numbers with the period one orbits in the two branched manifolds. The linking numbers in the two cases are
LN(A, AC) LN(B, AC) LN(C, AC)
Outside to inside
Inside to outside
7 8 8
8 8 9
(27)
These LNs were computed using a general purpose code designed to compute linking numbers and relative rotation rates for periodic orbits [31]. The inputs to the code consist of the algebraic description of a branched manifold and a list of orbits by their symbolic dynamics. The output consists of a table of their LNs or a table of their relative rotation rates.
182
R. Gilmore and X. Pei
0
1
2
3
0
+N-0
0
0
0
0
1
-N+0
0
1
2
2
Branch Array
4 5 ,... 0 O 2
2
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
0
0
0
0
0
0
0
2
2
4
4
6
6
8
8
2
2
2
2
1
-1
0
1
2
2
4
4
6
6
8
8
2
+1
2
2
2
2
4
4
6
6
8
8
~.~Bran c h = : ~ a Y
2
+N-I
0
2
2
2
2
2
2
2
2
2
3
-N+ 1
0
2
2
3
4
4
4
4
4
4
3
-2
2
2
2
3
4
4
6
6
8
8
4
+N-2
0
2
2
4
4
4
4
4
4
4
4
+2
4
4
4
4
4
4
6
6
8
8 8
5
-N+2
0
2
2
4
4
5
6
6
6
6
5
-3
4
4
4
4
4
5
6
6
8
6
+N-3
0
2
2
4
4
6
6
6
6
6
6
+3
6
6
6
6
6
6
6
6
8
8
7
-N+3
0
2
2
4
4
6
6
7
8
8
7
-4
6
6
6
6
6
6
6
7
8
8
8
+N-4
0
2
2
4
4
6
6
8
8
8
8
+4
8
8
8
8
8
8
8
8
8
8
9
-N+4
0
2
2
4
4
6
6
8
8
9
9
-5
8
8
8
8
8
8
8
8
8
9
Fig. 13.
(a) Inwards and (b) outward winding scroll templates.
These linking numbers were also computed visually, as illustrated in Fig. 14. The stretching and squeezing sections of the two branched manifolds are shown, with the outside to inside scroll on the left. The three period one orbits are shown propagating through the middle of the three branches. The period 2 orbit AC is shown propagating through the two outside branches. The linking number of the period two orbit (half the signed number of crossings) with each of the period one orbits is shown beneath each of the branches. For the outside to inside scroll these three integers are ( - 1 , 0 , 0), while for the inside to outside scroll the three integers are (0, 0, + 1). Each must be added to the LN in the return part of the map. This consists of 8 half-twists. These entangle each period one orbit with the period two orbit with a linking number of 8. The results of this computation are summarized in Eq. (27). We therefore located the period 2 orbit AC - (4f, 5f) in the second return map at T = 12~ This orbit is shown in two embeddings in Figs. 15 and 16. The LNs of this orbit with the three period one orbits were computed. This computation showed clearly that the branched manifold which describes the strange attractor generated by the modified Hodgkin-Huxley equations is the outside to inside scroll template.
183
The topology and organization of unstable periodic orbits
(a)
(b)
_
1
~1 ~ - 1 1 -1 +8 +7
0 +8 +8
0 +8 +8
0 +8 +8
0 +8 +8
+1 +8 +9
Fig. 14. Distinguishing between (a) outside to inside and (b) inside to outside scroll templates. Three branches (4f, 4r, 5f) of the two scroll templates are shown. The three period one orbits are shown as vertical lines through the middle of each branch. The period two orbit (4f, 5f) is shown going through the outer edge of the two exterior branches. Linking numbers of this period two orbit with the three period one orbits are half the sum of the signed crossings shown, plus half the sum of the additional 16 crossings in the return flow, which has 8 half twists. B
Fig. 15.
Periodic orbits in the differential embedding. (a) 4f; (b) 4r; (c) 5f; (d) (4f, 5f).
Table 2 provides the L N s of all orbits up to period three which can be found in the three branches 4f, 4r, 5f of this branched manifold. 9. Jelly rolls
The b r a n c h e d manifold which describes the strange attractor generated by the modified H o d g k i n - H u x l e y equations has been observed previously in both the pe-
184
Fig. 16.
R. Gilmore and X. Pei
(a)
(b)
(c)
(d)
Periodic orbits in the integral-differential embedding. (a) 4f; (b) 4r; (c) 5f; (d)
(4f, 5f). riodically driven Duffing oscillator [29] and in experimental data generated by a YAG laser [30]. It has been affectionately named the 'jelly roll' (Duffing) and the 'gateau roul6' (YAG laser). The three systems which exhibit this jelly roll behavior are all slightly different. The YAG laser is a nonautonomous dynamical system, driven by external forcing with fixed periodicity. The Duffing oscillator is also a nonautonomous dynamical system, driven by external forcing with fixed periodicity. However, this stretching and squeezing mechanism operates in an identical way over two half-cycles, so that the branched manifold for the Duffing oscillator is actually the second iterate of the jelly roll. In both systems, at any given forcing frequency, all coexisting unstable period one orbits have the same period. By contrast, the modified Hodgkin-Huxley equations form an autonomous dynamical system. Coexisting unstable period one orbits have somewhat different periods. This can be seen from the original bifurcation diagram (Fig. 1). The time duration of a period one orbit is the sum of its interspike time intervals. This sum increases nonmonotonically as T decreases, with peaks at intermittency, that is, when orbits of type nr are present. The jelly roll template will be used to provide a very simple, intuitive, and appealing description of the dynamics of receptors with subthreshold oscillations. The description involves two useful ratios. We first provide this description for the YAG laser. We then describe the small modifications needed to carry over the description to receptors with subthreshold oscillations. As a first step, we unroll the scroll shown in Fig. 13a. The result is the distorted rectangle shown in Fig. 17. The flow is from left to right. At the beginning of a period (t -- 0) a set of initial conditions exists along the vertical edge at the left. The vertical edge at the right (t =- P) marks the end of a period. Fiducial marks measure the right-hand edge in units (i.e., re) of the left-hand edge. Each unit carries an
The topology and organEation of unstable periodic orbits
185
Table 2 Linking numbers for all orbits up to period three that occur on the three-branch template: A = 4f, B = 4r, C = 5f Orbits
A
A B
C AB AC BC AAB AAC ABB ABC ACB ACC BBC BCC
B
4 4
4 7 7 8 11 11 11 11 11 11 12 12
4 8 8 8 12 12 12 12 12 12 12 12
C
A
A
B
A
A
A
A
A
A
B
B
B
C
C
A B
A C
B B
B C
C B
C C
B C
C C
11 12 12 22 22 24
11 12 12 22 22 24 33
11 12 12 22 23 24 33 34
11 12 13 23 23 25 34 35 35
11 12 12 22 23 24 33 34 34 35
11 12 13 23 23 25 34 35 35 36 35
12 12 13 24 24 26 36 36 36 37 36 37
12 12 13 24 24 26 36 36 36 38 36 38 39
4
7
7
8
4
8
8
8
8
8 15
9 16 16
8 8 9 12 12 12 13 12 13 13 13
15 16 22 22 22 23 22 23 24 24
16 22 22 23 23 23 23 24 24
24 24 24 25 24 25 26 26
33 33 34 33 34 36 36
34 35 34 35 36 36
35 34 35 36 36
35 36 37 38
35 36 36
37 38
39
i n t e g e r w h i c h reflects the t o r s i o n w h e n this s t r u c t u r e is r o l l e d b a c k u p to the o r i g i n a l scrolled s t r u c t u r e . As t i m e evolves, the set o f initial c o n d i t i o n s ( l e f t - h a n d edge) m o v e s to the right, stretches (sensitivity to initial c o n d i t i o n s , positive L y a p u n o v e x p o n e n t ) , a n d drifts u p w a r d ( i n c r e a s i n g t o r s i o n ) . W h e n the set o f initial c o n d i t i o n s arrives at the righth a n d edge, it is s p r e a d o v e r several c o n t i g u o u s s e g m e n t s . T h e r a t i o o f its l e n g t h at the r i g h t - h a n d edge to its o r i g i n a l l e n g t h is R = e ~', w h e r e ~1 is t h e p o s i t i v e L y a p u n o v e x p o n e n t (R -- s t r e t c h ratio). T h e r a t e o f u p w a r d drift is the r a t i o o f the t w o time scales o f the laser. T h e r e is a n intrinsic o s c i l l a t i o n time t, a n d the e x t e r n a l l y i m p o s e d drive p e r i o d , P. T h e i m a g e o f the l e f t - h a n d edge e x t e n d s a l o n g the righth a n d e d g e f r o m a b o u t ( P / t ) to a b o u t ( P / t ) + R. 7 6 5 4
3 2
Fig. 17. Intuitive description of scroll dynamics for the Y A G laser. The scroll shown in Fig. 13 is unrolled. A set of initial conditions flows from the left edge to the right, drifting upward and expanding. The right end is then rolled back up and the flow is reinjected back into the left edge. For the strange attractor generated by the modified Hodgkin-Huxley equations, the wavy line indicates the duration of a period one orbit.
186
R. Gilmore and X. Pei
The two ratios which characterize the dynamics are the time scale ratio P/'c and the expansion ratio R = e ~ . 1. The longer the period, P, the more scrolling (torsion) occurs. 2. The larger the stretching, R, the more branches are involved in the strange attractor. Only slight modifications are required to port this intuitive description from the nonautonomous Y A G laser to the autonomous receptor with subthreshold oscillations. In the latter case the period depends on the orbit. We have indicated this by a wavy solid line in Fig. 17. The role of P, the period of the external drive in the Y A G laser, is replaced by 1/T (T -temperature) in the modified Hodgkin-Huxley equations. We now cut away all the branches which are not visited in this deformed rectangle, and rescroll the remaining branches. The resulting structure has the form shown in Fig. 18a. What happens next can best be illustrated using a cut rubber band, half a pair of suspenders, or a stretchy belt (all of which are useless!). Imagine taking one of these deformable structures, stretching it by pulling it along its long axis, and then twisting it about its long axis several times. What results has the form shown in Fig. 18a. If the tension is now relaxed, the structure 'untwists', as shown in Fig. 18b. Mathematicians would describe this deformation as the conversion of twist for writhe. Indeed, there is a well-known conservation relation among the three quantities Link, Twist, and Writhe: Link = Twist + Writhe.
(28)
Mathematically, this is a remarkable relation, since neither term on the right is a topological quantity. They are both geometric, and when computed, may be real
c
Fig. 18. Illustration of Eq. (28). (a) A rubber band is twisted about its stretched length. (b) When the tension is relaxed, it deforms, exchanging twist for writhe. (c) When the two ends are reconnected, the shape of the flow generated by the modified Hodgkin-Huxley equations is apparent.
The topology and organization of unstable periodic orbits
187
rather than integer-valued. However, their sum is a topological quantity and always an integer [32]. In fact, one does not even have to go so far as using rubber bands or belts to visualize the transformation of twist to writhe. Anyone who has used a telephone (not cordless) has experienced this. This also occurs in DNA. We make a geometrical model of the flow in the reduced phase space of the modified Hodgkin-Huxley model as follows. We relax the twist out of the branched manifold almost entirely. This converts, for example, 4 full twists on the branch 4r into 4 loops (writhes) without twist. The resulting structure mimics very well the flow in the phase space. Passage through the maximum of each loop corresponds to a spike in a burst. This is indicated in Fig. 18c. It is not difficult to make predictions about what happens when control parameters are changed. The flow is pushed to contiguous branches. In the algebraic description of the branched manifold, the topological matrix T(i,j) remains unchanged, as does the joining information contained in the array J(i). The only part of the algebraic description which changes with control parameters is the Markov transition matrix. For flows involving branches 4f, 4r, 5f this matrix is Label lr 2f 2r 3f 3r 4f 4r 5f 5r 6f
Torsion 2 3 4 5 6 7 8 9 10 ll
2 -0 0 0 0 0 0 0 0 0 _0
3 0 0 0 0 0 0 0 0 0 0
4
5 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
6
7 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 0 0
8 0 0 0 0 0 1 1 1 0 0
9 0 0 0 0 0 1 1 1 0 0
10
11
0 0 0 0 0 0 0 0 0 0
00 0 0 0 0 0 0 0 0_
(29)
As control parameters are changed (e.g., the ambient temperature), the block of l's on the diagonal moves up or down the diagonal, possibly contracting to a 2 • 2 submatrix, perhaps expanding to a 4 • 4 matrix. The direction in which the allowed transition block moves depends on the ratio P/~ in the Y A G laser, or its analog in the nerve cell. The size of the block depends on the stretching factor or Lyapunov exponent, and is n • n, where n = [R] + 1 or [R] + 2, JR] is the integer part of R, and R = e~ .
10. Flows without equations Branched manifolds provide more than a means of classifying strange attractors. They even provide more than a very good representation of the flow in strongly dissipative strange attractors of dimension 2 + 8, 8 ~ 0. They provide a means for
R. Gilmore and X. Pei
188
accurately representing the flow in strange attractors which are far from the dissipative limit (e not small). The basic idea is that the branched manifold provides a 'backbone' or 'skeleton' for the flow. If one 'blows up' by expanding in the transverse direction, this is the same as expanding against the contracting direction. This replaces two-dimensional splitting and joining charts by their three-dimensional counterparts (cf., Fig. 8), and replaces two-dimensional branches by three-dimensional 'flow tubes' [2]. The result is a flow in R 3 which has the topological organization of the initial strange attractor, but for which the flow is now generated without the benefit of dynamical system equations of motion. How does this work for the modified Hodgkin-Huxley equations? Flow modeling of this type has already been done for the Duffing oscillator [29]. We simply take the results of that study and apply them to the current problem, with appropriate modifications. The flow takes place in a topological torus D 2 x T ~, which has been deformed by the conversion of twist to writhe. Two views of a structure of this type ('writhing torus'), with a writhe of 2, are shown in Fig. 19 [33]. We first imagine the flow to take place within a solid cylinder. Then we map the cylinder into the writhing torus, identifying the two ends of the cylinder. This produces a strange attractor with the correct topological structure, provided suitable care is taken. We will describe the necessary 'suitable care' below. The flow in the cylinder is modeled in two phases. The two processes described in Fig. 17 provide the skeleton for this model. However, those described in Fig. 17 are noninvertible, while the model we provide below is for an invertible map, and subsequently, for a flow. The first phase, which models stretching, occurs from s = 0 to s = 1 (s measures distance along the axis of the cylinder of length 2). In this phase, a set of initial
1.2
1.2i
0.8
O.8
0.4
0.4
YO
Y 01
-0.4
-0.4i
-0.8
-0.81 -1.2 T2
Fig. 19.
10
8
6
~
4 t
,
2
'
0
r 8 -2
-4
Two projections of a 'writhing torus' with writhe = 2 [33]. This figure was kindly supplied by J. Palencia.
189
The topology and organization of unstable periodic orbits
conditions in a strip at s = 0 is stretched out and rotated around the axis of the cylinder. The second phase, which models squeezing, occurs from s = 1 to s = 2. In this phase, a set of initial conditions, the circle at s = 1, is deformed into the interior of an open strip in the interval 0 ~<0 ~
0o -+ 01 = qb + )v0o, ro -+ rl --ro e-7(01-0~
(30)
In this map the 'drift' is measured by the angle qb, and the increase in torsion is measured by the ratio ~/rt. The parameter )~ is the stretch factor R. The parameter 7 describes the dissipation: how quickly the flow spirals down to the axis of the cylinder. At 0 = 0 the strip has outer and inner radii a and b. In order to guarantee invertibility of this map, we require ae -2rcv < b. In the second (squeezing) process, a set of initial conditions at s = 1 in the interior of a circle of radius a is mapped into the interior of the strip at s = 2 extending from 0 = 0 to 0 - rc (cf., Fig. 20b). This is conveniently done in two stages. First, the circle is transformed into a long, thin ellipse by squeezing along one diameter. Then this ellipse is mapped into the interior of the strip described above. For a point in the circle with coordinates (rl, 01) --+ (Xl,Yl) = (rl cos 01, rl sin 01), the map M2+-I (rl, 01) from s = 1 to s = 2 is Circle
--+
Ellipse
x'
=
89
b)(x,/a),
(x,,yl)
--+
(x',y')
Y'
:
Yl,
Ellipse
---+
Strip
02
-
89(r + r
(xt,y')
--+
(r2, 02)
=
[89 + b / + x')]e
+ 89(r -
r
(31)
In this map the two angles 0 ~<~2 < ~1 ~ ~ describe how extended the mapping of the ellipse into the strip is. The map from one end of the cylinder to the other is the composition of the two maps above
(r2, 02) = m2+_o(ro, 00) = m2+--1 (rl, 01)-ml+---o(ro, 00).
(32)
Iteration of this composition map can then reproduce the dynamics which is observed on a plane, for example, a Poincar~ section of the flow. This map is fairly complicated. It depends on the seven unfolding parameters (a, b, dp, ~, 7, ~1, ~2)" It has been studied over a limited range of these parameters [29].
R. Gilmore and X. Pei
190
() () () S=O
g = l ..~ v
S
S=2
0
S
1
a
0+~tTc
b
Fig. 20. Mappings D2(s = O) ~ D2(s- l) ~ D2(s = 2). Top: The cylinder has circular cross-sections at s = 0, s = 1, and s = 2. (a) The cross-section at s - 0 is mapped to the crosssection at s = 1 according to Eq. (30). The shaded region in the strip on the left spirals around and down toward the axis of the cylinder. (b) The cross-section at s - - 1 is mapped to the cross-section at s - 2 in two steps according to Eq. (31). First, the disk is compressed to thickness a - b along one axis. Second, this long thin ellipse is mapped to the strip extending from 0 = 0 to 0 - - x in the disk at s = 2. Top right: The sigmoidal function f(s) which interpolates a flow from a map has derivative(s) equal to zero at both ends. In this range o f control p a r a m e t e r values, it describes the alternation between periodic and chaotic behavior in periodically driven dynamical systems, with sequential increase of torsion ( n u m b e r o f spikes/burst) as the ratio o f P/x increases (modeled by ~ / x above). This m o d e l also describes the bifurcation sequences that can occur when branches with higher torsion are created and those with lower torsion are annihilated. It clearly shows a 'snake' of period one orbits as a function of the p a r a m e t e r ~ / x (cf., Fig. 21). As ~) increases, branches with torsion 2n a n d 2n + 1 (e.g., 4 and 5) come into existence via saddle n o d e bifurcations, a n d then branches with torsion 2 n - 1 and 2n (e.g., 3 and 4) go out of existence via inverse saddle n o d e bifurcations. This is the sequence of events that occurs in the bifurcation d i a g r a m o f the m o d i f e d H o d g k i n - H u x l e y equations as the t e m p e r a t u r e is lowered. The even (orientation preserving) branches are unstable w h e n born, and remain unstable t h r o u g h o u t their lives. The unstable period one orbit nr occurs on the b r a n c h with torsion 2n. Orbit nf occurs on the b r a n c h with torsion 2 n - 1. Bifurcation sequences can occur along the o d d branches. H o w e v e r , these bifurcation sequences m u s t reverse themselves before the o d d branches are annihilated, for example, as the t e m p e r a t u r e is lowered.
The topology and organization of unstable periodic orbits
191
12
8
4
0
ii
ii
i
ii
2
4
6
8
10
~/~
Fig. 21. Period one snake. For any value of qb, one or more period one orbits exist in the map (32). As d~ increases, these orbits are created and annihilated in a systematic way. In the modified Hodgkin-Huxley flow, period one behavior nf and nr occur along the branches 2 n - 1 and 2n, respectively.
Eq. (321) above defines a map from one face of a cylinder at s = 0 to another face at s = 2. The modified H o d g k i n - H u x l e y equations define a flow. It is possible to construct a flow from a map by simple interpolation schemes. For example, if x E R n (n = 2 in our case) has values x0 at s = 0 and xl at s = 1, then it is a simple matter to interpolate at intermediate positions 0~<s~< 1 by means of a weighted average
x(s) = xo • (1 - f ( s ) ) + Xl X f ( s ) ,
0~<s~<1, f ( 0 ) = 0, f ( 1 ) = 1.
(33)
The simplest interpolation is linear, with f ( s ) = s. This interpolation has the disadvantage of having discontinuous first derivatives at the endpoints of the interval. To remove this disadvantage, it is useful to use a sigmoidal-shaped function f ( s ) with the properties f(0)
-
0
f(1) = 1
f(1)(0)
-
0
f(1)(1) = 0
...
f(k)(0)
--.
f(k)(1) = 0 .
-
0,
(34)
This guarantees that the flow and its first k derivatives (f(k)(1) is the kth derivative of f , evaluated at s - 1) are continuous everywhere. A similar interpolation can be used to create a smooth flow between the faces of the cylinder at s = 1 and s -- 2. Once a smooth flow has been constructed in the cylinder, it is possible, by 'wraparound', to generate a smooth flow in the torus D 2 • T ! by identifying the end at s = 2 with the end at s = 0. In order to generate a flow of the type shown in Fig. 3, the cylinder should be mapped into a writhing torus of the type shown in Fig. 19. There are two subtleties about this map (i.e., the 'suitable care' which must be taken): 1. All radii of curvature of the writhing torus must be sufficiently large. Otherwise, singularities of the involution map will lie inside the image of the cylinder, and the uniqueness theorem of the theory of ordinary differential equations will be violated.
192
R. Gilmore and X. Pei
2. The writhe of the torus and the twist of the flow must add to the proper linking number (cf., Eq. (28)). The writhe of the torus is its number of loops (cf., Fig. 19), while the twist of the flow is determined by the angle qb. The Lyapunov exponents and dimensions for any of the flows defined in this way can be estimated as follows. The Jacobian for the map M I ~ 0 is the ratio of two areas, the area A(0) of the strip in the disk at s = 0, and the area A(1) of the elongated strip in the disk at s = 1. The area of the original strip is A(O) ~,, rc(a - b)e -Y~/2, while its image has area A(1) ~ )vrc(a- b)e -7[O~ As a result,
A(1)
Xx(a - b)e -~[~
A(O)
rt(a - b)e-V~/2
In a similar way, the Jacobian of the second map is the ratio of areas. The ratio of the ellipse to the circle is ~ ( a - b ) / ( 2 a ) . The ellipse is stretched by a factor (~l - ~2)/2 and squeezed by a factor of e -v(~,,+~,2)/2 on being mapped into the strip in the disk at s - 2, so that Jl .
A(2) . A'(1)
a
-- b ~1. - ~2e-V(~,,+~,,)/2
. 2a
2
The product of these two Jacobians is equal t o e )v'+)~2+)v3, where ~i are the Lyapunov exponents for the flow. In regions where chaotic behavior is observed, )Vl > 0 and )v3 < 0. The exponent along the flow direction ()v2) is zero. Taking logarithms, we find )~l+)~3~ln()~) -
{ y ( q b + ) ~~) + y (,~1+~2 2
a ( rt ) (4)} tln(a_b )+ln ~,_~,2 + l n .
(35)
If it is a good approximation to identify the factor )v with the stretch factor R = e ~'l , then )vl ~ ln()v). The negative Lyapunov exponent, ~,3, is then what is left over in the expression above. This has larger magnitude than )vl = ln()v), since the map M2+--o contracts areas. From these expressions, we derive an estimate for the Lyapunov dimension of the strange attractor generated by the interpolated flow ln(~) dL ~ 2 + {y(qb + )v~)+ YC"'+~'22 ~) + ln(a-~_6) + l n ( ~ , ~ , ) + ln(4)} "
(36)
This estimate holds independent of the interpolating functions f ( s ) used to create smoothe flows from the maps. Although the process of creating flows without dynamical equations of motion involves a fair amount of bookkeeping, it has the virtue that it is relatively straightforward and the topological structure of the strange attractor can easily be controlled: it is an input. In addition, it is likely to be easier to create models of interacting neurons through coupled geometric models of the type just described,
The topology and organization of unstable periodic orbits
193
than by proliferating sets of dynamical equations of motion, and then searching through a very large control parameter space for the subset of values which reproduces the observed properties.
II. Chaos in higher dimensions Before discussing chaos in higher dimensions (than 3), we first address two questions: 1. Why should we bother? 2. How is it done? We address the first question first, since it is simpler. If a single receptor with subthreshold oscillations exhibits chaotic (deterministic nonperiodic) behavior, then two such cells (the flow is now in R 5+5) can also exhibit this kind of behavior. If we would like to understand realistic neural networks, we must be able to understand chaos in higher dimensions. The second question has a simple answer: 'We don't know'. However, there are some hints as to what a theory of deterministic nonperiodic behavior in higher dimensions might look like. In some sense, we have already addressed the question of chaos in higher dimensions. The modified Hodgkin-Huxley equations are definitely not a threedimensional dynamical system: the phase space is R 5. However, there is only one unstable Lyapunov exponent, and the system is strongly contracting. This means that it is possible to project this system into a three-dimensional inertial manifold (in our case, O 2 • T 1 C R 3) in which Linking Numbers c a n be computed. If the phase space cannot be reduced to three dimensions, by projection, adiabatic elimination of slaved variables, or otherwise, then we cannot use the BirmanWilliams theorem to provide information about the topological organization of unstable periodic orbits, since they do not link in spaces of dimension greater than 3. However, the most important part of the Birman-Williams theorem may not be about the organization of unstable periodic orbits at all, but simply the fact that it is possible to project down along part of the stable invariant manifold to get something closely related to the initial dynamical system, but possessing important singularities. It is, after all, the singularities (splitting points, branch lines) which identify the stretching and squeezing mechanisms in R 3. It is likely that a theorem of the following form might be valid. Arrange the Lyapunov exponents of a strange attractor as described in Eq. (17). Then 1. The flow can be projected along the most strongly contracting directions corresponding to the most negative exponents ;~n+~,..., )LN. This projects the flow into an n-dimensional inertial manifold. 2. Within the inertial manifold, the flow can be projected along the remaining more weakly contracting directions, corresponding to the remaining negative exponents )~p+2,..., )~n. This projects the flow onto a p + 1-dimensional branched manifold. In a theorem of this type it will be necessary to replace 'Lyapunov exponent' by 'local Lyapunov exponent'. That is, the Lyapunov exponents are functions of po-
194
R. Gilmore and X. Pei
sition, )~i --~ ~i(X), where x is in the basin of the strange attractor 5es~'. The integer, n (cf., Eq. (19)) is a function of position, n(x). The dimension, riM, of the inertial manifold containing the strange attractor is then n M = Max xESed
n(x).
Within this inertial manifold the nM remaining Lyapunov exponents will vary in value, some even changing signs. Therefore the number of positive Lyapunov exponents, p, is also a function of position. We define the maximum number of positive Lyapunov exponents, PM, as we defined nM above. Then we project along a minimum number nM -- (PM + 2) of stable directions onto a branched manifold of dimension PM + 1. We should remark here that Lyapunov exponents are averages over local Lyapunov exponents {~I,~2,...,~N}--({XI(X),~2(X),...,~N(X)}).
(37)
It is possible that none of 'the' Lyapunov exponents (the averaged quantities) is positive, while one or more of the local Lyapunov exponents are positive in regions of phase space where stretching and squeezing take place. Strange attractors with no positive (averaged) Lyapunov exponents can be created in this way. Such attractors have been called 'strange nonchaotic attractors'. To understand why such a theorem could be useful, we consider first a simple case: the receptor with subthreshold oscillations. The phase space is five-dimensional, and the flow takes place in the disk D 4 • T 1 C R 5. There is a global Poincar~ section for this flow, corresponding to a fixed phase in T l . This Poincar6 section is four-dimensional. The first part of the hypothetical theorem guarantees a projection into an inertial manifold D 2 • T l C D 4 • T 1. The global Poincar6 section in this inertial manifold is now two-dimensional. The second part of the hypothetical theorem guarantees a further projection of the flow down to a two-dimensional branched manifold. The intersection of the two-dimensional branched manifold with the Poincar6 section is typically (mathematics: generically) a one-dimensional manifold, M 1. Under the flow, M ~ is mapped back onto itself (first return map). This first return map must possess a singularity, otherwise it would be invertible, information about the past would not be lost, and entropy would not be generated by the flow. The theory of singularities of mappings is well known, at least in low dimensions. If we ask: what singularities can occur (generically) in mappings of M 1 to itself, the answer was given by Whitney [34]: the only local singularity is the fold map, A2. The stretching and squeezing involved in the fold return map are shown in Fig. 22a. The canonical mechanism displayed leads directly to the logistic map. Blowing this back up, as described above, leads to an orientation preserving Henon map, described in Fig. 22b. The one-dimensional intersection M ~ can look like a piece of R ~. It can also have nontrivial boundary conditions, and be topologically equivalent to the circle T ~. In
195
The topology and organization of unstable periodic orbits
b
x R
~
(x, y+rx 2)
_STRETIH
Affine
x
[
TtH
(x, y+rx 2)
S RE
;SQUEEZE ) Y Affi (0, y=x 2)
~SQUEEZE y (~x, y+x 2)
(x'= +x2+a) orthogon~a1 ~k
translation
Fig. 22. Folds and strange attractors. (a) A line segment R 1 is deformed by stretching in the plane, then compressed by projecting it down to another line segment. The second line segment is mapped back to the first by an affine transformation. The result is a logistic map. (b) A long thin rectangle is deformed by stretching into a parabolic shape, then squeezed. The resulting horseshoe shape is mapped back to the original rectangle by an affine transformation. An orientation preserving map of Henon type results. this case, the first return map is equivalent to the standard circle map. Blowing this up produces the annulus map. The T 1 singularity describes systems with H o p f bifurcations in general, and the periodically driven van der Pol oscillator in particular. In this way, the simplest singularity in the reduced phase space D 2 • T 1 describes the following often studied maps:
R1 T1
Noninvertible
Invertible
Logistic Circle
Henon Annulus
(38)
By adding additional fold singularities, we are able to describe more extended nonlocal singularities, such as the scrolling which gives rise to the branched manifolds for the receptor with subthreshold oscillations. All branched manifolds in R 3 can be classified according to their singularity structure [2]. A similar treatment can be carried out in higher dimensions [35]. We give just one example to give a flavor of the riches that await our fuller understanding. We assume a flow has constant Lyapunov exponents )~1 > )~2 > )~3 = 0 > )~4"'" > )~N+I and is contained in a disk/yv • T 1" We also assume )~1 + )~2 + )~3 + ~4 < 0. Then this flow can be projected into a four-dimensional inertial manifold D 3 • T 1. There is once again a global Poincar~ section. There is a further projection of the flow down to a three-dimensional branched manifold. Typically, the intersection of this three-di-
196
R. Gilmore and X. Pei
mensional branched manifold with the global three-dimensional Poincar6 section is a two-dimensional manifold M 2 (by transversality and 3 + 3 - 4 = 2). The return map induced by the flow must be singular, for the reasons expressed above. This raises a well-posed question: W h a t are the singularities of mappings M 2 ~ M2? Again, the answer has been given by Whitney [34]. The only local singularities are the fold A2 and the cusp A3. Figure 23 illustrates the stretching and squeezing generated by the cusp singularity. As with the fold, nonlocal singularities are also possible. Again, as with the fold, M 2 may have the topology of a neighborhood of R 2, or it may have more interesting boundary conditions. In fact, rather than there being only two topologically inequivalent manifolds as in one dimension, with M 1 = R 1 or M 1 : T l , there is now a countable set including R 2, S 2 (sphere), T 1 • T 1 (torus), R 1 x T 1 (cylinder), as well as some nonorientable manifolds like the M6bius strip, the Klein bottle, and the real projective plane. Topological inequivalence has important consequences for the study of chaos. For example, in the one-dimensional cases described above, generic noninvertible maps R ] ~ R 1 can have any number of nonlocal folds, but generic noninvertible maps T ~ --* T ~ can only have an even number of folds, and they must be related to each other in very specific ways. Further, in a family of mappings F ( a ) : T 1 ~ T 1, important precursor p h e n o m o n a (e.g., mode locking) occur even before the noninvertible limit is reached. In the case of two dimensions (M 2 --. M2), in which a much larger variety of inequivalent global boundary conditions exist, the spectrum of behavior is much richer.
t
inject stretch project 9 squeeze (x,y,0) AgOne Return
Z
(O,y,z=xS+xy)
Fig. 23. Cusps and strange attractors. A segment of a plane R 2 is deformed by stretching in R 3, then compressed by projecting it down into a plane, producing a cusp singularity. The second plane section is mapped back to the first by an affine transformation. The plane section can be thickened, analogous to the construction shown in Fig. 22b. The resulting stretched and squeezed shape is mapped back to the original thickened plane by an affine transformation.
The topology and organization of unstable periodic orbits
197
More generally, it appears that there is a class of discretely classifiable branched manifolds for higher-dimensional dynamical systems. We use the ideas outlined in Section 10 and in Figs. 22 and 23 to outline the theory. Imagine a strange attractor which has (for simplicity) constant Lyapunov exponents and for which the inertial manifold is D u x T 1. We assume that there are Nu positive and Ns negative Lyapunov exponents, so that Nu + Ns - N. Then we project the flow onto a branched manifold of dimension Nu + 1 (+ 1 for the flow direction). The intersection of the branched manifold with a Poincar6 section defined by constant phase in T 1 is an Nu-dimensional manifold, M N~. The flow around T 1 induces a first return map: M uu ---+M l~u. This map must possess singularities, otherwise we would not lose information about the past (this is the 'branch line argument'). This means that we can determine the backbone of the strange attractor by determining what singularities exist in mappings M xu ~ M Nu . This is a well-defined question. Without going into generalities, we state here that among all singularities there are two infinite series of simple singularities which are already encountered in Catastrophe theory [36]: the cuspoids ( A k , k >~2) and the umbilics (Dk, k ~> 4). These singularities are defined by their germs, which are mappings R K ~ R x for which the Jacobian vanishes identically at the singular point. For Ak and Dk, K = 1 and 2. These germs have perturbations, as follows: Singularity
Germ
Perturbation
Ak,
xl --~ f, -- x],
~--~jk-2 ajx~,
Dk,
Xl ~ f l -- Xk1-2 + x22,
k-3 ajx~ _qt_ak-ZX2. ~j--1
X2 ~ f2 ~ XlX2~
If we regard the control parameters aj as coordinates zj, then the cuspoid singularity can be encountered stably in a space with dimension 1 (for Xl) + k - 2 (for Z l , Z Z , . . . , Z k _ 2 ) = k - 1 . The fold singularity A2 can therefore first be encountered in one dimension, and in fact occurs generically as a local singularity in mappings R 1 ~ R 1. Similarly, the cusp singularity A3 first occurs in mappings R2--+ R 2. The umbilic Dk first occurs in mappings R k ~ R k (k >~ 4), by similar arguments. Since the flow produces a singular map of an Nu-dimensional space onto itself, we can make the identification: # g e r m variables(x) + #unfolding coordinates(z) = #unstable directions(Nu). We now observe (cf., Fig. 22) that the fold singularity x ~ x 2 cannot continuously deform the line R 1 into the folded line x 2 unless there is additional room for the deformation to take place continuously. This is generally true for all singular maps. In order for the singularity of the map M Nu ---+M Nu to be created continuously, the manifold M Nu must be smoothly deformed in a space of higher dimension (recall that M Nu occurs in D N, N --Nu + Ns). The minimal number of dimensions that is
198
R. Gilmore and X. Pei
required for this deformation is the dimension of the germ of the singularity: 1 for Ak and 2 for Dk. The deformation in the phase space D u • T ~ can be treated as described in Section 10 for flows in D 2 x T 1. We first imagine that the flow takes place in a solid cylinder D u x R ~ of length 2. Then we group the variables into three types x z
germ variables "[ unfolding variables f
unstable, manifold,
y
deforming directions }
stable manifold.
During the first phase of the flow, from s = 0 to s = 1, stretching takes place. During the second phase, from s = 1 to s = 2, squeezing takes place. These processes take place as follows:
y + rf(x; z) s=0
y + f(x; z)
s=l stretching
(39)
s=2 squeezing
Here f(x;z) represents the perturbed singularity ( = germ + perturbation). During the stretching phase, r increases from 0 to 1. During the squeezing phase, ~ decreases from 1 to a sufficiently small value. For ~ # 0, this map is invertible. For noninvertible maps (i.e., of the branched manifold to itself) we begin with all coordinates y = 0, and go all the way to the limit ~ = 0. The final step is to wrap the solid cylinder around, and identify the output at s = 2 with the input at s = 0. The most general way this can be done, without altering distances (i.e., leaving the Lyapunov exponents unchanged), is through an affine transformation (orthogonal rotation + translation). The result is: A discrete classification of dynamical systems in higher dimensions, analogous to that described in [2] for three-dimensional systems, is possible when Nu >~dim(germ) + dim(unfolding) Ns = dim(germ)
(40)
In particular, stretching and squeezing mechanisms more complicated than of cuspoid type can first be encountered in seven-dimensional dynamical systems. The mechanism is the D+4 singularity, the dynamical system must have four unstable directions and two stable directions, in addition to the flow direction. The dimension of the strange attractor obeys 5 < dL < 7, the branched manifold is five-dimensional, and its intersection with a Poincar6 section is four-dimensional. The hypothetical theorem described at the beginning of this section has another application. If in fact it is possible to carry out the double projection as hypothesized, then the resulting branched manifold serves again as a skeleton for the flow in the inertial manifold. The branched manifold can be 'blown up' to construct a series of flow tubes in the reduced phase space. These tubes split and recombine (stretch
The topology and organization of unstable periodic orbits
199
and squeeze) to generate flows in spaces often more complicated than the cyclic spaces D N x T 1. Further, these flow tubes have full dimension in the reduced phase space. This means they cannot cross through each other. So although linking numbers for periodic orbits fail to exist for reduced phase spaces of dimension greater than three, linking numbers of flow tubes in inertial manifolds are well defined, no matter what the dimension. This means that the topological organization of the flow is rigidly determined. If the organization of these tubes in the reduced phase space is known, then it is possible to create higher-dimensional strange attractors without dynamical system equations of motion. The method is as described in Section 10. There is a great deal to be said for modeling dynamics geometrically, without equations. First, equations (such as the modified Hodgkin-Huxley equations) are opaque to our understanding. It is usually hard work to squeeze meaning out of a set of equations. Specifically, the topological and geometric description of the flow in phase space generated by these equations provides more information and understanding than the equations themselves. This was the message contained in Sections 8 and 9. Further, it is often difficult to find parameter values for which a set of equations exhibits chaotic behavior, and the difficulty appears to increase with dimension (this is Arnold's Principle of 'the fragility of fine things'). Finally, the effects of noise, both large and small control parameter variations, and other perturbations, can be visualized more easily in a geometric setting than in the traditional setting of coupled ordinary nonlinear differential equations.
12. Discussion and conclusions
Several different types of neuron receptors fire even in the absence of sensory inputs (subthreshold oscillations). It seems that these nerve cells are probing their surroundings (other nerve cells), and in turn are being probed, even in the absence of external stimulation. The equations originally designed to describe the electrical activity of the neuron [1] do not exhibit this phenomenon (Section 2). These equations describe 'Platonic' nerve cells: no input, no output. In order to account for subthreshold activity in many sensor neurons, the Hodgkin-Huxley equations have been modified by Braun and his colleagues [3]. The modified equations specifically take into account the differences in time scales between ion pump (slow) and ion gate (fast) processes (Section 3). By including the transfer of only two ion types, Na + and K +, across the neuron membrane, Braun et al. were able to construct a five-dimensional model of neurons which exhibit subthreshold oscillations. This model shows a complicated mixture of periodic and nonperiodic behavior as a function of ambient temperature. The bifurcation diagrams (interspike intervals vs. T), both with and without noise, strongly suggest that these equations generate chaotic behavior over some temperature ranges (Section 4). If in fact neurons do probe each other even in the absence of external inputs, then it seems that they would want to respond very quickly to any changes in ambient conditions. It is a well-known principle of en-
200
R. Gilmore and X. Pei
gineering design [36] that if a system is to respond very quickly to changes, it must be intrinsically unstable. A system operating in a chaotic regime satisfies this condition exquisitely. Indeed, if a nerve cell produces no output, or a periodic output, over a wide range of operating conditions (i.e., it is stable), then it is not responsive to inputs, and might as well be dead. If not, it soon will be, according to Darwinian laws. By integrating the five-dimensional dynamical system [3], we have shown that it is effectively three-dimensional (Section 5). In retrospect, we should be able to predict this because of the wide separation in time scales in the relaxation equations (cf., Table 1). However, multitime scaling methods are not yet sufficiently powerful to allow us to do this. Nor is there yet a useful theorem for projecting a dynamical system into a suitable lower-dimensional inertial manifold. It has recently become possible to test unequivocally for the presence of chaos in dynamical systems [2] (Section 6). This topological test probes for the stretching and squeezing mechanisms which generate chaotic behavior. It has built in self-consistency checks and degrades gracefully with noise, as opposed to the other two (metric, dynamical) methods of analyzing data. However, at present, the topological test requires the dynamics to be 'low dimensional' (Section 7). We have confirmed that the modified Hodgkin-Huxley equations generate a chaotic voltage output in certain temperature ranges [3]. We have done this by identifying the stretching and squeezing mechanisms which operate on the (reduced) phase space of this dynamical system (Section 8). It is the 'jelly role' or 'gateau roul6' mechanism. This mechanism has previously been identified for the periodically driven Duffing oscillator [29] and the YAG [30] laser. As a result, it is possible to port many of the observations and predictions about behavior from these two systems to the behavior of neurons with subthreshold oscillations (Section 9). We have been able to construct a model of the flow in phase space which is appealing at an intuitive level, since it relates the spikes in an output burst to the conversion of twist to writhe in the phase space flow. Further, this model has a number of predictive capabilities, many of which have been verified. The effects of noise are twofold. One effect is intrinsic to the system under study; the other applies to all systems for which a topological analysis is applicable. In the first case, orbits of type nr are exceptionally unstable against perturbations. What makes them so unstable is the last incomplete rise before the voltage returns to its repolarization minimum. The duration of this half peak is particularly long compared to the duration of the preceeding spikes. This means that noise has an enhanced effect on this feature because of its length. In particular, noise will either enhance or destroy the partial depolarization rise. That is, in the presence of noise, we have the stochastic destruction nr .. )
(n + 1)f nf
This means, for example, that noise will destroy all orbits in Table 2 containing the symbol B (A = 4f, B - 4r, C = 5f). To period three, the only periodic orbits we would expect to find in the presence of significant noise are the five orbits (A, C; AC; AAC, ACC). However, their LNs remain as described in Table 2.
The topology and organization of unstable periodic orbits
201
The second effect of noise is more general. The higher the noise level, the more difficult it is to identify orbits of higher period. For once Murphy (author of the famous law) is on vacation: the most important orbits in this analysis methodology are the lowest period orbits. Orbits of higher period are principally used for confirmation purposes. Until the noise level reaches the stage that orbits of period one and two can no longer be located in the data, this analysis method will succeed. We have also seen that it is possible to model chaotic dynamics geometrically, without aid of dynamical system equations of motion (Section 10). Although this may not be an important point for a single isolated neuron with subthreshold oscillations, this ability can play a powerful role in studying interacting neurons. If the (full) phase space for the single neuron is D 4 x T 1, then for two neurons it is (D4 x T l ) @ 2 D s x T 2. At this stage it would be really useful to have an inertial manifold theorem to reduce dimension (of D 8 to D 4, for example). If the two neurons do not interact, there is not necessarily any relation between their time evolution. But if there is an interaction, their time evolution might easily become synchronized. In this case there is a further reduction T2--+ T 1. One possible mechanism for this reduction is very beautiful, and intrinsic to this model. At any temperature, the time duration of each unstable period one orbit is different. A small coupling will then act to encourage s y n c h r o n i z a t i o n of the outputs by causing them to phase lock. This is easy because of the temporal diference of the period one orbits. The bifurcation T 2 ~ T 1 ( T n +-+ T 1 for n neurons) which describes phase locking synchronization is likely to play a very important role in learning and behavior modification. We close this discussion by listing the areas in which new contributions would be very useful. 9 Multitime scale analysis should be developed to the point where it is possible to determine the effective dimensionality of a system involving a number of equations of relaxation type, with different time scales. 9 A theorem of the type discussed in Section 11 would be useful in several ways. First, it would relate the dimension of an inertial manifold to the spectrum of local Lyapunov exponents. Second, it would relate the local Lyapunov dimension of a strange attractor to the local Lyapunov exponents. 9 Such a theorem would be even more useful if it included a prescription for the projection into an inertial manifold. 9 Generalization of the Birman-Williams theorem to higher dimensions would allow projection of a strange attractor to a suitable branched manifold. These are the objects for which a classification theory exists in low dimensions. By analogy, they should be classifiable in higher dimensions, and provide caricatures for the flows generating the strange attractors, as well. 9 A classification theory for higher-dimensional dynamical systems is badly needed. Otherwise, how is it possible to analyze data without knowing what we are looking for? Such a classification theory should include a description of squeezing mechanisms (such as singularity theory provides). It should also include a rich spectrum of results, depending on global boundary conditions.
202
R. Gilmore and X. Pei
9 A g o o d m o d e l for coupling between interacting n e u r o n s with s u b t h r e s h o l d oscillations is required. Only then will it be possible to study the p h e n o m e n a involved in the interactions between such neurons. Acknowledgements
R. G i l m o r e t h a n k s Prof. P. G l o r i e u x for the c o m f o r t , s u p p o r t and abri which he p r o v i d e d at the Universit~ des Sciences et Technologies de Lille during his sabbatical leave. He also t h a n k s his colleagues during this period, D. Derozier, S. Bielawski, and M. Lefranc, for exciting a n d stimulating discussions. X.-P. was s u p p o r t e d in p a r t by the Office of N a v a l Research, Physical Sciences Division, a n d by the D e p a r t m e n t of Energy. Both a u t h o r s t h a n k F. M o s s for useful discussions a n d e n c o u r a g e m e n t .
References 1. Braun, H.A., Huber, M.T., Dewald, M., Sch~ifer, K. and Voigt, K. (1998) Int. J. Bifur. Chaos 8, 881-889. 2. Gilmore, R. (1998) Revs. Mod. Phys. 70, 1455-1530. 3. Gilmore, R., Pei, X. and Moss, F. (1999) Chaos 9, 812-817. 4. Hodgkin, A.L., Huxley, A.F. and Katz, B. (1949) Arch. Sci. Physiol. 3, 129-150; Hodgkin, A.L., Huxley, A.F. and Katz, B. (1952) J. Physiol. 116, 424--448; Hodgkin, A.L. and Katz, B. (1949) J. Physiol. 109, 240-400; Hodgkin, A.L. and Huxley, A.F. (1952) J. Physiol. 116, 449-472, 473-496, 497-506; Hodgkin, A.L. (1951) Biol. Rev. 26, 339-409; Hodgkin, A.L. and Huxley, A.F. (1952) J. Physiol. 117, 500-544. 5. Llinas, R.R. (1988) Science 242, 1654-1664. 6. Braun, H.A., Bade, H. and Hensel, H. (1980) Pflfigers Arch. 386, 1-9; Braun, H.A., Sch~ifer, K., Wissing, H. and Hensel, H. (1984) in: Sensory Receptor Mechanisms, eds W. Hamann and A. Iago, pp. 147-156; Braun, H.A., Wissing, H., Sch~ifer, K. and Hirsch, M.C. (1994) Nature 367, 270-273. 7. Sch~ifer, K., Braun, H.A. and Rempe, L. (1991) Experientia 47, 47-50; Sch~ifer, K., Braun, H.A., Peters, R.C. and Bretschneider, F. (1995) Pfliigers Arch. 429, 378-385. 8. Sokabe, M., Nunogaki, K., Naruse, K., Soga, H., Fujitsuka, N., Yoshimura, A. and Ito, F. (1993) J. Neurophysiol. 70, 275-283. 9. Llinas, L.L. and Yarom, Y. (1986) J. Physiol. 376, 163-182; McCormack, D.A. and Feeser, H.R. (1990) Neurosci. 39, 103-113; Pare, D., Pape, H.C. and Dong, J. (1995) J. Neurophysiol. 74, 11791191. 10. Kaplan, J.L. and Yorke, J.A. (1979) in: Functional Difference Equations and the Approximation of Fixed Points, eds H.O. Pietgen and H.O. Walther, Springer Lecture Notes in Mathematics, No. 370, pp. 204-240, Springer, Berlin. 11. Eckmann, J.P. and Ruelle, D. (1985) Revs. Mod. Phys. 57, 617-656. 12. Poincar6, H. (1892) Les Methodes NouveUe de la Mecanique Celeste. Gauthier Villars, Paris. 13. Smale, S. (1967) Bull. Am. Math. Soc. 73, 747-817. 14. Birman, J. and Williams, R.F. (1983) Topology 22, 47-82; (1983) Contemp. Math. 20, 1-60. 15. Duffing, G. (1918) Erzwungene Schwingungen bei Ver~inderlicher Eigenfrequenz Vieweg, Braunschweig. 16. van der Pol, B. (1926) Philos. Mag. 2(7), 978-992. 17. Lorenz, E.N. (1963) J. Atmos. Sci. 20, 130-141. 18. R6ssler, O.E. (1976) Phys. Lett. A 57, 397-398. 19. Mindlin, G.B., Solari, H.G., Natiello, M.A., Gilmore, R. and Hou, X.-J. (1991) J. Nonlinear Sci. 1, 147-173. 20. Broomhead, D.S. and King, G.P. (1986) Physica D 20, 217-236. 21. Packard, N.H., Crutchfield, J.P., Farmer, J.D. and Shaw, R.S. (1980) Phys. Rev. Lett. 45, 712-715.
The topology and organization of unstable periodic orbits
203
22. Mafi~, R. (1981) in: Dynamical Systems and Turbulence, Warwick, 1980, eds D. Rand and L.-S. Young, Lecture Notes in Mathematics, Vol. 898, pp. 230-242, Springer-Verlag, New York. 23. Takens, F. (1981) in: Dynamical Systems and Turbulence, Warwick, 1980, eds D. Rand and L.-S. Young, Lecture Notes in Mathematics, Vol. 898, pp. 366-381 Springer-Verlag, New York. 24. Whitney, H. (1936) Ann. Math. 37, 645-680. 25. Solari, H.G. and Gilmore, R. (1988) Phys. Rev. A 37, 3096-3109. 26. Gionanni, F. and Politi, A. (1991) J. Phys. A 24, 1837-1848; Wu, Z.-B. (1996) Phys. Rev. E 53, 14461452; Plumecoq, J. and Lefranc, M. (2000) Physica D 144, 231-258, 259-278. 27. Fujisaka, H. and Yamada, T. (1983) Prog. Theor. Phys. 69, 32-47. 28. Brown, R., Rulkov, N.F. and Tracy, E.R. (1994) Phys. Rev. E 49, 3784-3800. 29. Gilmore, R. and McCallum, J.W.L. (1995) Phys. Rev. E 51 935-956. 30. Boulant, G., Lefranc, M., Bielawski, S. and Derozier, D. (1997) Phys. Rev. E 55 5082-5091. 31. Gilmore, R. (1989) (unpublished). 32. Kaufman, L.H. (1987) On Knots, Princeton University Press, Princeton,NJ. 33. Palencia, J. (1999) (unpublished). 34. Whitney, H. (1955) Ann. Math. 62, 374-410. 35. Gilmore, R. (1997) in: Applications of Soft Computing, eds B. Bosacchi, J.C. Bezdek, and D.B. Fogel, Proc. SPIE, Vol. 3165, pp. 245-257, SPIE, Bellingham, WA. 36. Gilmore, R. (1981) Catastrophe Theory for Scientists and Engineers, Wiley, New York.
This Page Intentionally Left Blank
CHAPTER 6
Controlling Cardiac Arrhythmias: The Relevance of Nonlinear Dynamics K. H A L L
D.J. C H R I S T I N I Division of Cardiology, Department of Medicine, Weill Medical College of Cornell University, New York, N Y 10021, USA
Departments of Physics and Physiology, McGill University, Montreal, Que., CANADA
J.J. C O L L I N S
L. GLASS
Department of Biomedical Engineering and Center for BioDynamics, Boston University, Boston, MA 10021, USA
Departments of Physics and Physiology, McGill University, Montreal, Que., CANADA
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
205
Contents
1.
Introduction
2.
The electrophysiological nature of the heart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................
207
3.
Arrhythmias
208
.................................................
3.1. Automaticity-related a r r h y t h m i a s 3.2. 4.
R e e n t r a n t arrhythmias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
209 211
Clinical methods of a r r h y t h m i a control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
212
4.1. A n t i a r r h y t h m i c drugs
6.
209
3.3. Diagnosis o f a r r h y t h m i a mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. 5.
.................................
207
........................................
Radiofrequency ablation
......................................
212 212
4.3. Implantable cardiac devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
214
N o n l i n e a r dynamical a r r h y t h m i a control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
215
5.1. C o n t r o l o f reentrant atrioventricular nodal conduction dynamics . . . . . . . . . . . . . .
217
5.2. Algorithmic improvements necessary for real-world a r r h y t h m i a control . . . . . . . . . .
223
Prospects
225
...................................................
Acknowledgements References
.............................................
.....................................................
206
226 226
1. Introduction
In recent years, the study of the heart's electrical activity (called cardiac electrophysiology) has evolved from a discipline of interest primarily to physicians and physiologists to one that has caught the attention of physicists, mathematicians, and engineers. Such scientists have come to realize that cardiac dynamics are characterized by many of the same principles that underlie the physical systems with which they are intimately familiar. The corresponding influx of new (to cardiology) analyses and techniques has led to many important contributions. In particular, techniques from the field of nonlinear dynamics have led to important advancements in the understanding of cardiac dynamics [1-3] and more recently to success in one of the most important clinical problems: control of abnormal heart rhythms (arrhythmias) [4-8]. This chapter outlines the basic electrophysiological properties of the heart, the nature of cardiac arrhythmias, and the analysis and control of arrhythmias. In addition to focusing on nonlinear dynamics, we discuss the remarkable advancements made from within the medical community in order to give dynamicists a better appreciation of the current state of the art in arrhythmia management- its strengths, weaknesses, and most importantly, areas in which nonlinear dynamics might have the greatest impact. This chapter begins with an outline of the basic electrophysiological properties of the normal heartbeat (Section 2). Next, we describe the origin, dynamics, and diagnosis of arrhythmias (Section 3). We emphasize those arrhythmias for which the field of nonlinear dynamics is particularly relevant. Next, we describe the advantages and disadvantages of the current clinical methods of suppressing or curing such arrhythmias (Section 4). Finally, we describe some recent advancements in nonlinear dynamical control of temporal arrhythmias that could lead to improved arrhythmia management (Section 5).
2. The electrophysiological nature of the heart
While the primary function of the heart (pumping blood throughout the body) is mechanical, the muscular pumping contractions are the result of electrical activity. Each heartbeat is the result of a wave of electrical activity that originates near the top of the right atrium in a small region of tissue called the sinoatrial node (see Fig. 1). The sinoatrial node is known as the heart's pacemaker because it spontaneously initiates each beat (a property known as automaticity) with near-regular timing (a property known as rhythmicity) [9]. The electrical wave propagates 207
208
D.J. Christini et al.
LA
~
i
'-,,
A Hi(
PF
RV Fig. 1. A schematic showing normal conduction from the sinoatrial (SA) node, through the right and left atria (RA, LA) and into the atrioventricular (AV) node. From the AV node, the cardiac impulse passes through the bundle of His (His) and into the Purkinje fibers (PF), via which it is distributed throughout the right and left ventricles (RV, LV). through the atria causing them to contract. The wave then converges on a specialized structure called the atrioventricular (AV) node, which is the only electrical connection between the atria and ventricles in a normal heart. The remainder of the wall between the atria and the ventricles comprises nonconductive connective tissue. Conduction through the AV node is slow, producing a significant delay between the activation of the atria and the ventricles. This delay is essential to the heart's pumping efficiency because the delay provides the time necessary for the blood to flow from the atria into the ventricles during atrial contraction. The electrical impulse exits the AV node into the bundle of His, which is the trunk of the ventricular conduction system. From the bundle of His, the impulse enters the Purkinje fibers, which are a network of highly conductive fibers that rapidly distribute the impulse throughout the ventricles. Subsequent contraction of the ventricles pumps the blood throughout the body.
3. Arrhythmias Disruption of cardiac impulse formation or conduction can result in an arrhythmia [10,11]. Arrhythmias that originate in the atria or atrioventricular node are known as supraventricular arrhythmias, while those that originate in the ventricles are known as ventricular arrhythmias. Arrhythmias that are characterized by abnor-
Controlling cardiac arrhythmias: the relevance of nonlinear dynamics
209
mally slow rhythmic activity are known as bradycardia, while those that are characterized by abnormally fast rhythmic activity are known as tachycardia. Depending on the mechanism and location of the arrhythmia, the pumping capacity of the heart can be impaired to a life-threatening degree. While there are many types of arrhythmias, here we discuss two that are of particular interest to nonlinear dynamicists- abnormal automaticity and reentry.
3.1. Automaticity-related arrhythmias Automaticity is not unique to sinoatrial-nodal tissue- atrioventricular, Purkinje fiber, and other myocardial tissue can also fire spontaneously. Because the natural firing frequencies of these secondary pacemakers are usually slower than that of the sinoatrial node, secondary pacemakers, under normal conditions, are suppressed they are entrained by the sinoatrial-nodal rhythm and therefore fire only in response to the sinoatrial-nodal electrical impulse. However, on rare occasions, a secondary pacemaker, known as an ectopic focus, will overtake the sinoatrial-nodal rhythm and initiate a heartbeat (which has atypical conduction dynamics due to its abnormal initiation point) [10,11]. An ectopic focus may become a pacemaker when (1) its own rhythmicity becomes enhanced, (2) sinoatrial-nodal rhythmicity becomes depressed, or (3) all conduction pathways between the ectopic focus and the sinoatrial node are blocked. A persistent ectopic focus can produce a tachycardic rhythm, which may lead to a dangerous decrease in cardiac pumping efficiency. Alternatively, a single critically timed ectopic beat may cause the heart to succumb to another arrhythmia called reentry.
3.2. Reentrant arrhythm&s The normal cardiac impulse propagates from the sinoatrial node via a well-defined path and extinguishes once it has excited the ventricles. This electrical wave normally leaves behind a wake of refractoriness- tissue that cannot be reexcited immediately. On rare occasions, the propagating impulse will deviate from its typical path and propagate back into a previously depolarized area that has recovered excitability [10,11]. Depending on the conduction properties of the tissue, the impulse may propagate indefinitely around a reentrant circuit. This reexcitation, known as reentry, is an arrhythmia that is particularly interesting from a nonlinear dynamics perspective. Reentry often occurs following a myocardial infarction (a "heart attack"), an event in which one of the coronary arteries (the arteries that supply oxygenated blood to the myocardial tissue itself) becomes blocked such that cardiac cells are cut off from blood. The resulting oxygen deprivation (ischemia) can cause myocardial tissue death, leading to a scar of non-conducting tissue. A reentrant impulse might propagate around or through (via remaining strands of viable conducting tissue) such scar tissue. Reentry can also occur around normal anatomical structures such as valves. For example, atrial flutter is a reentrant supraventricular arrhythmia in which an electrical impulse propagates around the tricuspid valve [11]. Yet another form of reentry occurs when an abnormal atrio-ventricular conduction pathway
210
D.J. Christ&i et al.
exists in the normally nonconductive atrio-ventricular wall. This anatomical structure may allow the cardiac impulse to reenter the atria after propagating through the AV node and His-Purkinje system (see Fig. 2). The impulse reactivates the atria and sends another wave through the AV node, perpetuating the cycle. Reentry typically causes tachycardia which, depending on the rate and anatomical location of the circuit, can be life threatening. The stability of reentry is highly dependent on the conduction velocity and circuit l e n g t h - the rhythm can sustain itself as long as the "head" of the reentrant impulse does not catch up to the "tail". Destabilization of reentry is occasionally preceded by an oscillation [12-14] or an alternation [15,16] in the period of the rhythm. If the rhythm becomes destabilized, reentry may terminate and thereby restore the normal heartbeat. Alternatively, a reentrant wave may break into multiple wavelets of excitation, with each wave traveling into a distinct nonrefractory tissue region. This seemingly random excitation pattern is known as fibrillation. Fibrillation causes heart tissue to lose all synchronicity and rhythmicity - the muscle twitches spastically as if it were a bag of writhing worms. Fibrillation can occur either in the atria or the ventricles. Atrial fibrillation will generally lead to an abnormally fast and irregular ventricular rate, but is typically not life threatening. In contrast, ventricular fibrillation effectively eliminates the heart's ability to pump blood, a situation that leads to death within minutes if not corrected.
S
RA k.LA
,
ii .......)
"........122::..._[) ....... ORT
9
~
", .........-'7~:
/-Ji
i
" ...........
J
RV Fig. 2. A schematic showing normal conduction from the sinoatrial (SA) node, through the right and left atria (RA, LA), the atrioventricular (AV) node, and the right and left ventricles (RV, LV). (The bundle of His and Purkinje fibers are omitted from this figure for simplification.) An abnormal retrograde pathway between the right ventricle and atrium produces an orthodromic reentrant tachycardia. Reprinted with permission from Ref. [8].
Controlling cardiac arrhythmias: the relevance of nonlinear dynamics
211
3.3. Diagnosis of arrhythm& mechan&ms Accurate diagnosis of the mechanism of an arrhythmia is crucial given that different arrhythmia mechanisms have vastly different treatments. Furthermore, a treatment that is therapeutic for one type of arrhythmia can exacerbate another. The mechanism of an arrhythmia can often be diagnosed on the basis of body surface measurements of the heart's electrical signal (an electrocardiogram). Intensive study of these electrical signals over the past century has produced a wealth of interpretive techniques that are invaluable for the assessment of cardiac health. Indeed, clinical cardiac electrocardiography is one of the most widely used and informative medical procedures. However, sometimes different arrhythmia mechanisms produce indistinguishable surface electrocardiograms, making it difficult to obtain a definitive diagnosis. For example, tachycardia resulting from an ectopic pacemaker can sometimes be confused with reentry. To overcome such confusion, more detailed information about the heart's electrical activity can be obtained by threading catheters through blood vessels and placing electrodes at strategic locations on the inner surface of the heart. These electrodes are used to monitor local electrical activity and deliver electrical stimuli. Such assessment of the heart's response to electrical stimuli, known as electrophysiological testing, is used to assist arrhythmia diagnosis. There are several complex discriminatory techniques currently used during electrophysiological testing. 1 However, these techniques are often difficult, cumbersome, and time consuming. It may be possible to improve such discriminatory techniques by analyzing the dynamics of the arrhythmia subsequent to delivering a sufficiently large electrical stimulus through an intracardiac electrode at the appropriate time and location. Such a stimulus causes a time-shift of the rhythm called resetting. Recently, we developed a nonlinear dynamical diagnostic technique that identifies the different spatio-temporal symmetry properties of targets and spirals by measuring the reset-response to stimuli applied at two electrodes [17]. The diagnostic technique is based on the observation that in order to reset a rhythm, the wave generated by a stimulus must interact with the source of the rhythm (i.e., the ectopic pacemaker or the tip of the reentrant spiral wave). If the stimulation electrode is some distance away from the source, a stimulus given late in the phase of the rhythm will generate a wave that collides with a wave traveling away from the source. Because these waves annihilate each other, the source will remain unaffected by the stimulus and the rhythm will not be reset. On the other hand, a stimulus given early enough can generate a wave that has time to reach the source of the rhythm and reset it. Thus, there is a critical stimulus phase defined by the latest resetting stimulus. The critical stimulus phase is different for stimuli applied at different locations (unless the electrodes happen to be the same distance from the source). Using symmetry arguments, we have shown that in the case of the ectopic rhythm, the change in the critical stimulus phase for stimuli applied at two locations can be predicted from the activation delay between the two electrodes. This is not true for 1
Because of the complexity of these techniques, we refer the interested reader to Ref. [10].
212
D.J. Christini et al.
reentry. Thus, one can distinguish an ectopic rhythm from reentry by applying resetting stimuli interchangeably between two electrodes and comparing the change in critical stimulus phase with the predicted change based on activation delays. Because activation delays and resetting are routinely measured in cardiac electrophysiology clinics for other diagnostic purposes, cardiologists should be able to test the clinical viability of this diagnostic method in the near future. 4. Clinical methods of arrhythmia control
Due to the life-threatening nature of arrhythmias, the primary focus of clinical electrophysiology is arrhythmia elimination or suppression. Clinical cardiac electrophysiology is an evolving discipline characterized by rapidly improving arrhythmia identification and control techniques. Prior to the 1980s, the main method of arrhythmia control was antiarrhythmic drugs. However, the past decade has seen major advancements in other methods of arrhythmia management, most notably in the areas of radio-frequency ablation and implantable cardiac devices. In this section we discuss the advantages and disadvantages of each of these approaches to give dynamicists a better appreciation of the current state of the art in arrhythmia management.
4.1. Antiarrhythmic drugs For many years, the only method of controlling cardiac arrhythmias was pharmacologic therapy. While newer methods have overtaken medication as the preferred method of control for many types of arrhythmias, drugs are still important tools. Antiarrhythmic drugs function in a variety of different w a y s - some slow impulse initiation while others prolong refractoriness. Thus, with proper drug selection, a wide variety of arrhythmias can be managed [10,11]. One disadvantage of pharmacologic therapy is that drugs can be proarrhythmic, meaning that they can actually cause life-threatening arrhythmias (usually a different arrhythmia than that for which they were prescribed). As mentioned in Section 3.3, a drug that is therapeutic for one arrhythmia may be proarrhythmic for another, hence the importance of accurate diagnosis of arrhythmia mechanism prior to pharmacologic therapy. Furthermore, a mountain of accumulating evidence from extensive clinical trials indicates that newer therapies (ablation and implantable defibrillators; discussed in the next two sections) are often more effective than antiarrhythmic drugs.
4.2. Radiofrequency ablation Ablation, one technique that is supplanting pharmacologic therapy, is used if there exists a spatially identifiable area of cardiac tissue that is a vital component of the substrate underlying a cardiac arrhythmia (for example, an ectopic focus or one of the pathways of a reentrant arrhythmia), but is not necessary for normal conduction. In such a case, ablation, which uses energy to kill cardiac cells and make them nonconductive, destroys the arrhythmogenic substrate.
Controlling cardiac arrhythmias: the relevance of nonlinear dynamics
213
The ablation procedure is performed in the electrophysiological testing clinic where catheters are moved into the appropriate (depending on the nature of the arrhythmia) chambers of the heart and positioned against the heart wall to monitor or stimulate the tissue. Electrogram monitors, fluoroscopic imaging, and a sequence of stimulating patterns are used to determine the precise location of the arrhythmogenic substrate to be ablated. At that point, alternating radiofrequency energy is passed through the appropriate electrode, resulting in thermal tissue damage [10]. The ablation is continued until the arrhythmogenic tissue has been destroyed and the arrhythmia can no longer be induced by the appropriate stimulation pattern. Unlike other therapies, when ablation is effective, it is a cure, i.e., the arrhythmia cannot recur because the arrhythmogenic substrate has been eliminated. For this reason, ablation is often the therapy of choice for eliminating certain arrhythmias such as ectopic tachycardia and certain reentrant tachycardia. Precise localization of the arrhythmogenic tissue can be difficult. In the case of an ectopic focus, the most commonly used localization strategy is simple: find the position of earliest activation because that site must correspond to the source of the rhythm. In practice, this requires a lengthy procedure where electrodes are dragged over the endocardial surface until an early activation site is found. The placement of electrodes is directed by a process of trial and error. As a result, it is often necessary to apply radiofrequency current at several early activation sites before locating the ectopic focus. Recently, we proposed a localization technique by which the site of origin of an ectopic depolarization could be estimated from the activation sequence of electrodes placed away from the ectopic focus [18]. This technique is similar to "sound-ranging", an artillery localization strategy developed during World War I. Sound-ranging was devised for locating long-range enemy guns by listening to a gunshot at three locations. Because the listening posts were different distances from the gun, the gunshot was heard at different times. The listening posts were connected electrically so that the three detonations could be recorded on one chronograph. By measuring the time delays between the recorded gunshots, soldiers could compute the location of the enemy gun. In our method, the ectopic focus and intracardiac electrodes are analogous to the gun and listening posts, respectively. Subsequent to an ectopic depolarization, a time delay between activation of a pair of electrodes implies that one of the electrodes is closer to the ectopic focus than the other. The difference in distance from the ectopic focus is given by the product of the measured time delay and the conduction velocity. Because the curve defined by a fixed differential distance between two points is a hyperbola, the ectopic focus lies somewhere on a hyperbola defined by the time delay between activation of the two electrodes. By measuring the time delay to activation of a third electrode, the predicted location of the ectopic focus is at the intersection point of two hyperbolae in an endocardial coordinate system. Once a putative ectopic pacemaker location has been identified, how can we be sure that we have found the correct position before delivering radiofrequency energy? In the case of an on-going ectopic rhythm, the resetting response of the ectopic pacemaker to stimuli applied at an outlying electrode can be used to estimate the distance to the pacemaker. By ensuring that the distance to the pacemaker is less
214
D.J. Christini et al.
than the ablation-lesion radius (approximately 3 ram), we can confirm that the ablation electrode is close enough to the ectopic focus to help ensure a successful ablation. In spite of the frequent success of ablation therapy, this procedure is often infeasible for the treatment of ventricular arrhythmias. One obstacle is the considerable thickness of the ventricular wall. (Supraventricular ablation is not hindered in this manner because the atrial wall is relatively thin.) Ablation lesions, which are applied to the endocardial tissue, extend only a finite distance beyond the tissue surface. Thus, if the arrhythmia substrate is deep in the heart muscle, ablation may not be successful. Additionally, for ventricular tachycardia resulting from ischemia, ablation is often hindered by the spatially extended and discontinuous nature of the scar tissue. Because of this, there are often multiple large regions of arrhythmogenic substrate- making ablation impractical. In short, while ablation is a powerful technique, it is practical for only a subset of arrhythmias.
4.3. Implantable cardiac devices For bradycardias or life-threatening ventricular arrhythmias, recent advancements in implantable cardiac devices offer promising therapeutic options [10,11]. Implantable cardiac devices are implanted above the pectoral muscle in the chest. Stimulation and detection electrode wires (leads) run from the device into the subclavian vein and pass through the right atrium into the right ventricle. The leads are secured to the myocardial wall via a remotely operated screw anchor or a passive anchor similar to a grappling hook. The simplest implantable cardiac device is a pacemaker. A pacemaker paces (i.e., applies electrical stimuli at regular intervals) the heart during bradycardia. Although the most basic pacemakers deliver a regular periodic input to the atria and/or ventricles, current pacemakers are usually more complex and typically include a sensing function to detect and adjust for spontaneous heartbeats. For example, in a demand pacemaker a stimulus is delivered only if the heart does not beat spontaneously within an allotted time window since the last beat. In a dual-chamber pacemaker, stimuli can be delivered to both the atria and the ventricles (via separate leads). One type of dual-chamber pacemaker delivers a stimulus to the atrium, and then after a fixed atrioventricular interval (e.g., 150 ms), the device delivers a stimulus to the ventricle. Both sites might have sensing functions which would inhibit stimulation of one or both chambers of the heart if a normal activation occurs. Devices for the control of tachycardia and/or fibrillation are known as implantable cardiac defibrillators. Implantable cardiac defibrillators monitor the heart's rhythm and typically use some form of rate analysis to detect arrhythmias. If tachycardia is detected, anti-tachycardia devices generally use relatively simple pacing algorithms to attempt to eliminate the tachycardia. One common algorithm paces the heart at a fixed percentage of the tachycardia rate for a preset number of stimuli. If the tachycardia persists, the pacing is repeated at a slightly faster rate. If the tachycardia continues to persist, the device shifts into defibrillation therapy.
Controlling cardiac arrhythmias: the relevance of nonlinear dynamics
215
Defibrillation therapy (for the above scenario or for detected fibrillation) uses a large voltage shock to reset the electrical activity of the entire heart. If the shock fails, a higher-energy shock is delivered. This procedure is continued for a preset number of attempts. Defibrillation shocks are extremely painful, having been compared to being kicked in the chest by a horse. Fortunately, this is usually irrelevant because the arrhythmia causes the patient to faint prior to the shock delivery. Even if the patient is still conscious at the time of the shock, the pain from the shock is better than the usual alternative: death. Implantable cardiac defibrillators are revolutionary medical inventions which have saved thousands of lives. That being said, because defibrillation requires powerful battery-draining shocks that can cause pain if inappropriately applied and because antitachycardia pacing efficacy decreases as tachycardia rate increases [19], there is clearly room for enhancement of device control algorithms. Current algorithms are dynamically simplistic (brute force for defibrillation; simple ramp or burst patterns for antitachycardia) and utilize little, if any, feedback information regarding their beat-to-beat effects on the arrhythmia - the algorithms are not truly adaptive, i.e., the rates, intervals, thresholds, and other parameters do not adapt automatically on a beat-to-beat basis. (They are typically programmed, via radiofrequency communication between the device and a manufacturer-specific programming computer, at the time of the implant or during follow-up visits.) In short, the simplicity of antitachycardia therapy and the brute force of defibrillation therapy suggests that nonlinear dynamics can be exploited to produce more elegant arrhythmia control techniques.
5. Nonlinear dynamical arrhythmia control Due to the nonlinear dynamical nature of arrhythmias, it is conceivable that control improvement could be achieved by utilizing algorithms that attempt to terminate reentrant cardiac arrhythmias by adaptively exploiting their underlying nonlinear dynamics (via beat-to-beat feedback). Several recent studies have provided evidence that such an approach may be possible via a class of control techniques known as chaos control or model-independent control. Unlike well-established model-based feedback control techniques [20] that utilize a system's governing equations (i.e., an analytical system model) to control the dynamics of a system, chaos control techniques are model-independent, i.e., they do not require explicit knowledge of a system's underlying equations. Model-independent techniques extract necessary quantitative information from system observations and then use this extracted information to exploit the system's inherent dynamics to achieve a desired control result. Thus, these techniques are inherently well suited for the control of physiological systems, for which quantitative models are typically unavailable or incomplete. Chaos control has been used to stabilize a wide range of physical systems [21-27]. Such success has fostered interest in applying model-independent control to cardiac arrhythmias. In the first such application, Garfinkel et al. [4] stabilized in vitro drug-
216
D.J. Chr&tini et al.
induced irregular cardiac rhythms in a section of tissue from the interventricular septum of a rabbit heart. Their control technique used precisely timed electrical stimuli to cause the cardiac cells to fire at a specified time in an effort to replace the irregular firing pattern with a lower-order periodic rhythm. In the idealized situation presented in Ref. [4], the stimuli achieve control by placing the system state point (the interbeat interval) onto the stable manifold (a state-space vector from which state points are attracted to the unstable periodic orbit) of a desired unstable periodic orbit. The system's state point should then obey the natural attracting dynamics of the stable manifold and move towards the unstable periodic orbit. Repeating this procedure whenever the state point is repelled from the unstable periodic orbit (as is the nature of unstable periodic orbit dynamics) should hold the state point near the orbit, i.e., regularize the interbeat intervals. Because the stable manifolds in the preparations of Ref. [4] were approximately horizontal, the chaos control algorithm had certain characteristics similar to a demand p a c e m a k e r provided that the preceding interval was in some fixed range, interbeat intervals were truncated (via stimuli) to a value that placed them onto the stable manifold. If the dynamics are governed by a nonlinear map, this is equivalent to inserting a flat portion into the map in a manner that is expected to stabilize periodic orbits and fixed points [28]. While the control results of Ref. [4] were promising, many questions were left unanswered, primarily because period-1 control was not achieved and the mechanism of control was not established [28-33]. However, the study has fostered intense interest in applying model-independent control to cardiac arrhythmias. Much attention, including the aforementioned rabbit heart control trials of Ref. [4], has been given to chaos control of fibrillation. While suppressing fibrillation is the holy grail of cardiac arrhythmia management, there are several formidable technical hurdles standing in its path. First, fibrillation is spatiotemporal and highdimensional, meaning that its dynamics occur in both space and time and multiple variables are required for full characterization. There exist spatiotemporal [34-37] and high-dimensional [27,38,39] chaos control algorithms, but none that appear to be appropriate for the heart, given the current technologies. For example, some spatiotemporal algorithms require perturbations applied to multiple sites, an approach that would be very difficult to incorporate into an implantable device. Second, fibrillating heart tissue degenerates rapidly because of ischemia. Thus, it is likely that the electrophysiological dynamics of fibrillating tissue are unstable and rapidly evolving. A control technique would either have to recognize and eliminate fibrillation rapidly (the ideal situation), or be able to adapt to the nonstationary dynamics. In part because of the technical difficulties of model-independent control of ventricular fibrillation, we have investigated the control of regular cardiac arrhythmias such as reentry. Termination of reentry is vital given that reentrant arrhythmias can be lethal on their own or can destabilize and degenerate into fibrillation (in which case reentry termination could prevent such fibrillation). Additionally, because termination of reentry typically requires a lower-energy defibrillatory shock than termination of fibrillation, it may be beneficial in some cases
Controlling cardiac arrhythmias: the relevance of nonlinear dynamics
217
to use low-energy pacing to stabilize reentry prior to defibrillation. Furthermore, model-independent control of such rhythms would be an important proof of concept (especially given the uncertainties surrounding previous biological control work [28,29]) that nonlinear dynamical control techniques can control real cardiac dynamics. 5.1. Control of reentrant atrioventricular nodal conduction dynamics
To investigate model-independent control of regular cardiac arrhythmias, we have focused on stabilizing reentry associated with the abnormal atrio-ventricular conduction pathway described in Section 3.2 and illustrated in Fig. 2. Specifically, reentry can be simulated via a protocol called fixed-delay stimulation (see Fig. 3) in which the right atrium is stimulated at a fixed time interval (corresponding to the desired abnormal pathway conduction time) following detection of the cardiac impulse at the bundle of His. By reducing the fixed delay (simulating faster reentry), the rhythm becomes destabilized and the conduction time through the AV node
S
RA ~LA
~ stimulator .........................~ AV measurement
I.
A
i~o' o............. ~ /!
,,,,,
delay HA computer
z
g
\
J
f RV
Fig. 3. A schematic showing normal conduction from the sinoatrial (SA) node, through the right and left atria (RA, LA), the atrioventricular (AV) node, and the right and left ventricles (RV, LV). (The bundle of His and Purkinje fibers are omitted from this figure for simplification.) In the absence of the type of abnormal retrograde pathway depicted in Fig. 2, orthodromic reentrant tachycardia can be simulated (as depicted by the loop containing the computer) by fixed-delay stimulation of the right atrium (at time A) at an interval HA following detection of His-bundle activation (at time H). Reprinted with permission from 9 Ref. [8].
218
D.J. Chrbstini et al.
bifurcates and alternates in a fast-slow-fast-slow pattern known as alternans [40-43]. As mentioned in Section 3.2, this kind of alternation or oscillation of the reentry period has been observed prior to destabilization of other reentrant arrhythmias and may occur prior to degeneration into fibrillation. Since such alternans appear to be caused by a period-doubling bifurcation [15,16], there must be an unstable period-1 rhythm (i.e., a constant AV-nodal conduction time) between the alternans branches. Importantly, such an unstable periodic rhythm is the fundamental requirement of model-independent control. With this in mind we attempted, using a mathematical model [15] of AV-nodal conduction, to apply model-independent control to suppress alternans, i.e., stabilize the underlying unstable period-1 rhythm. We implemented a control scheme based on a one-dimensional map simplification of the original Ott-Grebogi-Yorke modelindependent chaos control technique [44]. This control technique is applicable to systems that can be described effectively by a one-dimensional map x,+l = f(x,,p,), where xn is the current value (scalar) of one measurable system variable (for alternans, x is AH) 2, Xn+l is the next value of the same variable, and p, is the value (scalar) of an accessible system parameter p at index n (for alternans, p is HA) 3. The control technique attempts to stabilize the system's state point ~ = [x,,x,+l] r about a flip-saddle unstable periodic fixed point ~*-[x*,x*] r by perturbing p such that pn =/3 + 8pn, where/5 is the nominal parameter value, and 8pn is a perturbation [22,45-48] given by 8p, _ xx_. , _______~. -
(1)
x~, is the current estimate of x* (taken in Ref. [5] as the midpoint of x~,_1 and the previous value of the state variable xn-1), and g is the control sensitivity. As described below in conjunction with Fig. 6, the initial perturbations attempt to force ~n onto ~*; subsequent perturbations then hold the state point near that unstable periodic fixed point. To eliminate alternans in Ref. [5], feedback perturbations (Eq. (1)) were applied to the HA interval. Because alternans result from abnormally fast AV-nodal excitation, such as that which occurs during reentry, we only allowed HA to be shortened - otherwise control stimuli would be preempted by the natural reentrant impulses. Fig. 4 shows a control trial where the AH interval bifurcated into alternans, and then was successfully moved into the unstable period-1 rhythm by HA-interval perturbations. Once the AH interval had been moved into the period-1 rhythm, small HA-interval perturbations (as shown in the insets in Fig. 4b) maintained the period-1 rhythm. The necessity of these small perturbations is shown by the fact that the system, due to the unstable nature of the period-1 fixed point, returned to the alternans rhythm when control was terminated. 2
The AH interval, which is the time between a given atrial activation and the corresponding Hisbundle activation, represents the AV-nodal conduction time.
3
The HA interval, which is the time between one His-bundle activation and the next atrial activation, represents the AV-nodal recovery time.
Controlling cardiac arrhythmias." the relevance of nonlinear dynamics
a
219
160 N - No control C - Control
140
i
!
|
=
9
AH 120 (ms) 1 O0 I
800
48
2000
3()00 4600
-~HA ~~3990
46 HA 44 (ms)
5000
,1' 1' .... : .... tI00
4010
n
9 4H~[ ..-,.-..-,r..t...-.-., -...,, ~0.02 1000 n 11 O0
42~ 40 0
I
1000
t,
1000
i
2000
3000
4000
5000
beat number (n) Fig. 4. (a) AH and (b) HA intervals as functions of beat number n for suppression of alternans in a rabbit heart model [15]. The two insets are magnified plots of the HA intervals at the beginning of control (n = 1000- 1100) and the end of control (n - 3990- 4010). Motivated by the computational control results of Ref. [5], we used a similar control technique to suppress alternans in a series of in vitro rabbit heart experiments [7]. In five rabbit heart preparations, alternans were induced using fixed HA interval pacing and then suppressed with model-independent control. Control resuits from two different preparations are shown in Fig. 5. Each panel shows AH with and without control along with the corresponding values for the control parameter HA. Fig. 5a shows the control results of one preparation. The top trace of Fig. 5a, shows that the underlying unstable period-1 fixed point AH* was stabilized while control was active (266~
220
D.J. Christini et al.
a
150
130
125
120
AH 100 (ms)
m
75
m
(ms)
IIIII Ill
100 I
50 I 0 b
',
200
I
I
I
400
600
800
J90 1000 180
140 120
110 HA
HA
HA
~ ~
~
. . . .
160 HA
AH (ms)
140
100 80 0
I 1O0
I 200 beat number (n)
I 300
(ms)
120
Fig. 5. Control of alternans in two isolated rabbit heart preparations. (a) AV nodal conduction times AH (filled circles, top trace; left y-axis) and delay times HA (open diamonds, bottom trace; right y-axis) in the first preparation. Control is implemented from beat 266 to 787 (2 min). (b) AH (filled circles, bottom trace; left y-axis) and HA (open diamonds, top trace; right y-axis) in the second preparation. The first control attempt was implemented from beat 79 to 134 with 91 = 0.3. The second control attempt was implemented from beat 219 to 255 with 92 = 0.2. Reprinted with permission from Ref. [8].
Fig. 5b shows the control results of another preparation. For this preparation, the first control attempt failed to eliminate the alternans. Control failed because 9 was too large (i.e., the perturbations were too small). Thus, while the alternans magnitude was reduced (bottom trace of Fig. 5b), the perturbations made to HA (top trace of Fig. 5b) were not large enough to stabilize the unstable period-1 fixed point. However, the next attempt, with a smaller 9, was successful. This observation prompted us to investigate the stability of the controlled system. As previously mentioned, the control algorithm is restricted to shortening of the pacing interval HA. Similarly, all nonlinear dynamical biological control studies to date have been restricted to unidirectional perturbations. As we shall see, this restriction introduces some surprising complexity to the stability properties of the controlled system. In the experimental control of alternans, we estimated the fixed point as the midpoint of the previous two A H intervals (i.e., x ~ - 0.5(x, + Xn-1) rather than x* - 0 . 5 ( x * 1 + x,_l) as in the numerical simulations of Ref. [5]). This slight modification of the algorithm enabled the following stability analysis of the controlled system.
221
Controlling cardiac arrhythmias." the relevance of nonlinear dynamics
Linearizing the controlled system about a fixed point at the origin gives: x~+l = Ax~ + O ~ ( y ~ - xn),
(2)
Yn+l -- Xn~
where A - ~ f / ~ X
and J 3 - ~ ( ~ f / S p ) , where both derivatives are evaluated at the
fixed point. In the case of our cardiac experiments, A < - 1 and [3 < 0. The On term implements the unidirectional restriction of perturbations: On-
1 0
i f ( X n - Yn) > 0, otherwise.
(3)
Thus, when On -- 1 the control is turned on, and when On = 0 the control is turned off. Geometrically, the restriction of Eq. (3) means that perturbations will only be applied if ~n lies above the line of identity Xn+l = xn (the dotted diagonal line in Fig. 6). For example, Fig. 6a shows how the restricted control algorithm of Eq. (3) stabilizes the unstable fixed point x* in a linear system with A = - 4 and [5 = -3.1. The solid line corresponds to the uncontrolled system and the dashed lines correspond to the perturbed system, x* is at the intersection of the solid and dotted lines. In this case, the first control intervention causes the next iterate to fall below the line of identity, implying that the next iterate will be uncontrolled. Furthermore, because the first controlled iterate was less than the fixed point, the next iterate will be above the identity line. Thus, control is applied to every other iterate so that the sequence of On is 0101 .... In this case, the fixed point is stabilized because control directs the
.... \ ' ~ ...-'"" \
9 "
:~
-~ ~
..-""
\
~
o~ ~176176 ~176176176176
,~,
, X,
X,
X,
Fig. 6. Return maps showing the progression of control sequences as J3 is decreased for the linear system of Eq. (2). The dotted diagonal line is the identity line and the solid line is the uncontrolled system with slope A -- -4. The dashed lines show the system when perturbed by control interventions. (a) [5 = - 3 . 1 results in a stable 01 control sequence. (b) [5 = - 3 . 2 3 results in a stable 001 sequence. (c) [3 = -3.4 gives an unstable 001 sequence. (d) [5 = -5.76 gives a stable 011 sequence. (e) [5 = -5.798 gives a stable 0011 sequence. (f) 13= -5.82 gives an unstable 0011 sequence.
222
D.J. Chr&t&i et al.
system closer to the fixed point. For larger values of ]3 the fixed point is not stabilized, but the oscillatory growth of xn is slowed (not shown). In our cardiac experiments [7], if the gain parameter g is decreased, then [3 becomes more negative. Fig. 6b shows the situation when [3 (Eq. (2)) is decreased to -3.23. As in the previous example, the first controlled iterate is below the line of identity (implying that the next iterate is uncontrolled). However, in this case the perturbation is bigger and the controlled iterate is slightly larger than the fixed point. This means that the next iterate is also below the line of identity and the control is applied in a 001001... sequence. This sequence is stable for [3 = - 3 . 2 3 because the first controlled iterate directed the system closer to the fixed point. But if 13 is decreased to -3.4, we see from Fig. 6c that the same control sequence is unstable because the system is directed away from the fixed point. If g is further decreased, then a new control sequence is achieved. Fig. 6d shows a stable 011011... sequence for [3 = -5.76. In this case, the first perturbation is so large that the controlled iterate is also above the line of identity, implying that the next iterate is also controlled. The second controlled iterate is below the line of identity and below the fixed point, thereby giving the 011011... sequence. In this case, the fixed point is stabilized because the second controlled iterate is directed closer to the fixed point. However, larger values of [3 give an unstable 011011... sequence (not shown) because the second controlled iterate is directed farther from the fixed point. Just like the transition from the stable 0101... sequence to the stable 001001... sequence depicted in Fig. 6a,b, there is a transition from the stable 011011... sequence to the stable 00110011 ... sequence as [3 is further decreased (Fig. 6e shows the case for 1 3 - - 5 . 7 9 8 ) , and the 00110011... sequence becomes unstable as [3 is decreased still further (Fig. 6f depicts [ 3 - -5.82). For the linear system of Eq. (2), the progression of unstable and stable periodic control sequences continues indefinitely as [3 is decreased. In other words, the switching parameter | imposes the following progression of control sequences: unstable 01k, stable 01k, stable 001k, unstable 001k, unstable 01k+l, ..-, where lk denotes k control perturbations in a row before the sequence repeats and k progresses from one to infinity as [3 is decreased from zero. Fig. 7 shows the stability zones for the linearized system. While there are an infinite number of stable zones corresponding to k (arbitrary number k) consecutive control perturbations, the stability zones are bounded by the curve [3 -- A - 2 - 2v/1 - A . Although our experimental cardiac system was not linear, application of the restricted control algorithm resulted in several of the control sequences predicted in the above linear system. These control sequences were especially clear at the initiation of the control where the control perturbations were relatively large and therefore easy to see. For example, Fig. 8a shows AH and the control parameter HA during an unstable 01 sequence. This example corresponds to the initiation of the first control sequence of Fig. 5b (the baseline values are shifted to assist in plotting the data). The first controlled beat is indicated by the arrow and corresponds to a negative perturbation of HA (all control perturbations are negative as imposed by the switching
Controlling cardiac arrhythmias: the relevance of nonlinear dynamics -4
-3
-2
Ol 001
223 -I "10
01
-2
-4
A-2-2(1-A
-6
A Fig. 7. Stability zones of restricted control of the linear system of Eq. (2). The k - 1 and k = 2 stability zones are indicated. The grey lines inside the zones mark the transition from 01~ to 001k. The dots indicate that there is an infinite number of stability zones for larger number k perturbations in a row. The infinite sequence of stability zones is bounded by the curve ~ = A - 2 - 2~1 - A.
term | Because the system is nonlinear, oscillatory growth of AH is quenched and the original large amplitude alternation of AH is reduced in m a g n i t u d e - but not eliminated. However, as shown in Fig. 8b (corresponding to the second control sequence shown in Fig. 5b) when g is decreased the system shifts to a stable 001 sequence that eliminates the alternation of AH. The first controlled beat is indicated by the arrow. After the fourth perturbation of HA (beat 323), the system shifts to a stable 0101 sequence. This shift is due to the close proximity of these stable zones (Fig. 7) and any noise or drift in the system can cause such transitions. Fig. 8c shows a stable 0011 control sequence that eliminated the alternation of AH in the preparation corresponding to Fig. 5a. Again, the system switches to its adjacent stable 011 control sequence shortly after the control was initiated. In fact, as the control progresses we see that the fixed point becomes transiently destabilized via an unstable 011 sequence before restabilizing in a 0011 pattern. This figure shows that successful control is dependent on proper a priori definition of the proportionality constant g. Such dependence, and the corresponding inability to ensure effective control in the presence of the nonstationarities typical of biological systems, is a critical limitation given that clinical applications cannot afford control failure of the type shown in the first control attempt of Fig. 5b. To eliminate this shortcoming, considerable work, as discussed in the next section, has been done on adaptive estimation of the proportionality constant.
5.2. Algorithmic improvements necessary for real-world arrhythmia control Model-independent control of cardiac arrhythmias is hindered by a number of obstacles. Some apply primarily to fibrillation control (such as its high dimensio-
D.J. Christini et al.
224
a
]oo
AH (ms) . . . . . . -
HA (ms)
t ou 40
150
160
170
80
AH (ms) 80t-............
HA (ms)
6O 5O
280
~290
310
300
320
tAH (ms) 'HA (ms) IO0. 230
I. 250
~
~
270 290 beat number n
m 310
Fig. 8. Experimentally observed control sequences for rabbit-heart alternans suppression trials [7]. The first control perturbation in each trial is indicated by the arrow. (a) An unstable 01 sequence corresponding to the initiation of the first control sequence of Fig. 5b. (b) The same preparation with decreased gain 9 (corresponding to the second control sequence shown in Fig. 5b). In this case the control begins in a stable 001 sequence and shifts to a stable 01 sequence. (c) Transitions between stable and unstable control sequences in a different preparation (Fig. 5a). In this case, the system evolved from a stable 0011 sequence to a stable 011 sequence to an unstable 011 sequence and back to a stable 0011 sequence. nality and spatiotemporal nature, as mentioned in Section 5), while others, such as the need for adaptive algorithms, apply to the control of all cardiac arrhythmias. In an effort to improve model-independent cardiac arrhythmia control, we have focused our attention on improving adaptability. Specifically, we have developed a real-time, adaptive, model-independent control technique (the R T A M I technique) for chaotic and nonchaotic systems [49,50]. This technique overcomes the impractical requirement of trial-and-error estimation (as was required for the trial shown in Fig. 5b) by adaptively estimating both x* and g on-the-fly (Eq. (1)). Furthermore, this adaptability allows for the control of nonstationary systems, which are characterized by parameters that drift over time. As described in Ref. [7], control can be achieved for non-ideal values of g in the range ]glmin--< 1ol_ [glmax (for example, the stable k = 1 zone in
Controlling cardiac arrhythmias." the relevance of nonlinear dynamics
225
Fig. 7). 4 When control is initiated, g can be set to an arbitrary value (with the restriction that the sign of g must be correct or control will fail). After each measurement of x, x* is estimated as the average of N previous values of x. At each iteration, after x* is re-estimated, the magnitude of g is adapted in accordance with the expected perturbation dynamics [49]. If 9 = gideal, then the perturbation moves the state point from its current position ~n to ~*. If [gl is too large (i.e., 8p is too small), then the state point moves from its current position ~n to a position closer to ~* than would be expected without a perturbation, but not as close to ~* as for g : gideal. If [gl is too small (i.e., 8p is too large), then the state point moves from its current position ~, to a position on the same side of the line of identity. (This is in contrast to the expected alternation, due to the flip-saddle nature of ~*, of consecutive state points on either side of the line of identity.) If consecutive state points fall on the same side of the line of identity, then Ig[ is increased. Otherwise, ]gl is decreased if it is determined that ~,, is not converging to ~,*. Together, these criteria ensure that the RTAMI technique moves g into the range [7] necessary for control. In recent work, we have developed a computationally simple, yet robust, adaptive control technique [51]. Similar to the RTAMI technique's utilization of the state-point line-of-identity alternation, this technique keeps track of the perturbation history defined by 19,. If l~n_ 4 to l~n_ 1 has the pattern 0101 or 1010, then ]gl is decreased, otherwise Igl is increased. This criterion ensures that the system remains in the stable k = 1 zone depicted in Fig. 7. This technique has been used to stabilize the alternans model of Ref. [15], the nonchaotic and chaotic H6non map, and a drifting quadratic map [51]. Similar adaptive control algorithms will be vital for nonlinear dynamical termination of lethal reentrant arrhythmias. One reason why adaptive techniques will be necessary is that it is not possible, with existing technology, to determine the specific topology (and therefore dynamics) of a particular arrhythmia. Another reason is that arrhythmias may be nonstationary. Adaptive techniques, through their utilization of on-the-fly estimation and re-estimation of dynamical parameters, are well suited for overcoming both of these obstacles. Thus, in addition to proving that model-independent techniques can be used to control real biological systems, the aforementioned AV-nodal conduction control trials served as important means for developing and refining adaptive control algorithms. Such algorithms could serve as the foundations for clinically useful techniques that terminate reentrant arrhythmias by adaptively exploiting their nonlinear dynamics.
6. Prospects In this chapter, we have shown how nonlinear dynamics is being used to diagnose, locate, analyze, and/or control arrhythmias in simple theoretical and experimental 4
Prior to control, it is not possible to determine groin or gmax without an analytical system model or a learning stage.
226
D.J. Christini et al.
systems. To date, nonlinear dynamics have not had a major impact in generating clinically relevant therapies. However, the obvious nonlinear dynamical aspects of the heart and their parallels with the dynamics of well-defined physical systems, the innovative research being carried out on many fronts, and the importance of heart disease to the human condition, should lead to exciting developments in coming years.
Acknowledgements We thank Jacques Billette for discussion and for assisting with experiments and David Slotwiner for valuable suggestions. This work was supported by the American Heart Association (0030028N) (DJC), the Medical Research Council of Canada (KH and LG), the Canadian Heart and Stroke Foundation (KH and LG), the Natural Sciences and Engineering Research Council (Canada) (KH and LG), and the National Science Foundation (JJC). References 1. Glass, L., Hunter, P. and McCulloch, A. eds (1991) Theory of Heart: Biomechanics, Biophysics, and Nonlinear Dynamics of Cardiac Function. Springer, New York. 2. Panfilov, A.V. and Holden, A.V. eds (1997) Computational Biology of the Heart. Wiley, Chichester, UK. 3. Winfree, A.T. (1998) CHAOS 8, 1-19. 4. Garfinkel, A., Spano, M.L., Ditto, W.L. and Weiss, J.N. (1992) Science 257, 1230-1235. 5. Christini, D.J. and Collins, J.J. (1996) Phys. Rev. E 53, R49-R52. 6. Brandt, M.E. and Chen, G. (1996) Int. J. Bifur. Chaos 6, 715-723. 7. Hall, K., Christini, D.J., Tremblay, M., Collins, J.J., Glass, L. and Billette, J. (1997) Phys. Rev. Lett. 78, 4518-4521. 8. Christini, D.J., Stein, K.M., Markowitz, S.M., Mittal, S., Slotwiner, D.J. and Lerman, B.B. (1999) Heart Disease 1, 190-200. 9. Berne, R.M. and Levy, M.N. (1993) Physiology, 3rd Edn. Mosby-Year Book, St. Louis, Missouri. 10. Josephson, M.E. (1993) Clinical cardiac electrophysiology: Techniques and interpretations, 2nd Edn. Lea & Febiger, Philadelphia, PA. 11. Prystowsky, E.N. and Klein, G.J. (1994) Cardiac Arrhythmias: An Integrated Approach for the Clinician. McGraw-Hill, New York. 12. Frame, L.H. and Simson, M.B. (1988) Circulation 78, 1277-1287. 13. Ito, H. and Glass, L. (1992) Physica D 56, 84-106. 14. Courtemanche, M., Glass, L. and Keener, J.P. (1993) Phys. Rev. Lett. 70, 2182-2185. 15. Sun, J., Amellal, F., Glass, L. and Billette, J. (1995) J. Theor. Biol. 173, 79-91. 16. Amellal, F., Hall, K., Glass, L. and Billette, J. (1996) J. Cardiovas. Electrophys. 7, 943-951. 17. Hall, K. and Glass, L. (1999) Phys. Rev. Lett. 82, 5164--5t67. 18. Hall, K. and Glass, L. (1999) J. Cardiovas. Electrophys. 10, 387-398. 19. Rosenqvist, M. (1995) Pacing Clin. Electrophys. 18, 592-598. 20. Nise, N.S. (1992) Control Systems Engineering. The Benjamin/Cummings Publishing Company, Inc., Redwood City, California. 21. Ditto, W.L., Rauseo, S.N. and Spano, M.L. (1990) Phys. Rev. Lett. 65, 3211-3214. 22. Hunt, E.R. (1991) Phys. Rev. Lett. 67, 1953-1955. 23. Roy, R., Murphy, T.W., Jr. Maier, T.D., Gills, Z. and Hunt, E.R. (1992) Phys. Rev. Lett. 68, 12591262. 24. Petrov, V., Gfispfir, V., Masere, J. and Showalter, K. (1993) Nature 361, 240-243.
Controlling cardiac arrhythm&s: the relevance of nonlinear dynamics
227
25. Htibinger, B., Doerner, R., Martienssen, W., Herdering, W., Pitka, R. and Dressier, U. (1994) Phys. Rev. E 50, 932-948. 26. Jan de Korte, R., Schouten, J.C. and van den Bleek, C.M. (1995) Phys. Rev. E 52, 3358-3365. 27. Christini, D.J., Collins, J.J. and Linsay, P.S. (1996) Phys. Rev. E 54, 4824-4827. 28. Glass, L. and Zeng, W. (1994) Int. J. Bifur. Chaos 4, 1061-1067. 29. Christini, D.J. and Collins, J.J. (1995) Phys. Rev. Lett. 75, 2782-2785. 30. Sauer, T. (1997) Fields Inst. Comm. 11, 63-75. 31. Christini, U.J. and Collins, J.J. (1997) CHAOS 7, 544-549. 32. Kaplan, D.T. (1999) Preprint. 33. Christini, D.J. and Kaplan, D.T. (1999) Los Alamos National Laboratory E-Print Archive chao-dyn/ 9901023. 34. Auerbach, D. (1994) Phys. Rev. Lett. 72, 1184-1187. 35. Hu, G. and Qu, Z. (1994) Phys. Rev. Lett. 72, 68-71. 36. Petrov, V., Crowley, M.F. and Showalter, K. (1994) J. Chem. Phys. 101, 6606-6614. 37. Biktashev, V.N. and Holden, A.V. (1998) CHAOS 8, 48-56. 38. Auerbach, D., Grebogi, C., Ott, E. and Yorke, J.A. (1992) Phys. Rev. Lett. 69, 3479-3482. 39. Ding, M., Yang, W., In, V., Ditto, W.L., Spano, M.L. and Gluckman, B. (1996) Phys. Rev. E 53, 4334-4344. 40. Curry, P.V.L. and Krikler, D.M. (1976) Br. Heart J. 38, 882. 41. Sung, R.J. and Styperek, J.L. (1979) Circulation 60, 1464-1476. 42. Simson, M.B., Spear, J.F. and Moore, E.N. (1981) Am. J. Physiol. 240, H947-H953. 43. Ross, D.L., Dassen, W.R.M., Vanagt, E.J., Brugada, P., B~ir, F.W.H.M. and Wellens, H.J.J. (1982) Circulation 65, 862-868. 44. Ott, E., Grebogi, C. and Yorke, J.A. (1990) Phys. Rev. Lett. 64, 1196-1199. 45. Peng, B., Petrov, V. and Showalter, K. (1991) J. Phys. Chem. 95, 4957-4959. 46. Petrov, V., Peng, B. and Showalter, K. (1992) J. Chem. Phys. 96, 7506-7513. 47. Pyragas, K. (1992) Phys. Lett. A 170, 421-428. 48. Gauthier, D.J., Sukow, D.W., Concannon, H.M. and Socolar, J.E.S. (1994) Phys. Rev. E 50, 23432346. 49. Christini, D.J. and Collins, J.J. (1997) IEEE Trans. Circuits and Systems-I 44, 1027-1030. 50. Christini, D.J., In, V., Spano, M.L., Ditto, W.L. and Collins, J.J. (1997) Phys. Rev. E 56, R3749R3752. 51. Hall, K. (11999) Control of abnormal heart rhythms. Ph.D. thesis. McGill University.
This Page Intentionally Left Blank
CHAPTER 7
Controlling the Dynamics of Cardiac Muscle Using Small Electrical Stimuli D.J. G A U T H I E R
S. BAHAR* and G.M. HALL**
Department of Physics, Department of Biomedical Engineering, and Center for Nonlinear and Complex Systems, Duke University, Box 90305, Durham, NC 27708, USA
Department of Physics and Center for Nonlinear and Complex Systems, Duke University, Box 90305, Durham, NC 27708, USA
*Center for Neurodynamics, University of Missouri- St. Louis, 8001 Natural Bridge Rd., St. Louis, MO 63121, USA **The Corporate Executive Board, 2000 Pennsylvania Avenue, N.W., Suite 6000, Washington, DC 20006, USA
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
229
Contents 1.
Introduction
2.
The heart as a d y n a m i c a l system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................
232
2.1.
Cellular a n d small-tissue d y n a m i c s
233
2.2.
Whole-heart dynamics
................................
........................................
..............................................
231
237
3.
C o n t r o l l i n g chaos
4.
C o n t r o l l i n g cardiac d y n a m i c s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245
5.
Prevalence o f r a t e - d e p e n d e n t behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
248
6.
I n d u c i n g transitions between bistable states
7.
C o n t r o l l i n g alternans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
252
8.
Outlook ....................................................
253
Acknowledgements References
..............................
240
.............................................
.....................................................
230
250
253 254
1. Introduction
Understanding how systems evolve from one moment to the next and developing quantitative mathematical models to predict an observed dynamical behavior is an active area of research, especially when the systems are nonlinear. In a nonlinear system, the evolution of one variable depends in a nonlinear fashion on the collection of system variables. It is found that nonlinear dynamical systems can display a variety of exceedingly complex behaviors in comparison to their linear counterparts. For example, a nonlinear system can undergo a bifurcation, where one type of dynamical behavior (e.g., a steady temporal evolution of the variables) switches over to a different type of behavior (e.g., a periodic oscillation of the variables) as a system parameter is varied. Also, researchers have found that seemingly simple nonlinear systems characterized by as few as three dynamical variables can display a behavior known as chaos. In a chaotic system, the variables fluctuate in a seemingly erratic, 'noisy' fashion and they display a property known as extreme sensitivity to initial conditions where the slightest perturbation to the system completely changes its long-term state. Research on nonlinear dynamical systems displaying temporal instabilities and chaos has progressed at a truly astounding pace. It is now possible to control chaos, delay the onset of bifurcations and instabilities, and direct a trajectory to a desired target state by applying only minute perturbations to the system. The control strategies are quite general and do not require a detailed mathematical model of the system. The key idea is to design perturbations that direct the system near to or stabilize it about one of the unstable periodic orbits embedded in the system [1,2]. Motivated by successes in applying these methods to a variety of physical systems, recent attention has focused on controlling biological systems such as the heart [3]. For example, Hall et al. [4] used closed-loop feedback to suppress a period-doubling bifurcation in atrio-ventricular nodal conduction of an in vitro rabbit heart model, which is designed to mimic a human cardiac arrhythmia. Another potential application of closed-loop feedback methods is to terminate or prevent the occurrence of fibrillation by applying small electrical stimuli to the heart muscle. Fibrillation, often the cause of sudden cardiac death, is a state in which the average heart rate is elevated and the muscle contractions are spatially uncoordinated, preventing the heart from pumping blood effectively. The most common strategy for terminating fibrillation is to apply a large electric shock designed to stop temporarily all cardiac activity, with the hope that a normal rhythm will start with the next heart beat [5-7]. Such a strategy alters significantly the dynamical sate of the heart, is not always successful, and is often quite painful to the patient. A more subtle approach to this problem is to administer occasional, small electrical stimuli to the heart muscle, where the strength and timing of the shocks are 231
232
D.J. Gauthier et al.
based on a nonlinear dynamics control algorithm. The potential success of this approach relies on the understanding that the fibrillating heart is a complex, yet deterministic, nonlinear dynamical system [8]. Several hurdles have to be overcome before this approach can be successful. One primary issue is that the heart displays complex behavior as a function of time and space [9,10], which is believed to be a manifestation of the deterministic behavior known as spatio-temporal chaos [11]. It is not clear whether the chaos control algorithms, designed originally for suppressing temporal instabilities, can be applied to fibrillation. Controlling spatio-temporal complexity using small perturbations is interesting from a fundamental nonlinear dynamics perspective because it is not yet firmly established whether a generic system can be characterized by unstable period orbits, and no general control algorithms have been established. One suggested approach is to place standard chaos controllers designed to suppress temporal instabilities at a few or many spatial locations [12]. A recent promising study suggests that it may be possible to stabilize in vivo human atrial fibrillation by applying control at a singe spatial location [13]. In a systematic approach to controlling heart muscle dynamics, we have used an in vitro animal model consisting of a small piece of periodically paced bullfrog (Rana catesbeiana) cardiac muscle to test nonlinear dynamics control methods in a situation where spatial complexity is not an issue. Ultimately, we will explore their application to the whole fibrillating heart. Our experimental test-bed can also be used to investigate other local dynamical behaviors, such as bistability and hysteresis, which may also play a role in the initiation of fibrillation [14]. In this chapter, we present a survey of current progress in the control of cardiac instabilities and discuss some of our recent results. We first discuss general aspects of the heart as a dynamical system (Section 2), the idea of closed-loop feedback control in complex nonlinear systems (Section 3), and its application to cardiac dynamics (Section 4). We review some of our recent results on spatially localized dynamics in bullfrog cardiac muscle (Section 5), and the transitions of this system between bistable states using small perturbations (Section 6). Finally, we will discuss the application of chaos control techniques to this system (Section 7) and the implications of these results for applications of control in the whole heart (Section 8).
2. The heart as a dynamical system
The heart is a complex nonlinear system, designed to pump blood effectively. Its mechanical contractions are mediated by waves of electrochemical excitation that sweep through specialized conduction systems, such as the atrio-ventricular and sino-atrial nodes, Purkinje fibers, and through the heart muscle. These waves of excitation are followed by a refractory period when the tissue cannot respond to further stimulation, a fundamental characteristic of the generic, idealized dynamical system classified as an 'excitable media'. In healthy heart conditions, the waves of excitation cause a coordinated contraction of the muscle starting near the top of the right atrium at the sinoatrial
Controlling the dynamics of cardiac muscle using small electrical stimuli
233
node, progressing to the ventricles through the atrioventricular node and the Purkinje fibers, and terminating near the base of the heart. This type of behavior is known as normal sinus rhythm. In some situations, even in a nominally healthy heart, this orderly procession of waves can evolve into a complex dynamical state known as fibrillation in which the muscle contractions are uncoordinated [8]. Fibrillation in the atria is usually not life threatening, but can lead to an increased risk to stroke or ventricular fibrillation and often leaves a person feeling tired or having a lack of energy. Ventricular fibrillation, however, results in a rapid loss of blood pressure, leading to death within a few minutes if left untreated. To gain an understanding of the dynamical origins of this behavior, we undertake a brief review of cardiac electrophysiology covering the cellular-level dynamics as well as the macroscopic behavior of the whole heart.
2.1. Cellular and small-tissue dynamics A heartbeat is initiated when specialized heart muscle cells in the sino-atrial node generate an electrical signal, know as the action potential, which then propagates through the rest of the heart by myocardial cells specialized for conduction of the electrical signal. As the wave of activation propagates through the tissue, myocardial fibers contract in response to changes in the intracellular calcium concentration initiated by the action potential [15,16]. A schematic diagram of a cardiac cell plasma membrane is shown in Fig. 1. The intracellular and extracellular solutions contain different concentrations of ions, primarily potassium (K+), sodium (Na+), chloride (C1-), and to a lesser extent calcium (Ca2+). The interior and exterior regions are separated by a lipid bilayer approximately 3 nm thick that is impermeable to the ions. Proteins in the membrane selectively carry ions across the membrane under certain conditions of voltage and concentration. The concentration gradient of each species of ion across the cell membrane causes a potential difference across the membrane, which drives the
o0
o o
~)0 0
Extracellular
0
0o o
0
Oo 0
0
o 0
o
o o
3nm I
@
@
0
@o
o
o
~o ~
9 0
@
~o'~@ o
Intracellular
o @
o
@o
Fig. 1. An illustration of a plasma membrane. The membrane consists of a lipid bilayer that keeps the solution within the cellular structure separate from the solution surrounding the tissue. Ions are transported across the membrane wall by proteins embedded in the membrane that act as ion-selective, voltage-sensitive channels and pumps.
234
D.J. Gauthier et al.
cellular activity. In the cell's non-activated or resting state, this difference is approximately - 9 0 mV. The response of the cardiac tissue to electrical stimulation is a result of the voltage-dependent conductances of the protein channels that govern ion transfer across the membrane. In turn, the voltage across the membrane depends on the interaction between the nonlinear conductances of the proteins and the transmembrane voltage. If the voltage difference across the membrane increases above a critical level, some of the sodium channel proteins open leading to a slight depolarization of the cell (i.e., an increase of the transmembrane potential), which in turn opens more channels. This leads to a rapid depolarization of the membrane (upstroke of the action potential), resulting from the inflow of sodium ions through the open channels. The primary reason that the heart is capable of beating periodically arises from the nonlinear behavior of the membrane dynamics. The nonlinearity arises from the threshold behavior of the sodium channel proteins: the proteins are only activated once the small potential differences across the membrane exceed a critical value. While such nonlinearity is necessary for normal heart function, it also is responsible for the complex behaviors displayed by the heart such as fibrillation. The cardiac cell depolarization ultimately activates potential-dependent potassium channels, and an efflux of potassium follows the sodium influx. The potassium efflux results in the action potential downstroke. In cardiac cells (but not in neurons) the action potential is prolonged by an influx of calcium, which is essential for the contraction of the myocardial fibers. During this plateau, known as the refractory period, the cell is insensitive to further electrical stimulation. On a longer time scales, sodium (potassium) is redistributed to the outside (inside) of the cell by another membrane protein, the Na+/K+/ATPase, which uses energy from the hydrolysis of the ATP molecule to pump the ions back to their respective sides of the membrane. The transmembrane voltage can be measured by inserting a microelectrode through the plasma membrane using a setup shown schematically in Fig. 2. Using this setup, we measure the in vitro temporal evolution of the transmembrane voltage of a small piece of periodically paced bullfrog (Rana catesbeiana) ventricular myocardium as shown in Fig. 3a. The trace at the bottom of the panel shows the stimulus waveform. In the bullfrog myocardium, the action potential duration is typically between 300 and 900 ms; its height is about 90 mV. In some experiments on larger pieces of cardiac tissue, it is difficult to maintain the insertion of many microelectrodes in cardiac cells when attempting to record the electrical activity of the heart at many spatial locations because of the mechanical motion of the heart. One recently discovered method for making such measurements involves the use of voltage-sensitive dyes that are embedded in the cell membrane. When the dye-impregnated tissue is illuminated with light, the amount of fluorescence emitted by the dye is linearly proportional to the transmembrane voltage. Ultra-high speed and sensitivity cameras then allow the direct visualization of the transmembrane voltage [9,10]. However, the signals are usually small and the dyes can be toxic to the tissue.
Controlling the dynamics of cardiac muscle using small electrical stimuli
235
Stimulator Voltage-to-Period and t.;onverter Constant-Current Source
N
Amplifier
Microelectrode
Oxvqenated Solution Cardiac Muscle
Tissue Chamber
Fig. 2. Experimental setup to investigate in vitro the dynamics of rapidly paced cardiac muscle consisting of three subsystems. The tissue is bathed with flowing physiological solution to keep it alive. The computer system controls the generation of brief, constant current periodic stimuli that are applied to the tissue through two fine tungsten wires. The transmembrane potential is measured with a microelectrode, amplified, and digitized using a computer interface. A more common approach to measuring cardiac dynamics is to use extracellular electrodes. In this technique, a fine wire is placed on the surface of the tissue and the potential difference between this electrode and another placed at a convenient reference location is measured. The extracellular signal is related to the current flowing from the cells located near the electrode and is roughly proportional to the second derivative of the transmembrane voltage. An extracellular recording from a bullfrog is shown in Fig. 3b and is compared with the transmembrane voltage recorded simultaneously using a microelectrode. While the morphology of two signals is quite different, the beginning and end of the action potential are clear from the extracellular signal. Unfortunately, the deflection in the extracellular signal signifying the repolarization is not always detectable. The electrodynamics of the cell membrane, such as that shown in Fig. 3, can be modeled by an equivalent electric circuit shown schematically in Fig. 4. The flow of each ion across the membrane can be treated as a current through a series combination of a battery and a variable conductance. The battery represents the voltage generated by the equilibrium concentration gradient of a particular ion. The variable conductances represent the voltage-dependent ion-selective proteins in the membrane that move ions across the membrane. The additional current through the resistor and capacitor represents a voltage-independent ion flow across the membrane and the membrane capacitance. Modeling the particular ion flows and the voltage across a membrane including the nonlinear conductances involves a large number of differential equations whose
236
D.J. Gauthier et al.
(a)
0.4
~-~"E= 0.0 -Or; E , -03
E "~ -0.4 c-
"-'6>
(b)
/
-0.8
I
I
0.4
~~= o.o
"5
eg }
-0.4 -~
-1.2 !
0
500
1000 time (ms)
1500
Fig. 3. Temporal evolution of the (a) transmembrane potential and (b) extracellular potential recorded from a small piece of bullfrog cardiac tissue that is stimulated periodically. The transient depolarization of the membrane is called an action potential. The stimulus waveform is shown at the bottom of the panel. precise details depend on the cell type, the animal species [17] and the specific type of tissue [18,19]. Nevertheless, all cardiac muscles possess some of the general characteristics of excitable media, and simplified models of the tissue dynamics have been developed that give a reasonable description of the observed behaviors [20,21]. Experimentally, the duration of the action potential (denoted by APD) is seen to depend approximately on the time interval between the end of one action potential Intracellular Space
gNa
gK
gcl
TENa TEK TEc,
Rm
Cm
Extracellular Space
Fig. 4.
An equivalent circuit model of a cardiac cell.
Controlling the dynamics of cardiac muscle using small electrical stimuli
237
800 . . . . . . .
700 r
E a
a. <
6006
500 .....
100
i
l
l
200
300
400
"
i
i
i~
i
500
600
700
800
900
DI (ms) Fig. 5.
Experimentally observed restitution relation of an in vitro periodically paced small piece of bullfrog cardiac muscle.
to the beginning to the next, called the diastolic interval (denoted by DI). The APD is assumed to be a nonlinear function f of the DI in a form given by APD. = f(DIn).
(1)
The relationship between APD and the preceding DI is called a restitution relation, shown in Fig. 5 for a small piece of periodically paced bullfrog cardiac tissue. Notice the slope of the restitution relation begins to level off for long DI, and there is a maximum APD. From an electrophysiology point of view, this relation makes sense because of the fact that the ionic pumps take some time to restore the equilibrium ionic balance. If the tissue is stimulated before this balance is restored, the action potential is shortened. This description clearly discards much information about the specific nature of the tissue, but is useful to understand the gross features of cardiac dynamics [21,22]. 2.2. Whole-heart dynamics
While our understanding of single-cell dynamics has developed to a high level over the past decades, our understanding of the behavior of the whole heart is still a work in progress both from an experimental as well as the theoretical perspective. Wholeheart dynamics are more complicated because the heart consists of many million interconnected cells. Each cell is coupled to its neighbors through gap junctions, or channel proteins at the end of the cell. These channel proteins connect to the channel proteins in the neighboring cell, providing a path for current to flow from one cell to the next. This path couples the ionic concentration of a cell through diffusion to the cells around it, and allows for the coordinated spread of the action potential through the tissue. The ionic coupling between cells is responsible for the propagation of waves of excitation. It is interesting to note that the refractory period of individual cells results in the unidirectional nature of wave propagation in the heart. Cells in
238
D.J. Gauthier et al.
front of a wave of excitation have not yet been activated, while those behind have recently been excited and consequently are refractory. This also explains why waves annihilate each other when they collide, instead of passing through each other. A detailed review of the mathematics of the propagation action potential through the heart is presented, for example, by Henriquez [23]. To gain some understanding of the spatiotemporal aspects of signal propagation in the heart, we have measured the electrical activity of a sheep right atrium in vivo during normal sinus rhythm using a multi-channel electronic mapping system [24-27] in collaboration with researchers in Duke University's Experimental Electrophysiology Laboratory. In this experiment ~, a dense array of 324 fine wires embedded in a flexible substrate is affixed to the epicardiurn of the right atrial chamber heart during open heart surgery. Extracellular electrograms are recorded from the tip of each electrode and processed to detect the occurrence of a muscle contraction beneath each electrode. The temporal pattern of activation during normal sinus rhythm is shown in the series of panels in Fig. 6. Each panel corresponds to a 4.5 ms time-slice recorded during a single experiment. The gray and black dots in each panel represent the spatial location of the electrodes. A black dot indicates the detection of an activation during the time slice. Arrows have been added to guide the eye through the time course of the wave. The wave of excitation originates in the sinus node (off the plaque in the upper left region) and propagates across the atria. The wave of excitation shown in this figure corresponds to the 'P-wave' part of cardiac cycle and repeats approximately once a second. The total propagation time across the plaque (33 m m width) is about 31 ms. This coordinated behavior can give way to complex, fast rhythm known as fibrillation (even in a nominally healthy heart) for reasons that are not entirely well understood. In the laboratory, we can induce fibrillation by giving a short burst of small electrical shocks to the tissue via one of the plaque electrodes. Fig. 7 shows a series of panels recorded from the same sheep but during atrial fibrillation. These panels are similar to the ones shown in Fig. 6, except that each panel corresponds to a 9.5 ms time slice of a single time series because the waves propagate more slowly during fibrillation. Note that the various portions of the muscle are constantly contracting during fibrillation, which contrasts with normal sinus rhythm where a
All procedures are approved by the Duke University Institutional Animal Care and Use Committee (IACUC) and conform to the Research Animal Use Guidelines of the American Heart Association. Female 61 +~2 kg sheep are initially anesthetized with ketamine hydrochloride (15- 22 mg/kg IM). Once anesthesia is achieved, the animal is intubated with a cuffed endotracheal tube and ventilated with a North American Drager model SAV ventilator. Isoflurane gas (1-5%) is administered continuously to maintain adequate anesthesia. A nasogastric tube is passed into the stomach to prevent rumen aspiration. Femoral arterial blood pressure and the lead II electocardiogram are continuously displayed and monitored. Blood is withdrawn every 30-60 min to determine pH, Po2, Pco2, total CO2, based excess and the concentrations of Ca 2+, K +, Na +, and HCO~-. Normal physiological levels of the above are maintained by adjusting the ventilator and by IV injection of electrolytes. The chest is opened through a median sternotomy and the heart is suspended in a pericardial cradle. A 7.5 x 3.5 cm2 mapping plaque is affixed to the epicardium of the right atrial wall with sutures to the pericardial cradle and the fatty tissue in the atrioventircular (AV) groove.
Controlling the dynamics of cardiac muscle using small electrical stimuli
O-5ms
(a)
239
5.5 - 10 ms
(b)
~ ~o"
,
o.,o. -ram
Q
10.5 - 15 ms
o,.,e 9
oOo_
9 IgO
O~ll
,,
,
,
_oOoO
_era
15.5 - 20 ms
(d~.o~
I
~oo~
35.5 - 40 ms
30.5-35 ms
(g)
(h)
40.5 - 45 ms
(i)
oCe "(,,.........,
.
Fig. 6. Spatio-temporal evolution of a normal heart beat recorded in vivo from a sheep. The wave originates in the sinus node (off the electrode array near the top middle of the array). Each panel shows the location of a detected activation (black dot) during a 6 ms time-slice and the wave takes approximately 31 ms to travel across the electrode array. The array is approximately 7.5 x 3.5 cm and is affixed to the right atrium. Similar activity is observed to occur approximately once every second.
contraction occurs about once per second. During atrial fibrillation, there is no single point of origin for the excitation, the observed activity is self-sustaining, the shape of the wave is uneven, and its speed varies widely from location to location. It is now accepted by most researchers that the uncoordinated, seemingly random muscle contractions characteristic of ventricular or atrial fibrillation possess some degree of order analogous to the spatio-temporal complexity displayed in physical systems, such as certain chemical reactions or wide-aperture optical systems. While this important discovery has changed our view of cardiac dynamics significantly, a detailed quantitative understanding of the mechanisms that trigger the heart to enter an episode of fibrillation, or the dynamical techniques necessary to control such an episode, remain elusive. A primary question is whether nonlinear dynamics control methods can prevent the occurrence or terminate this behavior. While nonlinear dynamics techniques have been used to successfully modulate local dynamics in cardiac tissue [3,4,28,29], it remains an open question whether such control techniques can be extended successfully to the spatiotemporal case of the in vivo beating heart, or whether nonlinear dynamics control methods can successfully stabilize cardiac rhythms. Other unresolved questions include the existence of precursors indicating that an episode of fibrillation is imminent and how large shocks applied to the heart act to terminate fibrillation.
240
D.J. Gauthier et al.
0 - 12.5 ms
(a).
13 - 25 ms
I[(b~
38 - 50 ms
(d~o
25.5 - 37.5 ms
9
50.5 - 62.5
(C~ 9e
ms
63 - 75 ms
i(e) 9
9
%
I
~176
F
75.5 - 87.5 ms
(g)
i
100.5 - 112.5
88 - 100 ms
(h)
9
(i)
ms
#9
,..
Fig. 7. Spatio-temporal evolution of electrochemical activation in an in vivo fibrillating sheep right atrium. Each panel represents a 12 ms time-slice. The wave of excitation is continuous and takes approximately 120 ms to complete one cycle around the atrium.
3. Controlling chaos Feedback schemes for controlling chaos using small perturbations to an accessible system parameter, first proposed by Ott, Grebogi and Yorke (OGY) [30], are quite general, based on real-time measurement of the system dynamics, and do not require a detailed model of the dynamics. The key idea underlying the control schemes is to take advantage of the unstable dynamical states embedded within the system. Rather than making large changes to the system, the feedback technique applies small perturbations designed to stabilize one of these unstable states. As the system approaches the desired state, the strength of the perturbations required to keep it there vanishes, so that the feedback signal can be as small as the noise level in the system. In the simplest implementation of the OGY idea, a dynamical variable charaterizing the state of the system is measured at discrete time intervals, the difference from the measured and desired value (known a priori) is determined, and a control parameter is adjusted accordingly. For example, the dynamics of an electrical circuit can be controlled using a measured voltage as the dynamical variable and injecting current into a node of the circuit as the accessible system parameter (a procedure used in [31]). Variations of this method have been applied successfully to stabilize the dynamics of mechanical, electrical, optical, and chemical systems [1,2]. To illustrate the use of feedback control methods in the context of cardiac dynamics, we consider the seemingly simple situation in which a small piece of cardiac tissue is paced periodically by an external stimulus. As described earlier, the
Controlling the dynamics of cardiac muscle using small electrical stimuli
241
dynamics of the tissue is characterized approximately in terms of the APD and DI through the nonlinear relation (1). For periodically paced tissue, there is an additional constraint between the dynamical properties of the tissue and the applied stimulus, leading to the following relation (2)
Din - N 9PP - APD~,
where PP is the pacing period, N the smallest integer satisfying the condition APDn > Dlmin, and DImin is the minimum diastolic interval the tissue can sustain [20,21]. Combining Eqs. (1) and (2) results in a mathematical mapping relating the future APD in terms of its past value in a form given by
N 9PP-
APDn+I =
f(N
9P P -
APDn).
(3)
For some tissue preparations, including bullfrog myocardium [32], a restitution relation of the form f(DIn) - APDmax - A e -DI"/r,
(4)
gives good agreement with experimental observations, where APDmax, A, and z are tissue-dependent constants. For long PP, the behavior of the paced muscle settles down to a regular periodic behavior after an initial transient. In this case, DI >> DImin and every stimulus produces an action potential whose shape and duration are identical so that N = 1 (this is often referred to as 1"1 or period-1 behavior). The steady-state action potential duration, known as the fixed point of the map, is denoted by APD* and is given by the solution of the transcendental relation APD* - f ( N
9P P -
APD*).
(5)
It can also be determined graphically by finding the intersection of the restitution (4) and pacing (2) relations, as shown in Fig. 8. Note that the pacing relation line shown in the figure shifts to the left as PP decreases so that APD* decreases as the pacing becomes faster. The stability of the fixed point is governed by the value of Dlmin and the slope of the restitution relation evaluated at the fixed point Of] ix = 9
= - A - e- (PP-APD*) APD=APD* ~"
(6)
where ~t is known as the Floquet multiplier. When the actual value of the APD is close to the fixed point, the temporal evolution of the action potential duration is given approximately by 8APD~+l = ~SAPD~,
(7)
where 8 A P D , - A P D ~ - APD*. By inspection of Eq. (7), any slight perturbation that moves the system away from the fixed point will change by a factor Ibt] on each iterate of the map. Therefore, the fixed point is unstable (the perturbation grows)
242
D.J. Gauthier et al.
800
.
, restitution relation
600 E ~ " 400
, APD*
< pacing relation
200
o
0
,
|
200
400
~
...... J
!
600
800
1000
DI (ms)
Fig. 8. A mathematical model of the restitution relation (solid line), and the pacing relation (dashed line) describing the constraint between PP, APD, and DI. The intersection of the two curves gives APD*. whenever 1~[ > 1. In addition, the fixed point will become unstable whenever P P - APD* < Dlmin. As PP decreases (thereby decreasing APD* and increasing IPl), the fixed point becomes unstable and the tissue response switches over to a new pattern whose form depends on the precise values of the model parameters. Such a transition in the response pattern as a system parameter is swept (PP in this case) is known as a bifurcation and is one typical characteristic a nonlinear dynamical system. For tissue where DImin is large and characterized by a shallow restitution relation, the 1:1 behavior first becomes unstable when P P - APD* < Dlmin and is replaced by a pattern in which an action potential is induced for every other stimulus (N = 2). That is, the pacing is so rapid that the stimulus occurs during the period when the tissue is refractory and hence cannot respond with a new action potential. An action potential is only induced on the next stimulus after the tissue has recovered. This is known as a subcritical bifurcation and the response pattern is often referred to as 2:1. Experimental observations of this behavior are described in Section 5. In a situation where Dlmin is small ( P P - APD* > Dlmin for all PP considered) and the tissue is characterized by a steep restitution relation, the 1:1 response pattern becomes unstable when I~ < - 1 and is replaced by a pattern in which an action potential is induced by every stimulus but the duration alternates between two different values. This pattern is referred to as alternans, period-2 behavior, or 2:2 behavior by various authors. Alternans is sometimes observed in human electrocardiograms recorded from electrodes placed on the body surface, and can be correlated with an increased risk of a future episode of sudden cardiac death. Hence, it is important to understand the mechanisms giving rise to alternans and methods for controlling or suppressing this dynamical state. The transition to alternans as PP is varied and can be visualized using a bifurcation diagram, as shown in Fig. 9, for tissue parameters APDmax "- 680 ms, A =
Controlling the dynamics of cardiac muscle using small electrical stimuli
243
400 ms, z = 190 ms, and DImin -- 80 ms. To generate the diagram, map (3) is iterated to determine its asymptotic behavior for each value of PP and the resulting APD for several iterates of the map are plotted. The value of PP is then changed and the procedure repeated. For long PP, only one value of APD appears, indicating a stable 1:1 pattern. For decreasing PP, the APD shortens, consistent with normal cardiac tissue restitution properties. At PP = 631 ms, an abrupt transition from a 1:1 to a 2:2 pattern occurs where it is seen that the APD alternates between short and long values and l-t < - 1 . This type of transition is called a forward or supercritical bifurcation [33]. Even though the 1:1 pattern is unstable for PP < 631 ms, there still exists a solution to Eq. (5) for APD*. The value of this unstable fixed point as a function of PP is shown as the dashed line in Fig. 9. A properly designed controller can stabilize this state (the 1:1 pattern), thereby suppressing alternans. In general, feedback control of dynamical systems involves measuring the state of the system (characterized by APD in our model), generating an appropriate feedback signal, and adjusting an actuator that modifies an accessible system parameter (PP in our model) by an amount ~,. The size of the feedback signal can be small and little energy is expended by the controller when the system is stabilized to the desired state because the scheme stabilizes a state that already exists in the system but is unstable in the absence of control. To illustrate suppression of alternans using our simple mathematical model, we consider closed-loop proportional feedback. Proportional control seeks to minimize the difference between the current state of the system and a reference state, which we take as APD*. The algorithm uses an error signal given by
(8)
~, = y(APD. - APD*),
800
Alternans (2:2) Stable 1"1
600
E
a
v
~
~
-
r
400 Unstable 1:
<
/
200
0
~ 550
600
631 650
700
750
a00
PP (ms)
Fig. 9. A numerically generated bifurcation diagram for a simple model of periodically paced cardiac muscle. For PP larger than 631 ms, the response pattern is 1:1. A supercritical bifurcation occurs at 631 ms, leading to alternans for faster pacing. The dashed line indicates the location of the unstable 1:1 pattern.
244
D.J. Gauthier et al.
where ? is the feedback gain, which is used to adjust in real time the pacing period at each iterate as PP, = PP* + z,,,
(9)
where PP* is the nominal pacing period. Note that c~ = 0 when control is successful (APD, = APD*) so that PP, = PP*. Therefore, the controller does not modify the location of the fixed point, only its stability. The dynamics of the tissue in the presence of control is given by APD.+, - f(PP~ - APD~),
(10)
whose steady-state solutions are still equal to APD* since the controller does not effect the location of the fixed point of the system. Its behavior in a neighborhood of the fixed point is governed by 8APD, = g(1 - y)SAPD,.
(11)
To suppress alternans and stabilize the 1"1 pattern, the feedback gain must be chosen so that the condition Ig(1 - ?)[ < 1 is satisfied. When ? = 1, any perturbation from the fixed point 8APD, is driven to zero on the very next iterate of the map. Note that the presence of chaos is not required for this method to be successful. A simple numerical illustration of a proportional control experiment is shown in Fig. 10. The map (10) is iterated initially with 3' = 0 for PP* - 600 ms and APDn is plotted as a function of n. This tissue model displays alternans with APD* = 473.96 ms and g = -1.084 so that it should be possible to stabilize the 1"1 response pattern for 0.076 < ? < 1.92. Control is initiated with ? - 0.15 as indicated by the vertical dashed line (Fig. 10a). Initially, large adjustments to the pacing period are made that are designed to direct the system to the 1"1 state. Eventually, the perturbations become vanishingly small as the system approaches the fixed point. Once control is turned off (vertical dotted line), the fixed point is once again unstable and the system returns to alternans. The transient approach to the fixed point is rather long in this situation since the feedback gain is chosen close to the boundary of the domain of control. It can be made significantly shorter by choosing the gain closer to the optimum value as shown in Fig. 10b for ? = 0.9. Experiments demonstrating control of alternans in small pieces of periodically paced cardiac muscle using a modified control algorithm that adapts to changes in the tissue parameters are described in Section 7. For simplicity, we have considered feedback control of a fixed-point of a onedimensional dynamical system. Successful application of these control methods is not limited to such simple systems. Recent research has investigated the application of other linear and non-linear control schemes to moderate-dimension systems [34,35]. While the concepts described in this section are intriguing, additional research is needed to address how to apply these methods to the whole heart during fibrillation where spatial complexity is an issue. In addition, it is difficulty to modify the local pacing period in the whole heart so different methods for modulating the local
Controlling the dynamics of cardiac muscle using small electrical stimuli
(a)
245
700 600
E
500
cl n <
400
f
.e
9'....~.__
,ll %
Control Off
300
:. ..
9
Control On
Control
: 9
Off
**
%
200
(b)
700 600 9
(/)
E a. <
500
.l
|
i
iii
-
-
~
9
400
9
Control Off
300
.
9
9
Control On
.
9
Control Off
9
200 0
,
,
,
1 O0
200
300
Fig. 10. Temporal evolution of APD as a function of the beat number n demonstrating the effects of feedback control 9 Control is turned on during the interval between the vertical dotted lines with (a) 7 = 0.15 and (b) 7 = 0.9. cardiac dynamics are needed. In the next section, we review some of the current research directions.
4. Controlling cardiac dynamics The idea that small electrical stimuli can affect the dynamics of the heart is not new. It is well known that arrythmias such as atrial flutter and ventricular tachycardia can be initiated and terminated by one or more properly timed stimuli [36]. Unfortunately, attempts to interrupt atrial or ventricular fibrillation have been less successful. Allessie et al. [37] have successfully entrained the dynamics of a spatially localized portion of the myocardium during atrial fibrillation using rapid pacing. This procedure did not result in defibrillation, however; complex dynamics reappeared after the pacing was terminated. Similarly, KenKnight et al. [38] captured the local dynamics during ventricular fibrillation but also did not achieve defibrillation. Approaching the problem from a different perspective, Garfinkel et al. [3] have demonstrated that it is possible to stabilize cardiac arrythmias in an in vitro heavily medicated small piece of the intraventricular septum of a rabbit heart by administering small, occasional electrical stimuli. The protocol is referred to as proportional perturbation feedback (PPF) and it is a variation of the feedback scheme described
246
D.J. Gauthier et al.
in the previous section. In terms of the behavior of the heart, the scheme uses feedback to stabilize periodic beating (the unstable dynamical state) and destabilize complex rhythms using small perturbations. Crucial to this strategy is the concept that the heart can display two different dynamical behaviors under essentially identical physiological conditions, such as normal sinus rhythm and fibrillation or tachycardia and fibrillation, for example. Note that variations of this method for controlling cardiac dynamics have been suggested [39], some of which are simpler and more robust [40]. The PPF control strategy has also been used by Schiff et al. [41] to stabilize the electrical behavior of the rat hippocampus, suggesting that it may be possible to develop an intervention protocol for epilepsy. While these results are intriguing, they are surrounded by some controversy because of the criterion used to detect chaos in the biological preparations. Pierson and Moss [42] recently investigated the influence of noise on the analysis procedure and found that it is indeed capable of detecting unstable periodic orbits even in the presence of large amounts of noise. Also, Christini and Collins [43] have suggested that PPF can be used even in situations where the dynamics is driven by stochastic, rather than deterministic, influences; they require only the existence of unstable periodic orbits. More recently, Hall et al. [4] demonstrated that an adaptive controller can be used to suppress temporal instabilities in an atrial-ventricular nodal conduction system known to exhibit alternans. The control protocol is based on a comparison of the most recently observed interbeat interval with the previous interval. This method has been analyzed by Gauthier and Socolar [44] and is similar to the one used to control the dynamics of physical systems [31]. Note that the results of Garfinkel et al. [3] and Hall et al. [4] only address specialized systems of the heart but not necessarily the muscle of atrial or ventricular walls, which are the primary substrates for fibrillation. While these results are intriguing, they demonstrate only that temporal complexity of a dynamical system can be controlled. There is not yet a general approach to controlling systems that display spatio-temporal complexity such as that displayed by the heart during fibrillation [8,9,10]. Toward this goal, Glass and Josephson [45] established criteria for resetting and annihilation of reentrant arrythmias; Biktashev and Holden [46] demonstrated that proportional feedback control can induce drift of spiral waves in a model of cardiac muscle; Aranson et al. [47] showed that external stimuli can stabilize meandering spiral waves; Watanabe and Gilmour [39] proposed a strategy that uses small stimuli to prevent cardiac rhythm disturbances, and Fenton et al. [12] showed that breakup of spiral wave patterns can be suppressed by applying electrical stimuli at many spatial locations. In addition, preliminary research by Ditto and collaborators [13] suggests that it is possible to capture at least a portion of a fibrillating human atrium using the PPF method, where the heart dynamics is recorded and control stimuli are delivered through a quadrupolar electrode catheter inserted into the right atrium through the femoral vein. Due to the limited number of sensors, it could not be determined whether the entire atrium was controlled. Unfortunately, atrial fibrillation was terminated only rarely using this method.
Controlling the dynamics of cardiac muscle us&g small electrical stimuli
247
There are several issues that must be addressed before these recent results can be put to practice. In the work of Garfinkel et al., Biktashev and Holden, Watanabe and Gilmour, and Fenton et al. the use of spatially uniform forcing function or application of perturbations at numerous spatial locations is assumed, a requirement that is unlikely to be satisfied in practice. Also, the results of Aranson et al. and Glass and Josephson were obtained using a simplified model of cardiac dynamics whose properties are, in many aspects, different from the properties of cardiac muscle. Finally, experiments using many sensors at different spatial locations [9,10,24-27] are needed to determine whether the methods used in the preliminary work of Ditto and collaborators can successfully capture the entire atrium. Our approach to addressing these issues involves several steps. We believe that fibrillation can be controlled optimally in a clinical setting using feedback control methods at one or a few spatial locations on or in the heart, or by a combination of control and synchronization methods. To achieve this long-term goal, we believe that it is crucial to develop a quantitative understanding of how control stimuli interact with cardiac muscle that has passed through a bifurcation since these bifurcations are thought to be responsible for initiating fibrillation [48]. Our first step is to study small pieces of cardiac muscle where spatial complexity is not important. In these studies, precise experiments will be compared quantitatively to ionic-based cardiac mathematical models. The next steps involve extending the work to one, two, and three spatial dimensions, both experimentally and theoretically. The following sections describe our ongoing research investigating the dynamics and control of small pieces of cardiac muscle. This methodical approach of comparing experiments and mathematical models starting with 'simple' preparations has been followed by several research groups with great success. For example, phase resetting and phase locking in rapidly paced chick cell aggregates can be predicted accurately by an ionic model as demonstrated by Clay et al. [49] and Kowtha et al. [50], respectively; Wenckebach periodicities in rapidly paced enzymatically dissociated guinea pig ventricular myocytes can be predicted by the Beeler-Reuter model as demonstrated by Delmar et al. [51]; Roth [52] has shown that the induction of reentry by fast pacing through unipolar electrodes observed by Lin et al. [53] can be explained using a two-dimensional bidomain model with unequal anisotropy ratios and the Beeler-Reuter membrane dynamics; and Roth and Krassowska [54] have obtained quantitative agreement between experiment and theory for the induction of reentry in cardiac tissue by taking into account the manner in which electric fields alter the transmembrane potential. We believe that this approach is also important for research on controlling cardiac dynamics using small perturbations, consistent with the advice of Holmes during his presentation at the Workshop on Mathematical Approaches to Cardiac Arrythmias, 'Many of these [nonlinear dynamics] methods are in their infancy and we are unaware of their limitations. It is therefore best to proceed with caution and to attempt careful studies of 'simple' systems under well-controlled conditions (unless we are more interested in press coverage than scientific progress)' [55].
248
D.J. Gauthier et al.
5. Prevalence of rate-dependent behaviors We have investigated the dynamics of small pieces of periodically paced cardiac muscle without the possibility of spatial instabilities before investigating how the spatially localized behaviors fit into the more complex spatially extended behaviors. We measure the tissue response while varying parameters such as the pacing interval and strength in a systematic way. F o r example, under various conditions, S stimuli can elicit R responses (S:R behavior). A wide range of these responses have been observed depending on the type of cardiac tissue, animal species, or stimulus parameters, such as frequency, amplitude, and shape. In order to determine the prevalence of different rate-dependent behaviors, we investigated the response of small pieces of bullfrog (Rana catesbeiana) ventricular myocardium to periodic electrical stimulation [22]. We concentrated on rate-dependent behaviors in cardiac muscle because of the wide range of excitation rates that occur in both healthy and pathological cardiac tissue. We explored the range of dynamical behaviors in a large number of animals to determine the relative prevalence of different dynamical states because control protocols must be able to deal with every behavior occurring in a population. In our experiments, the heart was excised from adult animals of either sex and from 4 to 8 in. in length 2. After pacemaker cells were cut away, a small piece (~< 3 • 3 x 5 mm) of ventricular myocardium was removed, placed in a chamber, and superfused with a recirculated physiological solution. The tissue, quiescent in absence of an applied stimulus, was paced with 4 ms-long square-shaped current pulses applied through two fine (51 ~tm) tungsten wires set ~ 2 mm apart on the surface of the tissue. The amplitude of the applied stimulus was typically N0.10.2 mA, twice the current necessary to elicit a response so that the experiments were conducted in a parameter regime where only 1:1, 2:2 and 2:1 dynamics should occur [56]. Intracellular (transmembrane) voltages were typically measured within 1-2 mm of the stimulus electrodes using a glass micropipette. Before initiating data collection, the tissue was paced at PP of ~ 1000 ms for about 20 min. A bifurcation diagram is a convenient way to summarize observations of the long-term dynamical behavior of the tissue as PP is varied slowly. We obtained such a diagram by recording A P D while adjusting PP across a wide range of physiological values, from 1200 to 300 ms in 100 or 50 ms intervals (downsweep), and then from 300 to 1200 ms, again in 100 or 50 ms steps (upsweep). For each PP, the response of the tissue to the first 5-10 stimuli was discarded in order to eliminate All procedures are approved by the Duke University Institutional Animal Care and Use Committee (IACUC) and conform to the Research Animal Use Guidelines of the American Heart Association. Bullfrogs are anesthetized using 1% wt/vol 3-aminobenzoic acid ethyl ester mixed with cold tap water, double pithed, and dissected by cutting along the ventral side of the body. Physiological solution (100 ml/min flow rate) is maintained at temperature 20 i 2~ and contains 1l0 mM NaC1, 2.7 mM KCI, 1.5 mM MgCl2, 1.8 mM CaCl2, 5.6 mM glucose, 2.8 mM Na2HPO4 and 1 mM HEPES. It is buffered with NaHCO3 and bubbled continuously with 95% 02 and 5% CO2 to maintain a pH of 7.4 + 0.1. To reduce motion of the tissue due to contractions, 5-20 mM 2-3-butanedione monoxime (DAM) is sometimes added. We find that DAM does not affect the gross dynamical features of this preparation.
Controlling the dynamics of cardiac muscle using small electrical stimuli
249
transients and the subsequent behavior was recorded for up to 10 s. After discarding transients, the width of each action potential was determined at 70% of full repolarization and plotted at each PP. Fig. l la shows a bifurcation diagram for an animal that did not display alternans (2:2 pattern). For long PP, the tissue responds in a 1:1 pattern and it is seen that the APD decreases for decreasing PP (open circles). As the PP was decreased from 500 to 400 ms, an abrupt increase in the APD occurred, highlighted by the thin vertical arrow, indicating the bifurcation from a 1:1 to a 2:1 pattern. After the smallest PP of our pacing protocol was achieved, the PP was increased slowly (closed triangles). The tissue remained on the 2:1 branch for a wide range of PP values which elicited 1:1 response during the downsweep. At a PP between 700 and 800 ms, the tissue response makes an abrupt transition back to 1:1 behavior. The coexistence of two stable behaviors (response patterns) for a single system parameter (the PP in this case) is known as bistability. We found that bistability between 1:1 and 2:1 patterns exists in 17 of 23, or 74%, of the cardiac preparations studied. The window of bistability was situated over a range of PPs that lies near the natural period of the resting heart (~1000 ms), and extends for ~160 ms. We found that 2:2 behavior (alternans) occurs in 8 of 23 or 35% of cardiac preparations. In all cases where 2:2 behavior was observed, bistability also occurs between 2:2 and 2:1 dynamics. We found that the stimulus that does not elicit a response in the 2:1 state does, in fact, have an effect on the tissue dynamics. Furthermore, we found that the bifurcation to 2:1 behavior is subcritical, and that unstable periodic orbits did not occur for pacing intervals shorter than that at which the tissue undergoes a transition to a 2:1 pattern. Most significantly, our findings indicate that bistability is highly prevalent in bullfrog cardiac tissue, and thus optimal controllers must be designed with sufficient flexibility to deal with bistability as well as unstable periodic orbits.
(a)
800
(b)
700
*"
E 600 a n 500 < ~ ill 400 300 2()0 --
I
[s,ope>l
slope
~ ~, N-1
e ~ I
9
1:1(N=1) I
600 PP (ms)
I
1000
I 100
I
300
I"
500 DI (ms)
|
700
Fig. 11. (a) Bifurcation diagram showing 2:1 ~ ~ 1:1 bistability and hysteresis. (b) Plot of APD as a function of DI for the data shown in (a). The vertical line at DI -- 285 ms indicates the diastolic interval at which the slope of N-- 1 branch becomes greater than unity. For both plots, open circles (closed triangles) are APD measurements as the PP is decremented (incremented).
250
D.J. Gauthier et al.
We attempted to fit our observations to the simple one-dimensional mapping (3). Note that the 1:1 and 2:2 patterns observed in the experiments correspond to N-- 1, and the 2:1 pattern corresponds to N - 2 . This model assumes that a stimulus that fails to elicit an action potential has no effect on the subsequent dynamics. The mapping (3) predicts that the 1:1 behavior ( N - 1) should be unstable when I~t] > 1 and that, in a plot of APD as a function of DI, all data points should fall on a single curve (defining jO. As shown in Fig. 1 l b, which shows the dependence of f on Din for the data displayed in Fig. 1 l a, it is seen that one value of DI can give rise to two different values of APD, depending on whether the tissue response is on the N - 1 (solid line) or N = 2 (dotted line) branch. Note that the APD is shorter for the N - 2 branch than the N = 1 branch for the same value of DI, which we attribute to the effect of the stimuli that does not elicit an action potential. That is, this 'extra' stimulus has a rather significant effect on the subsequent action potential and shortens its duration. Hence, it should be possible to control the dynamics of the tissue by applying stimuli during the period when the tissue is refractory. In Section 6, we show that it is also possible to lengthen the APD by applying the stimulus at a different time in the refractory period. Modulating the APD by stimuli applied during the refractory period may prove to be one cornerstone of methods for controlling whole-heart dynamics. In addition to the multi-valued nature of the tissue response, we found that the 1:1 behavior (N = 1) was stable even when the slope of the APD vs. DI curve was greater than one, as indicated in Fig. 1 lb. We found that a difference-differential model proposed by Chialvo et al. [57], incorporating a 'memory effect,' could explain this behavior, but it did not correctly predict the transition from the N - - 2 to the N - 1 branch. 6. Inducing transitions between bistable states Since modulation of complex local dynamics in the heart may be critical for suppressing the initiation of or controlling the spatio-temporal complexity of fibrillation, we investigated methods of manipulating experimentally the bistable 1:1 and 2:1 states [58]. In much earlier work, Mines had shown that it was possible to cause a 2:1~1:1 transition with a single stimulus injected into the train of applied pulses [59], and Guevara et al. [60] have cited the experimental observation of 2:1 ~ 1:1 and 1:1 4 2 : 1 transitions by the injection of single stimuli into an applied train. We have extended the work of Mines and Guevara et al. by determining the range of timings of added stimuli capable of eliciting such transitions. We found that 2:1 ~ 1:1 and 1:1 ~2:1 transitions can be induced repeatedly by stimuli applied over a wide range of timing intervals. The extra stimuli that induce transitions between bistable states are of the same amplitude and duration as the regular pacing applied to the tissue: only their timing is altered within the periodic pacing. This can be considered a phase-sensitive open-loop controller because inducing a transition between states depends on the relative phase at which the extra stimulus is applied.
Controlling the dynamics of cardiac muscle us&g small electrical stimufi
251
In order to achieve transitions between the l:l and 2:1 states, we first determined the range of pacing intervals that yield both 1:1 or 2:1 behavior, and then paced the tissue at a PP within this window. The tissue can be biased onto the 1:1 or 2:1 branch within the bistable regime by sweeping the PP down and up until the tissue responds with the desired behavior. Once the tissue was in the 1:1 or 2:1 state, we injected an '$2' stimulus at an interval T after the preceding normal stimulus. The next normal stimulus was applied at an interval P P - T following the $2 pulse, marking the resumption of the normal stimulus train. We observe whether or not a transition takes place. This procedure is repeated for various PP values and T values in order to map the domain in the (PP, T) plane over which transitions can be induced. The interval T is varied from 50 ms to PP-50 ms in 50 ms increments. Fig. 12 shows one example of a 1:142:1 transition where the $2 stimulus (marked with an arrow) is applied during the early part of an action potential. This lengthens the action potential's refractory period long enough to cause the next regular stimulus to be ineffective, giving the tissue a time interval equal to 2 P P - T over which to recover, extending the next action potential after which it falls into a 2:1 pattern. This observation demonstrates that stimuli added during the tissue's refractory period can have a dramatic effect on the subsequent dynamics. We observed that there are two different types of 1:1---+2:1 transitions as well as two distinct types of 2:1--+ 1:1 transitions, each distinguished by their transient behavior en route to steady-state dynamics. A qualitative understanding of the nature of these transitions can be motivated using the simple one-dimensional iterative mapping model described in Section 3, which gives good qualitative predictions for three of the four transition types we observe.
9
i
.. 9
o ~
2:1
E __
r _
I
L PP _III_PP-TI_ PP _I
r
-I
I
1
I-
"-I-
ts~
-I
I
I
I
2
3
4
i
,
..,..,
5
time (s)
Fig. 12. Temporal evolution of the transmembrane potential showing a 1:1 -+ 2:1 transition. The stimulus waveform is shown at the bottom of the figure and the extra stimulus is marked with an arrow.
252
D.J. Gauthier et al.
7. Controlling alternans Using the feedback control methods discussed in Section 4 and our experimental test-bed, we have demonstrated that it is possible to stabilize an unstable 1:1 response pattern for a PP that produces alternans (2:2 pattern) when the control is turned off. The 1:1 state was stabilized using two variations of the feedback scheme known as time delay auto-synchronization (TDAS) [31]. This protocol has a short memory that allows it to automatically track slow changes in the stability of the dynamics. Note that Hall et al. [4] used a similar scheme for stabilizing alternans in AV-nodal conduction times, and Socolar and Gauthier [61] have analyzed theoretically by an extension of the protocol. Feedback protocols attempt to stabilize an unstable state of the system by making small adjustments to an accessible system parameter based on an estimate of the distance from the current state of the system to a desired state. In our experiment, the state of muscle tissue was characterized by measuring the n-th action potential duration (APD,), and the n-th pacing period (Pn) was used as the accessible system parameter. An error signal ~, is generated using the TDAS algorithm according to 8,, = 7(APD,, - APD,,_, ),
(12)
where the past behavior of the system is used to estimate the location of unstable state to be stabilized. The nth pacing period PP, is perturbed around its nominal value PP* as PP, = PP* + 8,,_ 1.
(13)
When control is successful, APD,, = APDn_I = APD*, which is the desired 1:1 behavior, and ~n -~ 0. Hence, the controller does not effect the state of the system, only its stability. Fig. 13 shows a complete TDAS control experiment. The temporal evolution of a monophasic action potential recorded from the tissue and the error signal are shown for a PP that produces alternans as control is turned on and off, indicated by the vertical dashed lines. With control off (to the left of the first dashed line) the A P D , alternates between approximately 110 and 180 ms. When the feedback perturbations are applied to the tissue (in between the dashed lines), the response pattern converges to 1:1 behavior with APD* = 132 ms after a few second transient and e,, attains a small value (<0.002 PP*). When control is turned off (to the right of the second dashed lines), the 1:1 dynamical state is unstable and the system slowly returns to alternans. This behavior is reminiscent of the numerical experiments shown in Fig. 10. In contrast to PPF algorithms [3] that uses a comparison between the current state of the system and a reference state (the desired unstable state) determined through a time-consuming 'learning phase,' TDAS has the advantage of using feedback generated solely from a comparison between current and previous states of the system. Thus TDAS algorithms, which are less sensitive to parameter drift and
Controlling the dynamics of cardiac muscle using small electrical stimuli
Control Off
(a)
~'~ 9
m
o >
.
< ~-~ v
Control On
I
253
Control Off I
0.4
0.2 0.0 -0.2
(b)
0.2
~.
0.1
n "e
0.0
i
-0.1
-0.2 .
0
.
.
.
m
5
,
,
10
15
20
time (s) Fig. 13. Temporal evolution of the (a) monophasic action potential signal (MAP) and (b) the error signal e, (quantified as a fraction of the average period PP*) during a control experiment.
experimental noise than PPF methods, have potential advantages for the control of complex behavior in biological systems.
8. Outlook Our preliminary experiments on controlling alternans in the bullfrog tissue is a first step toward realizing a method for controlling complex whole-heart dynamics. We are currently investigating the use of variants of this method to control the dynamics of a fibrillating sheep atrium in conjunction with a high-density mapping system to record the effects of the control perturbations on the entire atrium. This basic research program will provide new insights into the fundamental mechanisms that regulate the dynamics of small pieces of cardiac muscle subjected to small, appropriately timed electrical stimuli. The studies will help to elucidate the important factors that determine the success or failure of nonlinear dynamics control techniques in stabilizing cardiac arrythmias. The knowledge gained from these studies may improve current clinical methods of cardioversion and will increase the number of treatment options for cardiac patients.
Acknowledgements We gratefully acknowledge invaluable collaborations with Ellen Dixon-Tulloch, Wanda Krassowska, Robert Oliver, and Patrick Wolf of the Duke Experimental Electrophysiology Laboratory, Henry Greenside and Joshua Socolar of the Duke
254
D.J. Gauthier et al.
Physics Department, and David Schaeffer of the Duke Mathematics Department, a n d f i n a n c i a l s u p p o r t o f t h e W h i t a k e r F o u n d a t i o n a n d t h e N a t i o n a l Science Foundation.
References 1. 2. 3. 4. 5. 6. 7.
8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.
Shinbrot, T., Grebogi, C., Ott, E. and Yorke, J.A. (1993) Nature 363, 411. Ott, E. and Spano, M. (1995) Phys. Today 48, 34. Garfinkel, A., Spano, M.L., Ditto, W.L. and Weiss, J.N. (1992) Science 257, 1230. Hall, K., Christini, D.J., Tremblay, M., Collins, J.J., Glass, U and Billette, J. (1997) Phys. Rev. Lett. 78, 4518. Wiggers, C.J. (1940) Am. Heart J. 20, 399. Zipes, D.P., Fischer, J., King, R.M., Nicoll, A. and Jolly, W.W. (1975) Am. J. Cardiol. 36, 37. Ideker, R.E., Chen, P-S., Shibata, N., Colavita, P.G. and Wharton, J.M. (1987) in: Non Pharmacological Therapy of Tachyarrhythmias, eds G. Breithardt, M. Borggrefe, and D. Zipes, pp. 449-464. Futura, Mount Kisco. For a recent review, see Winfree, A.T. (1998) Chaos 8, 1 and the accompanying articles on the special. Focus Issue on Fibrillation in Normal Ventricular Mycoardium. Gray, R.A., Pertsov, A.M. and Jalife, J. (1998) Nature (London) 392, 75. Witkowski, F.X., Penkoske, L.J., Giles, P.A., Spano, W.R., Ditto, M.U and Winfree, A.T. (1998) Nature (London) 392, 78. Cross, M.C. and Hohenberg, P.C. (1994) Science 263, 146. Rappel, W.-J., Fenton, F. and Karma, A. (1999) Phys. Rev. Lett. 83, 456. Ditto, W.L., Spano, M.L., In, V., Neff, J., Meadows, B., Langberg, J.J., Bolmann, A. and McTeague, K. (2000) Int. J. Bifurcations Chaos 10, 593. Oliver, R.A., Hall, G.M., Bahar, S., Krassowska, W., Wolf, P.D., Krassowska, W., Dixon-Tulloch, E.G. and Gauthier, D. (2000) J. Cardiovasc. Electrophysiol. 11, 797. Katz, A.M. (1992) Physiology of the Heart, 2nd Edn, Chapter 2. Raven Press, New York. Plonsey, R., Barr, R.C. (1991) Bioelectricity: A Quantitative Approach. Plenum Press, New York. Luo, C.-H. and Rudy, Y. (1991) Circ. Res. 68, 1501. Rasmusson, R.L., Clark, J.W., Giles, W.R., Robinson, K., Clark, R.B., Shibata, E.F. and Campbell, D.L. (1990) Am. J. Physiol. (Heart Circ. Physiol. 28) 259, H370. Rasmusson, R.L., Clark, J.W., Giles, W.R., Robinson, K., Clark, R.B. and Campbell, D.L. (1990) Am. J. Physiol. (Heart Circ. Physiol. 28) 259, H352-H369. Guevara, M., Ward, G., Shrier, A. and Glass, L. (1984) in: Computers in Cardiology, IEEE Comp. Soc., p. 167. Glass, L. and Mackey, M.C. (1988) From Clocks to Chaos: The Rhythms of Life. Princeton University Press, Princeton, NJ. Hall, G.M., Bahar, S., Gauthier, D.J. (1999) Phys. Rev. Lett. 82, 2995. Henriquez, C.S. (1993) Crit. Revs. Biomed. Eng. 21, 1. Gallagher, J.J., Kasell, J.H., Cox, J.L., Smith, W.M., Ideker, R.E. and Smith, W.M. (1982) Am. J. Cardiol. 49, 221. Witkowski, F.X. and Corr, P.B. (1984) Am. J. Physiol. 247, H661. Ideker, R.E., Smith, W.M., Wolf, P.D., Danieley, N.D. and Bartram, F.R. (1987) PACE 10, 281. Shenasa, M., Borggrefe, M. and Breithardt, G. eds (1993) Cardiac Mapping, Futura Publishing Company, Inc., Mount Kisco, New York. Hall, G.M. and Gauthier, D.J. (2000) Submitted for publication. Christini, D.J., Hall, K., Collins, J.J. and Glass, L. (2000) in: Handbook of Biological Physics: Neuro-Informatics, Neural Modeling, Vol. 4, eds, F. Moss and S. Gielen. Elsevier, New York. Ott, E., Grebogi, C. and Yorke, J.A. (1990) Phys. Rev. Lett. 64, 1196. Sukow, D.W., Bleich, M.E., Gauthier, D.J. and Socolar, J.E.S. (1997) Chaos 7, 560. Nolasco, J.B. and Dahlen, R.W. (1968) J. Appl. Physiol. 25, 191.
Controlling the dynamics of cardiac muscle using small electrical stimuli
255
33. Berge, P., Pomeau, Y., Vidal, C. (1984) Order within Chaos: Towards a Deterministic Approach to Turbulence, pp. 40-42. Wiley, New York. 34. Romeiras, F.J., Grebogi, C., Ott, E. and Dayawansa, W.P. (1992) Physica D 58, 165. 35. Gluckman, B.J., Spano, M.L., Yang, W., In, V. and Ditto, W.L. (1997) Phys. Rev. E 55, 4935. 36. Chen, P-S., Wolf, P.D., Dixon, E.G., Danieley, N.D., Frazier, D.W., Smith, W.M. and Ideker, R.E. (1988) Circ. Res. 62, 1191. 37. Kirchhof, C., Chorro, F., Scheffer, G.J., Brugada, J., Konings, K., Zetelaki, Z. and Allessie, M. (1993) Circulation 88, 736. 38. KenKnight, B.H., Bayly, P.V., Gerstle, R.J., Rollins, D.L., Wolf, P.D., Smith, W.M. and Ideker, R.E. (1995) Circ. Res. 77, 849. 39. Watanabe, M. and Gilmour, R.F. Jr. (1996) J. Math. Biol. 35, 73. 40. Christini, D.J. and Collins, J.J. (1996) Phys. Rev. E (Rapid Communition) 53, R49. 41. Schiff, S.J., Jerger, K., Duong, D.H., Chang, T., Spano, M.L. and Ditto, W.L. (1994) Nature (London) 370, 615. 42. Pierson, D. and Moss, F. (1995) Phys. Rev. Lett. 75, 2124. 43. Christini, D.J. and Collins, J.J. (1995) Phys. Rev. Lett. 75, 2782. 44. Gauthier, D.J. and Socolar, J.E.S. (1997) Phys. Rev. Lett. 79, 4938. 45. Glass, L. and Josephson, M.F. (1995) Phys. Rev. Lett. 75, 2059. 46. Biktashev, V.N. and Holden, A.V. (1995) Proc. R. Soc. Lond. B 261, 211. 47. Aranson, I., Levine, H. and Tsimring, L. (1995) Phys. Rev. Lett. 72, 2561. 48. Karma, A. (1994) Chaos 4, 461. 49. Clay, J.R., Brochu, R.M. and Shrier, A. (1990) Biophys. J. 58, 609. 50. Kowtha, V.C., Kunysz, A., Clay, J.R., Glass, L. and Shrier, A. (1994) Prog. Biophys. Molec. Biol. 61, 255. 51. Delmar, M., Glass, L., Michaels, D.C. and Jalife, J. (1989) Circ. Res. 65, 775. 52. Roth, B.J. (1997) Cardiovasc. Electrophysiol. 8, 768. 53. Lin, S.-F., Roth, B.J., Eclat, D.S. and Wikswo, J.P. Jr. (1996) circulation 94, 1-714. 54. Roth, B.J. and Krassowska, W. (1998) Chaos 8, 204. 55. Holmes, P. (1990) in: Mathematical Approaches to Cardiac Arrhythmias, Ann. N. Y. Acad. Sci., Vol. 591, p. 301. ed. J. Jalife. The New York Academy of Sciences, New York. 56. Chialvo, D.R. (1990) in: Mathematical Approaches to Cardiac Arrhythmias, Ann. N. Y. Acad. Sci., Vol. 591, p. 351. ed. J. Jalife. The New York Academy of Sciences, New York. 57. Chialvo, D.R., Michaels, D.C., Jalife, J. (1990) Circ. Res. 66, 525. 58. Hall, G.M., Bahar, S. and Gauthier, D.J. (2000) Submitted for publication. 59. Mines, G.R. (1913) J. Physiol (London) 46, 349. 60. Yehia, A.R., Jeandupeux, D., Alonso, F. and Guevara, M.R. (1999) Chaos 9, 916. 61. Socolar, J.E.S. and Gauthier, D.J. (1998) Phys. Rev. E 75, 6589.
This Page Intentionally Left Blank
CHAPTER 8
Intrinsic Noise from Voltage-Gated Ion Channels" Effects on Dynamics and Reliability in Intrinsically Oscillatory Neurons J.A. W H I T E * and J.S. HAAS Department of Biomedical Engineering, Center for BioDynamics, 44 Cummington Street, Boston, MA 02215, USA *Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston University, Boston, MA 02215 USA
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
257
Contents 1.
Introduction
.................................................
2.
Background
.................................................
3.
4.
259 259
2.1.
Sources of electrical noise in the n e r v o u s system
2.2.
Effects o f electrical noise in the p o s t s y n a p t i c n e u r o n . . . . . . . . . . . . . . . . . . . . . .
Case study: Stellate n e u r o n s o f the e n t o r h i n a l cortex
........................ ........................
259 264 265
3.1.
Basic electrophysiological p r o p e r t i e s
...............................
3.2.
Q u a n t i f y i n g biological noise sources
...............................
266 268
3.3.
M o d e l i n g the effects o f c h a n n e l noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
269
Summary and conclusions .........................................
276
Acknowledgements
277
References
.............................................
.....................................................
258
277
1. Introduction Bioelectrical events underlie many of the processes necessary for life, including communication among nerve cells; the peristaltic wave of activity that moves food through the digestive system; secretion of insulin by the pancreas; and contraction of cardiac and skeletal muscle. All of these events are generated by ion channelsprotein complexes, inserted in the cell membrane, that act as molecular "gates" of electrical current [1-4]. The opening and closing of these gates can be controlled by one or more variables, including transmembrane voltage or the concentration of a chemical (e.g., a neurotransmitter or hormone) inside or outside the cell. A fundamental property of ion channels and other bioelectrical elements is that they are stochastic: one cannot specify the exact behavior of an ion channel, but only statistical descriptors of its behavior (e.g., the probability that the channel is open; the moments of the single-channel conductance). Similarly, the events underlying synaptic communication between nerve cells are stochastic at many levels. These stochastic phenomena add an element of noise to electrical responses in excitable cells. In this chapter, we begin by describing the major sources of biological noise in neurons (nerve cells), and how each can be characterized experimentally. We review some of the consequences of biological noise, and present a detailed case study of how noise interacts with oscillatory dynamics in a particular population of neurons in the mammalian brain (stellate neurons of the medial entorhinal cortex). We argue that intrinsic noise from voltage-gated Na + channels is sufficient to alter the electrical dynamics of stellate cells in a number of specific ways.
2. Background 2.1. Sources of electrical noise in the nervous system
Neuronal noise sources fall into many categories, and can be organized differently according to one's perspective. Here, we categorize them with loose correspondence to the neurophysiologist's world view, in which noise sources are grouped by physiological underpinnings rather than some other attribute (e.g., frequency content). 2.1.1. Voltage-gated ion channels Voltage-dependent conductances underlie action potentials and other intrinsic electrical activity in nerve and muscle cells. These conductances are generated by the concerted action of 102-106 voltage-gated ion channels, each of which appears to "gate" (open and close) probabilistically. The source of the apparent random behavior is believed to be thermal excitation of a molecule with multiple stable states. Typically, the gating of voltage-dependent ion channels is modeled as a Markov
259
260
J.A. White and J.S. Haas
process with a finite number of states (S;) and state-transition rates (a~) that depend instantaneously on membrane potential [1,2,4,5], although attractive alternatives to the Markov model exist [6-8]. Eq. (1) shows a hypothetical Markov model with four states" ~12 Sl~
~23 $2.,.
~21
~34 83,,
~32
(1)
84 ~43
The Markov process formalism is memoryless (i.e., its transitional probabilities are independent of its history), making it rather simple to derive much about the statistics of the modeled channel. For example, with fixed membrane potential, the autocorrelation function for single-channel conductance is a multi-exponential function [1,2,4,5]. A population of n such channels, assumed independent and each open with probability p, has simple binomial statistics under steady-state conditions. The noisiness of this group of channels can be quantified by the coefficient of variation (CV, the ratio of standard deviation to mean) of membrane current or conductance. For a binomial distribution, C V = {(1 -p)/(np)} 1/2 [4]. Thus, under assumptions of stationarity, the noisiness of a given population of voltage-gated channels is proportional to n -1/2. Use of Markov models to describe voltage-gated ion channels can best be illustrated through an example. Figure 1A shows a schematic representation of the classical voltage-activated K + channel from the Hodgkin-Huxley model. In this representation, the conductance state of the channel is determined by the states of four "gates". The gates open and close randomly, with independent and identical distributions. The channel conductance is zero, unless all four gates are open, as in the rightmost state. If each gate opens with rate constant an and closes with rate constant 13,, the channel as a whole is described by the 5-state rate scheme [1,4]" 4~n nO ~
3~n
2~n
~n
nl ~ n2 ~ n3 ,< _ n 4 [3, 2[~, 313,, 4[~,
(2)
In this scheme, the state nJ has j open gates. Integer coefficients of rate constants between two states are determined by the number of possible paths between the states. Responses of populations of n such 5-state channels to step changes in membrane potential are shown in Fig. lB. The steady-state autocorrelation of conductance in a single channel of this form is shown in Fig. 1C (solid line). The Markov description is mathematically elegant, but some voltage-gated channels seem to violate its underlying assumption of memorylessness; instead, transition rate constants can be proportional to t -c, where t is the amount of time already spent in a given state and c is a constant [6,7]. At the single-channel level, this fractal behavior in rate constants prolongs the "tails" of the autocorrelation function (Fig. 1C). At the population level, fractal rate constants make estimates of variance grow systematically with the time window of analysis. Fractal behavior in ion channel state transition rates gives individual channels memory, greatly complicating the task of representing a population of channels in
Intrinsic noise from voltage-gated Ion channels
261
(A)
(B)
0
~ open
0
~ closed
(c)
n=l
120 ;i Ii
100 (/3 cl.
n=lO
,-.., +
n= 100
o'~ I.l.J
n : 1000
6
' 1'0
' 2'0
40
.........
p .
.
.
.
5
p=11
......................................
-~0 - ~ 0
L_______ ' 3'0
60
Markov
/ ~ ~ ~ ~ " ]
deterministic//""-
i. I
80
' 4'0
-~0
~
~'0
;o
'
~'o
(ms)
' s'o
Time (ms)
Fig. 1. Stochastic models of ion channels. (A) A pictorial representation of a voltage-gated channel with four gating particles. The channel conducts only when all four gating particles are open. Typically, individual gating particles are modeled as independent, 2-state Markov processes with identically distributions. Under these assumptions, only the number of open gates, rather than the state of each gate, must be tracked in a simulation. (B) The behavior of a population of n Hodgkin-Huxley [60] K + channels in response to a voltage-clamp step from a holding potential o f - 7 0 mV to a clamp potential of 0 inV. Stimulus onset at t - 10 ms; offset at t = 40 ms. Each plot shows the conductance from a single run, scaled by the maximal possible conductance for comparative purposes. Single channels (top) have only two conductance states. With increasing n, the ratio of standard deviation to mean conductance drops proportionally to 1/~[n. The deterministic curve represents the behavior of the system in the limit as n ~ oc. (C) The autocorrelation of the single-channel conductance under steady-state conditions ( V = 0 mV). The Markov channel description (solid line) has a multi-exponential autocorrelation function with correlation time < 20 ms. In pseudo-fractal channel descriptions (dashed line, dotted line), multi-exponential autocorrelation functions approximate power law relationships. As p (the number of states in the pseudo-fractal channel model) increases, the approximation remains valid for longer time intervals [8]. a computational model. These difficulties can be circumvented by creating pseudofractal representations of ion channels [8]. In these representations, m e m o r y in rate constants is captured over some user-defined bandwidth by a rate scheme with multiplicatively related rate constants. For the H o d g k i n - H u x l e y K + channel, the
J.A. White and J.S. Haas
262
appropriate pseudo-fractal representation of a model with memory in the closed state is: A/K p- l nl
( B/KP -1
B/Kp-2 n2
( B/KP -2
A/K ""
~ B/K
A np-1 (
np
(3)
B
where A = {4(~n + ~n)4~n}l{(O~n -t- ~ n ) 4 -- ~n4}, B = 413n, and K is a constant [81. The pseudo-fractal model accounts for memory by allowing closed channels to wander several steps away from the open state (np). In the limit as the number of states p oe, the expected value of single-channel conductance approaches that of the Markov model (Eq. (2)), with fractally distributed closed-time intervals. The dashed and dotted lines in Fig. 1C show the effects on the single-channel conductance autocorrelation function for two values of p. The parameters of models like those in Eqs. (1)-(3) are determined routinely from physiological data. The procedure typically involves holding transmembrane voltage V constant at a series of physiologically relevant values (-100 mV < V < 50 mV), and measuring the resultant current that passes through a single channel or a presumed homogeneous population of channels. Parameters are determined by matching measured and modeled response attributes (e.g., spectral densities of transmembrane current).
2.1.2. Synaptic mechanisms The primary form of inter-cellular communication in most neural and neuromuscular systems, chemical synaptic transmission involves at least three important random processes. First, the number of quanta (packets) of neurotransmitter released in response to a presynaptic action potential is a discretely distributed random variable. Second, single quanta of neurotransmitter can be released without occurrence of a presynaptic action potential. Third, neurotransmitter-gated ion-channels, like their voltage-gated counterparts, gate stochastically. 2.1.2.1. Quantal release. The discovery and characterization of quantal release of neurotransmitter [9] stand as one of the great discoveries in the history of neuroscience. Typically, the number of released quanta is described by a binomial distribution, where in this case n is the number of sites of synaptic release (or of synaptic receptors, whichever is in short supply) and p is the probability of release at each site [2]. The binomial distribution implies that synaptic conductances in postsynaptic cells, which are proportional to the number of quanta released presynaptically, have mean value 3'np and variance 3'2np(l- p), where 3' is the magnitude of the conductance change induced by each quantum of neurotransmitter. The Poisson distribution is also used commonly to model synaptic release; in this model the mean and variance of induced synaptic events are equal [2]. Parameter estimation based on either the binomial or the Poisson model is based on several implicit assumptions, including those of statistical independence of release sites and uniform quantal size. In parameter estimation experiments, the researcher electrically stimulates the presynaptic nerve fiber(s) and uses intracellular
Intrinsic noise from voltage-gated Ion channels
263
recording techniques to measure the postsynaptic response. Evoked postsynaptic responses can be measured as changes in membrane potential, or as changes in membrane current. With a sufficiently high signal to noise ratio, it is possible to reconstruct the discretely valued distribution function describing the probability of release of k quanta of neurotransmitter (k = 0, 1, 2,...). With these data in hand, it is relatively straightforward to determine the applicability of the binomial or Poisson models, and to estimate best-fit parameter values [2]. 2.1.2.2. Spontaneous release. A second source of synaptic noise comes from the fact that single quanta of neurotransmitter can be released from presynaptic cells at seemingly random times. The time intervals between spontaneous release events are often assumed to be exponentially distributed with a constant mean rate of spontaneous release [2], although the strict validity of this assumption is doubtful [10]. Spontaneous release events can be measured and characterized from intracellular recordings, typically in the presence of pharmacological agents that block presynaptic action potentials. Depending on the biological preparation, measured spontaneous release events (often called "mini" postsynaptic potentials in the neurobiological literature) can reflect spontaneous release from one presynaptic axon, or the superposition of spontaneous release events from a large number of presynaptic cells. 2.1.2.3. Stochastic gating of neurotransmitter-gated ion channels. The gating behavior of neurotransmitter-responsive (or ligand-gated) ion channels is described classically by the same Markov-process constructs used to describe voltage-gated channels (Eq. (1)), with the difference that some state-transition rates are functions of the concentration of ligand (neurotransmitter) in the immediate vicinity of the receptor. Parameter identification procedures for ligand-gated channels typically involve recording responses to step changes of ligand concentration in the extracellular solution. 2.1.2.4. Other potentially important synaptic noise sources. Chemical synaptic transmission is a complex process. Other mechanisms that may add to the noisy character of synaptic communication include the diffusion of a relatively small number of molecules of neurotransmitter within the synaptic cleft, and the likely stochastic nature of the chemical reactions responsible for degradation or re-uptake of neurotransmitter molecules. The success of researchers in detecting quantal synaptic events from experimental data suggests that the contributions of these additional noise sources are not as large as those of quantal synaptic release.
2.1.3. Randomly distributed interspike &terval stat&tics in presynaptic neurons A third factor that contributes significantly to electrical noise levels in neurons is that of apparent randomness in the time sequence of action potentials in presynaptic neurons. Patterns of activity in presynaptic neurons are most often modeled as homogeneous or inhomogeneous Poisson processes, which offer many numerical
264
J.A. White and J.S. Haas
and analytical conveniences and are often reasonably accurate descriptions of neuronal firing patterns (but see [11]). Estimating the probability density functions describing presynaptic interspike interval statistics from experimental data is a difficult problem. Recordings in reduced preparations like the brain slice show abnormally low presynaptic firing rates that do not reflect rates seen in vivo (i.e., in the living animal), and intracellular recordings are notoriously difficult to make in vivo. Even in the few cases that such data have been collected, detection of presynaptic interspike interval statistics is prevented, because these events are so numerous and are masked by the slow kinetics of synaptic conductance changes and the synaptic noise sources listed above. In practice, many researchers [12-15] lump synaptic noise sources and interspike interval statistics together into one noisy current or conductance source that can, in principle, be characterized from intracellular measurements in vivo. 2.1.4. Other sources Several additional sources may contribute significant electrical noise in neurons. First, the cell membrane, like any element with electrical impedance, has associated with it thermal or Johnson noise. Typically, Johnson noise is believed to have small amplitude and wide bandwidth in neurons [1]. Second, electrical transients from activity in nearby neurons may be coupled capacitatively to the neuron in question, leading to a phenomenon called ephaptic noise. Noise from nearby neurons can be coupled resistively, through inter-cellular channels called gap junctions, as well [16]. Third, variations in intra- and extracellular ionic concentrations [17] may affect processes that depend on either membrane potential (e.g., ion channels) or chemical gradients (e.g., pumps), in a phenomenon sometimes called metabolic noise. Unlike the other noise sources listed here, metabolic noise depending on bulk concentrations would be expected to have a slow time scale. However, consideration of microdomains and other spatial gradients in concentration within the cell may lead to fast components of metabolic noise. 2.2. Effects of electrical no&e in the postsynaptic neuron
The ramifications of biological noise sources have been studied using a number of measures and methods. Here, we discuss briefly three interrelated measures that reflect the presence of biological noise: reliability, stochastic resonance, and noisedriven spontaneous activity. 2.2.1. Reliability and threshold To a first approximation, neurons can be thought of as devices that fire an all-ornothing action potential with a distinct threshold. It stands to reason that electrical noise from voltage-gated channels, for example, could make the neuronal threshold less distinct: cells might be expected to generate action potentials in response to brief current pulses with some probability that is a sigmoid function of stimulus amplitude. "Soft" thresholds of this kind have been seen in experimental data for over
Intrinsic noise from voltage-gated Ion channels
265
60 years [18-20], and have been attributed quantitatively to intrinsic noise from voltage-gated ion channels [21-23]. More recent work has focused on neuronal reliability to repeated presentations of broad-band stimuli [12,24-27]. "Reliability" in this context is typically quantified as the average cross-correlation of successive responses to an identical, broad-band stimulus. This approach has the advantage that the frequency content of presented stimuli is more realistic, but the potential disadvantage that it can be difficult to interpret responses to mixed sub- and suprathreshold stimulus sequences [28]. The underlying train of thought behind these experiments is that a cell with important effects of intrinsic noise should be unreliable in response to repeated presentations of broad-band stimuli with levels of fluctuation mimicking those seen in vivo; in contrast, high reliability indicates that intrinsic noise may be unimportant, and that variable responses seen in vivo are caused by different synaptic signals received on successive presentations of the stimulus. In recorded and simulated responses, reliability is maximal for stimuli with large fluctuations and frequency content matching the preferred firing frequency of the cell [12,24-27,29], although exceptions to this rule may exist [30]. 2.2.2. Stochastic resonance The sigmoid response probabilities mentioned above point to the result that intrinsic and other noise sources can enhance neuronal detection of a small stimulus. In this form of stochastic resonance, the neuron itself, and/or the probabilistic input signal it receives, provide the noise that enhances representations of small inputs. In considering the output of a network of neurons, receiving a common stimulus, it is interesting to note that the majority of the noise sources mentioned above would naturally operate in a neuronal population as independent noise sources, with consequent advantages in signal representation [31]. 2.2.3. Spontaneous firing and interspike interval stat&tics Intrinsic noise from voltage-gated ion channels has been shown to be sufficient to cause spontaneous firing in otherwise quiet neuronal models [3,21,22,32-36]. In the classical Hodgkin-Huxley model, in which repetitive action potentials arise via a subcritical Hopf bifurcation, interspike interval statistics of spontaneous activity are well described by a Poisson process with dead time [8,33]. Modification of ion channel kinetics to have long-term correlations in state occupancy gives rise to related longterm trends in the ordering of interspike intervals [8]. Bifurcation structure also has dramatic effects on interspike interval statistics in noisy models of single neurons [37]. Models that give rise to periodic firing via saddle-node bifurcations have long tails in their interspike interval distributions; models that give rise to periodic firing via Hopf bifurcations have more constrained distributions of interspike intervals. 3. Case study: stellate neurons of the entorhinal cortex
The hippocampal region, consisting of the hippocampus proper and the associated entorhinal, perirhinal, and parahippocampal cortices, plays a crucial role in memory
266
J.A. White and J.S. Haas
[38-40], and is a site of critical neuropathologies in Alzheimer's disease [41] and temporal lobe epilepsy [42]. During periods of attentiveness or intentional movement, the rat hippocampal EEG is dominated by a 4-12 Hz rhythm called the theta rhythm [43,44], which is the product of synchronous oscillatory activity in several interconnected brain regions, including the hippocampus and entorhinal cortex [43]. Two lines of evidence link the theta rhythm to the memory-related functions of the hippocampal region. First, disabling theta activity leads to severe memory impairment [40,45]. Second, theta-patterned activity is effective in inducing long-term changes in synaptic efficacy thought to be linked to memory formation in the hippocampus [46,47]. Two complementary mechanisms contribute to the hippocampal theta rhythm. First, the hippocampus is "paced" by theta-coherent input from external brain structures, in particular a nearby structure called the medial septum [45]. Second, the hippocampal formation includes intrinsic mechanisms that seem to aid in developing coherence in the 4-12 Hz band, including specialized synaptic kinetics [48] and specialized single-cell interspike interval dynamics [49,50]. In this case study, we will focus on stellate cells (SCs) of the medial entorhinal cortex (MEC). These neurons, which deliver much of the information from the neocortex to hippocampus, exhibit intrinsic oscillations in membrane potential at theta frequencies, even without synaptic input [50,51]. The mechanisms underlying these intrinsic oscillations are inherently noisy, with interesting consequences for spiking dynamics and information representation in the hippocampal formation.
3.1. Basic electrophysiological properties SCs of the MEC have unmistakable and unusual intrinsic electrophysiological properties [50,51]. Fig. 2A shows responses of an MEC stellate cell at rest (bottom trace), and at two levels of applied DC current. For intermediate current levels of DC current (middle trace), SCs generate ~8 Hz, subthreshold oscillations, with occasional action potentials locked to the depolarizing phase of the underlying oscillations. At higher levels (top trace), the frequency of the oscillations and probability of generating an action potential per oscillatory cycle both increase. For suprathreshold current levels, action potentials do not occur in a memoryless fashion. Instead, action potentials tend to "cluster" on adjacent cycles of the slow oscillations (Fig. 5A). The mechanisms underlying subthreshold oscillations and phase-locked spikes in SCs are relatively well understood (Fig. 2B). Spikes are generated by kinetically typical Na + and K + conductances 9Nal and 9Kl [52,53]. Subthreshold oscillations are independent of synaptic input. Their depolarizing phase is caused by a persistent (noninactivating) Na + current (gNa2). Their hyperpolarizing phase is caused by a slow hyperpolarizing conductance (9s) that may include contributions from activation of a slow K + current and deactivation of the slow inwardly rectifying cation current Ih [32,54-57]. The basic properties of the oscillations (e.g., amplitude, frequency) do not seem to depend critically on the precise identity of the slow hyperpolarizing current [57].
Intrinsic noise from voltage-gated Ion channels
267
(A)
0.6nA
0.3nA
0.0 nA
=;--. . . . . . . . - ~ 20 mV 250 ms externally applied inDut
(B)
--q app
inside +
C _
V _T
_
outside responsible for action potentials
~ responsible for slow oscillations
synaptic input
Fig. 2. Subthreshold oscillations and rhythmic spiking in stellate cells (SCs) of the medial entorhinal cortex (MEC). (A) Electrophysiologically recorded responses of a stellate cell to DC current, applied through the recording electrode, at three levels. At lapp - 0.3 nA, the cell exhibits noisy subthreshold oscillations and the occasional action potential (truncated, the raw action potential is ~100 mV in amplitude). At lapp = 0.6 nA, oscillatory frequency and the probability of spiking increase. (B) A minimal, biophysically based model that accounts for subthreshold oscillations and rhythmic spikes. In this circuit diagram, C represents cellular capacitance per unit area; 9N~ and 9~ the voltage-gated Na + and K + conductances responsible for action potentials; gNa2 the persistent (noninactivating) Na + conductance responsible for the rising phase of the subthreshold oscillations; 95 the slow conductance responsible for the falling phase of the subthreshold oscillations; 9e the ohmic "leak" conductance; and 9~yn represents synaptic input. Each conductance is in series with a voltage source representing its reversal potential (i.e., the potential at which current through each conductance changes sign). The current s o u r c e lap p represents current applied via the recording electrode.
268
J.A. White and J.S. Haas
3.2. Quantify&g biological no&e sources
The electrical behavior of MEC SCs is noticeably noisy in the subthreshold voltage range (Fig. 2A). Because the persistent Na + conductance gNa2 is activated in this voltage range and is mediated by a relatively small number of channels, it seemed to us a likely dominant contributor to this membrane electrical noise. We tested this hypothesis by measuring the noise contributions of the persistent Na + conductance, and comparing these contributions to those of all other sources [32]. Fig. 3A shows
(A) 50
TTX A Q..
-50
Control -100 -150 I
200
'
I
'
300
I
I
500
400
Time (ms)
(B)
1.0-
> (J
0.5
iiiiiili 0.0 I
-75
'
I
-50
'
I
-25
'
I
0
Vm ( m Y )
Fig. 3. Noise from persbtent channels & SCs. (A) Sample responses under steady-state voltage-clamp in control conditions (lower trace), and in the presence of tetrodotoxin (TTX, 1 pm), which blocks Na + channels selectively. In TTX, the mean current is reduced to zero, and the variance is reduced over threefold (from 144 to 42 pA2); the latter result implies that Na + channels are the major electrical noise source in these neurons under these experimental conditions (isolated neurons from the MEC; see [52]). The data shown were collected starting 200 ms after switching membrane potential to the clamp potential o f - 4 0 mV. Data of this kind can be used to estimate the rate constants, open-channel probabilities, and number of channels in an assumed independent and identically distributed population of stochastic ion channels [4,32]. (B): CVN~, the coefficient of variation of the persistent Na + current, plotted vs. steady-state membrane potential. In the subthreshold range (-60 to -50 mV), CVNa > 0.25, implying that the persistent Na + current is significantly noisy. Data (mean + SEM) are lumped from 4 neurons.
Intrinsic noise from voltage-gated Ion channels
269
sample data of this kind. The lower trace in Fig. 3A shows steady-state membrane current measured while membrane potential is held at -40 mV. The upper trace shows the equivalent measurement made in the presence of tetrodotoxin (TTX), a highly selective blocker of Na + channels. Application of TTX reduces the mean membrane current under these conditions from -75 to 0 pA, and the variance in membrane current from 144 to 42 pA 2. This reduction in variance indicates that the persistent, TTX-sensitive Na + current is the major source of membrane current noise in these cells under these experimental conditions (which do not account for noise from synaptic sources). Fig. 3B shows CVNa, the coefficient of variation (ratio of standard deviation to mean) of the persistent Na + current from 4 MEC SCs (mean i SEM). As predicted by binomial models, CVNa is relatively high for membrane potentials in the subthreshold range.
3.3. Model#zg the effects of channel noise 3.3.1. Effects on excitability and bifurcation structure Using data like those in Fig. 3, we constructed a model of MEC SCs (Fig. 2B). This model includes standard nonlinear differential equation-based descriptions of the voltage-gated conductances gNal, gK, and gs. The persistent Na + conductance gNa2, on the other hand, is modeled as a collection of independent, probabilistically gating channels [32]. Fig. 4 shows schematic bifurcation diagrams for deterministic (A) and stochastic (B) SC models, in response to DC current. Above each qualitative region in the bifurcation diagram is a schematic of the time-domain response of the model in that region. Deterministic models have a stable fixed point at rest. With increasing applied current (lapp), the stable fixed point becomes more positive, then loses its stability via a subcritical Hopf bifurcation. A stable limit cycle appears, representing rhythmic firing of action potentials at theta frequencies. Around the bifurcation point, there is a small region of bistability. Subthreshold oscillations are rarely seen in deterministic models in practice [32], because parameters must be tuned very precisely for them to occur. As indicated by the schematic bifurcation diagram in Fig. 4B, intrinsic noise from Na + channels changes the behavior of the model SC considerably. As applied current is increased, subthreshold oscillations arise (Fig. 4B, leftmost bifurcation). These oscillations resemble those that would arise via a supercritical Hopf bifurcation, but in fact they reflect the behavior of a noise-driven system spiraling around a stable critical point with associated complex eigenvalues. At a second critical value of lapp, the model is able to fire with nonzero probability (second bifurcation). With further increases in applied current, the probability of firing in the stochastic SC model changes gradually from near zero to near one spike per cycle, giving the noisy model a "soft," rather than "hard," threshold. The effect of noise is to increase the cell's dynamic range, by increasing sensitivity for small stimuli and decreasing sensitivity for larger stimuli. This effect is robust over a large parameter space [32]. The parameters of the stochastic (but not deterministic) model SC can be
J.A. White and J.S. Haas
270
(B) Stochastic
(A) Deterministic
oLL \ooo~176176
ooOO oo
vT
~oooo
....
f
f
lapp
lapp
Fixed points Stable
Oscillations ......
Unstable
Fig. 4. Intrinsic noise alters bifurcation behavior in SC models. Schematic bifurcation diagrams (main panels) and time-domain traces (insets) for deterministic (A) and stochastic (B) models of SCs are shown V - membrane potential; lap p -- DC applied current. (A) Deterministic models switch from a stable fixed point to rhythmic spiking. (B): With increasing lapp, stochastic models shift from a stable fixed point to subthreshold oscillations to rhythmic firing with some probability of firing an action potential on a given oscillatory cycle. This probability varies seemingly continuously from near zero to near one, giving the stochastic model a "soft" threshold.
tuned to match experimental data quite well, in terms of spectral density of subthreshold oscillations and spiking probabilities as functions of DC current level [32]. 3.3.2. Interspike interval statistics
Experimental responses of SCs to DC current show a notable phenomenon called "spike clustering" (Fig. 5A, inset), in which action potentials tend to occur on adjacent cycles of the underlying slow oscillations [51]. The main panel of Fig. 5A shows an interspike interval histogram derived from such experimental data (kindly supplied by Angel Alonso). Spike clustering is evident in the high probability of intervals corresponding to one period of the underlying oscillations in experimental data (bars) compared with the expected probability in a memoryless model of the same mean firing probability per cycle (solid line). The stochastic SC model can account to some degree for the spike clustering seen in experimental data. Fig. 5B shows simulated responses at the same mean firing rate (or, equivalently, the same mean probability of firing per slow oscillatory cycle).
271
Intrinsic noisefrom voltage-gated Ion channels
( A ) Experimental Data
(B) simulations
0,6-
0.6-
0.5-
0.5-
0.4-
0.4-
0.3-
0.3
0,2-
0.2
0.1-
0.1
0.0-
0.05
Number of Periods
Number of Periods
(C) pre-eventEnvelopes
Simulations
I
-500
2'o
(D) Markov Chain Model of Spiking
. . . . .
'
I
-250
'
]
0
Time before event (ms)
Fig. 5. Spike clustering in experimental data and stochastic simulations. (A) and (B) Bars show estimated interspike interval distributions derived from experimental data (A) and stochastic simulations (B). Time has been normalized by the period of the underlying subthreshold oscillation (~8 Hz), derived from the spectral density of extended subthreshold epochs of the data. Solid lines: expected interspike interval distributions for memoryless processes with the same mean probabilities of firing per cycle (0.2 for both the experimental data and simulations). Dashed lines: best-fit models with spiking probabilities that are conditional on the occurrence of an action potential on the previous cycle. Insets: sample timedomain data. Horizontal scale bar: 250 ms. Vertical scale bar: 10 mV. (C) Envelopes of subthreshold oscillations preceding action potentials (solid lines; mean + SEM) or action potential failures (dashed lines; mean + SEM), derived from experimental (top) and simulated (bottom) data. Selected data were purely subthreshold epochs >1 s, referenced to the time of occurrence of a spike or peak of a subthreshold oscillation at time = 0. Selected data were processed by subtracting out the mean value, taking the absolute value, and low-pass filtering. Both experimental and simulated results show significant growth in the oscillatory envelope for 1-2 oscillatory periods before action potentials. Initiation of envelope growth (arrows) begins earlier for experimental results than for simulated results. (D) A Markov chain model of conditional spiking probabilities, a represents the probability of spiking on the next cycle after a spike; 13represents the probability of spiking on the cycle after not spiking. Best-fit versions of interspike interval distributions based on this model are plotted as dashed lines in A-B.
272
J.A. White and J.S. Haas
Again, the probability of having an interspike interval of 1 cycle is elevated (cf. the bars and solid line), but not to the degree seen in experimental data. Although the stochastic MEC model can account partially for spike clustering, details of interspike interval distributions are different in Fig. 5A,B. Simulation results are well fit by a two-state Markov chain model of spiking per oscillatory cycle (Fig. 5D, dashed line in Fig. 5B). This result implies that the memory underlying spike clustering in stochastic simulations can be expressed accurately in terms of a simple conditional probability based on the occurrence (or lack thereof) of an action potential in the previous cycle. Two results indicate that memory in experimental data is somewhat more complex than in stochastic simulations. First, the two-state Markov model (Fig. 5D) does not do as good a job in fitting experimental interspike interval distributions (cf. bars and dashed line in Fig. 5A). In particular, the Markov model necessarily has a monotonically decreasing distribution, P{ISI--j} =(1 - a)(1 - 13)J-213forj > 2. The ISI distribution in experimental data is significantly nonmonotonic. Second, the preevent envelopes of membrane potential (Fig. 5C) preceding action potentials (solid lines; mean + standard deviation) look different in experimental data than in simulations. Specifically, the envelopes, obtained by filtering raw signals to obtain a signal representing the magnitude of the subthreshold oscillations over time, seem to "break away" from the internal control (dashed lines; mean + standard deviation for envelopes that do not precede spikes) earlier in the experimental data than in simulation results (cf. locations of arrows marking the approximate "break-away" point). These results imply that subthreshold oscillations wax significantly for some time before spiking in both experimental data and stochastic simulations, but that this form of self-organization occurs over a longer time scale in experimental data. The multiplicative nature of channel noise, caused by the voltage-dependence of the transition rate constants, contributes to the spike clustering seen in stochastic simulations at low spike rates. This result is seen in Fig. 6, in which we have plotted the estimated probability of occurrence of a spike cluster of length (M) vs. M at two spike rates. Simulations with multiplicative (voltage-dependent) noise show memory that is consistent with the 2-state Markov model of Fig. 5D at both low (Fig. 6A) and moderate (Fig. 6B) spike rates. Simulations with additive current noise, on the other hand, show no memory at low spike rates (Fig. 6A). Interestingly, current noise simulations show significant spike clustering at higher spike rates (Fig. 6B), indicating that the voltage-dependence of channel noise is a contributing, but not necessary, factor for spike clustering. 3.3.3. Stochastic resonance
In Fig. 4, we showed schematic bifurcation diagrams indicating that intrinsic noise contributed by persistent Na + channels is sufficient to "soften" the threshold for firing in response to DC stimuli. A similar effect is seen with 8-Hz, half-wave rectified sinusoidal input, mimicking the excitatory input SCs receive from their neighbors during the theta rhythm. In Fig. 7, we have plotted response probabilities (per cycle) vs. synaptic weight for deterministic and stochastic models of SCs. The deterministic model (solid line) shows three robust firing probabilities (p = 0, 0.5,
273
Intrinsic noise from voltage-gated Ion channels
(A)
C l u s t e r l e n g t h distributions: low r a t e
- -o- - Multiplicative noise - -m- - Additive noise - - -2-state Markov model Memoryless 9 model
10 ~ -z
~,
~
e...~ II t-
~-)
1 0 -1
I~
10 .2
' , .w,,~
. . . .
10 .3
b
I
I
I
I
1
2
3
4
M (B)
C l u s t e r l e n g t h distributions: high r a t e
10 o
1 0 "1
1 0 .2
1 0 .3
I
I
I
2
I
I
3
4
Fig. 6. Multiplicative noise alters cluster length distributions at low, but not high, spike rates. Plots of the probability of having a "spike cluster" (group of action potentials on adjacent slow oscillatory cycles) of length M vs. M, at low (0.05) and moderate (0.2) mean probabilities of firing per cycle. Results are shown for stochastic simulations with multiplicative (voltagegated) noise in the persistent Na + conductance (open circles); simulations with additive current noise, scaled to give the same mean firing rate (closed squares); a best-fit 2-state Markov chain model (Fig. 5D), which allows spiking probability to depend on the occurrence (or lack thereof) of a spike in the previous oscillatory cycle (dashed line); and a memoryless model, which has the same spiking probability (0.05 in A, 0.2 in B) every cycle (solid line).
274
J.A. White and J.S. Haas 1.0-
0.8
L_
Q.
0.6
./
U3 tO
ca. 0.4 t_
o ~Q.
0.2 ~ o ~ ~ ~
0.0 '
'
'
''I
'
'
10 .3
'
'
'
'
''I
10 z gsyn
( mS/cm2 )
Fig. 7. Intrinsic noise smoothes the dependence of spiking probability on magnitude of conductance input. Responses to half-wave rectified, sinusoidal conductance input at 8 Hz. The probability of an action potential per cycle of input is plotted vs. synaptic conductance magnitude 9syn- The reversal potential associated with the synapse was 0 mV. Solid line: response of the deterministic SC model. Dotted line: response of the stochastic model. Adapted from [32]. and 1), with somewhat complex transitions between these states. In contrast, the stochastic model exhibits firing probabilities that vary smoothly and largely monotonically between zero and one. The threshold for nonzero spiking probabilities is two to fivefold lower in the stochastic model, exhibiting a form of stochastic resonance in which the neuron itself provides a noise source that boosts sensitivity. 3.3.4. Effects on reliability As discussed in Section 2, the presence of significant intrinsic noise would be expected to lower the reliability (repeatability) of responses to repeated presentations of timevarying stimuli. Figure 8 shows results from electrophysiological experiments de-
Opposite: Fig. 8. Measurement of reliability in SCs. (A) and (B) Membrane potential responses (lower 10 traces of each panel; action potentials clipped at + 10 mV) to repeated presentations of fluctuating current stimulus (top traces of each panel). Vertical scale = 100 mV/division for voltage traces. The current traces in panels A and B have the same mean value (70 pA) and level of fluctuation ( ~ , the standard deviation, = 100 pA), but different frequency content. Each were generated by a 2-pole, low-pass filter with cutoff frequency ft. The filter was designed to preserve the overall level of fluctuations (cri). (C) Pooled reliability results from 8 putative SCs, identified electrophysiologically by their tendency to generate subthreshold oscillations and fire rhythmically. All cells fired at approximately the same rate (1-3 Hz) for the level of DC current used. Data were collected at two values of fc for each cell. Points are mean + standard deviation.
275
Intr&sic no&e from voltage-gated Ion channels
(A)
(B)
f = 53 Hz
I
0
'
I
'
200
I
'
400
I
'
I
600
'
800
/
fc =
I
'
'
0
1000
8 Hz
I
200
'
I
'
I
400
'
600
I
800
'
/
1000
Time after stimulus onset (ms)
(C)
Summaryresults 0.7-
0.6
--"--f~ = 53 HZ - - o - - f = 8 Hz
T
n
/'T
0.5
~, ._ ._
.m_ "~ n,
L
0.4
0.3
0.2
0.1
0.0
'
2'0
'
40
'
6;
'
80
I
I
I
q
100
120
140
160
~, (pA)
Fig. 8.
Caption opposite.
'
276
J.A. White and J.S. Haas
signed to test this hypothesis. Fig. 8A shows l0 responses (lower l0 traces) of an MEC SC to a repeated, broad-band current stimulus (top trace; current stimuli were generated as pseudo-random, Gaussian white noise of a given variance, then filtered with cutoff frequency fc). This cell responds reliably (9/10 times) at t ~ 80 ms after the onset of the stimulus, but subsequent action potential responses are far less reliable. Fig. 8B shows responses of the same cell to repeated presentations of a stimulus of equal variance but with most of its energy concentrated between 0 and 8 Hz. The SC responds to this stimulus more frequently and more reliably. Fig. 8C summarizes results based on recordings from 8 MEC SCs in brain slices. Plotted is reliability (calculated using the method of Hunter et al. [27]) vs. cyi, the standard deviation in current (or, more accurately, the RMS value of the zero-meaned current waveform). We draw two main conclusions from these results. First, intrinsic noise from voltage-gated channels seems to render SCs unreliable at moderate levels of signal fluctuation (cyl < 140 pA), implying that these neurons may be more difficult to entrain than, for example, neocortical pyramidal cells [12,25]. Second, SCs are sensitive to changes in the frequency content of the timevarying signal. In particular, they fire more reliably (and, in many cases, at higher rates) when the fluctuating stimulus contains more energy within the theta (4-12 Hz) frequency band. Thus SCs may be "tuned" to fire preferentially reliably to input bandwidths associated with a particular behavioral state.
4. Summary and conclusions The biophysical mechanisms underlying electrical excitability and communication in the nervous system make electrical noise ubiquitous and seemingly inescapable. Here, we have reviewed the basic characteristics and expected consequences of several biological noise sources. We have presented a case study in which we argue that the probabilistic gating of tetrodotoxin-sensitive Na + channels may alter neurophysiological function in stellate neurons of the medial entorhinal cortex, a memory-related structure in the mammalian brain. Key effects of intrinsic noise from Na + channels include enhancement of excitability and sensitivity to small stimuli; alteration of cellular bifurcation behavior and "softening" of the neural threshold; enhancement of robustness of electrical behaviors in the face of perturbations of model parameters; reduction of neuronal reliability in response to broad-band stimuli; and enhanced reliability with inputs that have significant energy content within the theta (4-12 Hz) band. The example we highlight here focuses on one intrinsic noise source in a particular neuronal population, but the phenomena associated with noisy, subthreshold oscillations and phase-locked action potentials are likely to be widespread and capable of revealing general principles of single-neuron computation [58,59,61,62]. The consequences of biological noise at the level of neuronal networks seem almost certain to be a fruitful area of research that has only begun to be explored.
Intrinsic noise from voltage-gated Ion channels
277
Acknowledgements W e t h a n k A. A l o n s o for p r o v i d i n g the e l e c t r o p h y s i o l o g i c a l d a t a for Fig. 5, a n d A . D . D o r v a l for t h o u g h t f u l c o m m e n t s o n a p r e l i m i n a r y v e r s i o n o f this c h a p t e r . This w o r k was s u p p o r t e d b y g r a n t s f r o m T h e W h i t a k e r F o u n d a t i o n a n d the N a t i o n a l I n s t i t u t e s o f H e a l t h (NS34425).
References 1. Hille, B. (1992) Ionic Channels of Excitable Membranes, 2nd Edn. Sinauer Associates, Inc., Sunderland, MA. 2. Johnston, D. and Wu, S.M.-S. (1995) Foundations of Cellular Neurophysiology. MIT Press, Cambridge, MA. 3. Koch, C. (1999) Biophysics of Computation. Information Processing in Single Neurons. Computational Neuroscience, ed M. Stryker, Oxford University Press, New York. 4. Weiss, T. (1996) Cellular Biophysics. Vol. 2: Electrical Properties. MIT Press, Cambridge, MA. 5. Sakmann, B. and Neher, E. (1995) Single-Channel Recording, 2nd Edn. Plenum Press, New York. 6. Liebovitch, L.S. and Toth, T.I. (1990) Ann. NYAS 591, 375-391. 7. Liebovitch, L.S. and Todorov, A.T. (1996) Crit. Rev. Neurosci. 10, 169-187. 8. Lowen, S.B., Liebovitch, L. and White, J.A. (1999) Phys. Rev. E 59, 5970-5980. 9. del Castillo, J. and Katz, B. (1954) J. Physiol. (Lond.) 124, 560-573. 10. Lowen, S.B., Cash, S.S., Poo, M.-m. and Teich, M.C. (1997) J. Neurosci. 17, 5666-5677. 11. Lowen, S.B. and Teich, M.C. (1996) J. Acoust. Soc. America 99, 3585-3591. 12. Nowak, L.G., Sanchez-Vives, M.V. and McCormick, D.A. (1997) Cereb. Cortex 7, 487-501. 13. Mainen, Z.F., Joerges, J., Huguenard, J.R. and Sejnowski, T.J. (1995) Neuron 15(6), 1427-1439. 14. Stevens, C.F. and Zador, A.M. (1998) Nature Neurosci. 1, 210-217. 15. Zador, A. (1998) J. Neurophysiol. 79, 1230-1238. 16. Draguhn, A., Traub, R.D., Schmitz, D. and Jefferys, J.G. (1998) Nature 394(6689), 189-192. 17. O'Rourke, B., Ramza, B.M. and Marban, E. (1994) Science 265, 962-966. 18. Pecher, C. (1939) Arch. Int. Physiol. Biochem. 49, 129-152. 19. Verveen, A.A. and Derksen, H.E. (1968) Proc. IEEE 56, 906-916. 20. Verveen, A.A. and DeFelice, L.J. (1974) Prog. Biophys. Mol. Biol. 28, 189-265. 21. Lecar, H. and Nossal, R. (1971) Biophys. J. 11, 1068-1084. 22. Lecar, H. and Nossal, R. (1971) Biophys. J. 11, 1048-1067. 23. Sigworth, F.J. (1980) J. Physiol. (Lond.) 307, 97-129. 24. Bryant, H.L. and Segundo, J.P. (1976) J. Physiol. (Lond.) 260, 279-314. 25. Mainen, Z.F. and Sejnowski, T.J. (1995) Science 268, 1503-1506. 26. Schneidman, E., Freedman, B., and Segev, I. (1998) Neural Comput. 10(7), 1679-1703. 27. Hunter, J.D., Milton, J.G., Thomas, P.J. and Cowan, J.D. (1998) J. Neurophysiol. 80, 1427-1438. 28. Pei, X., Wilkins, L. and Moss, F. (1996) Phys. Rev. Lett. 77, 4679-4682. 29. Jensen, R.V. (1998) Phys. Rev. E 58, R6907-R6910. 30. Gutkin, B. and Ermentrout, G.B. (1999) CNS 99: Proceedings of the Eighth Annual Computational Neuroscience Meeting, Vol. 8, p. 1. 31. Collins, J.J., Chow, C.C. and Imhoff, T.T. (1995) Nature 376, 236-238. 32. White, J.A., Klink, R., Alonso, A. and Kay, A.R. (1998) J. Neurophysiol. 80, 262-269. 33. Chow, C.C. and White, J.A. (1996) Biophys. J. 71, 3013-3021. 34. DeFelice, L.J. and Isaac, A. (1992) J. Stat. Phys. 70, 339-354. 35. DeFelice, L.J. and Goolsby, W.N. (1996) in: Fluctuations and Order, eds M. Millionas, M. Rotenberg, M. Gharib, W. Heller, K. Lindenberg, H.D. Arbarbanel and J.D. Simon, pp. 331-342. Springer, Berlin. 36. Strassberg, A.F. and DeFelice, L.J. (1993) Neural Comput. 5, 843-855.
278
J.A. White and J.S. Haas
37. Gutnick, B. and Ermentrout, G.B. (1998) Neural Comput. 10(5), 1047-1065. 38. Cohen, N.J. and Eichenbaum, H. (1993) Memory, Amnesia, and the Hippocampal System. MIT Press, Cambridge. 39. Eichenbaum, H. (1997) Science 277, 330-332. 40. Squire, L.R. and Zola-Morgan, S. (1991) Science 253(5026), 1380-1386. 41. Van Hoesen, G.W. and Hyman, B.T. (1990) Prog. Brain Res. 83, 445-457. 42. Wieser, H.G. (1983) Electroclinical Features of the Psychomotor Seizure. Butterworths, London. 43. Bland, B.H. and Colom, L.V. (1993) Prog. Neurobiol. 41(2), 157-208. 44. O'Keefe, J. (1993) Curr. Opin. Neurobiol. 3, 917-924. 45. Stewart, M. and Fox, S.E. (1990) Trends Neurosci. 13, 163-168. 46. Christie, B.R. and Abraham, W.C. (1992) Neuron 9(1), 79-84. 47. Larson, J. and Lynch, G. (1986) Science 232, 985-988. 48. Banks, M.I., Li, T.-B. and Pearce, R.A. (1998) J. Neurosci. 18, 1305-1317. 49. Nufiez, A., Garcia-Austt, E. and Bufio, B. Jr. (1990) Brain Res. 416, 289-300. 50. Alonso, A. and Llinfis, R.R. (1989) Nature 342(6246), 175-177. 51. Alonso, A. and Klink, R.M. (1993) J. Neurophysiol. 70, 128-143. 52. White, J.A., Alonso, A. and Kay, A.R. (1993) Neuron 11, 1037-1047. 53. Eder, C., Ficker, E., Giindei, J. and Heinemann, U. (1991) Eur. J. Neurosci. 3, 1271-1280. 54. Dickson, C.T. and Alonso, A. (1997) J. Neurosci. 17, 6729-6744. 55. Galue, A. and Alonso, A. (1996) Soc. Neuroscie Abs. 22, 61. 56. Klink, R.M. and Alonso, A. (1993) J. Neurophysiol. 70, 144-157. 57. White, J.A., Budde, T. and Kay, A.R. (1995) Biophys. J. 69, 1203-1217. 58. Braun, H.A., Sch/ifer, K. and Voigt, K. (1997) J. Comput. Neurosci. 4, 335-347. 59. Huber, M.T., Krieg, J.C., Dewald, M., Voigt, K. and Braun, H.A. (1998) Biosystems 48, 95-104. 60. Hodgkin, A.L. and Huxley, A.F. (1952) J. Physiol. (Lond.) 117, 500-544. 61. White, J.A., Rubinstein, J.T. and Kay, A.R. (2000) Trends Neurosci. 23, 131-137. 62. Steinmetz, P.N., Manwani, A., Koch, C., London, M. and Segev, I. (2000) J. Comput. Neurosci. 9, 133-148.
CHAPTER 9
Phase Synchronization" From Theory to Data Analysis M. ROSENBLUM, A. PIKOVSKY, and J. KURTHS Department of Physics, University of Potsdam, Am Neuen Palais 10, D-14415 Potsdam, Germany
C. SCHAFER Centre for Nonlinear Dynamics, Department of Physiology, McGill University, 3655 Drummond Street, Montreal, Que. Canada H3G l Y6
P.A. TASS Institute of Medicine (MEG), Research Centre Jfilich, D-52425 Jfilich, Germany
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
279
Contents
1.
Introduction
.................................................
1.1. Synchronization in biology
1.2. Synchronization and analysis of bivariate data 2.
281
.....................................
281
.........................
282
Phase and frequency locking: a brief review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
284
2.1. Periodic oscillators
284
2.2. Noisy oscillators 2.3. Chaotic oscillators
.......................................... ...........................................
286
..........................................
287
3.
2.4. An example: two coupled noisy R6ssler oscillators . . . . . . . . . . . . . . . . . . . . . . . Estimating phases from data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
287 289
3.1
292
4.
Straightforward analysis of phase difference: application to posture control . . . . . . . . . . .
293
5.
Statistical analysis of phase difference: application to brain activity . . . . . . . . . . . . . . . .
295
5.1. H u m a n brain activity during pathological tremor
298
6.
An example: synchronization via parametric action (modulation) . . . . . . . . . . . . . .
.......................
Stroboscopic technique: application to cardiorespiratory interaction
...............
6.1. Cardiorespiratory interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. The experimental data and preprocessing 7.
............................
302 302 303
6.3. Cardiorespiratory synchrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
304 307
7.1. Is it really synchronization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
307
7.2. Synchronization vs. coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
312 313
Appendix A. Instantaneous phase and frequency of a signal . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
280
314 319
I. Introduction
Synchronization is a basic phenomenon in science, discovered at the beginning of the modern scientific age by Huygens [1]. In the classical sense, synchronization means adjustment of frequencies of periodic self-sustained oscillators due to weak interaction [2-5]. This effect (also referred to as phase locking or frequency entrainment) is well studied and finds a lot of practical applications [3,4]. During the last 15 years the notion of synchronization has been generalized to the case of interacting chaotic oscillators. In this context, different phenomena exist which are usually referred to as "synchronization", so one needs a more precise description. Due to a strong interaction of two (or a large number) of identical chaotic systems, their states can coincide, while the dynamics in time remains chaotic [6,7]. This effect can be denoted as "complete (identical) synchronization" of chaotic oscillators. It can be easily generalized to the case of slightly nonidentical systems [7], or the interacting subsystems [8]. Recently, the effect of phase synchronization of chaotic systems has been described [9]. It is mostly close to synchronization of periodic oscillations, where only the phase locking is important, while no restriction on the amplitudes is imposed. Correspondingly, the phase synchronization of chaotic system is defined as the appearance of a certain relation between the phases of interacting systems (or between the phase of a system and that of an external force), while the amplitudes can remain chaotic and are, in general, noncorrelated. Of course, the very notion of phase and amplitude of chaotic systems is rather nontrivial. Remarkably, the properties of phase synchronization in chaotic systems are similar to those of synchronization in periodic noisy oscillators [10]. This allows one to describe both effects within a common framework. Moreover, from the experimentalist's point of view, one can use the same methods in order to detect synchronization in both chaotic and noisy systems; we will use this analogy below. Describing particular experiments and searching for phase synchronization, we will not be interested in the question, whether the observed oscillations are chaotic or noisy: the approach we present below is equally applicable in both these cases.
1.1. Synchronization in biology Synchronization phenomena are often encountered in living nature. Indeed, the concept of synchronization is widely used in experimental studies and in the modeling of interaction between different physiological (sub)systems demonstrating oscillating behavior. The examples range from the modeling of the heart in the pioneering paper of van der Pol and van der Mark [11] to investigation of the circadian rhythm [12,13], phase locking of respiration with a mechanical ventilator 281
282
M. Rosenblum et al.
[14] or with locomotory rhythms [15], coordinated movement [13] and animal gaits [16], phase locking of chicken embrion heart cells with external stimuli and interaction of sinus node with ectopic pacemakers [13], synchronization of oscillations of human insulin secretion and glucose infusion [17], locking of spiking from electroreceptors of a paddlefish to weak external electromagnetic field [18], and synchronization of heart rate by external audio or visual stimuli [19]. A very interesting and important example is interaction of human cardiovascular and respiratory systems. Although it is well known that these systems do not act independently [20] and in spite of early communications in the medical literature (that often used different terminology) [21-26], in the biological physics community these two systems were often considered to be not synchronized. So, an extensive review of previous studies of biological rhythms led to the conclusion that "there is comparatively weak coupling between respiration and the cardiac rhythm, and the resulting rhythms are generally not phase locked" (see [13, p. 136]). Recently, the interaction of these vital systems attracted attention of several physics groups, and synchronization during paced respiration [27,28] was investigated. Here, as well as in Refs. [21,23,24,27,28] only synchronous states of orders n: 1 (n heartbeats within 1 respiratory cycle) were found due to limitation of the ad hoc methods used for the analysis of data. In our recent work [29,30] we have reported on cardiorespiratory synchronization under free-running conditions; the proposed analysis technique allows to find out synchronous epochs of different orders n:m. This finding gives some indication for the existence of an unknown form of cardiorespiratory interaction. The notion of synchronization is also related to several central issues of neuroscience (see, e.g., [31]). For instance, synchronization seems to be a central mechanism for neuronal information processing within a brain area as well as for communication between different brain areas. Results of animal experiments indicate that synchronization of neuronal activity in the visual cortex appears to be responsible for the binding of different but related visual features so that a visual pattern can be recognized as a whole [32-34,31]. Another evidence is that synchronization of the oscillatory activity in the sensorimotor cortex may serve for the integration and coordination of information underlying motor control [35]. Moreover, synchronization between areas of the visual and parietal cortex, and between areas of the parietal and motor cortex was observed during a visuomotor integration task in an awake cat [36]. However, as yet, little is known about synchronization between different brain areas and its functional role and behavioral correlates in humans. On the other hand, synchronization plays an important role in several neurological diseases like epilepsies [37] and pathological tremors [38,39]. Correspondingly, it is important to analyze such synchronization processes to achieve a better understanding of physiological brain functioning as well as disease mechanisms. 1.2. Synchronization and analysis of bivariate data As we have argued above, synchronization phenomena are abundant in the real world and biological systems, in particular. Thus, detection of synchronization from
283
Phase synchronization: from theory to data analysis
experimental data appears to be an important problem, that can be formulated as follows: Suppose we can obtain several signals coming from different simultaneous measurements (e.g., an electrocardiogram and respiratory movements, multichannel electro- or magnetoencephalography data, records of muscle activity, etc.). Usually it is known how to attribute these signals to different oscillating objects. The question is whether there are states (or epochs) where these objects oscillate in synchrony. Unfortunately, typically observed oscillations are highly irregular, especially in live systems, and therefore possible synchronization phenomena are masked by strong noise and/or chaos, as well as by nonstationarity. This task is similar to a well-known problem in time series analysis: how to reveal the presence of an interdependence between two (or more) signals. The analysis of such bivariate data is traditionally done by means of linear cross-correlation (cross-spectrum) techniques [40] or nonlinear statistical measures like mutual information or maximal correlation [41-43]. Recently, different synchronization concepts of nonlinear dynamics have been used in studies of bivariate data. Schiff et al. [44] used the notion of dynamical interdependence [45] and applied the mutual prediction technique to verify the assumption that measured bivariate data originate from two synchronized systems, where synchronization was understood as the existence of a functional relationship between the states of two systems, called generalized synchronization. In our previous works [46-49,29], we proposed an ansatz based on the notion of phase synchronization; this implies existence of a relationship between phases of two weakly interacting systems, whereas the amplitudes may remain uncorrelated [9,10]. In our approach we assume that the measured bivariate data originate from two interacting self-oscillatory systems which may either be phase locked or oscillate independently. Generally, we try to access the following problem: suppose we observe a system with a complex structure that is not known exactly, and measure two time series at its outputs (Fig. 1). Our goal is not only to find out whether these signals are dependent or not - this can be done by means of traditional statistical techniques -
(a)
(b)
Fig. 1. Illustration of the synchronization approach to analysis of bivariate data. The goal of the analysis is to reveal the presence of a weak interaction between two subsystems from the signals at their outputs only. The assumption made is that the data are generated by two oscillators having their own rhythms (a). An alternative hypothesis is a mixture of signals generated by two uncoupled systems (b).
284
M. Rosenblum et al.
but to extract additional information on the interaction of some subsystems within the systems. Obviously, we cannot consider the system under study as a "black box", but need some additional knowledge to support the assumption that the content of this "box" is complex, and we indeed encounter several subsystems, that generate their own rhythms, but are, probably, weakly coupled. An advantage of our approach is that it allows to address rather weak interaction between the two oscillatory subsystems. Indeed, the notion of phase synchronization implies only some interdependence between phases, whereas the irregular amplitudes may remain uncorrelated. The irregularity of amplitudes can mask the phase locking so that traditional techniques treating not the phases but the signals themselves may be less sensitive in the detection of the systems' interrelation [46,48]. In the following we briefly review the ideas and results of theoretical studies of the synchronization phenomena that are used in our approach to time series analysis. Next, we present techniques of the bivariate data analysis and illustrate them by examples of physiological data. These examples are given in the ascending order of the signal analysis complexity, and in our presentation we dwell on the analysis itself, but not on the physiological importance of the results. 2. Phase and frequency locking: a brief review
We know that synchronization of weakly coupled oscillators appears as some relation between their phases and frequencies. In the context of data analysis we are going to exploit this fact to tackle the inverse problem: our goal is to reveal the presence of synchronization from the data. To this end we have to estimate from the signals the phases and frequencies and analyse the relations between them. First, we summarize what we know about the interdependence of phases and frequencies of synchronized systems. Definitely, as the experimental data are inevitably noisy, we always have to take fluctuations into account. Therefore, any relation between phases should be treated in a statistical sense. 2.1. Periodic oscillators
Stable periodic self-sustained oscillations are represented by a stable limit cycle in the phase space, and the dynamics of a phase point on this cycle can be described as
d~ dt = o0,
(1)
where COo= 2rt/To, and To is the period of the oscillation. It is important that starting from any monotonically growing variable 0 on the limit cycle, one can introduce the phase satisfying Eq. (1). Indeed, an arbitrary 0 obeys 0 = v(0) with a periodic v(0 + 2rt) = v(0). A change of variables = 030
[v(0)]-' dO
~0~
Phase synchronization: from theory to data analys&
285
gives the correct phase, where the frequency m0 is defined from the condition 2n - m0 fZ~[v(0)j-ld0. A similar approach leads to correct angle-action variables in Hamiltonian mechanics. From (1) it is evident that the phase corresponds to the zero Lyapunov exponent, while negative exponents correspond to the amplitude variables (not written in (1)). If two oscillators are weakly coupled, then in the first approximation one can neglect variations of the amplitudes to obtain equations describing the phase dynamics. In general, these equations have the form d~)l
dt = m, + eg, (~,, (~2),
d(~ 2
dt = 032 -3r-gg2((~2' (~1),
(2)
where the coupling terms gl,2 are 2rt-periodic in both arguments, and e is the coupling coefficient. The phase space of Eq. (2) is a 2-torus, and with the usual construction of the Poincar~ map this system can be made equivalent to a circle map, with a well-known structure of phase-locking intervals (Arnold's tongues) [50]; each of the intervals corresponds to a n:m synchronization region. This picture is universal and its qualitative features do not depend on the characteristics of the oscillations and of the external force (e.g. nearly sinusoidal or relaxational), and on the relation between amplitudes. Analytically, one can proceed as follows. The interaction between the oscillators essentially effects the evolution of their phases if the frequencies COl,2 are in resonance, i.e. if for some integers n, m we have nml ~ ram2.
Then, in the first approximation, the Fourier expansion of the functions 91,2 contains slowly varying terms ~ nq~1 - m q ~ 2. This suggests to introduce the generalized phase difference, (Pn,m(t) = /'/q~l (t) -- m~)2(t ).
(3)
Subtracting Eq. (2) and keeping only the resonance terms, we get d%,m d-----i-= ncol - m~o2 + gG(q~n,m),
(4)
where G(.) is 2n-periodic. This is a one-dimensional ODE that admits solutions of two kinds: fixed points or periodic rotations of %,m" The stable fixed point corresponds to perfect phase locking q~,,~ = const.; periodic rotations describe quasiperiodic motion with two incommensurate frequencies in the system (2). In the analytical treatment of the Eq. (2) we have neglected nonresonant terms, what is justified for small coupling. With nonresonant terms, the condition of synchronization for periodic oscillators should be generally written as a phase locking condition [nq~1(t) - mq~2(t ) - 81 < const.,
(5)
M. Rosenblum et al.
286
where 8 is some (average) phase shift, or as a frequency entrainment condition n~l -- mr22,
(6)
where /d~l.2 / f21.2- (~ dt " We emphasize, that in the synchronized state the phase difference is generally not constant but oscillates around 8. These oscillations vanish in the limit of very small coupling (correspondingly, the frequency mismatch neol - me02 must be also small), or if the coupling depends only on the relative phase: 91.2 = g l , z ( n ~ l - m~2)-
2.2. Noisy oscillators In general, both properties of phase and frequency locking (Eqs. (5) and (6)) are destroyed in the presence of noise ~(t) when instead of (4) one has dq)n,m
d~
= ntol - mr
+ cG(q),.m) + ~(t).
(7)
For small noise the stable phase dynamics is only slightly perturbed. Thus the relative phase q),.m mainly fluctuates around some constant level (former fixed point). These nearly stationary fluctuations may be interrupted by phase slips, where the relative phase changes relatively rapidly by -t-2rt. Thus, strictly speaking, the phase difference is unbounded and condition (5) is not valid anymore. Nevertheless, the distribution of the cyclic relative phase k}ln.m - - (Pn,m
mod 2 rt
(8)
has a dominating peak around the value corresponding to the stable fixed point [51]. Presence of this peak can be understood as the phase locking in a statistical sense.
If the noise is weak and bounded, then the phase slips are impossible and there exists a range of frequency mismatch ncol - m r where the averaged condition of frequency locking (6) is fulfilled. Near the boundaries of the Arnold tongue the noise causes phase slips and the transition out of the synchronous regime is now smeared. If the noise is unbounded, e.g. Gaussian, the probability of a slip to occur is nonzero even for n t o l - meo: --0, so that strictly speaking the region of frequency locking shrinks to a point. As this probability is (exponentially) small for weak noise, practically the synchronization region appears as an interval of nr -me02, where nf~l ~ m~2. Within this region, the distribution of the cyclic relative phase is not uniform, so that one can speak of phase locking. In the case of strong noise, the phase slips in both directions occur very frequently, so that the segments of nearly constant relative phase are very short and time course of r looks like a random walk, that is unbiased in the very center of the synchronization region and biased otherwise. The synchronization transition is
Phase synchronization: from theory to data analysis
287
now completely smeared and, hence, synchronization appears only as a weakly seen tendency. 2.3. Chaotic oscillators
For the periodic oscillator the phase was introduced in Eq. (1) as a variable corresponding to the shift along the limit cycle, and, hence, to the zero Lyapunov exponent. Any autonomous continuous-time dynamical system with chaotic behavior possesses one zero Lyapunov exponent that corresponds to shifts along the flow, therefore we expect that phase can be defined for this case as well. Suppose we can define a Poincar6 secant surface for our autonomous continuous-time system. Then, for each piece of a trajectory between two cross-sections with this surface we define the phase as a piece-wise linear function of time, so that the phase increment is 2rt at each rotation: d~p(t) - 2 rt
t-t~ tn+l -- t~
+ 2 rtn,
tn <~t < tn+l.
(9)
Here tn is the time of the nth crossing of the secant surface. Obviously, the definition is ambiguous, because it crucially depends on the choice of the Poincar~ surface. Nevertheless, defined in this way, the phase has a physically important property: its perturbations neither grow nor decay in time, so it does correspond to the direction with the zero Lyapunov exponent in the phase space. Note that for periodic oscillations corresponding to a fixed point of the Poincar6 map, this definition gives the correct phase satisfying Eq. (1). In contrast to the case of periodic oscillations, the growth of the phase of a chaotic system cannot generally be expected to be uniform. Instead, the instantaneous frequency depends in general on the coordinate of the intersection with the Poincar6 surface, i.e. on the irregular amplitude. This dependence can be considered as an influence of some effective "noise" (although this irregularity has of course purely deterministic origin). Thus, the synchronization phenomena for chaotic system are similar to those in noisy periodic oscillations [9,10], therefore, from an experimentalist's viewpoint, we can treat them in the same way. 2.4. An example." two coupled noisy R6ssler oscillators
For illustration we take two coupled chaotic R6ssler oscillators subject to noisy perturbations. Namely, we consider the model )r
-- --f-l)l,ZYl,2 -- Zl,2 -3r- ~1,2 + ~3(X2,1 -- Xl,2)
Y1,2 --- fD1,2Xl,2 q- 0 . 1 5 Y1,2
(10)
21,2 = 0 . 2 -~- ZI,Z(Xl, 2 -- 10)
where the parameters 0~1,2- 1-t-Aco and ~ govern the frequency mismatch and the strength of coupling, respectively and ~1,2 are two Gaussian delta-correlated noise terms fulfilling ( ~ i ( t ) ~ j ( t ' ) ) = 2 D S ( t - t ' ) 6 i j . The system is simulated by
M. Rosenblum et al.
288
Euler's technique with the time step At = 2n/1000. In the following we fix D-1. First we consider the case of two identical oscillators, i.e. Aco = 0. It makes no sense to speak of frequency locking here, as the averaged frequencies are equal even if the oscillators are uncoupled. Nevertheless, we can distinguish the uncoupled (e = 0) and coupled cases (e = 0.04) if we look at the distribution of the relative phase (Fig. 2). In both cases the relative phase performs a random walk-like motion, but its distribution is uniform in the absence of coupling and has a well-expressed peak if two systems weakly interact. Thus, in the latter case we encounter phase locking understood in a statistical sense. Now we consider detuned oscillators, A~0 = 0.015 (Fig. 3). In the absence of noisy perturbations, the phase difference oscillates around some constant level, and its distribution obviously has a sharp peak (not shown). Therefore we can speak of frequency and phase locking here (Fig. 3c). In the presence of noise the relative phase performs a biased random walk, so there is obviously no frequency locking. Nevertheless, the distribution of the phase definitely shows the presence of phase locking (Fig. 3). To conclude this section, we stress that the appearance of synchronization entails some relations between phases and frequencies of oscillators, but the inverse statement is strictly speaking not correct. Indeed, if a system is outside the synchronization region, but close to its border, then the distribution of the cyclic relative phase is also nonuniform, and the frequencies of oscillators are closer than those
(a)
20
~
,
,
,
,
.
-~
(b)
0.10
r
,
,
,
10 0.05 9-
10
(c)
__
~
1
~
I
~
0.00
1
(d)
20
n
0.10 ~- ,
0
n
~
10 t~
0.05
910
0
20000
40000
time
60000
o.oo
!~1
-/1:
0 ~p 1,1
Fig. 2. Relative phase q01.l = ~l - ~2 and distribution of ~I/l.l "~ q)l.l mod 2n for the case of uncoupled (a, b) and coupled (c, d) identical chaotic systems perturbed by noise. Although the fluctuations of q01,1 in both cases seem to be quite alike, the distributions (b) and (d) clearly identify the difference between coupled and uncoupled regimes.
289
Phase synchronization." from theory to data analysis
(a)
28 C',,I
'
f
''
(b) 0.10
~
18
9-
0.05
8 2
,_
(c)
l-
.
I
.
,
.
__
I
,
I
(d)
.
lO ca
6
9-
2
o.oo '"" -71;
0
0.10 l
'
~:
0.05
0
1000 2000 3000 time
4000
5000
0.00
-~:
0
~: 1,1
Fig. 3. Relative phase q~l,l = ~1 - ~ 2 and distribution of ~1,1 = q~l,lmod2Tc for the case of uncoupled (a, b) and coupled (c, d) nonidentical chaotic systems perturbed by noise. The horizontal line in (c) corresponds to the absence of noise: in this case the phase difference fluctuates around some constant value due to influence of chaotic amplitudes. These fluctuations are rather small (barely seen in this scale), and no phase slips are observed; this fact is explained by the high phase coherence properties of the R6ssler attractor. In contrast to the noisy case, here we observe both frequency and phase locking. for noninteracting systems. Thus, the presence of a peak in the distribution of ~,,m(t) generally indicates the presence of some interaction only, but does not necessarily mean that the systems are synchronized. As we are not interested in determination of the borders of a synchronization region, but are only searching for the presence of coupling, this fact does not influence interpretation of our results.
3. Estimating phases from data Before we can analyze the relations between the phases of two oscillators, we have to estimate these phases from scalar signals. We have shown above how to define the phase for a periodic self-sustained system and for chaotic oscillations. Quite often, the phase of an oscillator can be determined if one can find a suitable projection of the phase space ensuring that all the trajectories rotate around some point that is taken as the origin. From this projection, the phase can be identified with the angle (with respect to an arbitrary direction) of the vector drawn from the origin to the corresponding point on the trajectory. Note that in this way we obtain a some nonuniformly rotating phase, what can essentially complicate the analysis. Another possibility is to construct a Poincar6 map (stroboscopic map) and to define the phase according to (9). These two methods can be adapted for estimation of phases from experimental data. To explain the details, we consider a human electrocardiogram (ECG) and a
M. Rosenblum et al.
290
(a)
(b) Rk_l
Rk
RE§
t-
(0 uJ
(L)
o
Fig. 4.
i
1
i
2 time [sec]
i
3
4
D
0
,
t
20
,
I
,
I
40 60 time [sec]
9
~
80
,
- -
1oo
Short segments of an electrocardiogram with the R-peaks marked (a) and of a respiratory signal (b); both signals are in arbitrary units.
respiratory signal (air flow measured at the nose of the subject) as examples. An essential feature of the ECG is that every (normal) cardiocycle contains a wellpronounced sharp peak that can be with high precision localized in time; traditionally it is denoted as R-peak (Fig. 4a). The series of R-peaks can be considered as a sequence of point events taking place at times tk, k = 1 , 2 , . . . Phase of such a process can be easily obtained. Indeed, the time interval between two R-peaks corresponds to one complete cardiocycle; ~ therefore the phase increase during this time interval is exactly 2ft. Hence, we can assign to the times tk the values of phase ~(tk) = 2rtk, and for arbitrary instant of time tk < t < tk+l t -- tk
~(t) - 2rtk + 2 r t ~ t~+l - tk
(11)
This method can be applied to any process that contains distinct marker events and can therefore be reduced to the spike train. Determination of the phase via marker events in time series can be considered as the analogy to the technique of Poincar6 section, although we do not need to assume that the system under study is a dynamical one. Now we consider the respiratory signal (Fig. 4b); it reminds a sine-wave with slowly varying frequency and amplitude. Phase of such a signal can be obtained by means of the analytic signal concept originally introduced by Gabor in 1946 [52]. To implement it, one has to construct from the scalar signal s(t) a complex process ~(t) -~ s ( t ) -t- tSH(t) - - A ( t ) e t~(t),
(12)
From the physiologist's viewpoint the cardiocycle starts with the P-wave that reflects the beginning of the excitation in the atria. This does not contradict to our procedure: we understand the cycle as the interval between two nearly identical states of the system.
Phase synchronization." from theory to data analysis
291
where sH(t) is the Hilbert transform (HT) of s(t); the instantaneous phase q~(t) and amplitude A(t) of the signal are then uniquely determined from (12). Note, that although formally this can be done for an arbitrary s(t), A(t) and ~(t) have clear physical meaning only if s(t) is a narrow-band signal (see the discussion of properties and practical implementation of the HT and analytic signal in Appendix A). We can look at this technique from the other viewpoint: it can be considered as a two-dimensional embedding in coordinates (s(t),sH(t)). Note, that in these coordinates a harmonic oscillation is represented by a circle for any co. This circle can be considered as the analogy to the phase portrait of the harmonic oscillator. The phase obtained from this portrait increase linearly in time q~(t) = cot + q~0, as we expect it for this system. Note, however, that the often used coordinates (s(t),~(t)) and delay coordinates (s(t),s(t-z)) generally produce an ellipse; the phase obtained as an angle from such plots demonstrates periodic deviation from the linear growth (i.e., [q~(t) -cot] oscillates periodically) that is in this case the artifact of calculation. 2 An important practical question is: Which method should be chosen for analysis of particular experimental data? To address this problem we make the following remarks: 1. If we define the phase of a system in order to characterize its frequency locking properties, then different methods (via the Poincar~ section, from the twodimensional projection of the phase space or from an oscillatory observable by means of the HT) give similar results, at least if the system is a "good" one [10]. Although these phases vary microscopically, i.e. on the time scale less than one (quasi)period, the average frequencies obtained from these phases coincide, and it is exactly the frequencies what is primarily important for the description of synchronization. Therefore, theoretically all the definitions of the phase are equivalent. That is rarely the case in an experimental situation, where we have to estimate the phases from short, noisy and nonstationary records, so that numerical problems become a decisive factor. 2. If the signal has very well-defined marker events, like the ECG, the Poincar6-map-technique is the best choice. It could be also applied to an "oscillatory" signal, like the respiratory one: here it is also possible to define the "events" (e.g. as the times of zero crossing) and to compute the phase according to Eq. (11). However, we do not recommend to do this: the drawback is that the determination of an event from the slowly varying signal is strongly influenced by noise and trends in the signal. Besides, we get only one event per quasiperiod, and if the record is short, then the statistics is poor. In such case the technique based on the HT is much more effective because it provides the phase for every point of the time series, so that we have a lot of points per quasiperiod and can therefore smooth out the influence of noise and obtain sufficient statistics for the determination of phase relationships.
Obviously, to obtain for a harmonic signal a circle in the embedding one can use the coordinates (s(t),~(t)/o~) or delay coordinates with z = n/(2c0), but this requires a priory knowledge of o and cannot be implemented for a signal with slowly varying frequency.
292
M. Rosenblum et al.
Another important point is that even if we can unambiguously compute the phase of a signal, we cannot avoid the uncertainty in the determination of the phase of an oscillator. 3 The latter depends on the observable used; "good" observables provide equivalent phases (i.e. the average frequencies defined from these observables coincide). In an experiment we are rarely free in the choice of an observable. Therefore, one should always be very careful in formal application of the presented methods and in the interpretation of the results. We emphasize, that even if the observable is good enough, the distribution of the estimated cyclic relative phase can essentially differ from that obtained from the correct phase satisfying Eq. (1). Indeed, let us consider a strongly nonlinear limit cycle oscillator and estimate the phase by means of the HT, or take as the phase the angle in the phase plane. If the angle velocity of the phase point is essentially nonuniform in time, then the estimated relative phase strongly oscillates on the time scale less than the period of oscillations. In this case the distribution of the cyclic relative phase does not become unimodal; the presence of noise makes this distribution even more smeared. On the other hand, if the oscillations are nearly harmonic (in this case we call the oscillators quasilinear), then the estimated phase is close to the true one, and the distribution of the cyclic relative phase has a sharp peak. Deviation of the distribution from a unimodal one can also occur if the interaction between oscillators is not weak. We illustrate this with the following example. 3.1. An example: synchronization via parametric action (modulation)
We consider a noisy van der Pol oscillator with modulated natural frequency - ~t(1 - x 2 ) k + (m + A sin v t ) 2 x - ~,
(13)
where ~t = 1, the natural frequency is m - 1, and ~ is the Gaussian delta-correlated noise term. We vary the modulating frequency v around 1/3, and look for the 1:3 locking. First we compute the averaged frequency of the van der Pol oscillator ~ = (+yaP) for different values of the modulating frequency v and plot ~ - 3v vs. v; these dependencies for the noise-free and noisy cases are demonstrated in Fig. 5. We note, that in contrast to the case of synchronization by additive forcing, the locking here occurs only if the amplitude of the modulation A exceeds some threshold value (or at least the width of the synchronization region below this threshold is vanishingly small). Next, we compute the distribution of the relative phase at the center of frequency locking region v - 0.307. These distributions shown in Fig. 6 are not unimodal. Hence, synchronization via parametric modulation cannot be easily described in terms of phase locking understood as the existence of a statistically preferred value of the relative phase. In this case, in order to reveal synchronization from phases, 3 We remind that although one can compute several phases from different observables of the same oscillator, there exists only one phase of that system corresponding to its zero Lyapunov exponent.
Phase synchronization." from theory to data analysis
0.06
> cr
I
293
I
I
9
I
i
I
0.00
0.06
0.29
0.30
0.31
I
0.32
V Fig. 5. Frequency - detuning plot for the van der Pol oscillator with the modulated natural frequency; A - 0.6. Bold line corresponds to the noise-free case; rather small noise (D = 0.05) smears the plateau in the (f~- 3v) vs. v plot (solid line).
(a) 0.10
,
(b) 0.10
0.05
000
0.05
0
7~ 1,3
Fig. 6.
000 J
7~
0 1,3
Distribution of the cyclic relative phase for the noise-free (a) and noisy (b) van der Pol oscillator with the modulated natural frequency.
one has to use the stroboscopic approach, i.e. to observe the phase of the oscillator at the instants of time when the phase of the drive attains a certain fixed value; we explain this technique below.
4. Straightforward analysis of phase difference: application to posture control Sometimes synchronization can be detected in a straightforward way: by plotting the generalized phase difference %,m (see Eq. (3)) vs. time and looking for horizontal
294
M. Rosenblum et al.
plateaus in this presentation. This simple method proved to be efficient in the investigation of model systems as well as for some experimental data [48,18,19]. To illustrate this, we present the results of experiments on posture control in humans [48]. During these tests a subject is asked to stay quietly on a special rigid force plate equiped with four tensoelectric transducers. The output of the setup provides current coordinates (x,y) of the center of pressure under the feet of the standing subject. These bivariate data are called stabilograms; they are known to contain rich information on the state of the central nervous system [53-56]. Every subject was asked to perform three tests of quiet upright standing (3 min) with (a) eyes opened and stationary visual surrounding (EO); (b) eyes closed (EC); (c) eyes opened and additional video-feedback (AF). 132 bivariate records obtained from 3 groups of subjects (17 healthy persons, 11 subjects with an organic pathology and 17 subjects with a phsychogenic pathology) were analyzed by means of cross-spectra and generalized mutual information. It is important that interrelation between body sway in anterior-posterior and lateral directions was found in pathological cases only. Another observation is that stabilograms can be qualitatively rated into two groups: noisy and oscillatory patterns. The latter appears considerably less frequently - only some few per cent of the records can be identified as oscillatory- and only in the case of pathology. The appearance of oscillatory regimes in stabilograms suggests excitation of selfsustained oscillations in the control system responsible for the maintenance of the constant upright posture; this system is known to contain several nonlinear feedback loops with time delay. On the other hand, the independence of body sway in two perpendicular directions for all healthy subjects and many cases of pathology suggests that two separate subsystems are involved in the regulation of the upright stance. A plausible hypothesis is that when self-sustained oscillations are excited in both these subsystems, synchronization may take place. To test whether the interdependence of two components of a stabilogram may be due to synchronization, we have performed the analysis of the relative phase. Here we present the results for one trial (female subject, 39 years old, functional ataxia). We can see that in the EO and EC tests the stabilograms are clearly oscillatory (Fig. 7). 4 The difference between these two records is that with eyes opened the oscillations in two directions are not synchronous during approximately the first 110 s, but are phase locked during the last 50 s. In the EC test, the phases of oscillations are perfectly entrained during all the time. The behavior is essentially different in the AF test; here no phase locking is observed. An important advantage of the phase analysis is that by means of q)n,m(t) plots one can trace transitions between synchronous and nonsynchronous states that are due to nonstationarities in interacting systems and/or coupling (Fig. 7a). Noteworthy, this is possible even for very short records. Indeed, two different regimes In order to eliminate low-frequency trends, a moving average computed over the n-point window was subtracted from the original data. The window length n has been chosen by trial to be equal or slightly larger than the characteristic oscillation period. Its variation up to factor 2 does not effect the results.
Phase synchronization." from theory to data analysis
% 9-
295
(c)
(b)
10
5
0
I
0
50
time
1o0
[sec]
I
150
i
0
50
1oo
time [sec]
150
0
50
time
1O0
[sec]
150
Fig. 7. Stabilograms of a neurological patient for EO (a), EC (b), and AF (c) tests. The upper panels show the relative phase between two signals x and y that are deviation of the center of pressure in anterior-posterior and lateral direction, respectively. During the last 50 s of the first test and the whole second test the phases are perfectly locked. No phase entrainment is observed in the AF test. that can be distinguished in (Fig. 7a) contain only about ten quasiperiods, i.e. these epochs are too short for reliable application of conventional methods of time series analysis. A disadvantage of the method is that synchronous regimes which correspond to neighboring Arnold tongues, e.g. synchronization of order n:(m + 1), appear in this presentation as nonsynchronous epochs. Besides, there exist no regular methods to pick up the integers n and m, so that they are usually found by trial and error. Respectively, in order to reveal all the regimes, one has to analyze a number of plots. In practice, the possible values of n and m can be estimated from the power spectra of the signals and are often restricted due to some additional knowledge on the system under study. Another drawback of this technique is that if noise is relatively strong, this method becomes ineffective and may be even misleading. Indeed, frequent phase slips mask the presence of plateaus (cf. Fig. 3) and synchronization can be revealed only by a statistical approach, i.e. by analysis of the distribution of the cyclic relative phase ~Fn,m.
5. Statistical analysis of phase difference: application to brain activity If the interacting oscillators are quasilinear then we can estimate the strength of the n:m phase locking by comparing the distribution of the cyclic relative phase ~n,m(t) with the uniform distribution. For a single record this can be done by visual inspection (cf. the example of coupled noisy R6ssler oscillators, Figs. 2 and 3). In order to perform automatized analysis for large data sets or in order to trace the variation of the strength of interaction with variation of some parameter, we need quantitative criteria of synchronization. Quantitative characterization is also re-
296
M. Rosenblum et al.
quired for significance tests. To this end we introduce three n:m synchronization indices:
(1) The synchronization index based on the Shannon entropy S of the phase difference distribution [49]. Having an estimate pk of the distribution of q~n,m, we define the index P as Smax - S Pn.m--
Smax
'
(14)
where S = - ~-~'~kxlpk lnpk, and the maximal entropy is given by Smax = In N; N is the number of bins and pk is the relative frequency of finding W,,m within the kth bin. 5 Due to the normalization used, (15)
O~f_)n,m~l , whereas
Pn,m--0 corresponds
to a uniform distribution (no synchronization) and
Pn.m = 1 corresponds to a distribution localized in one point (~5-function). Such
distribution can be observed only in the ideal case of phase locking of noise-free quasilinear oscillators. (2) Intensity of the first Fourier mode of the distribution Y].m - (c~ qJ,.,~(t)) 2 + (sin q~,.m(t)) 2,
(16)
where the brackets denote the average over time, can serve as the other measure of the synchronization strength; it also varies from 0 to 1. The advantage of this index is that its computation involves no parameters: we do not need to choose the number of bins as we do not calculate the distribution itself. (3) If the oscillators are strongly nonlinear then the distribution of q~,,m(t) is nonuniform even in the absence of noise; this is essential if synchronization occurs via parametric action. In this case we need some other measure to characterize the strength of synchronization. For this purpose we recall the stroboscopic approach: we know that in the synchronous state the distribution of the stroboscopically observed phase is the 8-function; it is smeared in the presence of noise. Thus, for n:m synchronization we have to characterize the distribution of 1] -- (I)2 m o d 2 rtn]+, rood 2rtm--0"
(17)
This means that we observe the phase of the second oscillator at the instants of time when the phase of the first one attains a fixed value 0 (phase stroboscope). To account for the n:m locking, the phases are wrapped into intervals [0,2rtm] and [0, 2nn], respectively. Repeating this procedure for all 0 ~<0 < 2rt and averaging, we get a statistically significant synchronization index [49]. Practically, if we deal with the time series, we can introduce binning for the phase of the first oscillator, i.e. divide the interval [0, 2rim] into N bins. Next, we denote the
5
The optimal number of bins can be estimated as N = exp[0.626+ 0.4In(M-1)], where M is the number of samples [57].
297
Phase synchron&ation: from theory to data analysis
values of qb1 mod 2rim falling into t h e / t h bin as 0z and the number of points within this bin as Mr. Then, with the help of Eq. (17) we compute Ml corresponding values q;,t, where i -- 1 , . . . , Mz. If the oscillators are not synchronized, then we expect rli,l to be uniformly distributed on the interval [0, 2nn], otherwise these quantities group around some value and their distribution is unimodal (Fig. 8). To quantify it, we compute ml
At = Mr-' ~ exp[t(qi, t/n)].
(18)
i=1
The case of complete dependence between both phases corresponds to J A i l - 1, whereas [Atl vanishes if there is no dependence at all. To improve the statistics, we average over all N bins and get the synchronization index N
Ln,m -- N -1 Z
IA'I"
(19)
1=1
According to the definition above Ln,m measures the conditional probability for qb2 to have a certain value provided qbl is in a certain bin. We compare now these three indices using the simulated data. As a first mode! example we take two coupled nonidentical R6ssler oscillators (Eq. (10)); now we consider a noise free case. Increasing the coupling strength e, we observe a transition to the synchronous state. This can be seen from Fig. 9a, where the difference of frequencies of two oscillators is plotted as a function of coupling strength. At ,~ 0.03 this difference vanishes, i.e. synchronization occurs. The lower panel (Fig. 9b) shows the indices as functions of e. We see, that the indices are nonzero outside the synchronization region. It is not surprising: we have noted already that the distribution of the cyclic phase outside the region also has a peak. Thus, we can reveal the presence of interaction even if it is
no n:m synchronization
2rim
0
o lQ) 0
n:m synchronization
Fig. 8. Synchronization index based on the conditional probability. Phase of the second oscillator ~2 wrapped in the interval [0, 2rtn] is observed stroboscopically, i.e. when phase of the first oscillator d~l is found in the certain bin 0l of the interval [0, 2rtm]. If there is no synchronization, then the stroboscopically observed qb2 is scattered over the circle, otherwise it groups around some value. The sum of the vectors pointing to the position of the phase on the circle provides a quantitative measure of synchronization.
298
M . Rosenblum et al.
(a) <1
0.04
0.02
0.00
(b)
1.0
_ _
f
'
T
f
0.0 0.00
0.01
-, . . . . . . . .
J/
lit
~11 i1
0.
,
0.02 E
0.03
0.04
Fig. 9. Comparison of quantitative measures of synchronization using the simulated data from two coupled R6ssler oscillators (Eq. (10)). Transition to synchronous state takes place when the difference of frequencies of two oscillators vanishes with increase of coupling coefficient e (a). Three 1:1 synchronization indices are shown as the functions of e (b). too weak to induce synchronization. Comparing the indices we can see that ~1,1 and 71,1 are almost 1 inside the Arnold tongue, while the index Pl,1 attains essentially lower value; it has another drawback to be mentioned: the estimate of this index strongly depends on the number of bins used for computation of the histogram. The transition to the synchronous state appears to be more sharp if described by the indices 71,1 and Pl.l. We conclude, that if the goal of the analysis is to reveal very weak interaction then the conditional probability index k,,m seems to be more appropriate. Alternatively, if we have a number of time series and try to sort them into g r o u p s - synchronized and not synchronized with respect to some reference s i g n a l then the indices 7,.,, and p,.,, are more suitable. As the second example we consider van der Pol oscillator with modulated natural frequency (Eq. (13)). The distribution of the cyclic relative phase qJl,3 is broad (cf. Fig. 6), and, thus, synchronization cannot be understood in terms of statistical phase locking. Therefore, the indices Pl,3 and 71,3 cannot reveal synchronization, while the conditional probability index kl.3 does indicate it (Fig. 10).
5.1. Human brain activity during pathological tremor 5.1.1. Motivation and experimental data Here we briefly present the results of the investigation of phase synchronization between different brain areas, as well as between brain and muscle activity in Parkinsonian patients by means of noninvasive measurements [49,58]. These patients may exhibit involuntary shaking that is called tremor and predominantly affects the distal portion of the upper limb. Resting tremor in Parkinsonian patients has a principal frequency in the band 3-6 Hz. The goal of the study was to find out whether synchronization between different cortical areas is involved in the generation of pathological tremor.
299
Phase synchronization." from theory to data analysis
(a) >
I
0.050
i
0.025 0.000
(b) 1.00 Pl,3
0.50 0.00 ~ " " 0.290
Ya3
0.295
0.300 V
0.305
0.310
Fig. 10. Comparison of quantitative measures of synchronization using the simulated data from van der Pol oscillator with modulated natural frequency (Eq. (13)). Transition to synchronous state takes place when the difference of frequencies of two oscillators if2- 3v vanishes with decrease of detuning (a). Three 1:3 synchronization indices are shown as the functions of the frequency v of the modulating force (b). As is known, the neuronal activity of the h u m a n brain can be noninvasively assessed by registering the electric field with electroencephalography (EEG) [59], and the magnetic field with magnetoencephalography ( M E G ) [60-62]. The measurement of the magnetic field is realized by means of superconducting q u a n t u m interference devices (SQUIDs). An important advantage of M E G is that the scull and scalp are transparent to magnetic field. Therefore, in contrast to electric field, the externally measured magnetic field is not distorted. As a result of this, M E G allows to achieve a high spatial resolution and to distinguish between the cortical activity originating from distinct areas provided the latter are sufficiently remote. The data were obtained by means of N e u r o m a g - 1 2 2 - a whole-head M E G systems consisting of 122 S Q U I D s arranged in a helmet-shaped array [61]. In addition to the M E G , the electromyogram ( E M G ) from two antagonistic muscles exhibiting tremor activity, namely the right flexor digitorum superficialis muscle ( R F M ) and the right extensor indicis muscle (REM), was registered by standard techniques. The data are introduced in Fig. 11; from the power spectra one can see that they are not narrow-band signals. Therefore, filtration is required in order to separate the frequency band of interest from the background brain activity. The visual inspection of the data after preprocessing 6 shows that they are nonstationary. The EMG data were preprocessed in a standard way (cf. [63]) so that the resulting signal represents the time course of the muscular contraction. Then both EMG and MEG are band-pass filtered; the filter parameters are chosen in such a way that only the power around the main frequency, or around the main frequency and its second harmonics is preserved. The presented results correspond to the following parameters: bandpass of EMG signals: 5-7 Hz, bandpass of MEG signals: 5-7 and 1014 Hz (for quantification of 1:1 and 1:2 locking, respectively). The results are robust with respect to variations of the band edges of all filters used; the usage of two-band filters (e.g., 5-7 Hz plus 1014 Hz) produces consistent results.
300
M. Rosenblum et al.
X
-20 0
i
Ii
Ii
.
0.02
(b) ~ o.o~ >'
0
0
.
100
0
~
100.5 time [see]I01
(c) xlO -17 10-le
0
(d)
4 6 8 10 12 frequency[Hz]
101.5
102
>,10-~
oo 10-7
4
6
8
10 12
frequency[Hz]
Fig. 11. An original and filtered MEG signal (from a channel over the left sensorimotor cortex) (a) and its power spectrum (c). The original and preprocessed EMG signal of the right flexor digitorum muscle (b) and its power spectrum (d).
5.1.2. Data analysis and results The phase analysis was performed in the following way. First, the instantaneous phases of signals were obtained by means of the Hilbert transform. Next, one signal was taken as the reference one, and phase locking between this channel and all others was studied in pairs. To cope with nonstationarity, a sliding window analysis was done and the distribution of ~,.m was computed for every time point t within the window [ t - T/2, t + T/2] and characterized by means of the synchronization indices 9 and 9~.7The window length T was varied between 2 and 20 s; the results are robust with respect to this variation. In search of corticomuseular synchronization (CMS), an E M G signal was taken as a reference signal. Investigation of corticocortical synchronization (CCS) was done by choosing for reference one of the M E G channels over the left sensorimotor cortex. The main results are shown in Figs. 12 and 13. Pronounced tremor activity starts after ~ 5 0 s (Fig. 12a). It involves the coordinated activation and deactivation of flexor and extensor muscles that is reflected by antiphase locking of their E M G activity (Fig. 12b). This peripheral coordination may be a consequence of CMS or/ and CCS - that is actually the hypothesis we are testing. Therefore, it is convenient to use the synchronization measures for characterization of the time course of the tremor activity. Then, one can verify whether this coordination reflects the time course of synchronization. Indeed, during the epoch of the significant tremor activity, corticomuscular as well as cortico-cortical synchronization were also observed. Namely, the activity of both sensorimotor cortex and premotor areas are 1:2 phase locked with the E M G activity of both flexor and extensor muscles (Figs. 12c
7
Computation of both indices gives consistent results.
301
Phase synchronization." from theory to data analysis
(a)
[ '
LI-
(b)~
' '
,. ,,.~;
.... 3 .,, .... ,....,._.., _,,~..,,a,J.~L~
~--' - ~l. . . . . . . .
..,..- . . . .
!1
0F..................
,..~.,i i .1,1~.1,,. ~ a
. . . . . . . .
.,,...i .1
j
~ ~
1
e.
......./"
0
(c) d~ o.~. 1~.:.,;, % ..,,~C,.,,,,. ~ o.
"0
.o.2i- I
"<
o.
0
.
-
.....
' ."" 9~.... ....... . ;;:~:...... '.
'
A
A
-
.
'
i
i ....... .~ , _',.~i~,.
'.
.,t ,,
.
,
/1
0 50
1 O0
150
200
250
300
t i m e [sec]
Fig. 12. (a) EMG of the right flexor muscle (RFM, upper trace) and an MEG over the left sensorimotor cortex (LSC) (lower trace). (b) 1:1 coordination between right flexor and extensor muscles: the distribution of the cyclic phase difference ~IJl,1 computed in the running window [ t - 5, t + 5] is shown as a gray-scale plot, where white and black correspond to minimal and maximal values, respectively (upper plot); the lower plot shows the corresponding synchronization index Pl 1. (c) 1:2 corticomuscular synchronization: time course of the distribution of the cyclic phase difference qJl,2 between MEG signal from the LSC and EMG of the RFM (uppermost plot) and of the corresponding indices Pl,2 and ~1,2; for comparison, 1:1 synchronization index Pl,1 between LSC and RFM is shown below. (d) 1:1 cortico-cortical synchronization between LSC and a premotor MEG channel. Dashed line indicates the value of Pl,1 corresponding to 99.9th percentile of the surrogates (see Section 7).
and 13), whereas the activities of these two brain areas are 1:l locked (Fig. 12d). It is important, that when the strength of peripheral coordination decreases during the last ~ 50 s, the strength of CMS and CCS is also reduced. Another important observation is that the onset of CCS precedes initiation of the tremor. These results confirm the sequential activation of motor cortex and tremor bursts revealed by Volkmann et al. [63]8 and, thus, support the hypothesis of a central oscillator responsible for the generation of the Parkinsonian tremor. Moreover, the phase analysis allows to localize the brain areas with M E G activity phase locked to tremor activity from noninvasive measurements (Fig. 13). The main focus of the 1:2 synchronization is located over the contralateral sensorimotor cortex. Additionally, this type of locking is observed over premotor, frontal, contralateral parietal and contralateral temporal areas. In contrast to the 1:2 locking,
8
This conclusion is based on their MEG study, animal experiments and recordings during neurosurgery in Parkinsonian patients.
302
M. Rosenblum et al.
....
| ..~_._,iai ,~uu..,i
|
Fig. 13. Time dependence of the synchronization index P~.2 characterizing 1:2 locking between the EMG of the right flexor muscle (reference channel, plotted in the lower right corner) and all MEG channels. Each rectangle corresponds to an MEG sensor; time axis spans 310 s and y-axis scales from 0 to 0.25. The head is viewed from above, 'L' and 'R' mean left and right (see the "head" in the upper right corner). The upper and lower gray regions corresponds to premotor and contralateral sensorimotor areas, respectively. The results are similar for the extensor muscle. the 1:1 synchronization is much weaker, and is observed over contralateral sensorimotor, parieto-occipital and frontal areas. All areas which are 1:2 locked with the tremor are also 1:1 locked among each other.
6. Stroboscopic technique: application to cardiorespiratory interaction In this section we present the synchronization analysis of cardiorespiratory interaction in humans. The data we analyze, namely ECG and respiratory signal, were already introduced in Fig. 4. The complexity of this case is related to the following features: 9 the time series have essentially different forms (respiration is a narrow-band signal, while ECG can be reduced to the spike train); 9 the characteristic time scales of two signals are different (there are always several heartbeats per respiratory cycle) and vary essentially within one experimental record; therefore we expect (and we indeed observe it) synchronization of some high order n:m and transitions between different synchronous states; 9 synchronization is probably related to modulation of the heartrate by respiration, so that stroboscopic methods suitable for the detection of n:m locking from nonstationary data are required. These features make the problem a very useful example for comparison of different analysis techniques. 6.1. Cardiorespiratory interaction
The human cardiovascular and respiratory systems do not act independently; their interrelation is rather complex and still remains a subject of physiological research
Phase synchronization." from theory to data analysis
303
(see, e.g. [20,64] and references therein). As a result of this interaction, in healthy subjects the heart rate normally increases during inspiration and decreases during expiration, i.e., the heart rate is modulated by a respiratory related rhythm. This frequency modulation of the heart rhythm (see Fig. 14) is known for at least a century and is commonly referred to as "respiratory sinus arrhythmia" (RSA). It is a well-studied phenomenon, see, e.g. [65-67]. The interaction between the cardiovascular system and respiration involves a large number of feedback and feed-forward mechanisms with different characteristic time scales. Definitely, it cannot be simply viewed at as unidirectional modulating action of respiration on the cardiovascular signal. Nevertheless, this modulation (RSA) is almost always observed in the data and therefore should be accounted for.
6.2. The experimental data and preprocessing Here we report the results by Sch~ifer et al. [29,30], where the examinations with 8 healthy volunteers (14-17 years, high performance swimmers, 4 male, 4 female) were performed. The subjects were laying at rest and no constraints like paced respiration or mental exercising were used. The ECG was registered by standard leads and respiration was measured simultaneously by a thermistor at the nose, while respiratory abdominal movements were registered for control. The duration of each record is 30 min. All signals were digitized with 1000 Hz sampling rate and 12 bit resolution. For the analysis of the heart rate the times of R-peaks in the ECG (Fig. 4a) were extracted by a semiautomatic algorithm with manual correction. Only data sets without extrasystoles were used for the subsequent analysis. The respiratory signals were visually inspected and, if required, preprocessed. After low-frequency trend elimination, a second-order Savitzky-Golay filter [68] was applied to remove high frequency noise. (a)
1.0 0.9
0
.~. n" rr
0.8 0.7 0.6
(b)
'
I
'
I
'
I
'
C 0
.m
275
295
315 335 time [sec]
355
375
Fig. 14. An example of pronounced respiratory sinus arrhythmia: the heart rate (a) is modulated by a respiratory related rhythm. The respiratory signal (arbitrary units) is shown in (b).
M. Rosenblum
304
et
al.
Here we describe two representative data sets (Subjects A and B); the data used for the analysis are shown in Fig. 15. We note that subject B exhibits essentially more pronounced RSA: the average amplitude of the modulation is ~ 4 times larger than that for subject A.
6.3. Cardiorespiratory synchrogram First we present a graphic tool based on the stroboscopic technique. Here we use the phase stroboscope and observe the phase of the one oscillators at the instants of time when the phase of the second one attains a certain value. For the study of cardiorespiratory interaction the natural way is to make use of the fact, that ECG can be reduced to a point process. Hence, we observe the phase of the respiratory signal (~r at the times tk of appearance of kth R-peak, and plot this phase vs. tk. In the noise-
(,91)
1.3
1
nrr 0.7
. . . . . . . . . . . . .
(b) 0.45 0.40
,
N
-1v
0.35
.,
"
9"
.,
,.
"'
.i..-....
"
.,
0.30 0.25
(c)
... "....,
o
500
,~.~ ~,., ~ ',~. /
10oo
"
1500 ............
time [sec]
1,6
. . . . . . . . . .
1
1.4
~1'
0.8 (d)
.,
,
" "
t
_
_
0.45 0.40
-iN
0.35
,,
0.30
'
0.25
0.20
0
s0o
looo
time [sec]
1soo
Fig. 15. The data for subject A: time course of R-R (interbeat) intervals (a) and of the instantaneous frequency f(t) of respiration (b) clearly demonstrate the nonstationarity of the time series. Similar data for subject B are shown in (c) and (d).
Phase synchronization:from theory to data analysis
305
free case of n: 1 synchronization, we would observe n distinct values of the respiratory phase so that such a plot would exhibit n horizontal lines. Noise smears out these lines, and some bands are expected to be observed instead. To look for n:m locking we use the wrapping of the respiratory phase into [0,2~m] interval, i.e. consider m adjacent oscillations as one cycle, and plot 1
~m(tk) -- - ~ (~)r(tk) mod 2 ~m)
(20)
vs. tk; we refer to this plot as cardiorespiratory synchrogram [29], see Fig. 16. An important feature of this graphic tool is that, in contrast to the phase difference plots, only one integer parameter m has to be chosen by trial. Moreover, several synchronous regimes can be revealed within one plot, and the transitions between them can be traced. Indeed, if due to nonstationarity the coupled systems exhibit a transition from, e.g. 3:1 to 5:2 locking, then this is reflected in the proposed presentation with m - 2 as a transition from a 6- to a 5-line structure. A very important property of the synchrogram is that it is equally effective in both cases of synchronization either by external or parametric forcing [30], see Fig. 17.
6.3.1. Example of phase locking." subject A The sequence of R-R intervals and frequency of respiration for subject A are presented in Fig. 15; nonstationarity is the essential feature of the data. First we analyze the generalized phase difference q~n,mfor different values n and m (Fig. 18), as well as instantaneous frequency ratio (not shown). The latter indicates the possibility of 5:2 locking within the first ,~ 300 s and of 3:1 locking appearing after ~ 750 s. 9 From the analysis of relative phase only, we cannot reliably confirm the occurrence of synchronized epochs. Indeed, q~3,1 exhibits some plateaus interrupted by phase slips only for the last 400 s (see inlet in Fig. 18); q~5,2, as well as the values of relative phase for other locking ratios, displays no plateaus in this presentation. The presence of 3:1 locking becomes more evident if we analyze the phase difference statistically, i.e. consider the distribution of the cyclic relative phase ~3,1 (cf. Eq. (8)). As the data are nonstationary, we compute this distribution in a running window (Fig. 19); the preference of a certain value of ~3,1 within the last ~ 900 s is clearly seen. Computation of synchronization indices 7n,m and ~,m confirms the phase locking. We note that for the 3:1 locking both indices give similar results, whereas the 5:2 locking within the first part of the records is indicated by the conditional probability index only. The next step is to perform the stroboscopic analysis of the respiratory phases as described in. The cardiorespiratory synchrogram (CRS) clearly exhibits six horizontal lines within the last ~ 1000 s (Fig. 20); this is confirmed by the respective
As the precision of computation of instantaneous frequencies in case of noisy data is rather poor, this method can be considered only as an auxiliary one. Its advantage is that there is no need to search for appropriate values of n and m, moreover, an approximatelyconstant value of the ratio can be used for estimation of these integers.
M. Rosenblum et al.
306
(a) time
(b) ~.
v~ l
m
~ ~ (d)
9
...........................
E .............
C~ /
"0 0
E ~-~ 0
-
-0 ............
9 ........... 0 ...........
TT111TTTTI ,
,
,
,
l
,
,
,
,
i ,
, ,
, ,
,
(c)
-
,
,
,
,
,
,
l
,
,
i
,
,
,
,
9 ............. 9 ............ 9 .......... 9 ..........
O_ 9 .... 9 .......
9
time
time
Fig. 16. Principle of the phase stroboscope, or synchrogram. Here a slow signal (a) is observed in accordance with the phase of a fast signal (c). Measured at these instants, the phase of the slow signal wrapped modulo 2~m, (i.e., m adjacent cycles are taken as a one longer cycle) is plotted in (d); here m = 2. In this presentation n:m phase synchronization shows up as n nearly horizontal lines in (d). distribution (phase density histogram) showing six well-expressed peaks. This presentation makes the presence of the 3:1 phase locking in the data quite evident. 6.3.2. Example of frequency locking." subject B Within the first ~ 300 s the CRS for subject A (Fig. 20) has a clear five-band structure. These bands are not horizontal, hence the distribution of ~2 is practically uniform, so that we cannot speak of phase locking. Nevertheless, the occurrence of these bands shows that, on average, two adjacent respiratory cycles contain 5 heartbeats; this may be considered as an indication of frequency locking. This indication is also supported by the conditional probability index ~5,2 (Fig. 18d). Another illustrative example can be found in the data of subject B; these data were already introduced in Fig. 15b. The analysis of relative phase and synchronization indices (Fig. 21) indicates epochs of 3-1 and 5:2 synchronization. The CRS plot confirms that we encounter statistical 3" 1 phase locking within the time interval 400-1200 s (Fig. 22). The interval 1200-1800 s represents frequency locking; the relative phase q~5,2 fluctuates around a constant value, so that on average the frequency ratio fh/f~ -- 5:2. Although we can find some short epochs with five distinct bands (e.g. around t - 1400 s), and the distribution of ~2 is not uniform, we cannot with confidence speak of phase locking in this case. From the other side, long lasting coincidence of frequencies by pure chance seems to be very unlikely. This regime probably arises due to dominating role of modulating influence of respiration on the heartrate (cf. model example Eq. (13)). We remind, that subject B has considerably higher RSA than subject A.
Phase synchron&ation: from theory to data analys•
(a)
(b)
307
(c) I v
A
A
v
q~r(tk)
(d)
(e)
A
W
0
3~o
V
v
Time
(f) 1 W
"gt
dh, ~"
!
A W
0Or(tk)
30~o
0
v
w
A v
A v
A v
A v
A w
Time
3,o-A 3,o+Z~ Fig. 17. Different efficiency of straightforward analysis of the relative phase and synchrogram technique in the case of synchronization via external forcing (a,b,c) and modulation (d,e,f); 3:1 locking is taken here as an example. In the first case point events ("heartbeats") occur at three equally spaced values of the ("respiratory") phase. These values are shown by black points on the circle (a) and the corresponding radii. The noise smears these values, this is illustrated by the gray band around the radii. In this case, the distribution of the cyclic relative phase shows a single maximum (b). In the case of modulation, the events are not equally distributed on the circle (d) and the respective distribution (e) has three maxima and is essentially broader than the one shown in (b). As a result, the synchronization seems to be not well expressed. Nevertheless, the synchrograms (c) and (f) efficiently reveal synchronization in both cases. The difference between synchronization via external forcing or modulation shows up by different distances between the horizontal bands in these plots.
7. Discussion 7.1. Is it really synchronization? An important issue is the interpretation of the results of the phase analysis. Here we have to be aware of two problems: 9 Can we be sure that the patterns of the relative phase, described in the sections above, indeed indicate synchronization, and, respectively, underlying nonlinear dynamics? 9 H o w reliable is this indication? Before we address these questions, we remind that the synchronization transition in noisy systems is smeared. Next, as we already stressed, the relation between phases indicates, strictly speaking, the presence of interaction between systems, but not necessarily means that they are synchronized. Finally, our synchronization approach to data analysis is based on certain assumptions that might be not always
308
M. Rosenblum et al.
(a)
1
l
i
100 Cq
;: 9-
50
108 104 1400
j
(b)
,'~
i 1500
,
I
I 1600
,
I 1700
, I
I
1800
,
0.30 0.15
~, o.oo
(c) c,l
o~~-~ 200
~
400
!
g:
(d)
.
.
.
t
L ~ 10 0
0.20 ~~E:
.
o
0.00
,
100
1200
,
'
0
500
3Qo
i
,
,
,
'
'
1000
1500
time [sec]
t
Fig. 18. Generalized phase difference and frequency ratio for subject A. Relative phase q)3,1 (a) shows some indication for 3:1 phase locking. For a comparatively short period of time one can see plateaus in the plot of q~3.~ vs. time, interrupted by phase slips (see inlet). The indication for the locking is confirmed by synchronization indices 3'3.1 and X3,1 shown in (b) by bold and solid line, respectively. The time dependence of q~5.2 (c) remains approximately constant during first 300 s but displays no distinct plateaus, as can be seen from the zoomed plot (inlet). Nevertheless, the index X5.2 indicates some level of synchronization (d), solid line, while the index 75,2 (bold line) is practically zero.
r
o200
=:.~:.~.~f~=~-~1, 400
600
800
,
1 1000
,
i ......... 1200
1400
time [sec]
Fig. 19. Distribution of the cyclic relative phase ~3.1/2x calculated in a running window (400 heartbeats) and coded by gray scales also gives some indication of synchronization in the time interval 600-1400 s. Black color corresponds to the maximal values. fulfilled. All in all, we can never u n a m b i g u o u s l y state that we have observed synchronization; nevertheless, strong indications in favor of such a conclusion can be sometimes found. As synchronization is not a s t a t e , b u t a p r o c e s s of a d j u s t m e n t of r h y t h m s due to interaction, we c a n n o t validate its existence if we do not have access to the system p a r a m e t e r s and c a n n o t check experimentally that the synchronous state is stable towards variation of the p a r a m e t e r m i s m a t c h within a certain range (i.e. if we
Phase synchronization: from theory to data analysis
309
2.0 1.5 v
1.0 0.5 0.0
0
500
2.0
1000
1500
. .:.--
"-. . . . . . =~176 oo. . . . . . . . == , . . = . ~ . . ' . -,_~ ~ ..............~176 9 ..... oo\~176176176176176176176176176176176 . ...
1.5
9~
ooU oo~~ ~176 ~
oo ooou%e, 9149
-
9176176 '4, , , 9 9
1.0
-
9149149149 o 99 9u~ ~ ooo oooo ouUOUoo oo~176149149149149~
0.5
-
0.0
,,',,
00 o ~9
.....,....~OoOO~176176176=~~ ~ ~ oo~~..... ?......j 9149176176149176 u o o ~ du~
0
du~
100
2.0
9
9= ' ~ , ,
%,,
~
oo ~
~176176
200
% %,
""
~
9
o~o
300
',' ,,,, ""2 ,,"
9
400
",, ~,, " ~, '~
0.0
0.1
0.0
0.1
" .%-"
1.0 0.5 0.0
300
400
500
600
700
2.0
. 9
1.5
.'..-" ~176176176o~176176176176 6 oo oO~ 00% o00%00 o ~176
1.0 0.5 0.0
9
OoOoob o oooOOoOOOo~OoO o o~o ~
o
o o
oOo
o ~~176
oo o Ooo ~
o o Oo...~o ,~j~ot
,-...~ooooOoOoO+oooOooooooOoOoooOoooOOOOo~176176 ,..... 0~
oo. . . . OOooo%O~17617617617600000000%00~ 0% ooo
.'-%OoOoO~176
~oo 00900o20o00.0~0~OooOooOO~OOoC O~ oT
:.-.-"oooo~ o~o%0000~ . . . . . . .
-
700
"
,
~ , '
o
I~
800
, ~176176176 900
ooo00% 00% %o oo oo~176 o
i on,
1000
o
i n ,o
1100
I
1200 0.0
0.1
1800 0.0
O. 1
2.0 ~ooooOo~oO oOOO~%~oooOo ~o~~ ~ O ~ O o O ~ o ~
~176
0.5 ~ o ~ %0~176176176176176176176176176176 o~ oo o o o oO o~ 1200
1400
1600
time [ s e c ]
Fig. 20.
Cardiorespiratory synchrogram (CRS) of subject A, showing the transition from a five-band structure (5:2 locking) to a six-band structure (3"1 locking).
cannot plot the frequency vs. detuning curve). If we are not able to do such experiments, but just have some data sets registered under free-running conditions, the only way to get some confirmation (but certainly not a proof) on the existence of synchronization is to make use of the fact that the data are nonstationary. Indeed, we can trace the variation of the instantaneous frequencies of both signals and their relation with time. If we find some epochs, as in the case of cardiac and respiratory data, where both frequencies vary, but their relation remains stable (Fig. 23), this can be considered as a strong indication in favor of our conclusion. Another indication that also can be obtained using the fact of nonstationarity of the data is the presence of several different n : m epochs within one record. Indeed, one can argue that observed phase or frequency locking of, e.g. order 3:1, could be due to the coincidence of frequencies of the uncoupled systems. Nevertheless, occasional coincidence of frequencies having the ratios exactly corresponding to neighboring Arnold tongues seems to be very unlikely. If the data are rather stationary and we are not able to find such epochs, the situation is more difficult. Suppose that the distribution of the relative phase for
310
M. Rosenblum et al.
(a)
300 200 9-
(b)
(c)
800
~ 1 1__~ 000
1200
"'-,
7~ 0.30 I o~ 0.15 0.00 100 0
9- 100 200
(d)
- / ~1 ,4 ~0 0600
100
0.50 0.25 0.00
i I
]
i '
_
I- ]Ul I100 U o
'~, ~ ~
'
~
I ~
, 200
,4oo
'
T
,
~
500
~
15o 16o
180 4 0
~----~r
-
-
0
,
.
,
L
.
-i
I
j"
1
1000 time [sec]
--'
~..
,
, ' ._A
-J
,
l __, 1600
l 1800
I
' J
L-
1500
Fig. 21. Subject B. Relative phases q)3.J (a) and (D5.2(C) show some indication of 3:1 and 5:2 synchronization, respectively. This is consistent with the values of synchronization indices (b) and (d). Although the plateaus in the time course of the relative phase are not very distinct (see inlets in (a) and (b)), statistically understood 3:1 phase locking can be confirmed by means of CRS (see Fig. 22). Note that within the last 600 s the generalized phase difference fluctuates around a constant level, indicating frequency locking on average; this indication is supported by a high value of the conditional probability index X5.2.
such a bivariate record is nonuniform. Can it just happen due to an occasional coincidence of frequencies? F r o m the theory and the simulation of the model example (Eq. (10)) we know that even if the frequencies of uncoupled oscillators are equal, the distribution of W,,m, computed on a sufficiently long time scale, has to be nearly uniform due to the diffusion of the phase. Certainly, occasionally one can find short epochs where phases seem to be locked. How can we estimate what is "short" and "long" in this context? F r o m the first sight, a natural way to address this problem is to use the surrogate data techniques [69,28]. However, we see some serious problems in this approach. The usual formulation of the null hypothesis that is used for nonlinearity tests is to consider a Gaussian linear process [70,71] with a power spectrum that is identical to that of the tested signal; more sophisticated methods [72] imply also preservation of the probability distribution. A modification of this null hypothesis for the tests for synchronization - a consideration of two surrogate signals that preserve the linear cross-correlation between the original d a t a - seems to be not sufficient. Indeed, due to the definition of synchronization, we are interested in the relation between in-
Phase synchron&ation: from theory to data analys&
311
1.5 1.0 I~ ~'o ~ o o
2.0
0
500
1000
1500
~. 1.0 0.5 0.0
0
100
.
.
200
.
.
300
400 0.00 ~
2.0
0.07
1.5 ~
,
~
~
~.o
,
0.5 0.0
400 2.0
500
600
700
800 0.00 ~
0.07
12000.00
0.07
1.5 ~,~ 1.0 0.5 0.0
800 o
900 ~
oo
o
oO ~oo oOO ~
1000 o
~,
~
oO
o
o
O otOOo~O%|
11O0 o
~
oo
o%o~ o
OoooOOo
o
o
O;o ZoO~;% o
0.5 %~176 o ~Ooo ~oOo ~ %~ oz o o Oo oO o ~ o ooO o oO ~ o ~o ~ OOooo~~ o 0.0 g g ~176176 ~ ~176 ~* ~~ ~ ~176 ~176176176 ~ ~176 1200 1400 1600 18000.00
time [sec]
0.07
Fig. 22. Cardiorespiratory synchrogram (CRS) of subject B demonstrates six-band structure in the range 400-1200 s confirming 3:1 phase locking. Note that there is no phase locking in the statistical sense in those intervals where the generalized phase difference (Fig. 21c) indicates 5:2 phase locking.
stantaneous phases, whereas the variations of amplitudes and their interrelation are of no importance. The usual way to construct surrogates (the randomization of Fourier phases) mixes the phase and amplitude properties, transforming the variation of instantaneous phase into the variation of instantaneous amplitude and vice versa. Moreover, the signals generated by self-sustained oscillators possess certain properties of the distribution of instantaneous amplitudes (see [73] and references therein), and this distribution is destroyed by the Fourier phase randomization. Although we cannot give a general recipe how to estimate the reliability of phase analysis, some empirical methods can be used in particular experiments. So, in the above-described MEG study [49,49] the surrogates were constructed by taking either white noise or empty room measurements (instrumental noise) and filtering them with the same band-pass filter as the data. The 95th percentile of the distribution of a synchronization index for surrogates was taken as the significance level. Afterwards, the synchronization indices were re-calculated in accordance to this level, e.g. Pn,m ~ maX{Pn,m -- P, 0}, where ~ is the significance level.
312
M. Rosenblum et al.
~,,aJl.lO I" ,_.._, 0
.~. 0.90 n.rr 0.70
(b) 4.0 3.0 O
9
2.0 1.0
(s 0.40 0.35 0.30
0.0 1160
~
'
1180
,
1200 1220 time [sec]
1240
1260
Fig. 23. A transient epoch within the data of subject A confirms the existence of synchronization. The periods of cardiac (R-R) and respiratory cycles (T) are shown in (a) and (b), respectively. After a short epoch of nonsynchronous behavior (1150-1200 s) the frequencies of heart rate and respiration change, probably due to influence of a certain control mechanism, and become locked, i.e., fr/fh "~ 1/3. In the next 50 s we observe that, although both frequencies decrease, this ratio remains almost constant (c). This means that one of the systems follows the other one, i.e., synchronization takes place. 3:1 phase locking is also clearly seen from CRS (d).
7.2. Synchronization vs. coherence Very important question is the relation of the synchronization analysis to the crossspectrum techniques (coherence estimates) and to mutual information approach that are widely used, e.g. for the analysis of brain activity [74-76]. This problem requires a systematic study for different types of coupling; here we present only preliminary results. First, we mention that synchronization is not equivalent to correlation (coherence). Indeed, if two systems synchronize, their signals are correlated. On the contrary, coherence does not necessarily show the presence of synchronization; it may be for example caused by mixing of signals (see Fig. 1). This scheme imitates the real situation: e.g., each M E G sensor measures signals originating from more than one area of neuronal activity. To simulate this, we construct artificial signals u = (1 - g ) x 1 nt- ~ix 2 and w = ~txl + (1 - g ) x 2 ; where xl and X 2 are the solutions of Eq. (10) for the parameters D = 0.2, p = 0.02 and s - 0, i.e. they are the outputs of two uncoupled RO'ssler oscillators. This mixture of signals does not lead to a spurious detection of synchronization, i.e. the relative phase is not bounded, although u and
Phase synchronization."from theory to data analysis
313
Fig. 24. Results of the cross-spectrum analysis between EMG of the right flexor muscle (reference channel) and all MEG channels. The coherence function 72 is plotted vs. frequency. Each rectangle corresponds to an MEG sensor; the x-axis spans 2.5-25 Hz and y-axis scales from 0 to 0.75.
w are correlated: the cross-spectrum analysis by means of the Welch technique with the Bartlett window indeed reveals significant coherence 72 --0.43. The difference between the results of phase analysis and coherence can be also demonstrated with real data. In Fig. 24 we present the results of cross-spectrum analysis of all the MEG channels and the reference channel that was the E M G of the flexor muscle. Each rectangle in the plot shows the coherence between the respective M E G channel and the reference one. Contralateral sensorimotor MEG signals are coherent with EMG, in accordance with the concepts known in neuroscience [63]. Nevertheless, one can also see tremor coherent M E G activity extended over the right hemisphere in contradiction to this concept. Insufficiency of the coherence technique can additionally be seen from the fact that M E G channels overlying sensorimotor and premotor areas are coherent with practically all other M E G channels. Comparing Fig. 24 with the results of the synchronization analysis (Fig. 13) we conclude that the latter technique allows better localization of the tremor related brain activity. Acknowledgements
We are very grateful to the co-authors of original experimental publications which results have been used for this review article: H.-H. Abel, G. Firsov, H.-J. Freund, R. Kuuz, B. Pompe, A. Schnitzler J. Volkmann, and J. Weule. We acknowledge fruitful discussions with V.S. Anishchenko, P. Grassberger, H. Kantz, F. Moss, A. Neiman, U. Parlitz, J. Timmer, and M. Zaks.
314
M. Rosenblum et al.
Appendix A Instantaneous phase and frequency of a signal
A.1. Analytic signal and the HT A consistent way to define the phase of an arbitrary signal is known in signal processing as the analytic signal concept [77,40,78,79]. This general approach, based on the HT and originally introduced by Gabor in 1946 [52], unambiguously gives the instantaneous phase and amplitude for a signal s(t) via construction of the analytic signal ~(t), which is a complex function of time defined as
~(t) = s(t) + tsH(t) = A(t)e '~'),
(A.1)
where the function sn(t) is the HT of s(t) SH(t) = rt-lP.V,
~t-z
dr
'
(A.2)
and P.V. means that the integral is taken in the sense of the Cauchy principal value. The instantaneous amplitude A(t) and the instantaneous phase ~(t) of the signal s(t) are thus uniquely defined from (A.1). We note, that HT is parameters free. As one can see from (A.2), the HT can be considered as the convolution of the functions s(t) and 1/rtt. Due to the properties of convolution, the Fourier transform Sn(03) of sn(t) is the product of the Fourier transforms of s(t) and 1/1tt. For physically relevant frequencies 03 > 0, SH (03) = -tS(03). This means that the HT can be realized by an ideal filter whose amplitude response is unity, and the phase response is a constant re/2 lag at all frequencies. Although formally A(t) and d~(t) can be obtained for an arbitrary s(t), they have clear physical meaning only if s(t) is a narrow-band signal, see the detailed discussion in [79]. For narrow-band signals the amplitude A(t) coincides with the envelope of s(t), and the instantaneous frequency corresponds to the frequency of the maximum in the instantaneous spectrum. We illustrate the properties of the HT by the following examples. Example 1: Damped oscillations. Let us take as the measured signals free oscillations of linear + 0.05 2 + x = 0
(A.3)
and Duffing Jf --t- 0 . 0 5 JC + X --~ X 3 --- 0
(A.4)
oscillators, and calculate from x(t) instantaneous amplitudes A(t) and frequencies ddo/dt (Fig. 25). The amplitudes, shown as thick lines, are really envelopes of the decaying processes. The frequency of the linear oscillator is constant, while fre-
Phase synchronization." from theory to data analysis
(a) 1.2
(c)
0.6 "<
315
Vvvvvvvvvvv,
o.o 0.6 1.2
(b)
(d)
1.3 1.2
~-
1.1 1.0
0.9
0
50
time
100
0
50
100
time
Fig. 25. Free vibrations x(t) of the linear (a) and nonlinear (Duffing) (c) oscillators. The instantaneous amplitudes A(t) calculated via HT are shown by thick lines. Corresponding instantaneous frequencies d~/dt are shown in (b) and (d). quency of the Duffing oscillator is amplitude-dependent, as expected. Note, that although only about 20 periods of oscillations have been used, the nonlinear properties of the system can be easily seen from the time series, because frequency and amplitude are estimated in every point of the signal. This method is used in mechanical engineering for identification of elastic and damping properties of a vibrating system [80]. This example illustrates the important property of the HT: it can be applied to nonstationary data. Example 2.'A chaotic signal. HT can be considered as a two-dimensional embedding in coordinates s, su. Let us choose as an observable the x coordinate of the R6ssler system. The phase portrait of this system in coordinates x, xu is shown in Fig. 26; one can see that it is very similar to the "true" portrait of the R6ssler oscillator in coordinates x,y. Instantaneous amplitude and phase are shown in Fig. 27. The phase ~ grows practically linear, nevertheless small irregular fluctuations of that growth are seen. This agrees with the known fact that oscillations of the system are chaotic, but the power spectrum of x(t) contains a very sharp peak. Example 3: Human electrocardiogram. As an example of a complex signal we take a human ECG record (Fig. 28). We see that the point in the S, SH-plane makes two rotations corresponding to the so-called R and T-waves, respectively. (Small loops corresponding to the P-waves are not seen in this magnification.) What is important, the trajectories in s, SH pass through the origin, and therefore the phase is not always defined. We did not encounter this problem in the previous examples, because of the simple structure of the signals there. Indeed, the normal procedure before computing the HT is to subtract the mean value from the signal. Often this ensures that trajectories go around the origin; we implicitly used this fact before. In order to compute the phase for the cardiogram, we have to transfer the origin to the point s*, s H*and compute the phase as
316
M. Rosenblum et al.
(a)
20
.
.
.
.
(b)
10
=:
.
.
.
.
.
,
10
0
;~
10
20
20
0
10
20
.
. . 10
.
. 0
.
.
10
20
20
. . . . . 10 0
20
X
Fig. 26.
' 10
' 20
X
Phase portrait of the R6ssler system in x - XH coordinates.
20
'
'
I
10
o "~
10 20
.e-
75
%
0 0.5
(a)
o.o ~-
0.5
0
50
1 O0
150
time Fig. 27. Solution of the R6ssler system x(t) and its instantaneous amplitude A(t) (thick line) (a). Instantaneous phase ~ grows practically linear (b), nevertheless small irregular fluctuations are seen (c).
~--arctan(SH--S* )., s
(A.5)
-- s H
Definitely, in this way we loose the unambiguity in the determination of phase: now it depends on the choice of the origin. Two reasonable choices are shown in (Fig. 28c) by two arrows. Obviously, depending on the new origin in this plane, one cardiocycle (interval between two heartbeats) would correspond to the phase increase of either 2~ or 4~. This reflects the fact that our understanding of what is "one oscillation" depends on the particular problem and our physical intuition.
Phase synchron&ation."from theory to data analysb
317
10 ~.
6
2 2 5
7
2
0
4
time [sec]
.~o
'L.J' o
4
s(t)
'
8
Fig. 28. Human ECG (a) and its HT (b) and ECG vs. its HT (c). Determination of phase depends on the choice of the origin in the (s, sH) plane; two reasonable choices are shown in (c) by two arrows.
A.2. Numerics: hints and know-hows An important advantage of the analytic signal approach is that the instantaneous phase and the amplitude can be easily obtained from experimentally measured scalar time series. Computing H T in the frequency domain. The easiest way to compute the HT is to perform the FFT of the original time series, shift the phase of every frequency component by -rt/21~ and apply the inverse FFT. Zero padding should be used to make the length of the time series suitable for the FFT. To reduce the boundary effects, it is recommended to eliminate at least 10 quasiperiods at the beginning and the end of the signal. Such a computation with double precision allows to obtain the HT with the precision of about 1%. (The precision was estimated by computing the variance of s(t)+ HZ(s(t)), where H 2 means that HT was performed twice; theoretically s(t) + HZ(s(t)) =_ 0.) Computing H T in the time domain. Numerically, this can be done via convolution of the experimental data with a pre-computed characteristic of the filter (Hilbert transformer) [40,78,81]. Such filters are implemented, e.g. in the software packages MATLAB [81] and RLAB (public domain, URL: http://www.eskimo.com/~ians/ rlab.html). Although HT requires computation on the infinite time scale, i.e., the Hilbert transformer is an infinite impulse response filter, the acceptable precision of about 1% can be obtained with the 256-point filter characteristic. The sampling rate 10 This can be conveniently implemented by swapping the imaginary and real parts of the Fourier transform: Re(mi) ~ tmp, Im(mi) ~ Re(m/), - t m p --, Im(mi), where tmp is some dummy variable.
318
M. Rosenblum et al.
must be chosen in order to have at least 20 points per average period of oscillation. In the process of computation of the convolution L/2 points are lost at both ends of the time series, where L is the length of the transformer. Computing and unwrapping phase. The convenient way to compute the phase is to use the functions DATAN2(SH,S) ( F O R T R A N ) or atan2(SH,S) (C) that give the cyclic phase in the [-rt, rt] interval. The relative phase, or phase difference of two signals Sl (t) and s2(t), can be obtained via the HT as d~l (t) -- dP2(t) -- arctan
- s, (t)s.,2 (t)
Sl (t)s2(t) -k- SH.1(t)SH,2(t)
(26)
For the detection of synchronization it is usually necessary to use the phase that is defined not on the circle, but on the whole real line. For this purpose the phase (or relative phase) can be unwrapped by tracing the ~ 2~ jumps in the time course of r
Sensitivity to low-frequency trends. We discussed already that phase is well defined only if the trajectories in the s, sH-plane always go around the origin. This may be violated if the signal contains low-frequency trends, e.g. due to the drift of the zero level of the measuring equipment. If, as the result, some trajectory misses the origin, this oscillation will not be counted as a cycle, and 2rt will be lost in the overall increase of the phase. To illustrate it, we add an artificial trend to the ECG signal; the embedding of this signal in coordinates s, sH is shown in Fig. 29, to be compared with the same presentation for the original data in Fig. 28. Obviously, the origin denoted in Fig. 28 by the first arrow would be in this case a wrong choice. To avoid these problems, we recommend always to plot the signal vs. its HT and to check whether the origin is chosen correctly. Instantaneous frequency. We note, that estimation of instantaneous frequency f ( t ) of a signal is rather cumbersome. Direct approach, i.e., numerical differentiation of qb(t), naturally results in very large fluctuations in the estimate of f(t). Moreover, one may encounter that f ( t ) < 0. This happens not only due to the influence of noise, but can result from a complicated form of the signal. For example, some characteristic patterns in the ECG (e.g. the T-wave) result in negative values of instantaneous frequency. From a physical point of view, we expect that the
{
,
i
9
s(t) Fig. 29.
An illustration of the sensitivity of the HT to low-frequency trends.
Phase synchronization: from theory to data analys&
319
i n s t a n t a n e o u s f r e q u e n c y is a p o s i t i v e f u n c t i o n o f t i m e t h a t varies slowly w i t h respect to the c h a r a c t e r i s t i c p e r i o d o f o s c i l l a t i o n s a n d has a sense o f a n u m b e r o f oscillations p e r time unit. This is especially i m p o r t a n t for the p r o b l e m o f s y n c h r o n i z a t i o n w h e r e we are n o t i n t e r e s t e d in the b e h a v i o r o f the p h a s e o n a time scale s m a l l e r t h a n the c h a r a c t e r i s t i c o s c i l l a t i o n p e r i o d . T h e r e exist several m e t h o d s to o b t a i n the e s t i m a t e s o f f ( t ) in a c c o r d a n c e to this v i e w p o i n t ; for a d i s c u s s i o n a n d c o m p a r i s o n see [79,30].
References 1. Hugenii (Huygens), C.H. (1673) Horologium Oscillatorium (Apud F. Muguet, Parisiis, France; English translation: 1986, Iowa State University Press, Ames). 2. Hayashi, C. (1964) Nonlinear Oscillations in Physical Systems. McGraw-Hill, New York. 3. Blekhman, I.I. (1971) Synchronization of Dynamical Systems. Nauka, Moscow (in Russian). 4. Blekhman, I.I. (1981) Synchronization in Science and Technology. Nauka, Moscow (in Russian); English translation: 1988, ASME Press, New York. 5. Landa, P. (1996) Nonlinear Oscillations and Waves in Dynamical Systems. Kluwer Academic Publishers, Dordrecht, Boston, London. 6. Fujisaka, H. and Yamada, T. (1983) Prog. Theor. Phys. 69, 32-47. 7. Pikovsky, A.S. (1984) Z. Physik B 55, 149-154. 8. Pecora, L.M. and Carroll, T.L. (1990) Phys. Rev. Lett. 64, 821-824. 9. Rosenblum, M., Pikovsky, A. and Kurths, J. (1996) Phys. Rev. Lett. 76, 1804-1807. 10. Pikovsky, A., Rosenblum, M., Osipov, G. and Kurths, J. (1997) Physica D 104, 219-238. 11. van der Pol, B. and van der Mark, J. (1928) Philos. Mag. 6, 763-775. 12. Aschoff, J., Daan, S. and Groos, G. (1982) Vertebrate Circadian Systems. Structure and Physiology. Springer, Berlin. 13. Glass, L. and Mackey, M.C. (1988) From Clocks to Chaos: The Rhythms of Life. Princeton University Press, Princeton, NJ. 14. Petrillo, G.A. and Glass, L. (1984) Am. J. Physiol. 246, 311-320. 15. Bramble, D. and Carrier, D. (1983) Science 219, 251-256. 16. Collins, J. and Stewart, I. (1993) J. Nonlinear Sci. 3, 349-392. 17. Sturis, J., Knudsen, C., O'Meara, N. M., Thomsen, J. S., Mosekilde, E., VanCauter, E. and Polonsky, K. S. (1995), CHAOS 5, 193-199. 18. Neiman, A., Ping, X., Russel, D., Wojtenek, W., Wilkens, L., Moss, F., Braun, H., Huber, M. and Voigt, K. (1999) Phys. Rev. Lett. 82, 660-663. 19. Anishchenko, V.S., Balanov, A.G., Janson, N.B., Igosheva, N.B. and Bordyugov, G.V. (2000) Int. J. Bifurc. and Chaos 10, 2339-2348. 20. Koepchen, H. (1991) in: Rhythms in Physiological Systems, pp. 3-20. eds H. Haken and H. Koepchen. Springer Series in Synergetics, Vol. 55 Springer, Berlin, Heidelberg. 21. Stutte, K. and Hildebrandt, G. (1966) Pfli~gers Archiv. 289, R47. 22. Engel, P., Hildebrandt, G. and Scholz, H.-G. (1968) Pflfigers Archiv. 298, 258-270. 23. Pessenhofer, H. and Kenner, T. (1975) Pflfigers Archiv. 355, 77-83. 24. Kenner, T., Pessenhofer, H. and Schwaberger, G. (1976) Pfliigers Archiv. 363, 263-265. 25. Raschke, F. (1987) in: Temporal Disorder in Human Oscillatory Systems, pp. 152-158. eds L. Rensing, U. an der Heiden and M. Mackey. Springer Series in Synergetics, Vol. 36. Springer, Berlin, Heidelberg 26. Raschke, F. (1991) in: Rhythms in Physiological Systems, pp. 155-164. eds H. Haken and H.P. Koepchen. Springer Series in Synergetics, Vol. 55. Springer, Berlin, Heidelberg. 27. Schiek, M., Drepper, F., Engbert, R., Abel, H.-H. and Suder, K. (1998) Nonlinear Analysis of Physiological Data, pp. 191-209. eds H. Kantz, J. Kurths, and G. Mayer-Kress. Springer, Berlin. 28. Seidel, H. and Herzel, H. (1998) IEEE Eng. Medicine Biol. 17, 54-57. 29. SchS.fer, C., Rosenblum, M.G., Kurths, J. and Abel, H.-H. (1998) Nature 392, 239-240.
320
30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58.
59. 60. 61. 62. 63. 64. 65. 66. 67.
M. Rosenblum et al.
Sch~ifer, C., Rosenblum, M., Abel, H.-H. and Kurths, J. (1999) Phys. Rev. E 60, 857-870. Singer, W. and Gray, C. (1995) Annu. Rev. Neurosci. 18, 555-586. Gray, C. and Singer, W. (1987) Soc. Neurosci. 404, 3. Eckhorn, R., Bauer, R., Jordan, W., M.Brosch, Kruse, W., Munk, M. and Reitboeck, H. (1988) Biol. Cybern. 60, 121-130. Gray, C. and Singer, W. (1989) Proc. Natl. Acad. Sci. USA 86, 1698-1702. MacKay, W. (1997) Trends Cog. Scie. 1, 176-183. Roelfsema, P., Engel, A., K6nig, P. and Singer, W. (1997) Nature 385, 157-161. Engel, J. and Pedley, T. (1975) Epilepsy: A Comprehensive Textbook. Lippincott-Raven, Philadelphia. Freund, H.-J. (1983) Phys. Rev. 63, 387-436. Elble, R. and Koller, W. (1990) Tremor. Johns Hopkins University, Baltimore. Rabiner, L. and Gold, B. (1975) Theory and Application of Digital Signal Processing. Prentice-Hall, Englewood Cliffs, NJ. Rgnyi, A. (1970) Probability Theory. Akad+miai Kiad6, Budapest. Pompe, B. (1993) J. Stat. Phys. 73, 587-610. Voss, H. and Kurths, J. (1997) Phys. Lett. A 234, 336-344. Schiff, S., So, P., Chang, T., Burke, R. and Sauer, T. (1996) Phys. Rev. E 54, 6708-6724. Pecora, L.M., Carroll, T.L. and Heagy, J.F. (1997) in: Nonlinear Dynamics and Time Series, pp. 49-62. eds C.D. Cutler and D.T. Kaplan. Fields Inst. Communications Vol. 11 American Mathematical Society, Providence, Rhode Island. Rosenblum, M., Pikovsky, A. and Kurths, J. (1997) IEEE Trans. CAS-I 44, 874-881. Rosenblum, M.G. and Kurths, J. (1998) in: Nonlinear Analysis of Physiological Data, pp. 91-99. eds H. Kantz, J. Kurths, and G. Mayer-Kress, Springer, Berlin. Rosenblum, M.G., Firsov, G.I., Kuuz, R. and Pompe, B. (1998) in Nonlinear Analysis of Physiological Data, pp. 283-306. eds H. Kantz, J. Kurths, and G. Mayer-Kress, Springer, Berlin. Tass, P., Rosenblum, M., Weule, J., Kurths, J., Pikovsky, A., Volkmann, J., Schnitzler, A. and Freund, H.-J. (1998) Phys. Rev. Lett. 81, 3291-3294. Ott, E. (1992) Chaos in Dynamical Systems. Cambridge University Press, Cambridge. Stratonovich, R. (1963) Topics in the Theory of Random Noise. Gordon and Breach, New York. Gabor, D. (1946) J. lEE London 93, 429-457. Gurfinkel, V., Kots, Y. and Shik, M. (1965) Regulation of Posture in Humans. Nauka, Moscow. Cernacek, J. (1980) Agressologie 21D, 25-29. Furman, J. (1994) in: Bailli+re's Clinical Neurology, Vol. 3, pp. 501-513. Bailli6re Tindall. Lipp, M. and Longridge, N. (1994) J. Otolaryngology 23, 177-183. Otnes, R. and Enochson, L. (1972) Digital Time Series Analysis. Wiley, New York. Tass, P., Kurths, J., Rosenblum, M., Weule, J., Pikovsky, A., Volkmann, J., Schnitzler, A. and Freund, H.-J. (1999)in: Analysis of Neurophysiological Brain Functioning, pp. 252-273. ed C. Uhl, Springer Series in Synergetics. Springer, Berlin. Niedermeyer, E. and da Silva, F.L. (1972) Electroencephalography- Basic Principles, Clinical Applications and Related Fields, 2nd Edn. Urban & Schwarzenberg, Baltimore. Cohen, D. (1972) Science 175 664-666. Ahonen, A., H~im~il~iinen,M., Kajola, M., Knuutila, J., Lounasmaa, O., Simola, J., Tesche, C. and Vilkman, V. (1991) IEEE Trans. Magn. 27, 2786-2792. H~im~il~iinen,M., Hari, R., Ilmoniemi, R., Knuutila, J. and Lounasmaa, O. (1993) Rev. Mod. Phys. 65, 413-497. Volkmann, J., Joliot, M., Mogilner, A., Ioannides, A., Lado, F., Fazzini, E., Ribary, U. and Llinfis, R. (1996) Neurology 46, 1359-1370. Saul, J. (1991) in: Rhythms in Physiological Systems, pp. 115-126. eds H. Haken and H. Koepchen, Springer Series in Synergetics. Vol. 55 Springer, Berlin Heidelberg. Anrep, G., Pascual, W. and R6ssler, R. (1936) Proc. Roy. Soc. (London) Ser. B 119, 191-217. Anrep, G., Pascual, W. and R6ssler, R. (1936) Proc. Roy. Soc. (London) Ser. B 119, 218-230. Schmidt, R. and Thews, G. (1983) Human Physiology. Springer, New York, 1983.
Phase synchronization." from theory to data analys&
321
68. Press, W.H., Teukolsky, S.T., Vetterling, W.T. and Flannery, B.P. (1992) Numerical Recipes in C: the Art of Scientific Computing. Cambridge University Press, Cambridge. 69. Palus, M. (1997) Phys. Lett. A 227, 301-308. 70. Theiler, J., Eubank, S., Longtin, A., Galdrikian, B. and Farmer, J. (1992) Physica D 58, 77-94. 71. Kurths, J. and Herzel, H. (1987) Physica D 25, 165. 72. Schreiber, T. and Schmitz, A. (1997) Phys. Rev. Lett. 77, 635-638. 73. Landa, P. and Zaikin, A. (1996) Phys. Rev. E 54, 3535-3544. 74. Schack, B. (1999) in: Analysis of Neurophysiological Brain Functioning, pp. 230-251. ed C. Uhl, Springer Series in Synergetics. Springer, Berlin. 75. Miltner, W., Braun, C., Arnold, M., Witte, H. and Taub, E. (1999) Nature 397, 434-436. 76. Kwapien, J., Drozdz, S., Liu, L. and Ioannides, A. (1998) Phys. Rev. E 58, 6359-6367. 77. Panter, P. (1965) Modulation, Noise, and Spectral Analysis McGraw-Hill, New York. 78. Smith, M. and Mersereau, R. (1992) Introduction to Digital Signal Processing. A Computer Laboratory Textbook Wiley, New York. 79. Boashash, B. (1992) Proc. IEEE 80, 520-568. 80. Feldman, M. (1994) Mech. Systems Signal Process. 8, 119-127. 81. Little, J. and Shure, L. (1992) Signal Processing Toolbox for Use with MATLAB. User's Guide, Mathworks, Natick, MA.
This Page Intentionally Left Blank
C H A P T E R 10
Statistical Analysis and Modeling of Calcium Waves in Healthy and Pathological Astrocyte Syncytia P. J U N G
A.H. C O R N E L L - B E L L and M. D R E H E R
Department of Physics and Astronomy and Program for Neuroscience, Ohio University, Athens, OH 45701, USA
Viatech Imaging/Cognetix, 58 Main Street Ivoryton, CT 06442, USA
A. D E G R A U W and R. S T R A W S B U R G
V. T R I N K A U S - R A N D A L L
Division of Neurology, Children's Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, USA
Department of Opthalmology, Boston University, School of Medicine, 80 E Concord St., Boston MA 02118, USA
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
323
Contents
1.
Introduction and background
2.
Intracellular and intercellular calcium waves ..............................
328
3.
Calcium waves and nonlinear waves on a stochastic support
332
4.
Space-time cluster analysis
5.
Calcium waves analysis
6.
.......................................
325 ....................
........................................
..........................................
337 342
5.1.
Healthy rat hippocampal astrocytes ................................
342
5.2.
C u l t u r e s o f h u m a n n e r v o u s tissue
343
Conclusions
Acknowledgements References
.................................
................................................. .............................................
.....................................................
324
350 350 350
1. Introduction and background In recent years the theory of spatially extended excitable systems in contact with a fluctuating environment has revealed remarkable new results that may have important implications for the understanding of cellular communication in cell cultures and brain slices [1]. Spatially extended excitable or oscillatory systems are known as pattern forming systems [2]. In two dimensions, they produce rotating spiral waves and target waves [3] and chaotic waves [4]. Calcium waves of such nature have been observed in oocytes [5], in slices of hippocampal tissue [6] and cultures of glial cells [7,8]. Waves in slices and cultures are recorded by loading the cells with calcium sensitive dyes. The exact function of these calcium waves in brain tissue (intracellular and intercellular) is not known. Possibly waves enable communication within a cell or between cells and they may facilitate a collective response to a stimulus or even facilitate synchronization between them. The message encoded in the calcium waves, however, remains unknown. There is controversy about the nature and the mechanism of these waves and most likely the mechanism varies from cell type to cell type. It is however certain that besides release of calcium from internal reservoirs many channels and exchangers are involved in this process. Release of additiona! neurotransmitter, including ATP, may occur from astrocytes that have previously been stimulated to produce Ca 2+ waves [9]. Our first hypothesis is that calcium waves can be viewed as a fingerprint of the underlying complex physiological processes that create these waves and of its chemical environment. In other words, the intercellular calcium waves probe the ionic environment of the cell culture and their macroscopic features give clues about it. One of the remarkable features of the intercellular calcium waves is their large amount of "fuzziness". This feature and their apparently random generation in cultures of rat hippocampal astrocytes motivate the second hypothesis that noise plays a role for the generation as well as the propagation of intercellular calcium waves. The role of fluctuations for the generation and propagation of patterns in spatially extended excitable media has been studied quite recently [1,10]. The role of fluctuations depend on the excitability of the cells. For high excitability, fluctuations will primarily generate randomly target waves that travel unbounded through the excitable medium. Such a situation, often encountered in unstirred chemical reactions where the wave is being triggered by an inhomogeneity, is unrealistic for intercellular calcium waves. Intercellular calcium waves in airway epithelial cells, e.g., have been observed to decay after a lifetime of about a minute [11]. Similarly, calcium waves in the rat astrocyte syncytium have finite lifetimes of up to 10 rain [12]. In view of the potential role of calcium to organize the response to local stimulation or even 325
326
P. Jung et al.
synchronize the response, it is clear that a long distance propagation of the waves is not desired in healthy tissue. Such a consideration in connection with recent results in the theory of excitable media coupled to a noisy environment [13] suggested the hypothesis that the excitability of the tissue that gives rise to intercellular propagation of calcium waves should be low. This is the domain where noise strongly dominates the features of waves. A recent comparison of the qualitative features of intercellular calcium waves in rat astrocytes indicated that the proper regime might be the sub-threshold regime. In the mathematical model, below a critical threshold of excitability, initially stimulated cells cannot give rise to a propagating wave front [1]. Nevertheless, in the presence of fluctuations, an excitable system even below the critical threshold of excitability forms local patterns with finite lifetimes and thus radius of spread. These patterns - impossible in the absence of fluctuations - are fueled by the fluctuations and therefore have stochastic features. In spite of the random fueling of the system, the patterns have spatiotemporal coherence. Phenomena like this, where fluctuations play a constructive role for a dynamic process have been studied extensively in the physical/mathematical literature over the last decade under the term stochastic resonance [14]. Stochastic resonance has been demonstrated in living systems such as in sensory hair cells [15,16] and recently also in mammalian neuronal networks [17]. It describes the enhancement of a weak periodic signal by adding an extra dose of noise. For sensory neurons it has been shown that the information encoding of a weak signal in the form of a spike train can be actually enhanced by an extra amount of noise [15,16]. Most remarkably, there is a resonant value of the strength of the fluctuations where this enhancement is maximized, hence the term stochastic resonance. Although the mathematical details of this effect are now well understood, the major question remains whether nature has learned to take advantage of this effect by tuning its parameters (such as the excitability of a neuron) to a level where fluctuations maximize the responsitivity and the performance of neurons. In the context of intercellular calcium waves, the suggested hypothesis is that environmental fluctuations fuel the processes that drive the calcium waves; thus they trigger intercellular communication which facilitates a collective response of the tissue. In this picture, the intercellular calcium waves are the fingerprints of a state of communication between cells in the tissue, similar to earthquakes being the fingerprints of the buildup of stress between tectonic plates. Some pieces of evidence for this point of view are provided by a recent quantitative analysis of excitable waves in a noisy environment and intercellular calcium waves. A space-time cluster decomposition technique [13] that decomposes recorded calcium waves (or any other wave pattern) into coherent space-time clusters has revealed power-law scaling of the distribution of the cluster sizes of calcium waves [12,18]. Ca 2§ signals captured in time-lapse videos are exquisitely complex. Videos reveal intracellular Ca 2§ oscillations, waves of Ca 2§ traveling within the confines of a cell and slower waves which travel intercellularly over long distances. Activation of different neurotransmitter receptors produce only oscillations (metabotropic glutamate receptor), a mixture of oscillations and wave patterns (glutamate receptor) or
Stat&tical analysis and modeling of calcium waves
327
only waves (ionotropic glutamate receptors). If ionotropic receptors, such as kainate, are activated at the optimal concentration of neurotransmitter long-lasting waves can be seen traveling over as much as 100-200 cells lengths within a culture. To date there has been no successful measure of communication between cells in the astrocyte syncytium. Several imaging studies measured fluorescence intensity for every cell in a field throughout time-lapse records [19,20] with little success at determining whether the oscillation rate of one cell influenced a neighbor. The passage of a wave through a group of cells appeared to influence the oscillation rate of other astrocytes in the field, however a means of measuring how great this influence was did not exist. Traditionally, authors reported the findings for one or two cells in a field and then verbally described what happened to the majority of the other cells. With the space-time cluster decomposition cellular Ca 2+ wave signaling can be quantified with respect to the strength of these waves both in terms of their spatial and temporal components. Physiological properties of glial cells cultured from resected medial temporal lobe epilepsy patients retain a hyperexcitable response to neurotransmitters [21]. Hyperexcitable glutamate-induced Ca 2+ responses were produced by cells derived from tissue regions that exhibited hyperexcitable EEGs in the operating room. Astrocytes from epileptic loci exhibited altered physiology such as increased coupling [22] and altered morphology and changed channel expression [23]. Underlying causes of seizure development remain unclear. Excitatory neurotransmitters such as glutamate, aspartate and glycine and the inhibitory neurotransmitter gamma amino butyric acid (GABA) have been implicated in the pathophysiology of seizures [24]. A number of studies in animal models and in humans have shown that glutamate concentration is increased extracellularly and GABA may be decreased in the epileptic area during and after seizures (see [25] for review). Involvement of glial cells in epilepsy remains an uncharted area. It is possible that the astroglial glutamate transporters are impaired in epileptic foci [26]. In another study these authors showed that astrocytes from epileptic foci have a much higher density of Na + channels and as a consequence, they are capable of generating much larger currents than astrocytes from normal tissue [27]. Additionally, there is dramatic alteration in the K + channel properties which may also indicate destruction of the potassium spatial buffering which normally protects tissue from excessive buildup of K + in the extracellular space [23]. Genetic studies of familial epilepsies on the other hand have shown mutations in ion channel genes. A newly described epilepsy generalized epilepsy with febrile seizures plus [28] was linked to the betal auxiliary subunit of the voltage-gated sodium channel (SCN1B) on chromosome 19 [29]. "Benign neonatal familial convulsions (BNFC)" [30] has been linked to chromosomes 8 and 20. These two genes encode highly homologous potassium channel subunits, KCNQ2 and KCNQ3 [31,32]. It is fair to say that epilepsy derives from multiple causes yet shared properties of excitotoxicity and cellular excitability are manifested in all of these conditions. In this paper, we demonstrate that calcium signaling in cultures of epileptic astrocytes differs significantly from cultures of healthy astrocytes. This difference can be quantified using the space-time cluster decomposition method.
328
P. Jung et al.
2. Intracellular and intercellular calcium waves
Calcium plays an important role in many cellular functions. It is essential for muscle contraction and cardiac electrophysiology and long-term potentiation in neurons to name only a few. The typical concentration of calcium in a cell is 0.1 laM. Since higher concentrations are toxic if maintained for a longer time, the cell has several mechanisms to remove calcium from the cytoplasm. Calcium can be removed from the cytoplasm by active pumps or by concentrating it in internal compartments (internal stores), bound by a membrane. Calcium can enter the cytoplasm in two different ways: from extracellular space through calcium channels and from internal storage through IP3 binding to IP3-receptors on the membrane of internal calcium stores. Binding of extracellular neurotransmitter (such as glutamate) can cause the intracellular release of IP3. The mechanism outlined above are the basis for the calcium oscillations (voltage clamped) observed in many cells (e.g. astrocytes) as a response to neurotransmitters [33,34]. Binding of neurotransmitter to a receptor causes the release of IP3 into the cytoplasm. IP3 binds the membrane of an internal calcium store and calcium is released into the cytoplasm. Calcium releases more calcium from internal storages through calcium-induced calcium release. This second mechanism describes an instability leading to a quick increase of calcium in the cytoplasm. The calcium is removed from the cytoplasm by active pumps proportional to the concentration of calcium in the cytoplasm, thus quickly releasing the high intracellular calcium concentration. Assuming that the intracellular IP3 concentration remains constant, the cycle can begin again leading to stationary oscillations of the concentration of intracellular calcium. The oscillations of the calcium concentration occur often nonuniformly across the cell, but rather is organized in the form of intracellular waves [35]. The speed of these intracellular calcium waves is of the order of several micrometer per second and thus very slow in comparison to neuronal signal propagation. Intercellular calcium waves propagating from cell to cell have been observed in hippocampal brain slices [6] and glial cell cultures [7,8]. As mentioned already in the introduction there is controversy about the mechanism of intercellular calcium waves. There is probably no universal mechanism for all cell types and thus each case has to be studied separately. Sanderson et al. [36] report that intercellular calcium waves spread through a culture of airway epithelial cells by slowing down when they cross membranes and subsequently speed up as they propagate as intercellular calcium waves. Such a slowing down has not been reported in intercellular calcium waves in cultures of glial cells produced by glutamate exposure [7]. It has been suggested that the sodium-calcium exchanger plays a key role for kainate-induced intercellular calcium waves [37]. Also, the appearance of the intercellular calcium waves varies from culture to culture indicating different mechanisms. Moreover, the relation of intracellular calcium oscillations and intercellular calcium waves is poorly understood at present. A common model for intercellular calcium waves has been put forward by Boitano et al. [38]. According to this model, IP3 released in one cell by agonist binding diffuses through the cell culture via the gap junctions. The IP3 binds to the internal calcium storages and calcium is released into the cytoplasm fueling
Statistical analys& and model&g of calcium waves
329
the propagation of the calcium waves via calcium-induced calcium release from internal storages. That this cannot be the only mechanism for intercellular calcium waves has been shown by a recent study where the calcium wave propagates from cell to cell even when the gap junctions are inhibited. An additional, hypothetical mechanism for kainate-induced intercellular calcium wave propagation can be extended. This mechanism derives from the observations that inhibiting the N a + / C a 2+ exchanger slows down the intracellular oscillations of Ca 2+ to neurotransmitter [37] and totally eliminates kainate-induced intercellular spiral waves (Fig. 1). Extensive neuronal activity results in an increase in stimulusevoked release of neurotransmitter within a region which further produces increased astrocytic responses to extracellular neurotransmitter. Increased excitability of the ionotropic glutamate receptors results in an increased inward flux of Na + associated with receptor activation [39-41]. This increased receptor activity will produce a rapid increase in intracellular Na +, which will elicit regulation by the N a + / C a 2+ exchanger in the following manner. Larger reductions in the extracellular Na + electrochemical gradient will promote substantial net gain of Ca 2+ that will not appear as free Ca 2+ until the Ca 2+ buffering systems are overwhelmed [42]. Reduction of Na + results in a rapid reversal of the N a + / C a 2+ exchanger with a large increase in cytoplasmic Ca 2+. The N a + / C a 2+ exchanger at the plasma membrane effects the magnitude of Ca 2+ release from IP3-mediated and Ca 2+ stores in the endoplasmic reticulum [43]. Heterogeneous stores are activated by spatially and temporally distinct Ca 2+ signals, which control individual CaZ+-dependent Kainate Wave in Complete Saline
A
Kainate in Benzamil (lOOuM)
B
300
250 200
0 uc uo
oo 2OO
o
150
U)
100
I,I,
~oo A
0
9
0
2 u.
A
J
100
.
i ....
200
9
,
300
Time (Sec)
"--
r
400
+
i
500
50
0
. . . . .
o
"
k
i
,oo
9
200
',
300
w
k
,oo
_
i
500
Time (Sec)
Fig. 1. Inhibition of the Na+/Ca 2+ exchanger with benzamil abolishes spiral Ca 2+ waves. When kainate (100 ~tM) is bath applied (arrow heads) to a field of astrocytes loaded with a Ca 2+ dye, Fluo-3AM, a gradual increase in intracellular Ca 2+ occurs which can take 50-100 s to reach a peak. Long-lasting intercellular waves of Ca 2+ propagate throughout the culture after the initial peak is reached. These C a 2+ elevations are seen as the broad oscillations in Fig. 1A. If the Na+/Ca 2+ exchanger is inhibited with the addition of Benzamil in the perfusion system no long-lasting spiral waves are seen upon the addition of kainate (arrow heads).
P. Jung et al.
330
processes. When Ca 2+ homeostasis within the astrocyte is interrupted with thapsigargin (see Fig. 2) kainate addition produces an initial response, however propagation of the kainate spiral waves is totally inhibited. These studies suggest that the regulated uptake and release of Ca 2+ from ER stores contributes to lowering the barrier to wave propagation. Interestingly, IP3-sensitive stores that were stimulated by binding the metabotropic glutamate receptor (i.e. t-ACPD) and inhibited by M C P G (Fig. 3) appeared to have no interaction with the propagation of the kainate spiral wave implying that these IP3-mediated stores may be under local regulation of Ca 2+ homeostasis that is distinct from the stores effected by thapsigargin [43]. Propagation of the kainate spiral waves most likely requires contributions from multiple Ca 2+ regulatory sites. Bath perfusion of astrocytes with kainate or excessive release of neurotransmitter from epileptic astrocytes would send entire regions of the astrocyte surface into a state of imbalance. Excessive stimulation of ionotropic receptors results in reversal of the N a + / C a 2+ exchanger, dramatic changes in Ca 2+ spatial buffering and propagation of this Ca 2+ imbalance in the form of a spiral wave across the surface of the astrocyte. Release of additional neurotransmitters, including ATP, [9] would increase this state of imbalance even further hence reducing the barrier to propagation even more (see Fig. 3). _1
s]
x
I
s2
I
.
1_
K
|
s3
260
0
0 C
ID
0 Ih,,.
0
220
2OO
0 '" ,,' 0
100
200
300
400
SO0
!
600
-
'|
700
-
i
'
800
Time (Sec) Fig. 2. Thapsigargin inhibition of Ca 2+ re-uptake into IP3-mediated stores prevents propagation of the kainate spiral waves but does not inhibit initial intracellular Ca 2+ increase following kainate. Thapsigargin (T) added to Fluo 3AM loaded astrocytes results in increase of Ca z+ from IP3-mediated stores and prevention of re-uptake of the cytoplasmic Ca 2+ load back into the endoplasmic reticulum. Eventually there is clearance of the cytoplasmic Ca 2+ to a level below the beginning baseline. Once kainate is added to the system (325 s) there is a fairly steep increase in Ca 2+ levels, but there is no significant intercellular spiral wave production. Instead, the cells exhibit very short-lived oscillations (1-15 s) on top of the longer-lasting, nearly 200 s, cycle of Ca 2+. These data imply that the normal regulation of this IP3-sensitive store is needed to support propagation of the spiral waves.
331
Stat&tical analysis and modeling of calcium waves
MCPG (500 uM) and KA (100 uM)
A 250
200
0 c 4) 0 m z._
0 :3 IL
150-
too
m
.MCPG_ 0
-
i
0
'
9
!
100
-' ........
200
!
"'1
300
'
400
'
5;0
Time (Sec) B
MCPG (500uM) and tACPD (100uM)
250
200
150
100
4' !
.......
0
9
L ii i i
i
100
i
iiiiiii
,
MCPG
!11111 i
i
200
9
i
300
i
400
9
i
500
Time (Sec) Fig. 3. Methyl-4-carboxyphenylglycine (MCPG), an inhibitor of the IP3-mediated metabotropic glutamate receptor has no effect on the initiation or propagation of the kainateinduced spiral waves. In contrast, MCPG completely inhibits the t-ACPD response, which specifically activates the metabotropic glutamate receptor. Inhibiting the metabotropic glutamate receptor contribution to the IP-3 mediated Ca 2+ stores appears to have no effect on the ionotropic-induced kainate waves. When kainate is placed on astrocytes that have previously been treated with MCPG (Fig. 3A) the expected gradual increase in Ca 2+ occurs over 50 s (approximate) followed by long-lasting Ca 2+ increases. In the time-lapse video generated from these cells spiral waves form and then propagate in a normal manner throughout the culture. M C P G is an effective inhibitor of t-ACPD which is the specific metabotropic glutamate receptor agonist (Fig. 3B).
332
P. Jung et al.
3. Calcium waves and nonlinear waves on a stochastic support The model for intercellular calcium waves by Boitano et al. [38] assumes that the propagation of calcium waves is facilitated by diffusing IP3. Mathematically, this is described by a diffusion equation with an absorbing term that takes into account leakage of IP3 into other processes. However, only for a large number of Brownian walkers (the IP3 molecules) the actual concentration of IP3 will follow the radially symmetric solution of the isotropic diffusion equation. For a finite number of walkers internal fluctuations can become important providing a stochastic support for the calcium waves. This could be one of the reasons why the calcium waves in cultures of airway epithelial cells [36] and glial cells [12] appear noisy. Astrocytic calcium waves are revealed using the calcium sensitive fluoroprobe, Fluo-3, and time lapse imaging using a laser confocal microscope [7]. With adequate video equipment, space and time resolved information on the shape and propagation of calcium waves can be extracted in the form of time lapse frames. In Fig. 4, a sequence of difference-frames between consecutive frames of imaged calcium activity in an astrocyte syncytium exposed to a low dose of kainate (50 ~tm) is shown. The gray scale codes the intensity of calcium activity (i.e. more calcium means darker gray). Time advances from the upper left to lower right (8 s between the frames). The sequence of difference-pictures between consecutive frames shows a revolving core of a rotating spiral wave. Characteristic features of the calcium waves are their wrinkly shapes, their apparently stochastic occurrence at apparently random sites, and their disappearance after finite statistically distributed lifetimes. In contrast to pointwise 9
.L
f
,.
r i
i +
.4
-'-
o...,,
1/:
.~
.
- . .
I,'
.
,sll
:
11 -. +
.
,.
.
.
..
+
. .. .
.i
:4....
~r
,
-
-
.-
..;t
+
. ~ +
~
+
.
...
.
1
.
I
o. . . .
L
. .
9
.
Fig. 4. Difference frames of time lapse images are shown. The time proceeds from the upper left to the right lower image. The gray scales code the intensity of calcium activity. The frames show small-scale spontaneous calcium activity as well as one large-scale (spiral-core type) brain wave. These pictures indicate already the wave activity on many time and length scales, resulting in the power-law behaviour mentioned in the text below.
333
Statistical analysis and modeling of calcium waves
stimulation with neurotransmitter, neurotransmitter has been perfused over the entire cell culture, mimicking neuronal activity over the entire system. Thus, within the theory of IP3-mediated calcium waves, the concentration profile of IP3 is nonGaussian (it would be Gaussian for a pointwise stimulation) and provided therefore a fluctuating environment for the propagation of intercellular calcium waves. Given the complexity of the cellular processes connected with the calcium waves, our approach to model these waves is to take a simple excitable model that generates nonlinear waves and modify it such that those waves are propagating on a stochastic support, i.e. the wave can only propagate if enough Brownian walker IP3 molecules are present. An example of an excitable system is the F i t z h u g h - N a g u m o model (Fitzhugh, 1961) for generation of neuronal action potentials given by the two dynamical system with two degrees of freedom ~ = ~(a
-
~)(~-
= v - aw-
1)
-
w,
b.
(1)
The variable v describes the membrane voltage, while w describes the slow (~ ~ 0) recovery process. The nullclines, i.e., the curves where b and ~ vanish, are shown in Fig. 5 for b -- 0.0, a = 0.5 and d = 1. The stationary states are given by the intersection of the nullclines. The two nullclines intersect once at v = 0, w = 0 (S), where the intersection point represents a stable node. A small perturbation, which does not bring the system beyond P, leads to a relaxation back towards the stationary state S. If a perturbation is large (beyond P), the system switches to the excited state T~ (it fires) and returns to the stationary state S only after a large excursion (P ~ T1 ~ T2 ~ S). During the time interval when the system approaches the stationary state, it is in the recovery state.
0.2
\
0.1
,\/
,i
,
)/'
,-
T2f(v,w)=0
0.0
-0.1 -0.2 -0.5
, iig(v,w):O
0.0
0.5
T11 1.0
1.5
Fig. 5. The nullclines of Eq. (1) are shown for b = 0, a -- 0.5 and d = 1. The intersection (S) of the nullclines at v = 0, w - 0 represents a stable node. The arrows indicate the course of a large excursion if the threshold of excitation (P) is crossed.
P. Jung et al.
334
Based on the qualitative features of excitable systems, one can construct a simplified version of excitable dynamics modeled by a three-state system. The three states are the quiescent state Q (corresponding to the stable fixed point S), the excited state E (corresponding in Fig. 5 to the escape from P to T1 and the recovery state R (corresponding to the state of slowly moving back to S). The state of the system is determined by an input variable v(t). Whenever the variable v(t) crosses a threshold (corresponding to P in Fig. 5), the system switches from the quiescent to the excited state. After some recovery time, the system eventually returns back into the quiescent state. Such a simplified version of excitable dynamics becomes useful for modeling excitable media, since it greatly reduces computational demands by allowing large time steps in comparison to the continuous models. A discrete version of an excitable medium can be modeled by a square array of excitable three-state cells eij at the positions X i j - - iaYc+ jaf~ with the unit vectors and 3) in x and y directions, respectively, and the lattice spacing a [1]. Approaches like this have been also used for modeling the visual cortex [44,45]. The quiescent state corresponds to an astrocyte with a normal calcium concentration, i.e. no calcium wave. The excited state corresponds to an astrocyte that contains enough IP3 that a wave can propagate. Within the model of calcium waves involving the sodium-calcium exchanger, the excited state corresponds to the state of sodiumcalcium exchangers in reverse. The excitable dynamics is driven by the fluctuating concentration of IP3 vij modeled by the Langevin equations (2) with Gaussian, white noise, i.e.
(~ij(t)~kl({)) -- 2 Cy28(t- {)8(ij).(kl), :
(3)
o.
The threshold for excitation for each cell is denoted by b. The constant ~/describes the rate at which fluctuations decay and cy2 denotes the spatially homogeneous variance of the fluctuations. Integration of (2) over the finite time step At yields the exact map
Vij(t + At)
=
vij(t)exp(-yAt)
+
Rij,
(4)
where Rij are Gaussian-distributed random numbers with variance cy2(1- exp(-2yAt)). When the cells (kl) are excited, they do interact with their neighbors within a finite neighborhood by pulse interaction: All excited cells (kl) send out signals to the cells in a neighborhood that increase their concentrations vii and thus increase their chances to become excited. The signaling amplitude between the cells depends exponentially on the distance r(ij)(kt) between the cells. Integrated over the time interval At the total change of vij is given by
Vij(t + At)
--
vij(t) e x p ( - y A t )
+ Rij at- K
Z Gij'klDkl(t)' kl
(5)
Statistical analysis and modeling of calcium waves
335
with the Greens function
G i j kl - -
K exp
4,,] .
-X~
,
(6)
a2 J
Here 9v describes the interaction range and K a coupling constant. The indicator function pkl(t) is unity at sites where the cells are excited at time t and zero elsewhere, i.e. (7)
Pk,(t) = O ( v k l ( t ) - b).
Each cell undergoes a recovery period after the excitatory phase. The proper scaling of this model is given by xij ~ x i j / b , t -+ yt, cy2 --+ cy2/b 2, y ~ yAt, K -+ K / b . The value of the threshold is normalized to unity. It has been demonstrated [46] that this model shows for large coupling K (in the absence of noise) the typical excitation patterns of excitable media, i.e. rotating spiral waves or target waves, usually described in terms of reaction-diffusion equations with two species. In the presence of noise, the typical excitation patterns can still be observed, but they exhibit rough and wrinkly wave fronts a n d - depending on the noise l e v e l - more serious imperfections such as break up of wave fronts and collisions with noise-nucleated waves. The overall picture in the large coupling regime is the coexistence of multiple finite-size cells with coherent patterns. In Fig. 6, a sequence of snapshots is shown for K = 0.14, X = 0.1, cy2 = 0.016. Initially a strip of cells were excited. On one side of the strip the cells were refractory. Such an initial condition gives rise to a rotating spiral waves. The noise, however, makes the wave fronts wrinkly and nucleates waves at other sites that collide with the spiral wave. On a long time scale the spiral wave is organized and dominates the array. For weak coupling K, however, different p h e n o m e n a can be observed. To
t=O
t=20
t = 10 •
.........
"~ : ~::~'~ ' ~' "~, i '~! ii~ i....
t = 40
t=50
t=30 • ~
.. , .
i ~:~
•9 t=60
t=70
Fig. 6. Snapshots of evolving waves obtained from the model for K =0.14, ~ =0.1, cy2 = 0.1 and y = 10. The array has a size of 101 x 101 cells. The snapshots are taken at multiples of the time step At, i.e. at t = 0, t = 10At, t = 20At,... Initially (t = 0), a strip of excited cells (black) is attached by a refractory layer (gray). In all snapshots, the black dots denote firing cells and the gray dots refractory cells.
336
P. Jung et al.
m a i n t a i n a firing pattern, the coupling K has to exceed a critical value K0 which is a p p r o x i m a t e d for small 9~ by [13]
K0(v)
(8)
exp(-~.) + exp(~,) ~,~=2 exp( - Z n 2 - n~,)
In the subexcitable reoime, where K < K0, waves c a n n o t be m a i n t a i n e d in the absence o f fluctuations. F o r K < K0, one can observe a novel kind o f p a t t e r n that is best described as a hybrid o f avalanches and nonlinear waves. Initially each cell o f the array is in the quiescent state a n d only fluctuations drive the system. Initially the cells become excited in an u n o r g a n i z e d fashion. At later times (and this time depends on the initial conditions) a r e m a r k a b l e r e o r g a n i z a t i o n of the firing events sets in across the array, which manifests itself by the f o r m a t i o n o f s p a t i o t e m p o r a l l y synchronized activity. A sequence o f s n a p s h o t s o f an array o f size 101 • 101 is s h o w n in Fig. 7 for 7 = 1 and K = 0.08 < K0 ~ 0.09. The u p p e r left panel o f Fig. 7 shows the initial condition we have been using, i.e. the all-quiescent array. In the lower left panel o f Fig. 7, one can observe the s p o n t a n e o u s f o r m a t i o n o f a curved wave front, which resembles the cores o f a spiral wave. The waves are a p p e a r i n g spontaneously. They collide with each other to form a n e t w o r k o f wave fronts or d i s a p p e a r spontaneously. O f i m p o r t a n c e is the o b s e r v a t i o n that it takes some transient time before the unor-
t- 0
t - 300
t- 200 i
|
..
.....
]
i I
|
imi
i
.
. .
~..
9
. . . . .
9 ,
. .
9
t- 400
t - 500'
t-
600
Fig. 7. Snapshots of thermal waves are shown in chronological sequence (from upper left to lower right). The size of the array is 401 x 401. The parameters are K -- 0.08 < K0 ~ 0.09, ~; = 0.1 and r 2 = 0.16. The first frame (t = 0) shows the initial condition. In the subsequent frames snapshots of the array at times t = 100At, t = 200At, t = 300At, t = 400At and t = 500At are shown. Black dots correspond to cells in the excited state. Gray does indicate refractory cells.
337
Statistical analysis and modeling of calcium waves
ganized r a n d o m activity is replaced by the organized spatiotemporal behavior. During this transient time, a critical background of vii(t) is building up. This is demonstrated in Fig. 8, where we show the concentration vij(t) corresponding to the snapshots in Fig. 7 encoded in gray scale. For the thermal waves to appear, a certain critical gray-scale distribution has to be reached. As a matter of fact the first thermal waves appear where the concentration v is the largest. If compared to the calcium waves, this would correspond to a buildup of IP3 all over the culture. In the glial cell cultures this is being done externally by perfusing neurotransmitter. These waves have been coined thermal waves [13] since they are sustained only by a thermal environment. They occur when the variance of the noise exceeds a critical level depending on the dissipation rate 7. In Fig. 9, a comparison between subsequent snapshots of calcium waves and a selected sequence of thermal waves is shown.
4. Space-time cluster analysis Spatiotemporal patterns with noise, especially thermal patterns discussed in the previous section cannot be easily classified as a spiral wave or a target wave. These waves have a statistical component that calls for a statistical method in describing
t = 0
i
t = 200
t = 300
illl
ii!!iiiii!i!i!i!!!ii!!ilili}
~,::;;':. :~;~i .f:.:..":;? .~::~,i~:.+~. . -.t . .....:.:.~:~!?~:~.;;:~i-~.::;:~.' ::~:!,
t = 400
t=
500
t = 600
Fig. 8. The background activities vii corresponding to the snapshots of the thermal waves in Fig. 7 are shown in chronological sequence. The parameters are the same in Fig. 7. The snapshots are directly corresponding. The gray scale encodes background activity in the following manner. At the threshold, the grey level is set to 225 corresponding to black. For negative background v, the gray level is set 0 corresponding to white. The background v between zero and threshold (i.e. unity) is transformed to a gray scale according to 225 9v4. The power of four emphasizes close to threshold activity.
338
P. Jung et al.
= 9 m J~ll'
II ii
9
9 "~. 9
9 9
9
Ili=,
1 9
. "" 11=1
9 9
,~r:.'" 9. - . .
i~@i.
9 I
L5
ii
9
9
..~[
~ F ' ~
,F 9
9
.1 9
;..;. .
"=
=
==
-.
~,"
9
9
'."9 ~ . . .
9,...
9
l=
;P
9
L=
'I,1= =i
.'.:_IT'.'.-.."" -p-.
'= l l i ~ ,
iii
|
~ ~..~;,
iT
9
"=
~.~ Ill
.:
9
9
m 9 9 9
ii.
9
=rJN, 1~1=
".'~
9-" ' . "
....
: ~",.. .... +.,~,~;~;,~;, =-.~. ,,,,~,.-. ~.' .';d4-~;-~,..
"~.. I
..
,,
=if
,.
==
.~e'~!~.
Fig. 9. Snapshots of evolving patterns produced by the numerical simulation (upper panel) are compared with snapshots of calcium activity in culture of hippocampal astrocytes of rat (lower panel). The simulations are performed with an array of 101 cells with parameter values K = 0.08, ? = 1, k -- 0.1 and cr2 = 0.16. The snapshots of the simulations have been taken every 8 time steps. The snapshots of the culture have been taken every 10 s. them. Comparison between theory and experiment for the very same reason has to be done based on a statistical method. The development of a statistical description of patterns is an active field of research, particularly in the context of spatiotemporal chaos [47,48]. We utilize an event-based method [13,18] that decomposes the spatiotemporal evolution into space-time clusters. The method works as follows: First, we stack a temporal sequence of Nt snapshots of the array, taken at times t, = nat to obtain a large space-time cube, which carries all the spatiotemporal information within the time interval NtAt. In the second step, we draw a small cube around each excited cell with a spatial side-length ds and a temporal side-length dt. The temporal and spatial side lengths can be varied to analyze the system on different length and time scales. In this paper, we use for the temporal side length dt the elementary time step At and for the spatial side length ds the lattice constant a, i.e. only the nearest neighbors produce overlapping small cubes. Overlapping small cubes in the time-forward direction form objects which we have termed coherent space-time clusters. Collisions of waves need to be treated with a set of rules: 9 When two waves merge, they can propagate after the collision as one wave. In this case, the younger cluster is terminated at the merging point and the outgoing wave is counted to the cluster of the older incoming wave. This rule separates off small wave fragments that merge into a larger wave. 9 When two waves merge, they can branch off following the collision and propagate as separate waves. In this case, the younger cluster is terminated at the merging point and both outgoing waves count to the cluster of the older incoming wave. Two branching structures occur after collision of two open waves (wave fragments). Thus the reason for two instead of one outgoing wave lies in earlier cluster events.
Statistical analysis and modeling of calcium waves
339
9 Local propagation failure can cause the breakup of a wave into two fragments. Since both fragments originate from the same mother wave, they count as one cluster. This rule avoids, e.g., artificial additional clusters when a wave collides with a boundary of the system. Although the wave fractures, all the propagating pieces count as one cluster. The lifetime of a cluster is limited by collision with another cluster or spontaneous disappearance. The size s of a coherent cluster is defined as its space-time volume (in units of the small cubes) throughout its entire lifetime. A first characterization of the patterns is the histogram of cluster sizes n(s), i.e. the number of clusters of size s. The cluster size distribution is obtained after normalization as 1
p(s) - -~ n(s) ,
(9)
with the normalization factor
Z -- ~
n(s).
(10)
n=l
In Fig. 10a, the space time plots of the time evolution of an array of size 101 with all clusters included are shown for K -- 0.14, X = 0.1, O "2 = 0.016 and ~/= 1 is shown. A sequence of snapshots of this simulation is shown in Fig. 11. In Fig. 10b, the largest cluster extracted is shown. This cluster is essentially the rotating spiral structure visible in the snapshots in Fig. 11. The space time plot of two smaller target w a v e s - extracted by the cluster method is shown in Fig. 10c,d. In Fig. 12, the (not normalized) cluster size distribution of an extended simulation (larger system size, larger time interval) is shown. It shows a power-law behavior for small clusters but a slower decay for larger cluster sizes. For small cluster sizes (1-10 pixels), the cluster size distribution follows a power law. This is indicative of randomness in the structure due to noise triggered cluster-letts or due to spatiotemporal chaos [49]. There is one dominating huge cluster that forms the backbone of the temporal evolution of the array. Such a mother cluster is typical for systems above propagation threshold and small to moderate noise. For thermal waves below propagation threshold, i.e. K < K0, the situation is different. A typical cluster size distribution is shown in Fig. 13. The regime where the cluster size distribution follows a power law is extended to large cluster sizes and the exponent of the power law is smaller. This indicates that the life and death of even larger clusters are controlled by stochastic events. The cluster size distribution does not weigh the clusters with respect to their relevance for the entire spatiotemporal dynamics. For systems above propagation threshold, the mother cluster is clearly dominating and should weigh according to its size. One way to accomplish this, is to weigh the clusters by their relative contribution to the entire pattern. The groups of all clusters of size s cover the volumes n(s)s. Normalizing this coverage to the entire volume of the pattern, i.e. Vtot = n(1) + 2n(2) + 3n(3) + - . - we obtain the relative coverage distribution
P. Jung et al.
340
~[[
b
" ' ' : '"5; :; " =
4o
-
-4o
~
~
"..
40
- .--.:-.-..~:.:t~-~..-.,~,',: :"
'! '
"
" '~ "~
....
:. 7 .~ .
.
.
.
I-
~ ~ 7 "
~ ~ " - -
" : ";': :. "-": ~ ':
~
.... . : . ~ - : - : ' - ~ : - . ~
40
";:':"i~-:~'''" .....
.....: :
"' """"-."-
": 5-
....
""
40
40
20
20
0
0 -
10 ~
_
_
_/
--~o~:~.~~ time
35 40 45
space
~~~ 50
-
--~o~_-r-~_.
-40
time
/_20 40 45
50
space
-40
,: : ;:.
40
40
20 0 -20
-20
-40
-40
40
40 20
20 0
e
e
time
35 40 45
50
-40
~
Fig. 10. Space-time plot of a simulation for K =0.14, X : 0 . 1 , c ~ 2 --0.3 and 7 = 10 is shown. In (a), all clusters are included. In (b) the space-time plot of the largest cluster is shown. In (c) and (d) space-time plots of two extracted target waves are shown.
sn(s)
(11)
,)(~) -- G~, ~',,(~') "
The coverage distribution describes the probability that an excited cell, arbitrarily picked at any time during the time interval where clusters are recorded, belongs to a space-time cluster of size s. A measure for the uncertainty of membership in a cluster of a certain length is thus the cluster entropy [49] S - - Z s vslnvs - ln(s)
--(-~~spsln(sps). 1
(12)
Stat&tical analysis and modeling of calcium waves
t-O
.
,
.?
t - 10
.
341
t - 20
I
t - 25
;ii
":
i!
...'...
1#~~%: ~~ ~ t - 30
l,...~ .. ~1
/0
t - 35
~-
~L ..... !
I
t - 40
~jll." "
t - 50
Fig. 11. Snapshots of a simulation of an array of size 101 • 101 is shown for K = 0.14, k = 0.1, cy2= 0.3 and Y = 10. The initial condition (a) is an excited strip (black) with a refractory layer attached. The snapshots are taken at t = l OAt, t=20At, t=25At, t = 30At, t = 35At, t = 40At and t = 50At. Excited cells are marked black, and refractory cells are marked gray.
(a)
l0 s
i0
[n
(b) l o-
4
08-'--'0602
10 3
v
1o 2
04-
101
02-
i0
0
.~A,-
00 i0
0
i0
1
i0
2
i0
3
i0
S
4
1
05
106 i0
7
.
.
.
9 i0
0
i0
.
i 1
i0
i 2
i0
I 3
S
i0
I 4
i0
I 5
i0
J 6
i0
7
Fig. 12. Cluster size distribution (a) of the temporal evolution of the array size 201 • 201. The parameters are K = 0.14, % -- 0.1, c~2 = 0.3 and Y = 1. The initial condition is an excited strip (black) with a refractory layer attached. The clusters have been collected over a time interval of 1000At. The coverage distribution (b) is low at small clusters and dominated by one huge cluster. Correspondingly, the cluster entropy is small S = 0.31.
T h e e n t r o p y v a n i s h e s if the excited sites b e l o n g to o n l y o n e cluster class Cs (Vs - 1), e.g., the s p a t i o t e m p o r a l p a t t e r n consists o f e i t h e r a single cluster o r clusters o f o n l y o n e size. A v a n i s h i n g e n t r o p y d o e s n o t n e c e s s a r i l y r e q u i r e a ' s i m p l e ' initial state b u t can also arise f r o m c o m p l e x initial c o n d i t i o n s , such as m a n y spiral w a v e s
342
P. Jung et al.
(b)O.2O 10
10
4
0.15
3
U] "~ 0.i0 >
ol ...... 10
10
10
2
0.05
1
o
0.00 i0 ~
101
102 s
103
104
i0 ~
101
102
S
103
104
Fig. 13. Cluster size distribution (a) obtained from simulations of an array of size 201 • 201 at the parameters K = 0.08 < K0 ~ 0.09, Y -- 1, k - 0.1 and (5 .2 ~ 0.18. The clusters are tracked for approximately 1000 time steps At. The initial condition consists of an excited strip of cells attached by a refractory strip. The coverage distribution (b) is dominated by a wider range of clusters in comparision to Fig. 12. The cluster entropy is accordingly larger (S = 4.1). with randomly selected cores, if an inherent process for generating new cluster sources is lacking. The collision and merging of uncorrelated clusters then occurs over a transient time to yield one surviving cluster. The artificial fracturing of waves at boundaries does not affect the entropy, since the fractured waves belong to the same coherent parent structure. The entropy reflects the generation of new clusters due to incoherent processes such as fluctuations or spatiotemporal chaos. In Fig. 14, we show the entropy as a function of the coupling K. At the propagation threshold, the entropy assumes a maximum. It falls off rapidly for increasing K since eventually the mother cluster dominates the pattern. It falls off for decreasing K since the probability of formation and survival of large clusters becomes smaller. In the next section, we perform the space-time cluster analysis of calcium waves in cultures of astrocytes of rat brain and epileptic human tissue. 5. Calcium waves analysis 5.1. Healthy rat hippocampal astrocytes
In this section, we first analyze the calcium wave patterns obtained from cultures of hippocampal astrocytes of rat brain. The size of the culture under investigation includes about 500 astrocytes. First the cells are loaded with calcium-sensitive fluoroprobe, Fluo-3M. Minutes after kainate is perfused homogeneously over the culture, calcium waves all across the specimen can be observed. The fluorescent activity reflects calcium concentration in the astrocytes. Snapshots of the calcium activity have been taken in intervals of 1.7 s (see Fig. 15a). In Fig. 15b, the calcium activity is shown after subtraction of the first snapshot (background).
Statistical analysis and modeling of calcium waves
(a)
343
(b)
4--
5-
4-
3-
~032211--
._..4 0.06
I
I
I
I
0.08
0.i0
0.12
0.14
K
0 0.i
[
l
l
l
0.2
0.3
0.4
0.5
~2
Fig. 14. The cluster entropy is shown in (a) as a function of the coupling parameter K for 7 = 1, ~ = 0.1 and cr2 - 0.16. It shows maximum at the critical value of the coupling K0 where propagation failure occurs in the absence of noise. In (b) the cluster entropy is shown as a function of the variance of the noise ~2. The maximum occurs around values where the patterns appear most coherent by visual inspection.
Although the figures indicate a wave-like propagation of calcium activity, it does not reveal the wave front clearly since the overall fluorescence increases as the wave propagates. In order to visualize the wave more clearly, we choose to show the positive gradient of the calcium activity (Fig. 15c). The difference between the calcium activity of a frame and its previous frame, i.e. the increase of calcium activity is shown. The wave fronts (dark) are clearly visible. In later graphs, we have generalized this way of visualizing the calcium waves by subtracting earlier reference frames in order to see a larger part of the entire calcium wave. Applying the cluster method described in the previous section to these subtracted waves, yields a cluster size distribution shown in Fig. 16. In order to focus on the gross features of the calcium wave, we reconstruct the waves by recombining the clusters after applying a spatiotemporal size filter. The filter function is a Heaviside function H ( s - s o ) H ( s l - s) which limits the bandwidth of clusters taken into account to those with sizes in the interval [so, Sl]. Chopping off the small clusters so = 100, yields a reconstruction of the wave overlaid the original frames (in red) shown in Fig. 17. The reconstruction clearly identifies the wave fronts and their partially disconnected pieces. The wave fronts cover many cells that are interacting cooperatively across the cell culture. 5.2. Cultures o f h u m a n nervous tissue
Tissue culture methods for h u m a n neurosurgical specimens have evolved in the Viatech Imaging laboratory [21,22]. Inclusion of protease inhibitors in the transfer media has provided an effective preservation solution for the transport of epileptic specimens to the laboratory for culturing. Specimens are obtained from patients at
344
P. .lung et al.
t+17s i "I
~
.......
.......... .................
, m
~ ............
'i
~'Ak.
,~
a
:d
b
j, 4tr -
4
.
' A
C ~__ i
i
i
ii
i ii
~
iiiii
iii
iii
..............................
Fig. 15. Two consecutive snapshots of the flourescent activity (upper panel), the corresponding background-subtracted frames (middle panel) and the sliding reference subtracted frames are shown. Darker color indicates higher calcium concentration. Children's Medical Center, Department of Pediatric Neurology, Cincinnati, Ohio during the course of neurosurgical resection for intractable epilepsy. Electroencephalographic (EEG) recordings of the tissue regions are established in the neu-
Statistical analysis and modeling of calcium waves
345
l0 s A
104
m
0 12 --$ 0 i0
\
1o 2
3
0 08
>
o 06 0 04
i01
0 o2
i0 ~
/ I
i0
0
101
I
i0
.... I
2
S
i0
"
3
'[
1
04
AA
I
I0
0
5
O0 I0
0
i0
i
i0
2
S
i0
3
I0
4
Fig. 16. The cluster size distribution (a) is shown for a sequence of 100 sliding-reference subtracted frames. The original frames (resolution 640 x 480) have been coarse grained to frames of resolution 166 x 160 by averaging the gray scales over areas of size 4 x 3 pixels. Subsequently the frames are subject to binary filtering. Calcium activity of less than 50% of the average is considered not calcium active (0). The other sites are considered calcium active (1). Then the frames are stacked in time and clusters decomposition is applied. The histogram of cluster sizes exhibits for small clusters a power law with an exponential cut-off and a slower decay for larger clusters that form the backbone of the calcium pattern. The coverage distribution v(s) has a peak at small clusters and another one at large clusters. This broad distribution of relevant cluster sizes causes the cluster entropy to be relatively large (S - 4.1). rosurgery operating room. Hyperexcitable vs. normal EEG activities are recorded for the tissues that had been characterized in the imaging experiments. Viable cultures which exhibit behaviors characteristic of the diseased tissues can be produced even after two days in transit. Cells cultured from regions of hyperexcitable EEG activity show abnormal responses to neurotransmitter. Cultures have been established for several forms of epilepsy including medial temporal lobe disease, cortical tumors, hyperexcitable regions of the cortex and for the purpose of these studies we have also cultured a Tuberous Sclerosis (TS) case. The Children's Medical Center in Cincinnati supports a clinic dedicated to the treatment of this inherited disorder [50], which is characterized by mental retardation, epilepsy and skin manifestations. So called giant cell astrocytomas may develop within the brain. A tumorous lesion within the cortex of a 17 month old patient was cultured following surgical resection to alleviate epileptiform seizures. Subependymal giant cells are apparent in these cultures (see Fig. 18). The giant cells generally reside on top of a normal appearing astrocyte layer. One such giant cell is localized within an asterix in the left panel of Fig. 20A. Extensive process formation and a large nucleolus, which has stained heavily with the Ca 2+ dye, characterize this cell. These cells are common in these cultures. A second giant cell (asterix in Fig. 20B) has well-developed projections to neighboring cells that are reminiscent of the end feet of normal astrocytes. This cell however is not integrated
346
P. Jung et al.
a
.~
b
c
.~
d
e I~
~l,
~
,~
"
!
--"
*
#I~L,. ~:-:~:~
g
:-
~
h ..% ,, ', ,,~ ~
,~ ,,
Fig. 17. The reconstructed waves (red) after applying a space-time filtering that ignored space-time clusters of size less than 100 is overlaid on the original frames (both coarse-grained to a resolution of 160 • 160). Time proceeds from (a) to (i). The reconstruction based on the cluster decomposition allows to identify coherently connected calcium activity and to eliminate background activity without affecting the larger structures.
into the astrocyte syncytia that sits on top of. Neurons show extensive changes in morphology (dysmorphism) including extensive small processes as seen in the neuron below the giant cell in Fig. 18B. The neuron to the right of the giant cell in this panel exhibits an unusual cell body as well. In contrast, the neuron at the top in this panel has a normal appearance with a rounded cell body and long projection off to the right. Numerous immunohistochemial studies of the giant cells have shown they contain astrocyte as well as neuronal markers [51].
Statistical analysis and modeling of calcium waves
347
Fig. 18. A two-year-old patient underwent neurosurgery for treatment of intractable seizures associated with tuberous sclerosis. Surgical specimens from the posterior temporal lobe were cultured [22] and imaged using confocal microscopy. Fig. A shows relatively normal astrocytes (a) with a reactive astrocyte (*) on the upper surface of the astrocyte syncytium. In Fig. B several neurons and a second reactive astrocyte (*) are also superficial in the culture. The reactive astrocytes share the morphology of both neurons and astroytes (Taraszewska et al. 1997). Extensive process formation (See Fig. A), or elaborate end feet (see Fig. B) are characteristic of reactive astrocytes. The frequent polymorphic astrocytes in the culture are laden with eosinophillic granules. In addition, neurons can exhibit normal morphology (upper left, Fig. B) or varying degrees of abnormal development (B).
,.
9
,
.::
2-
9
o,
.~, ,
Fig. 19. A sequence of subtracted snapshots of calcium activity of a culture of astrocytes from TS-tissue is shown from left to right. The time interval between two snapshots is 1.7 s. Higher calcium activity is coded in larger gray scale. One can observe local islands of calcium activity but no large-scale organization.
Often giant cells respond to neurotransmitter exposure with an increase in intracellular Ca 2+, however a significant number of these cells do not respond at all to neurotransmitter.
348
P. Jung et al.
.i,L
"
:
-
..,
.
,
9 .
,.
,
2
9.
.
:
.~ .
..
,.
-,
.
9
.
,
9 [..
9
,.
, 9
.,.:
.
,-.
.
,o
.
..
9
Q
-
,,
.
.~.~
,.
9
-..
-.
,p. 9 9
.
.
..
-,
.
.
,
..., ..
.
~
.~
...
:.
Fig. 20. A sequence of subtracted snapshots of a calcium activity of a culture of astrocytes from TS-issue is shown from left top to right bottom. The time interval between two snapshots is 1.7 s. Higher calcium activity is coded in larger gray scale. One can observe two dramatic events of a large-scale organization of calcium activity in the form of cylindrical waves.
Statistical analysis and modeling of calcium waves
349
I0 s
0.30
104
9
0 25
,,., v C 102
>
0 15 0 10
10 ~ 0 o5 100
J 100
9149
I
I
I
I
101
102
103
104
S
0 O0 lO ~
lO 1
10 2
s
10 3
10 4
Fig. 21. The cluster size distribution (a) obtained from the TS data at 40 ~m Kainate is shown. All the clusters larger than s = 1000 belong to the rare of dramatic wave events. In comparision with Fig. 16, much less larger clusters are observed indicating mostly local astrocyte signaling. The coverage distribution v(s) (b) differs dramatically from that obtained from the rat-brain tissue. The relative coverage by smaller clusters is almost twice as high. The coverage by large clusters is smaller. If one ignores these rare dramatic wave events, these results imply that the incoherent activity in the epileptic case is higher as in the normal case.
Multiple identical coverslips (n = 32 coverslips) were cultured with the TS case cells described above. All coverslips were treated identically, i.e. they were rinsed with the same saline, stained with the same Ca 2+ dye (2 ~tm Fluo-3AM), imaged within a 48 h period of each other using an identical imaging paradigm (25 frames in saline, 400 frames in neurotransmitter, followed by 50 frames washout) and exposed to equivalent neurotransmitter solutions. Images were collected to the hard drive of our confocal scanning laser microscope (Nikon PCM2000) every 1.7 s. These images were then exported to a writable CD drive. In Fig. 19 a sequence of snapshots is shown. We have applied subtraction of a sliding reference frame (3 frames gap). For most of the time, calcium activity includes only a few cells. This pattern, however, is interrupted by drastic events where one or two waves involving virtually all cells on the coverslip are elicited (see Fig. 20). We have not observed this behavior in cultures of the healthy rat hippocarnpal astrocytes. This observation indicates at least temporary strong coupling between the astrocytes in the culture that leads to a fast and strong wave. During these episodes, the cells in the entire culture appear synchronized in their calcium activity. After these events the synchrony is lost and the calcium activity is local. Applying the cluster decomposition, we correspondingly see a difference in the cluster size distribution compared to that of healthy ran brain cells (see Fig. 21). There are only very few large clusters, corresponding to the dramatic events when a calcium wave occupies the entire culture. The speed of the wave well exceeds the speed of the calcium wave fronts observed in the rat brain tissue in the previous
350
P. Jung et al.
section. This indicates a more rapid response of the diseased astrocytes in comparison with astrocytes from normal tissue. The cluster entropy can be calculated based on the normalized cluster size distribution using Eq. (12). During typical time intervals with no dramatic wave events, the cluster entropy of the epileptic culture was about S = 2.1 and well below the entropy of the cluster entropy of the healthy rat brain cultures (S = 4.5). Including the rare wave events, the entropy of the epileptic culture increased to about S = 3.3, which is still well below the cluster entropy of the rat brain culture. 6. Conclusions
Calcium waves in glial-cell cultures exhibit many features of excitable waves in a noisy environment. We have reviewed previous work on waves in excitable media coupled to a noisy environment. A cluster decomposition method to quantitatively characterize noisy spatiotemporal patterns has been reviewed and applied to computer models as well as to calcium waves in cultured astrocytes stemming from normal and epileptic tissue. The cluster decomposition allows to assign thermodynamic properties such as a cluster entropy to the patterns. Preliminary results indicate that cluster entropy is lower in epileptic than in healthy cell cultures.
Acknowledgements This work has been supported by the National Science Foundation (grant # 0078055) and by the National Institute of Environmental Health (grant # IS08470).
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Jung, P. and G. Mayer-Kress (1995) Phys. Rev. Lett. 74, 2130-2133. Cross, M.C. and Hohenberg, P.C. (1993) Rev. Mod. Phys. 65 851-1112. Winfree, A.T. (1974) Sci. Amer. 230, 82-95. B~ir, M. and Eiswirth, M. (1993) Phys. Rev. E 48, R1635. Lechleiter, J., Girard, S., Clapham, D. and Peralta, E. (1991) Nature 350, 505-508. Charles, A.C., Merrill, J.E., Dirkson, E.R. and Sanderson, M.J. (1991) Neuron 6, 983-992. Cornell-Bell, A.H., Finkbeiner, S.M., Cooper, M.S. and Smith, S.J. (1990) Science 47, 470--473. Sanderson, M.J., Charles, A.C. and Dirkson, E.R. (1990) Cell Regulation 1, 585-596. Guthrie, P.B., Knappenberger, J., Segal, M., Bennett, M.V.L., Charles A.C. and Kater, S.B. (1999) J. Neurosci. 19, 520-528. Kadar, S., Wang, J. and Showalter, K. (1998) Nature 391, 770-772. Sneyd, J., Keizer, J. and Anderson, M.J. (1995) FASEB J. 9, 1463-1472. Jung, P., Cornell-Bell, A.H., Madden, K. and Moss, F.J. Neurophysiol. (1998) 79, 1098-1101. Jung, P. (1997) Phys. Rev. Lett. 78, 1723-1726. Gammaitoni, L., Hanggi, P., Jung, P. and Marchesoni, F. (1998) Rev. Mod. Phys. 70, 223-287. Doughlas, J.K., Wilkens, L., Pantazelou, E., and Moss, F. (1993) Nature 365, 337-340. Levin, J.E. and Miller, J.P. (1996) Nature 380, 165-168. Gluckman, Bruce J., So, Paul, Netoff, Theoden I., Spano, Mark L. and Schiff, Steven J. (1998) Chaos 8 588-598.
Statistical analysis and modeling of calcium waves
351
18. Jung, P., Cornell-Bell, A.H., Kadar, S., Wang, J., Showalter, K. and Moss, F. (1998) Chaos 8, 567-575. 19. Cornell-Bell, A.H. and Finkbeiner, S.M. (1991) Ca 2+ waves in astrocytes, Cell Calcium 12, 185-195. 20. Finkbeiner, S.M. (1993) Glia 9, 83-104. 21. Cornell-Bell, A.H. and Willimson, A. (1993) in: Biology and Pathology of Astrocyte-Neuron Interactions, eds Federoff et al. pp. 51-65, Plenum Press, New York. 22. Lee, S.H., Magge, S., Spencer, D.D., Sontheimer, H. and Cornell-Bell, A.H. (1995) Glia 15, 195-202. 23. Bordey, A. and Sontheimer, H. (1998) Epilepsy Res. 32 286-303. 24. During, M.J. and Spence, D.D. (1993) Lancet 341, 1607-1610. 25. Scheyer, R.D. (1998) in: Progress in Brain Research, eds. O.P. Ottersen, I.A. Langmeon, and L. Gjerstad. Vol. 116, pp. 359-369. Elsevier, New York. 26. O'Connor, E.R., Pizzonia, J.H., Spencer, D.D., de Lanerolle, N.C. (1996) Epilepsia 37, (Suppl. 5), 51. 27. O'Connor, E.R., Sontheimer, H., Spencer, D.D., de Lanerolle, N.C. (1998) Epilepsia 39, 347-354. 28. Scheffer, I.E. and Berkovich, S.F. (1997) Brain 120, 479-490. 29. Wallace, R.H., Wang, D.W., Singh, R., Scheffer, I.E., et al. (1998) Nat. Genet. 19, 366-370. 30. Ronen, G.M., Rosales, T.O., Connolly, M. et al. (1993) Neurology 43, 1355-1360. 31. Singh, N.A., Charlier, C., Staufer, D., et al. (1998) Nat. Genet. 18, 25-29. 32. Biervert, C., Schroeder, B., Kubisch, C., et al. (1998) Science 279, 403-406. 33. Goldbeter, A., Dupont, G. and Berridge, M.J. (1990) Proc. Nat. Acad. Sci. USA. 87, 1461-1465. 34. Goldbeter, A. (1996) Biochemical Oscillations and Cellular Rhytms: The Moleculare Bases of Periodic and Chaotic Behaviour. Cambridge University Press, Cambridge. 35. Lechleiter, J. and Clapham, D. (1992) Cell 69, 283-294. 36. Sanderson, M.J., Charles, A.C., Boitano, S. and Dirksen, E.R. (1994) Molec. Cell Endocrinol. 98, 173-187. 37. Kim, W.T., Rioult, M.G. and Cornell-Bell A.H. (1994), Glia 11, 173-184. 38. Boitana, S., Dirkse, E.R. and Sanderson, M.J. (1992) Science 258, 292-295. 39. Muller, T., Moller, T., Berger, T., Schnitzer, H. and Kettermann, H. (1992) Science 256, 1563-1566. 40. Burnashev, N., Khodorova, A., Jonas, J., Helm, J., Wisden, W., Monyer, H., Seeburg, P.H. and Skemann, B. (1992) Science 256, 1556-1570. 41. Egebjerg, J. and Heinemann, S.F. (1993) Proc. Natl. Acad. Sci. 90 755-759. 42. Goldman, W.F., Yarowsky, P.J., Juhaszova, M., Krueger, B.K. and Blaustein, M.P. (1994) J. Neurosci. 14, 5834-5843. 43. Golovina, V.A., Bambrick, L.L., Tarowsky, P.J., Krueger, B.K. and Blaustein, M.P. (1996) Glia 16, 296-305. 44. Fohlmeister, C., Gerstner, W., Ritz, R. and Van Hemmen, J.L. (1995) Neural Computation 7, 905-914. Myocardium. 45. Kistler, W.M., Seitz, R. and Van Hemmen, J.L. (1997) Physica D 114, 273-295. 46. Jung, P. and Mayer-Kress, G. (1995) CHAOS 5, 458 (1995). 47. Zoldi, S.M. and Greenside H.S. (1997) Phys. Rev. Lett. 78, 1687. 48. Zoldi, S.M., Liu, J., Bajaj, K.M.S., Greenside, H.S. and Ahlers, G. (1998) Phys. Rev. E58, R6903. 49. Jung, P., Wang, J. and Wackerbauer, R., Preprint. 50. Franz, D.N. (1998) Seminars in Pediatric Neurology 5, 253-268. 51. Hirose, T., Scheithauer, B.W., Lopes, M.B.S., Gerber, H.A., Altermatt, H.J., Hukee, M.J., Vandenberg, S.R. and Charlesworth, J.C. (1995) Acta Neuropathologica 90, 387-399.
This Page Intentionally Left Blank
C H A P T E R 11
Neurones as Physical Objects" Structure, Dynamics and Function C. M E U N I E R Laboratoire de Neurophysique et Physiologie du Syst~me moteur (EP 1848 CNRS), UniversitO RenO Descartes, 75270 Paris cedex 06, France
9 2001 Elsevier Science B.V. All rights reserved
I. S E G E V Department of Neurobiology, Institute of Life Sciences, and Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
353
Contents
1.
Introduction ............................................................
355
1.1. W h a t e x p e r i m e n t s tell us a b o u t n e u r o n e s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
355
1.2. A n overview of theoretical a p p r o a c h e s 1.3. O r g a n i z a t i o n o f the p r e s e n t c h a p t e r 2.
....................................
......................................
N e u r o n a l excitability: dealing w i t h time scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.
T h e e l u c i d a t i o n o f n e u r o n a l excitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. N e u r o n e s as n o n - l i n e a r systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. 3.
Discussion
.........................................................
Conduction and distribution of action potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.
Action potential conduction
3.2.
Spike c o n d u c t i o n failures
............................................
..............................................
3.3. P r e s y n a p t i c i n h i b i t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. 4.
5.
Discussion
.........................................................
Dendrites and synaptic integration
...........................................
359 363 364 364 370 384 396 396 400 405 410 412
4.1.
H o w do n e u r o n e s utilize dendrites? A l o n g s t a n d i n g q u e s t i o n . . . . . . . . . . . . . . . . . . . .
412
4.2.
Passive cable t h e o r y
416
4.3.
T h e synaptic s h u n t
4.4.
Non-linear membrane properties .........................................
428
4.5.
D e n d r i t i c spines: a m i c r o c o s m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
434
4.6.
Discussion
.................................................. ...................................................
.........................................................
422
438
Conclusion .............................................................
446
Acknowledgements .......................................................
448
Appendix: The H o d g k i n - H u x l e y model
448
.......................................
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
354
452
I. Introduction 1.1. What experiments tell us about neurones 1.1.1. The classical view o f the neurone
If one looks at pictures of neurones in classical textbooks, one always encounters the same stereotyped view of an exquisite branching structure composed of three morphologically distinct regions: (i) The dendritic tree, (ii) the soma thinning down into the axone hillock, and (iii) the extended axonal tree (see Fig. 1). Most textbooks also assign to each of these three regions a specific role. The dendritic tree is described as a collector of the massive synaptic input that the neurone receives from many other neurones. Synaptic currents are "integrated" in this dendritic tree and in the soma, which leads to the depolarization of the initial segment of the axone. There, if this depolarization is large enough, isolated action potentials or spike trains are initiated. The pattern of action potentials in a train is controlled by a set of voltage-dependent conductances located at the somatic membrane and activated already at potentials below the firing threshold. These spike trains then propagate faithfully along the axone and provoke transmitter release at terminal and en passant synaptic boutons, which leads to post-synaptic events in target cells. The neurone is thus described as a deterministic and "dynamically polarized" device [1,2] that reliably converts the synaptic inputs it receives into a spike train delivered to all the neurones it contacts. This vision, originated in Ramon y Cajal's histological observations and impressive intuition of the workings of neurones (see Fig. 2), was progressively substantiated by electrophysiology experiments. These experiments unravelled the voltage-dependent mechanisms that underlie non-decremental propagation in myelinated and unmyelinated axones [3,4], and elucidated, first on lumbar motoneurones [5,6] and later on different classes of central neurones, in particular hippocampal and neocortical [7] pyramids, thalamo-cortical and Purkinje cells, what were the respective roles of the axone and of the soma in the firing of action potentials and the patterning of the discharge [8]. 1.1.2. Alternative ideas
Over the years many interesting ideas were nonetheless proposed, that departed from the simple and attractive view of the neurone presented above or enriched it. For example, Llinfis and his collaborators suggested that calcium spikes could be fired locally in the dendrites of Purkinje cells, and emphasized the excitable properties of these dendrites [9]. Eccles and coworkers [10] discussed the possibility that action potentials might invade the dendritic tree of motoneurones. Henneman speculated that the action potentials might suffer conduction failures, due to the electrotonic architecture of axones [11]. However all these ideas remained for a long 355
356
C. Meunier and L Segev
Fig. 1. Neurones are the basic building blocks of the nervous system. A pyramidal neurone from the cat visual cortex stained intra-cellularly with a dye and its three-dimensional shape was reconstructed. The axone (red) has a dense local arborization as well as lateral projections. The axone displays a total of 4105 synaptic boutons (yellow), most of which are known to form excitatory synapses on the dendritic trees of other pyramidal cells. The dendritic tree of this cell is shown in green. Courtesy J. Andersen and K. Martin. Calibration: 100 lam.
time unsubstantiated by clear direct experimental evidence. In particular it was impossible to record routinely from dendrites and information on their properties and electrical behaviour was inferred from recordings in the soma. As a consequence the nature of "synaptic integration", a concept introduced by Charles Sherrington [12,13], and its biophysical substrate remained obscure and the most widely accepted view was still that dendrites were passive structures onto which summation of synaptic potentials took place. 1.1.3. New experimental techniques This situation has gradually changed over the last 20 years. The development of new techniques - in particular infra-red D I C video microscopy [14] and i m m u n o l a b e l l i n g -
Neurones as physical objects." structure, dynamics and function
357
A /
11 ~, i'
u
i Fig. 2. Ramon y Cajal's law of dynamic polarization. This "law" states that the nerve impulse flows from the dendritic branches and the soma to the axone (arrows). Shown are the pattern of axo-dendritic connections between cells in the cerebral cortex and the detailed morphology of the dendritic branches (thicker lines) and dendritic spines (the many thin thorns emerging from the apical tufts - on top), the soma and the axonal tree (thinner smooth lines). Dendrites and soma receive input from many cells via terminal arborization of axone collaterals. made it progressively possible to measure in vitro the passive membrane properties and the density and kinetics of voltage-dependent channels at various locations on the neurone membrane, and not only at the soma. It is now becoming feasible to perform such measurements also in vivo [15,16]. One can also record in vitro and in vivo the local variations of the intra-cellular calcium concentration and the membrane potential of neurones at various sites, thanks to calcium and voltage sensitive dyes [17], and two photons microscopy [18,19,20]. Dynamic clamp technique allows experimentalists to investigate the effects of a given voltage-dependent current at the soma on the firing pattern of a neurone [21]. Recording simultaneously from a presynaptic cell and one or two of its target cells in hippocampal and neocortical slices is now possible yielding information about changes in synaptic dynamics during activity [22,23].
358
C. Meunier and L Segev
The results obtained with these new techniques show: 9 that the dendrites of many central neurones are electrically heterogeneous and endowed with a set of voltage-dependent ion channels (see [24,25] for recent reviews), and that these channels can shape the response of neurones to synaptic stimuli. For instance, it was directly demonstrated, both in vitro and in vivo, that full blown calcium spikes can be generated in the dendritic tree of central neur o n e s - such as neocortical pyramidal cells [16,135] - and that they may strongly affect the discharge pattern of these neurones. It was also shown in vitro that action potentials may invade the apical dendritic tree of pyramidal neurones not only because of passive propagation but also thanks to voltage-dependent membrane conductances [26]; 9 that the typical membrane properties, both passive and active, of the soma and dendrites are quite similar. The hypothesis of a large somatic shunt [27] was rejected on the basis of new conductance estimates relying on patch electrodes rather than sharp electrodes [28], and no evidence was found for a particularly high density of sodium channels on the soma and axone hillock [29]; 9 that the stochastic openings and closings of ion channels may impart some variability to the discharge of pyramidal neurones in vitro and affect their response to stimuli [30,101,102]; 9 that the axone does not necessarily convey the same output signal to all its neuronal targets; 9 that neurones are prone to slow modulations of their excitability (neuromodulation, activity-dependent regulations [31,32], circadian rhythms [33], etc.).
1.1.4. Experimental limitations
Still the way neurones behave in physiological (or pathophysiological) conditions in situ in a network remains elusive. How do they handle synaptic inputs in such conditions and respond to them? What is their pattern of activity and how does it depend on the task performed? Can we characterize in some physiologically relevant way the input-output properties of neurones in such conditions? Performing in vivo experiments to answer such questions is difficult but feasible to an extent. Spinal physiology mostly progressed through intra-cellular recordings of neurones in anaesthetized cats that enabled both to largely understand the membrane properties of motoneurones [6], and to unravel the functional connectivity of important spinal circuits. However, anaesthesia strongly reduced background synaptic activity and suppressed neuromodulation (so that motoneurones do not display plateau properties in such an in vivo preparation). In the cortex, intra-cellular recordings of neurones in vivo submitted to an intense synaptic bombardment can now be performed [34,35]. Some in vitro preparations, such as slices of cerebellar cortex [36], display spontaneous activity and can be used to study network activity but most in vitro experiments are mainly useful for investigating cellular properties: nature, kinetics and distribution of ionic channels on the neuronal membrane, voltage transients elicited by stimulation, etc. This can shed some light on possible mechanisms at
Neurones as physical objects." structure, dynamics and function
359
work at the cellular level in physiological conditions, and helps to identify some basic operating principles. But the functional identification of cells is difficult and extrapolating from in vitro preparations to physiological conditions is unwarranted. For instance, inferring from the response to a single test input what would happen in a physiological situation where numerous synapses are activated is made extremely difficult by the non-linearities of the neuronal membrane (mutual shunt of synaptic inputs, activation of voltage-dependent currents). This set of fundamental problems deserve our attention. Moreover one must address the issue of universality. Many different classes of neurones have been defined on the basis of their characteristic morphological properties or their discharge pattern. Neurones of a given class are often specific to a given nervous structure and they are endowed with the same set of voltage-dependent currents. To what extent can we identify operating principles shared by all these classes of neurones? May we rightfully speak of canonical neurones [37]? We know that single spike generation mechanisms are essentially the same in all neurones that emit spikes. But can we understand synaptic integration on the basis of a few general principles? Moreover the detailed morphology and the intrinsic properties of neurones widely vary even inside a given class [38,39]. How should we then model a neurone? Is it not the case that the behaviour of a model reflects the choice of a specific set of parameters rather than the general operation of neurones in a given class? In view of all these conceptual and technical difficulties, we doubt that unravelling the operating principles of neurones can be achieved by purely experimental approaches. As this may seem an unwarranted statement, let us then examine what lessons we may learn from the past.
1.2. An overview of theoretical approaches 1.2.1. Analytic theories Our present understanding of neurones is not grounded only on morphological and physiological experiments, but also on theoretical approaches. A genuine understanding of spike generation was reached only when Hodgkin and Huxley could write down differential equations that made the non-linear nature of actions potential explicit [3]. This "Hodgkin-Huxley model" does not constitute the first attempt for a mathematical description of neuronal excitability. Indeed a phenomenological model of neural firing was designed by Lapicque as early as 1907 [40]. More recently important progresses were made in understanding the role of subthreshold currents in such phenomena as bursting and post-inhibitory rebounds by relying on theoretical concepts (bifurcations, singular perturbation theory, etc.) and techniques initially developed for understanding dynamical systems with small number of degrees of freedom [41]. Similarly methods introduced to deal with extended dynamical systems may be used to better understand spike conduction in axones [42], while the impact of the random openings and closings of channels on cell excitability could be studied on stochastic variants of Hodgkin-Huxley like models [43].
360
C. Meunier and L Segev
The other important feature of neurones, their capability to "integrate" synaptic inputs, also benefited a lot from theoretical approaches. The first genuine progresses on that issue were due to W. Rall who developed the well-known "cable theory of the neurone" [44]. One may debate on how much of synaptic integration is explained by this linear theory of dendrites, especially now that it becomes clear that many voltage-dependent channels are embedded in the dendritic membrane. One might even consider that developing cable theory was more work than it was worth. We are not of that opinion. Cable theory provided the first solid ground for understanding how post-synaptic potentials would interact and spread in dendritic structures. It introduced extremely useful notions such as space and time constants, emphasized the notion of electrotonic architecture, gave important insights on what could happen in dendrites and made it possible to derive information on dendrites from intra-cellular recordings at the soma.
1.2.2. Neurophysics Much theoretical work on neurones was performed by researchers with a strong background in Physics, using concepts and methods from this scientific discipline. For so much, may we define a physics of neurones? Neurochemistry is recognized as a relatively autonomous branch of Neurosciences. Can we also legitimately speak of Neurophysics? It is widely accepted that the electrical behaviour of neurones is grounded in a specific form of microphysics, the biophysics of excitable membranes. However Hodgkin-Huxley theory [3] or cable theory [44] does not directly deal with such microscopic physical properties of the neuronal membrane as the properties of individual channels. They consider a more macroscopic level of description and address cooperative phenomena that involve the opening of many such passive or voltage-dependent channels: generation and conduction of the action potential, spread of synaptic potentials. Also they use a rigorous and operative mathematical description of phenomena taking place in neurones. They might therefore be considered as applications of mathematics. Still cable theory relies on hypotheses and equations that were introduced almost one century before to analyse the physical problem of electric conduction on telegraphic cables and propagation of a solitary wave in a non-linear medium is a general problem encountered under many forms in Physics. Therefore we think that we can genuinely address dendritic integration or spike conduction as physical problems, in as much as concepts and methods derived from Physics as well as Mathematics are used to investigate these issues. The multi-faceted relationship between Physics and Neurophysiology is better appreciated in a historical perspective. Neurophysiology arose as a science from the first attempts to describe in physical terms the nervous system activity and to unravel the underlying mechanisms. The first important step in this direction was the progressive elucidation of the electric nature of the nervous influx, starting with Galvani's works at the end of the 18th century [45] and parallel to the development of electromagnetism: existence of the "animal electricity", conduction velocity along nerves [46], biophysics of membranes [47]. Physics also provided the appropriate experimental tools for studying neural activity and supported in particular the development of electrophysiology. For instance, nervous influx could be definitively
Neurones as physical objects: structure, dynamics and function
361
assigned to the propagation of action potentials, the nature of which became amenable to investigation, only when high temporal resolution recordings enabled researchers to visualize the shape of these fast events. In France, this occurred only in the 1940s when Alfred Fessard introduced the oscilloscope in neurophysiology laboratories. Metaphors borrowed from Physics were also highly influential in Neurophysiology. In his "Treatise of Man" [48] Ren6 Descartes introduced a number of notions that still shape our way of thinking. Descartes idea that both perception and action relied on the exchange of physical signals between the central nervous system and the periphery (see Fig. 3) had a decisive influence on the development of Neurophysiology, though the electric phenomena taking place in nerves were far remote from his conceptions. Most importantly, his dualistic approach led him to consider the nervous system as a physical machine. This led to the idea that the mental operations of the brain were nothing else than the direct product of its mechanistic activity (see Fig. 4), emphasized one century later by the materialist Julien Offrieu de La Mettrie [49]. Descartes speculations marked neural mechanisms as legitimate objects of scientific investigations, thus signing the birth certificate of Neurophysiology. Much more recently Hopfield [50], capitalizing on an analogy with disordered magnetic materials known as spin glasses, introduced an enlightening and analytically tractable model that demonstrated that many activity patterns could be stored in a large network in a robust and distributed manner by Hebbian modifications of connection weights. This seminal work opened the door to the rigorous study of attractor neural networks and, more generally, of collective phenomena in large systems of neurones.
",
N~]
I
Fig. 3. Descartes, the founder of reflex theory? Descartes proposed in his first work on mind and body, De homine, a mechanism of automatic reaction to external events. This treatise was completed in Amsterdam in 1633 but Descartes renounced to publish it after Marin Mersenne wrote to him of Galileo's problems with the Inquisition. Consequently this treatise appeared only years after the death of Descartes.
362
C. Meunier and L Segev
Fig. 4. Vaucanson's automata. Automata exerted a great fascination throughout the 18th century. The most famous automata maker of this time is certainly Jacques de Vaucanson (1709-1782), who built the tabor player and the transverse flute player shown on this postcard (from Conservatoire National des Arts et M~tiers (Paris), http://www.cnam.fr/museum/ musica_mecanica/a/9lossaire/vaucanson.html).La Mettrie wrote his "Treatise on the Manmachine" in this historical context.
One may still be rightfully reluctant to consider neurones and systems of neurones as genuine physical objects. It is clear that biological systems present striking differences with most inanimate physical systems. For instance they present a historical dimension since they slowly evolve over time, develop from the embryo and undergo plastic changes, all this under the pressure of various environmental constraints. Still such a historical perspective can be encountered also in Physics, in connection with Astrophysics, Geophysics, or when investigating the evolution in time of the mechanic properties of materials, for instance. The major difference with Physics is probably that biological systems are not the mere material substrate of phenomena but perform specific and highly organized functions. Physics may contribute to the design of technological devices but it is badly equipped to deal with the functional properties of neuronal systems. Nonetheless the approaches, concepts and methods developed in Physics can be capitalized on to study the dynamics of neurones and neuronal systems, that is, the nature and evolution in time of the spatial patterns of activity of these structures. The purpose of Neurophysiology is to elucidate how neuronal systems perform their functions. A better understanding of how the dynamical states of activity, that are widely thought to constitute the immediate substrate of functions, are determined by the underlying structural properties may help to solve this structure/function problem.
Neurones as physical objects: structure, dynamics and function
363
1.3. Organization of the present chapter The present chapter is not a comprehensive review of theoretical works on the neurone. We shall focus on a limited range of Neurophysical problems, and refer little to Computational Neuroscience. As a consequence, such topics as, for instance, information theory will not be dealt with. Also we shall adopt a particular viewpoint on the nervous system where the emphasis is put on action potentials. We are perfectly aware that neurones do not communicate by the sole mediation of spikes and that addressing physiological functions in terms of spiking activity of neuronal populations is not the only way to go. Our current understanding of sleep, for instance, probably owes as much to Neurochemistry as to the study of activity patterns in the brain. Accordingly we shall analyse in what follows some exemplary situations where analytical theory proved eminently useful in the past (Hodgkin-Huxley theory of the action potential, classification of bursters, cable theory, etc.) and discuss open issues that should in our opinion benefit from a rigorous well-managed physically oriented treatment. We shall respect the categories introduced by Ram6n y Cajal [1]: collection of synaptic inputs, firing of action potentials, distribution of action potentials to the target cells, but shall present them in an order reflecting the nature of the non-linear dynamics involved and not signal flow as proposed by the "law of dynamic polarization". We shall discuss first in Section 2 the generation of spike trains, as this constitutes the best established and most dramatic consequence of membrane nonlinearities. In this section we shall put into a historical perspective the contributions of Hodgkin and Huxley [3] - to which the Appendix is also fully d e v o t e d - and shows how the excitability of axones and neurones can be studied in the framework of dynamical systems with few degrees of freedom (that is, without taking into account the fact that neurones are spatially extended structures) using such approaches as bifurcation theory and (multiparameter) singular perturbation theory. So doing we shall emphasize how the existence of several time scales underlie and constrain the behaviour of the neurone. In contrast we shall stress the importance of space scales in Section 3 that discusses the propagation of action potentials on the axone, an extended medium with highly non-linear membrane properties. We shall first discuss this problem in terms of diffusion-driven propagation of an excitability front in a homogeneous or heterogeneous metastable medium. We shall largely dedicate the rest of this section to the physiologically important issue of whether axones distribute the same spike trains to all their neuronal targets or whether selective filtering of these spike trains occurs due to passive membrane properties (electrotonic conduction failures [11]), activation of voltage-dependent currents [51], or activation of axo-axonic synapses [52] (presynaptic inhibition). Section 4 will address the difficult issue of the role of dendrites: What does synaptic integration mean? Are dendrites mere collectors of inputs or do neurones utilize these branched structures for local processing of inputs? How can distal inputs have a significant effect at the soma? What may be the role of active dendritic
364
C. Meunier and L Segev
properties? We have yet no satisfying answer to these questions, in contrast to action potential generation and propagation on axones which were elucidated in the 1950s, and the role of somatic conductances unravelled in the 1980s. This situation largely stems from the fact that dendrites display intermediate features. They constitute neither an electrotonically compact compartment like the soma nor a quasi-infinite medium like axones. Both their passive and active properties very likely play an important role in their function. Moreover dendritic membrane properties are themselves modified by the ongoing synaptic activity. Many misconceptions about dendrites were clarified by passive cable theory, which set the basis for our current understanding of dendrites. Hopefully, the progressive development of an "active cable theory" will pursue this clarification process. All topics regarding dendrites could not be reviewed in depth in the present chapter. In particular, we did not dwell on frequency domain analysis of the dendritic filtering of periodic signals (impedance, resonance in non-linear dendrites, etc.) or stochastic signals. All along this chapter we shall try as much as possible to keep physiological conditions and functional implications as close as possible to the foreground. Accordingly each of the three main sections ends with a discussion of selected questions largely referring to the physiological relevance of the material discussed in the section. Also we shall privilege dynamical phenomena occurring at a macroscopic scale, and analytical studies of idealized models. Much less emphasis will be put on the channel level, and numerical simulations of detailed models of neurones [53]. For instance, we shall describe analytical methods but shall not dwell on numerical methods [54] and software packages (GENESIS [55], N E U R O N [56], SWIM [57], etc.) for simulating the dynamics of neurones. To justify this bias the conclusion of the chapter will be devoted to an unfair comparison of simple mathematically tractable models and detailed supposedly "realistic" models, and to a discussion of their relevance for understanding the experimental "reality" of the neurone. We hope that this chapter will convince the reader that the physiology of the neurone already benefited greatly from metaphors, theories and models which originated from other scientific fields, and particularly from Physics. We also hope that our conviction that genuine progress may still be expected from theoretical studies, and mostly from the further development of analytic theories provided they keep in close touch with physiological issues will be shared by the reader. We shall be greatly satisfied if these aims are fulfilled. If not we still hope that the reader will be interested in this chapter, whether he enjoyed it or was irritated by it. At the least we wish that this chapter will help to correct some misconceptions or prejudices with respect to theoretical work on neurones.
2. Neuronal excitability: dealing with time scales
2.1. The elucidation of neuronal excitability 2.1.1. Spikes Large amplitude events, during which the membrane depolarises by about 100 mV, and which last about 1 ms, constitute the hallmark of neuronal excitability
365
Neurones as physical objects." structure, dynamics and function
(see Fig. 5). These action potentials are used pervasively throughout the nervous systems of both invertebrates and vertebrates for signalling between neurones, and in particular for long distance communication. All neurones do not display the same firing pattern under current clamp experiments (see Fig. 6), and only a minority of them respond to a current step with a perfectly periodic discharge characterized by a single frequency. Some of them fire only a few spikes at the beginning of current injection (phasic response). Others fire continuously throughout current injection (tonic firing) but display a steep frequency decrease over the first few interspike intervals (adaptation). Even neurones that fire tonically and without significant adaptation cannot be grouped in the same category. While some neurones start firing at high rate as soon as the current threshold is reached, many others display a gradual increase in the frequency with the injected current from some low initial value near the current threshold. In addition many neurones from invertebrates and vertebrates recorded in vitro display periodic bursts of activity, at least in some regimes of activity or when submitted to neuromodulation (conditional bursters). Bursting results from a slow depolarization of the membrane due to an inward current, followed by a repolarization when some threshold is reached where the inward current inactivates or some outward current becomes activated. However, a variety of currents can be involved. The inward current may, for instance, be activated by the hyperpolarization and have a reversal potential close to the resting membrane potential (Ih) or it may be a calcium current de-inactivated by the hyperpolarization (IT). The detailed pattern of bursting also
A
B -" +40
-0
f
-70.;~ ie
A
A
Fig. 5. Action potential recordings by Hodgkin and Huxley. (A) A micro-pipette was inserted axially into a giant axone of the squid (clear space, 0.5 mm in diameter) with small nerve fibres on either side. (B) An action potential, 90 mV in amplitude and lasting 1 ms was initiated following membrane depolarisation. Time scale is marked by a sine wave in bottom, with 2 ms between peaks. Reprinted from [58] by permission of Macmillan Magazines.
C. Meunier and L Segev
366
t!,, I I,
I! l
_-_2__0n A ion to_phoresis
t:ii",; 1 ~'t I~II[,,lI,,~;,~ ...... tl!~
!
!
i~ !! i i:!Ii:!:!i',~!~!~i!!!i
'
"[ ...........
I
z
"
I,~
-20 nA iontophoresis
]
-.
!!,,,, ,,,,,,,,,,,,i!!1!i!1i ' ,~ ,,,: :ii :ili;~i~ii~ ~~'~i t
1~
-45..n A iontophoresis
.....I........
Fig. 6. Firing repertoire of a cortical pyramidal neurone in vitro. Firing was evoked by ejecting the excitatory transmitter glutamate close to the apical dendrite at 407 lam from the soma. Top panel: Repetitive burst of action potentials evoked by minimal glutamate ejection. Middle and lower panels: Increasing glutamate quantity elicited shorter periods of burst firing, followed by longer periods of regular firing [59].
varies from one class of neurones to another. A slow wave, sine-like or sawtooth-like (triangular bursters), may or may not underlie the periodic bursting. Spikes inside each burst may progressively decrease in amplitude or remain of the same height. Their instantaneous frequency may decrease throughout the burst, increase or display a symmetrical pattern (parabolic bursters). How to make sense of this diversity? Understanding the nature of action potentials and neural firing was a long process. At the turn of the century it had been recognized that the electric excitability properties of the membrane were due to ionic gradients between the intracellular and extracellular m e d i a - in particular potassium ion g r a d i e n t s - arising from the limited permeability of the membrane. This is known as Bernstein's "membrane hypothesis" [47]. It was also established that spike initiation was associated with a huge increase in the permeability of the membrane ("membrane
Neurones as physical objects: structure, dynamics and function
367
breakdown"). It had been noticed that nerves submitted to steady current injection behaved as relaxation oscillators [40]. However, the mechanisms underlying neuronal firing remained elusive, because it was not recognized that the membrane displayed non-linear properties. Understanding this fundamental point constitutes the most impressive contribution of Hodgkin and Huxley to Neurosciences. In a series of works, started in the 1930s and interrupted by World War II, they showed that both the firing of action potentials [58], and their non-decremental conduction in the axone, relied on the opening of voltage-dependent sodium channels [3], followed by the slower repolarization of the membrane. A few years later Coombs et al. [10] suggested that the same mechanisms underlay the firing of central neurones, which was subsequently confirmed. 2.1.2. Neurones as relaxation oscillators
The first attempt to provide an abstract mathematical description of tonic neural firing is much older than Hodgkin and Huxley's works. It dates back to Lapicque, who recognized in 1907 that the membrane response of nerves to small perturbations was linear [40] and proposed on that basis the "integrate-and-fire" model (see [60] for more on this model). During tonic firing the membrane potential of neurones slowly rises until a threshold is reached; the voltage ramp is then interrupted by the abrupt firing of an action potential followed by a resetting of the voltage to its initial value. This is typical of the relaxation oscillations arising from a threshold crossing. Lapicque's integrate-and-fire model is grounded on this observation and displays a sharp voltage threshold Vth. In the subthreshold voltage range (V < Vth) the evolution of the voltage is governed by a linear differential equation dV
Cm - ~ -- Gm(Vrest - V) --[-I,
(1)
where I is the injected current, Cm the membrane capacitance and Gm is the membrane conductance. No dynamical equation is specified in the suprathreshold voltage range: the neurone fires a fast action potential whenever Vth is reached (see Fig. 7), after which the voltage is reset to some fixed value Vreset (often taken to be the resting membrane potential Vrest). This is an extremely non-linear behaviour. In this way the integrate-and-fire model elegantly by-passes the issue of membrane non-linearity and remains analytically tractable. The current threshold Ith where the neurone starts to fire is easily calculated as the current value for which the voltage tends to Vth at large times Ith z Gm (Vth -- Vrest).
The firing frequency, f , is also easily computed for I > Ith by integrating Eq. (1) in the range from Vreset to Vth spanned by the membrane voltage during the interspike interval f -'-
(
z log 1 +
1 Gm( Vth_Vreset)~ .
I-Ith
)
(2)
368
C. Meunier and L Segev
V
action potential (instantaneous)
Vth subthreshold range
Vreset
mterspike interval
#" v
Fig. 7. Lapicque's integrate and fire model. Voltage trajectory of the Lapicque's model for constant current injection above the current threshold. This simple model accounts for many aspects of tonic firing: rheobase, voltage threshold, membrane repolarisation, smooth increase of the firing rate with the injected current, etc. Similarly the phase resetting curve can be readily computed. The amenability of the integrate-and-fire model to both analytical and numerical studies (though the presence of discontinuities sets some numerical problems [61]) explains the popularity of this model in studies of neural network dynamics. However, Lapicque's approach provides no explanation for the firing of action potentials by neurones. Action potential time course (instantaneous in Lapicque's original model), voltage threshold and reset threshold are incorporated "by hand" in the model. Also none of the non-linear effects (membrane rectification, refractoriness) observed experimentally can be correctly reproduced. Moreover, only the oscillatory aspect of discharge is accounted for. The behaviour of Lapicque's model, which is not a smooth dynamical system, near the current threshold departs significantly from what is expected for neurones firing at low frequency (see below): the firing frequency does not grow from 0 as the square root of the bifurcation parameter I - Ith, the voltage lingers near Vth which makes the model very sensitive to perturbations on a large part of the interspike interval. The high frequency behaviour is no more correct. Due to the resetting of the voltage to a fixed value and the inability of the model to account for spike amplitude reduction, the firing rate increases without bound with the injected current. 2.1.3. The H o d g k i n - H u x l e y model revisited
In contrast to the integrate and fire model, Hodgkin-Huxley unravelled the nonlinear properties of the neuronal membrane. Reading their 1952 paper [3] remains extremely enlightening. Hodgkin and Huxley used a very appropriate preparation, the giant axone of the squid, which allowed them to perform in vitro experiments, during which the ionic composition of the extracellular medium could be easily manipulated, and to insert a longitudinal electrode in the axoplasm, thus space clamping a whole section of the axone. This enabled them to study action potential
Neurones as physical objects: structure, dynamics and function
369
generation per se without having to deal (in a first stage) with the issue of propagation along the axone. In this way Hodgkin and Huxley reduced the study of axonal excitability to a purely temporal problem. We also note that Hodgkin and Huxley were fortunate in the choice of the squid giant axone as neural excitability involves only two voltage-dependent currents in this preparation: a transient sodium current and a persistent potassium current [62]. It was later found, starting with the work of Connor and Stevens [63,64], that other axones display additional ionic currents, in particular transient A-type potassium currents. Very importantly Hodgkin and Huxley did not restrict themselves to a heuristic explanation of action potential regeneration on the basis of the experimental data they obtained. They proposed a quantitative model that correctly predicted the time evolution of the membrane potential in current clamp experiments, both in the subthreshold regime where constant current injections evoke graded depolarizations and in the supra-threshold regime where periodic spiking occurs. This model consists in a system of four ordinary differential equations (see Appendix for how they were derived by Hodgkin and Huxley): dV
Cm--~- --
Gleak(~eak "- V) q- GNam3h(VNa - V) + GKn4(VK - V),
dm "r,m--~ -~ mc~ ( V) - m,
dh
9 u
(4)
( v ) - h,
(5)
T , n - ~ - - ncc(V) -- n.
(6)
dn
-
(3)
Here Cm is the capacitance of the patch of membrane clamped, Gleak, GNa, and GK the maximal conductances of the passive leak current, transient sodium current and delayed rectifier potassium current in this membrane patch, and ~eak, VNa and VK are the reversal potentials of these currents. The voltage dependence of the potassium current is described by the activation variable n: the actual conductance of the potassium current is equal to GKn 4. The variable n is governed by the first-order kinetic equation (6) where both the time constant, ~n, and the equilibrium value of n, n ~ ( V ) , are functions of V, which accounts for the voltage-dependence of the potassium current, n~ (V) is an increasing sigmoid function of V. The behaviour of the transient sodium current is more complex. It is governed not only by the activation variable, m, which plays a role similar to n, but also by an inactivation variable, h with slower kinetics than m. h also follows a first-order kinetics but its equilibrium value, h~ (V) is a decreasing function of V. Accordingly, h is responsible for the slow decay of the sodium current, that follows its fast activation. Finally we emphasize that Hodgkin and Huxley relied on a macroscopic description of the axonal membrane, that was not derived from some microscopic description of the membrane in terms of ion channels. On the contrary they proposed an appealing microscopic interpretation of their results in terms of putative voltage-
370
C. Meunier and L Segev
gated ion channels, that is presented in the Appendix. If many researchers readily accepted the notion of specific ionic channels, experimental confirmation required more than a decade until patch clamp techniques were developed. The existence of these discrete ionic channels in the membrane brings up an interesting issue, at least from the conceptual viewpoint. Patch clamp experiments, and in particular single channel recordings, show that individual channels indeed open and close randomly. How then can deterministic equations, such as the Hodgkin-Huxley system, correctly describe the actual behaviour of the neurone, inasmuch as spike firing is grounded on the non-linear amplification of voltage fluctuations? This issue is discussed in Section 2.3. 2.2. Neurones as non-linear systems 2.2.1. Dynamical systems theory The Hodgkin-Huxley Eqs. (3)-(6) lie at the heart of our understanding of neural excitability. They did not only provide an elegant explanation of the action potential but set the ground for all subsequent work on the excitability properties of nonmyelinated axones, myelinated axones, and neurones. Three major aspects of this model must be emphasized. Firstly, it is highly non-linear because of the voltage-dependence of ionic currents. Hodgkin and Huxley did not only demonstrate that spike generation was due to a massive inward sodium current. Still more importantly, they showed that spike initiation resulted from a non-linear property of the membrane, namely the voltagedependent activation of this sodium current. The steep voltage dependence and fast reaction time of this current, that is, its non-linear features are actually as important as the fact that it is carried by sodium ions. This point is strikingly illustrated by the existence of calcium dendritic spikes, similar to axonal spikes but slower. Despite the different ions involved, they are initiated through the same non-linear mechanism as the sodium spike. More generally it is now clear that many different panoplies of voltage-dependent currents can lead to the same basic pattern of activity in a neurone (see Fig. 6): tonic firing, bursting, subthreshold oscillations, etc. Identifying the abstract non-linear mechanisms that give rise to these dynamical behaviour can help us to understand how similar behaviours may emerge from different combination of ionic conductances and to organize all this diversity. Hodgkin and Huxley were the first to identify neural excitability as a problem in non-linear dynamics. Secondly, Hodgkin and Huxley showed that the existence of well separate time scales was crucial for spike generation. The action potential exists only because the fast activation of the sodium current occurs before slower recovery processes (sodium current inactivation, potassium current activation) may take place. This point was made perfectly clear by subsequent theoretical studies of spiking (see Section 2.2.3). Theoretical studies of neuronal excitability put a great importance on the identification of the time scales involved and capitalize on their separation in several groups. For instance, separation of time scales allows theoreticians to use singular perturbation methods to investigate bursting mechanisms (see Section 2.2.4). Hodgkin and Huxley were the first to show that neuronal excitability was a multiple time scales problem.
Neurones as physical objects." structure, dynamics and function
371
Thirdly, Hodgkin and Huxley analysed a space clamped situation, and used a simple macroscopic description of the membrane that involves few variables (membrane potential, gating variables, etc., see Section 2.3.1). Most later theoretical works have also relied on space clamped models, that lumped together the axonal spike initiation zone and the soma in a single compartment, and accounted for current leak to dendrites by an increase in the passive membrane conductance. Spatio-temporal aspects were thus dismissed and spiking was more easily investigated as a purely time-dependent phenomenon. Whether firing may or may not be fully understood within the simplified framework of space clamped neurones will be one of the issues discussed in the next subsection (see Section 2.3.5). However it is clear that much of our present understanding of neuronal excitability was gained thanks to this simplification. It enabled theoreticians to analyse excitability phenomena in terms of abstract concepts (control parameters, bifurcations, codimension, genericity, limit cycles, etc.) prior defined in dynamical systems theory (see next section), and to apply in their study of excitability the analytical and numerical techniques developed for systems of non-linear differential equations [65]. In this latter respect it must also be emphasized that Hodgkin and Huxley laid with their model the appropriate ground for the quantitative modelling of neurones (not only of axones), on which compartmental modelling later developed. To summarize Hodgkin and Huxley were the first to show that the theory of dynamical systems with small number of degrees of freedom constituted the natural framework for studying neuronal excitability.
2.2.2. Bifurcation theory applied to neurones The first concept defined for dissipative dynamical systems and obviously relevant for neurones is that of an attractor. For constant current injection below the current threshold where tonic firing starts to take place, the solutions of the HodgkinHuxley all go to a fixed point, which is called accordingly a global attractor of the dynamics. The corresponding value of the voltage V~s(I) is then uniquely given by the I - V curve which is monotonous. The gating variables take their steady-state values: m = m~(V~s), etc. On the opposite, at higher injected current, when the fixed point has become unstable, solutions converge to a stable periodic trajectory called accordingly a limit cycle. Note that the last category of bounded attractors, the socalled strange attractors which present non-trivial topological properties is barely encountered in neuronal models: as a rule Hodgkin-Huxley-like deterministic models display only strictly periodic spiking or bursting under current clamp (see Section 2.3.6 for a discussion of this issue). The experimental notions of rheobase or current threshold correspond to the topological concept of bifurcation [65], which characterizes qualitative changes in the nature of trajectories. Outside bifurcation points in parameter space, differential systems are structurally stable: their solutions smoothly deform as parameters are slightly changed. This is the situation which is encountered both in the subthreshold current range (the value of the potential smoothly depends on the injected current as well as on conductances and kinetic parameters) and in the regime where tonic firing is the sole dynamical state. On the opposite trajectories of the differential system are
372
C. Meunier and L Segev
drastically altered when one crosses a bifurcation point: this happens, for instance, at the onset of tonic spiking in the Hodgkin-Huxley model. However bifurcation is not a mere pedantic rephrasing of the more concrete notion of current threshold. Bifurcations are rigorously defined and can be put to good use to classify the firing mechanisms of neurones. This idea that neurones should be regrouped in a few classes according to their firing pattern actually dates back to the 1930s when Arvanitaki classified the firing patterns of crustacean axones in phasic, tonic and bursting [66]. Different bifurcations exist, that involve fixed point. One of the most important is the Hopf bifurcation [67,68] where a stable fixed point changes stability when merging with a limit cycle. This happens when a pair of complex conjugate eigenvalues cross the imaginary axis while the other eigenvalues describing the linear stability of the fixed point remain all real negative or complex with a negative real part. This bifurcation exists in two different flavours, known as subcritical and super-critical Hopf bifurcations. When the bifurcation is subcritical the stable fixed point becomes unstable when merging with an unstable limit cycle that exists up to the bifurcation point. Above the bifurcation point, the solutions of the differential system converge to some attractor which is far from the fixed point and unrelated to it. This is what occurs in the Hodgkin-Huxley model [69,70] (see Fig. 8). The Hodgkin-Huxley model [3], grounded on the experimental study of the squid's giant axone, cannot account for the ability of most invertebrate and central neurones to fire at very low rate (down to a few Herz). Indeed, even at low temperature (6.3~ firing starts in the squid axone at about 60 Hz. This raises several issues: what is the biophysical substrate of the low frequency firing of neurones? Can this different behaviour also be accounted for by a simple Hodgkin-Huxley-like model? Is this difference a mere quantitative problem or does it stem from some underlying qualitative difference, either biophysical or mathematical? Long interspike intervals were observed in invertebrate preparation by Connor et al. [63,64] who could ascribe them to a fast transient potassium current (as compared to the slow delayed rectifier current present in these preparations). This A c u r r e n t - many different forms of which were later identified in invertebrate and vertebrate n e u r o n e s - de-inactivates under hyperpolarization, activates in a lower voltage range than the delayed rectifier current, and inactivates much more slowly than it activates. These kinetic properties make A currents particularly capable to increase the latency period before the firing of the first spike and to increase the duration of the slow voltage ramp between successive spikes [64]. Although they are not the only currents involved in low frequency firing, it is enlightening to follow Rush and Rinzel's analysis of A-currents [71] and to go deeper in the operating mechanisms of this class of potassium current. As a rule activation and deactivation kinetics of A-currents, and inactivation (which is typically one order of magnitude slower) constitutes the rate limiting process. Still the inactivation properties greatly vary from one preparation to the other and the voltage-independent inactivation time constant lies in the range 1-30 ms in central
Neurones as physical objects." structure, dynamics and function
373
!
i t
,
Ii
! !
V
I= I1
I'-~ ~
I = 12
I= 13
! w Fig. 8. Schematic bifurcation scheme of the Hodgkin-Huxley model. For the sake of simplicity a simpler two-dimensional situation is described (variables are the voltage, V, and some recovery variable, W) and secondary bifurcations [69] are not shown. Stable (full lines) and unstable (dashed lines) states are shown. The bifurcation parameter is the injected current, I. The onset of spiking at I1 corresponds to a saddle-node bifurcation of periodic orbits [65], which gives rise to a stable limit cycle (in red), and to its unstable counterpart (in blue). At 12 the membrane resting state FP becomes unstable through a subcritical Hopf bifurcation, when it merges with the unstable limit cycle. Between these two values bistability is observed: the membrane can stay in its resting state or display oscillations depending on the initial condition. Spiking ceases at 13, and the membrane again adopts a stable steady state. In contrast with the onset of spiking, vanishingly small oscillations are observed just before this transition due to a high level of sodium current inactivation. Consequently the stable limit cycle is very close to the fixed point near the bifurcation and the Hopf bifurcation is this time supercritical: the unstable fixed point becomes stable when merging with the stable limit cycle.
neurones at physiological temperature. The hyperpolarizing effects of a potassium current that relaxes with a time constant in the tens of milliseconds range m a y be expected to last up to 100 ms, so that these currents might be able to regulate neuronal firing down to frequencies of 10 Hz. But what about firing at still lower frequencies, when the time constants of the potassium currents becomes m u c h smaller than the interspike interval? To u n d e r s t a n d how fast currents (at this time scale) m a y still play a role in regulating the discharge one must distinguish between their steady-state properties
374
C. Meunier and L Segev
and their kinetic properties. Consider a current with fast kinetics (at the time scale of the interspike interval). This current will soon relax to its steady-state value and then follow adiabatically the evolution of the membrane voltage. It will be able to regulate the duration of the interspike interval only if several conditions are satisfied. It must still be operative at steady state (it is then called a window current, as it can be observed only in the limited voltage window where it is at the same time significantly activated and de-inactivated). Its steady-state voltage window must overlap with the voltage range swept during the interspike interval. Finally it must be strong enough to hinder membrane depolarization and maintain voltage long enough in its voltage window. On the opposite a current that slowly inactivates does not require a steady-state window current to be operative. It can take values high enough to oppose membrane depolarization just because it stays in a dynamical state where it is both activated (due to the actual voltage value) and deinactivated (due to the slow de-inactivation kinetics). The potassium A-current may act both ways. When the firing frequency is not too low the current does not fully relax during the interspike interval and the A-current may operate via a kinetic effect. This is no longer possible closer to the current threshold as the A-current has fully relaxed early in the interspike interval. Then it can operate only as a window current. We note also that if the maximal effect on the duration of the interspike interval is generally expected from a current acting all along the interspike interval, a current that acts only transiently may still significantly shorten or lengthen the interspike interval by a phase resetting effect (see for instance the chapter by D. Golomb, D. Hansel and G. Mato in the present book). The existence of a window current can be seen on the steady-state I - V curve of the neurone. The I - V curve of the usual Hodgkin-Huxley model is monotonous. On the opposite it is N-shaped in the model incorporating an A current proposed by Connor et al. [64]. This opens the possibility that the stable resting state disappears via a saddle-node bifurcation, when it coalesces with an unstable fixed point at the first knee of the I - V curve. Such a scenario could not occur in the usual Hodgkin-Huxley model. Still the shape of the I - V curve does not guarantee that spiking will start via a saddle-node bifurcation. The current threshold can be lower than the saddle node bifurcation point. This may happen when the homoclinization affects the unstable fixed point and is unrelated to the saddle-node bifurcation that occurs at a higher current level. In this case the bifurcation is called a saddle-loop. The f - I curve still starts at zero frequency at the bifurcation point but f then grows linearly with I near the bifurcation. In addition, a bistability region then exists between the saddle-loop and saddle-node bifurcation points. The stable fixed point may also bifurcate via a subcritical Hopf bifurcation before the saddle-node bifurcation is reached, in which case the saddle-node will involve two unstable fixed points and will not be associated to a homoclinization. Which scenario will occur depends on all the problem parameters, and in particular on the activation and inactivation time constants of the A current, although changing these kinetic parameters does not affect the steadystate I - V curve [71].
Neurones as physical objects: structure, dynamics and function
375
The inactivating A current affects the resting membrane potential but its effect progressively subsides as the injected current is increased. It is the subsequent disappearance with increasing injected current of the rectification caused by the A current which makes the I - V curve N-shaped. Non-monotonous I - V curves displaying anomalous rectification may also originate from the steady-state activation of an inward current (persistent, or transient with some window current effect). Correspondingly low frequency firing associated to a saddle-node bifurcation, can be obtained when persistent sodium current rather than potassium A current is added to the Hodgkin-Huxley model. It may even happen within the framework of the Hodgkin-Huxley model provided that the parameters are modified so that the I - V curve displays a strong anomalous rectification [71]. Conversely changing the kinetic parameters of the A-current can make the I - V curve monotonous, in which case the membrane steady state will become unstable through a Hopf bifurcation and firing will start at finite frequency [71]. All these examples illustrate in the simple case of tonic firing an important general fact: the firing pattern of a neurone cannot be readily associated to the presence of some given voltage-dependent current. Similar stationary firing features can be obtained for completely different repertoires of ionic currents (though their transient effects may still be different as exemplified by the brief discussion of phase resetting curves below) while merely changing the relative maximal conductances of two currents or shifting some kinetic parameters can strongly affect the firing pattern of a neurone. On the contrary the abstract concept of bifurcation enables one to regroup many different tonic neurones in a few well-defined classes, each corresponding to a given firing pattern near the current threshold. Moreover analysing the non-linear dynamics of neuronal models through phase plane techniques and bifurcation analysis allows us to progress in understanding the effects of the various ionic currents on the steady-state discharge of neurones [72].
2.2.3. Phase plane analysis Bifurcation analysis enables us to understand through which mechanisms the quiescent state of the membrane becomes unstable and neurones start to fire. However it is, by definition, restricted to a neighbourhood of the current threshold in parameter space. Moreover the analysis is mostly local, and essentially limited to a neighbourhood of the resting state in phase space. The only global feature of the dynamics that is considered is the possible existence of heteroclinic or homoclinic orbits. Therefore many questions regarding neuronal firing cannot be answered in that framework. A good example of that is the issue of the (approximate) linearity o f f - I curves (see Fig. 9). An another approach of neuronal excitability pioneered by FitzHugh [73] and Nagumo [74] gives us a global description of the phase space of neurones, even far from bifurcation points. It is grounded on a description of neurones as twodimensional relaxation oscillators models [73-75]. Models of this type are smooth dynamical systems, at variance with the Lapicque's model, and thus better behaved. They are amenable to phase plane analysis, at difference with the more
376
C. Meunier and L Segev
A
B
130
110
s ,,.-71-
A N
"r
N
:z::
v
,...
-
00
9
9
--
9
A
i thAI
,11
60
I ~-
o
P
J
A
0
A
i
J
(nA)
&
,
D
25
Fig. 9. Gain function ( f - I curve) of lumbar at motoneurones. Intra-cellular recordings from anaesthetised cats. (A) Motoneurone without secondary firing range. (B) Neurone with primary and secondary firing ranges. Note the higher gain in the secondary range. Reprinted from [5] (Figs. 2(C) and I(C)) by permission of the American Physiological Society. The approximate linearity of the f - I curve in the primary firing range is due to the slow AHP conductance. Near the current threshold, this conductance plays no role and the f - I curve is not linear (A). Therefore bifurcation theory can shed no light on the issue of f - I curve (approximate) linearity. complicated Hodgkin-Huxley model, so that the geometric aspects of neuronal excitability can be readily investigated. In view of their simplicity they are also widely used (as well as the Lapicque's model) to investigate such problems as the synchronization of strongly coupled neurones [76-78]. The first model of that type was proposed by FitzHugh [73,79] and independently by Nagumo et al. [74]). It is a variant of the simple undamped oscillator with a non-linear friction term introduced by Van der Pol in the 1920s [65,80] to provide a phenomenological description of heart beat d2x dx dt 2 t-c(x 2 . 1 ) ~ - ~ + x = O . Van der Pol's equations can be recast into the form
dt =C
x + y-
dy= dt
x c
,
(8)
where
1 dx y=c~+5
(7)
X3
x.
FitzHugh-Nagumo equations are barely more complicated
Neurones as physical objects: structure, dynamics and function
dt
-c
V--
3-
dW _ _V-a+bW
dt
+W+I
'
377
(9)
(10)
c
and the original Van der Pol's equations are even recovered when all three constants a, b and I are set to 0. In these equations V can be interpreted as the membrane voltage and W as some slower "recovery variable" responsible for membrane repolarization (the ratio of the evolutionary time scales of the two equations being given by c2), while I represents the injected current. The phase plane of this system can be readily analysed (see Fig. 10). The V nullcline (the locus of points where dV/dt vanishes) is an N-shaped cubic while the W nullcline is just a straight line. Under appropriate conditions on the constants a, b and c, these nullclines intersect at a single fixed point, which can be thought of as the resting membrane state, so that the I-V curve is monotonous as in the Hodgkin-Huxley model despite the cubic shape of the first nullcline. When I is increased this fixed point becomes linearly unstable at some bifurcation value. The system then performs periodic oscillations, interpreted as tonic firing. These oscillations take a particularly simple form in the limit of infinite c: the system then performs a periodic oscillation on a singular closed trajectory consisting of two slow evolutions on the V nullcline interspersed by two instantaneous transitions at constant W. This asymptotic analysis must then be extended to finite values of c. This is a singular perturbation problem. However, because the differential system is two-dimensional one can use the Poincar6-Bendixson theorem [81] to show that for large but finite c the system displays a smooth stable limit cycle that tends to the singular closed trajectory defined above when c goes to infinity. Phase plane analysis explains why the repolarization of the membrane after the peak of the action potential proceeds in two steps. Just after the peak the membrane voltage decreases as the system operation point follows the depolarized branch of the V nullcline. The subsequent rapid evolution to the hyperpolarized branch of this nullcline is associated with a much steeper voltage drop. Therefore the two slopes observed during the decay phase of the action potential are not the signature of two different active repolarization processes being successively at work, such as sodium current inactivation and potassium current activation, but the natural consequence of the N-shape of the V nullcline. We also note that a voltage threshold naturally appears in this analysis, as a consequence of the N-shape of the V nullcline. In the limit where c vanishes it is merely given by the first knee of this nullcline. This voltage threshold increases with the injected current. However, at variance with the injected current itself, it is a geometric feature of the trajectory rather than a natural control parameter of the differential system. But is there not some hidden but important difference between this simple twodimensional model and the four-dimensional Hodgkin-Huxley model? The relaxation oscillation of the FitzHugh-Nagumo model presents clear differences with the tonic spiking of neurones. The spike duration, for instance, is comparable to the
378
C. Meunier and L Segev
A
1,5
9 '
'
'
I
'
'
'
'
I
'
' ~ '
I
'
'
'
'
I
'
'
'
'
'
'
'
'
dW/dt = 0
W
0,5
-0,5
I
- 3
-2
-1
0
1
2
V
.50"25
.."
U
-75 r -I00
-5O
0
50
V Fig. 10. Phase planes of two-dimensional models. (A) FitzHugh-Nagumo models. Parameters are a = 0.7, b - 0.8, c - 0, and I = -0.4. Nullclines are displayed as bold lines. In the singular limit, shown here, where c = 0, the operation point of the system moves along the V nullcline, the variable V just following adiabatically the slow "recovery variable" W. This is possible only as long as a knee of the curve is not reached. Then the operation point instantaneously jumps to the other branch of the V nullcline (dashed line), which, in turn, is followed till its knee is reached. (B) Reduction of the Hodgkin-Huxley model, using Abbott and Kepler's method [86]. Sodium activation is assumed to be instantaneous. Gating variables h and n are replaced by voltage-like variables Uh and U,, respectively, defined by h = h~(Uh) and n i n~(U~). These two variables are then linearly combined into a single slow variable U. The resulting dynamics is shown in the phase plane (V, U) for I -- 0. Note that the nullcline d U / d t -- 0 is just the straight line V = U. A more systematic treatment can be found in [84]. Reprinted from [86] (Fig. 3).
Neurones as physical objects: structure, dynamics and function
379
interspike interval in the FitzHugh-Nagumo model whereas it is negligible with respect to the interspike interval in neurones firing at low rates. However this can be fixed by making the recovery time constant voltage-dependent. Oscillations appear at the current threshold through a supercritical Hopf bifurcation [82] in the original FitzHugh-Nagumo model [73]. This means that the amplitude of the oscillations steadily grows from 0 as the current is increased past the threshold, which does not match the subcritical behaviour of the Hodgkin-Huxley model. Still one can adopt other parameters for which spiking onset occurs through a supercritical Hopf bifurcation, with little impact on the global geometry of phase plane. One may then wonder whether it is not possible to reduce the Hodgkin-Huxley model to a simpler two-dimensional FitzHugh-like model. One way to do this is to try to eliminate variables from the Hodgkin-Huxley model with as little impact as possible on the qualitative and hopefully quantitative behaviour of the model. All such reductions schemes rely on the same approach: eliminating the fast activation variable m by assuming that its variations are instantaneous or occur at the time scale of the membrane time constant, merging into a single slow variable the two recovery variables h and n. The first such reduction was done by Kokoz and Krinskii [83] who noticed that the sum h + n remained nearly constant throughout the oscillation. Much later, Kepler et al. showed how to perform more accurate reductions of the Hodgkin-Huxley model and similar differential systems [84]. These two reduction schemes do lead to two-dimensional systems with the desired bifurcation scheme [85]. In addition they approximate much better the shape of the action potentials than the FitzHugh equations [85,86] (see Fig. 10). This is due to the fact that the depolarised branch of the V nullcline barely changes with the injected current is these models [86]. Can we do better and propose a two-dimensional system really equivalent to the Hodgkin-Huxley model? In this model, the resting state of the membrane becomes linearly unstable when a pair of complex eigenvalues of the linearized vector field cross the imaginary axis. Accordingly the unstable manifold of the fixed point, in which the stable limit cycle is embedded, is twodimensional. This manifold is globally invariant under the dynamics and is transversally stable with respect to the other degrees of freedom, which relax exponentially toward it. A perfect approximation of the Hodgkin-Huxley model would require to write down explicitly the equations governing the dynamics on the two-dimensional unstable manifold, near the fixed point (local invariant manifold) and farther in phase space (global invariant manifold, first determined in the vicinity of the Hopf bifurcation, and then extended for larger values of the injected current). Unfortunately this cannot be done analytically, although the Hodgkin-Huxley system is fundamentally bidimensional. Although it is not directly related to bifurcation theory and can be used far from bifurcation points, phase plane analysis can still shed light on the bifurcations of neurones. This is best illustrated by the clever way Hindmarsh and Rose [75] addressed the problem of low firing rates. These authors combine the analysis of experimental data on invertebrates neurones (pond snail Lymnaea stagnalis and crab Cancer magister) with a mathematical approach of the problem. The model they analysed is a variant of the FitzHugh-Nagumo model
380
c. Meunier and I. Segev
dV dt dW dt
= a(-f(v) - b(o(V)-
+ w + 1), w).
The functions f ( V ) = c V 3 + d V 2 + e V + h and g(V) = f ( V ) - qe rv + s were determined by fitting voltage-clamp data (a simpler quadratic function for g(V) give similar results [87]). Just above the current threshold, the two nullclines determine a n a r r o w channel where the trajectory spends a very long time (see Fig. 11). This time increases without b o u n d when I goes to Ith from above. Accordingly the firing rate vanishes like v / I - Ith at the current threshold Ith. At this specific value the stable limit cycle becomes a homoclinic orbit, that is, a trajectory which tends to the fixed point when time goes to +oo. This behaviour corresponds to a saddle-node bifurcation on an invariant cycle [65] (see Fig. 12). Here again this phase plane analysis of the problem is m a d e possible by the two-dimensional nature of the dynamics of tonically spiking neurones. 2.2.4. Multiparameter singular perturbation theory
Singular perturbation theory, such as it is used for studying the F i t z H u g h - N a g u m o or H i n d m a r s h - R o s e models, can to some extent be generalized to a theory of
Recovery side
y(nA)
Action side
6o/
x(mV)
. . . .
/
.'~=0
C
B
Fig. 11. Phase plane of the Hindmarsh-Rose model. The two nullclines and the limit cycle are displayed for I = 0.033 nA. Other parameters are a = 5,400 mV s -1, b = 30 s -1, c = 0.00017, d = 0.001, e -- 0.01, h = 0.1, q = 0.024, r = 0.088 and s = 0.046. The two variables are denoted by x and y in this original figure by Hindmarsh and Rose [75], which corresponds, respectively, to V and W in the text of the chapter. At variance with the FitzHugh-Nagumo model, where the second nullcline is a straight line, the two nullclines of the Hindmarsh-Rose model do not intersect transversely at the current threshold but display a tangential contact. Just above the threshold they remain close together, just separated by a narrow channel (C ~ A). As a consequence, the vector field takes small values all over this region and the evolution of both variables proceeds extremely slowly. This entails a very long voltage ramp before a spike may be fired. Reprinted from [75] (Fig. 3).
Neurones as physical objects." structure, dynamics and function
381
/
s
,gj m u
m
FP3 m
m
mn
mmm
n
mm
mm
k~
~
n
mm
mmm mmm m m
F::P2
aml jmw 4m~,
9
O N
._
FPI I
I -12
1=13
m w Fig. 12. Schematic bifurcation scheme of a neurone undergoing a saddle-node bifurcation at the current threshold. The phase space displayed here is two-dimensional (variables are the voltage, V, and some recovery variable, W), which is enough to capture the essence of the bifurcation scheme. Bifurcation parameter is the injected current I. Stable (full lines) and unstable (dashed lines) states are shown. The differential system exhibits a stable limit cycle (in red) and three fixed points:/7/1 (light blue), UP2 (green) and FP3 (dark blue). At 11 the pair of unstable fixed points UP2 and FP3 appears through a first saddle-node bifurcation. F& is fully unstable whereas/;P2 is unstable in only one direction. The one-dimensional unstable manifold of FP2 goes to the globally stable fixed point FP1 (resting state of the membrane), thus defining an invariant circle (heteroclinic connection). At I2 F& and FP2 coalesce through a second saddle-node bifurcation. The heteroclinic connection between the fixed points, that existed for I1 ~
multiparameter singularly perturbed dynamical systems [88], in particular to investigate bursting neurones. This was done first on Plant's model of bursting in Aplysia R-15 cell [89] and in the study of thalamo-cortical cell bursting by Hindmarsh and Rose [90-92]. Several important steps in this direction were made more recently by J. Rinzel and collaborators [41,93-96] who applied this approach to classify and understand from a mathematical viewpoint the bursting mechanisms of neurones. The basic idea is as follows. One identifies a group of fast evolving variables (membrane voltage, gating variables of fast currents, etc.), collectively denoted by X,
C. Meunier and L Segev
382
and a group of slowly evolving variables (gating variables of slow currents, intracellular calcium concentration, etc.), denoted by Y. The dynamics of the burster is then described, at the time scale of the slow evolution, by the two coupled subsystems dX ~--d7 = F(X, Y), dY dt - G(X, Y) where the small parameter c quantifies the separation between the fast and slow time scales. On the fast time scale, the slow variables Y barely evolve and can be considered as control parameters of the fast subsystem (11). Accordingly dynamics on the fast manifold, that is, the manifold spanned by the fast variables, relax rapidly toward a quasi-stationary state, Xss(Y), which is not necessarily a fixed point and is implicitly given by the system of equations F(Xss, Y ) = 0. In turn, the variables Y evolve on the slow manifold of the problem according to the differential system dY dt = a ( < Xss(r) >, r), where the brackets denote averaging [97] over the fast variables in the dynamical state Xss. The analysis then proceeds along the same general lines as .for simple relaxation oscillations. The fast variables adiabatically follow the slow variables whenever this is possible. When this is no longer possible they jump to another branch of solutions of the equations F(Xss, Y) = O. Several basic bursting mechanisms can be distinguished in this way as illustrated in Fig. 13. We would like to conclude this paragraph on several remarks. Firstly applications of multiparameter singular perturbation theory is not restricted to bursters but applies as well to spikers, when the dynamics involve several distinct time constants (after-hyperpolarization, subthreshold oscillations, etc.). The effect of the slowly inactivating A-current, for instance, can be analysed applying singular perturbation theory to a three-dimensional model involving three time scales [86]: a fast time scale corresponding to the membrane passive response, a slow time scale associated with membrane recovery processes and A-current activation, and the still slower scale of A-current inactivation. Secondly a clear separation of time scales is not sufficient to apply singular perturbation theory. The exponential nature of the relaxation to a quasi-steady state on the fast time scale, for instance, is an important point in problems of singular perturbations. Fortunately, the kinetics of gating variables and intracellular ionic concentrations display such an exponential behaviour. Thirdly, singular perturbation theory is not the only possible way to study neuronal dynamics with multiple time scales. In the example quoted above singular perturbation theory clearly revealed the increase in spike latency due to the A-current. However studying the effect of this current on the bifurcation scheme of neurones sets the problem in a more general context and may be more enlightening.
Neurones as physical objects: structure, dynamics and function
383
Ag
BB
...........
u
~
f
S
u
u
Fig. 13. Fast- and slow-phase plots of bursting dynamics. Membrane potential V is the only fast variable displayed. In each case, the bifurcation diagram is computed for the fast subsystem, with slow variables treated as parameters, and the membrane potential behaviour is displayed as a function of these slow variables. The solid curve shows stable and the dashed curve unstable branches. Maximum and minimum voltages are displayed for states of repetitive firing. The heavy curves with arrows are stable periodic trajectories of the system in phase space corresponding to burst firing. (A) Square-wave bursting. It is based on the bistability of the fast subsystem (displaying a stable fixed point together with a stable limit cycle) and periodic switching between the two attractors of the fast subsystem, induce by the oscillatory dynamics of the slow variable Y. (B) Parabolic bursting. The two-dimensional slow subsystem YI,y2 displays a stable limit cycle. Motion of the slow variables along this cycle induces smooth periodic switching between a stable fixed point (SS) and a repetitive firing state (OSC) of the fast subsystem. At variance with the former case (A), the fast subsystem exhibits no bistability. Reprinted from [95] (Fig. 3(A) and (C)).
These two approaches correspond to different limits and are complementary. Bifurcation theory is particularly useful to study the onset of firing and changes in the firing pattern. However it is restricted by definition to narrow parameter ranges (near the current threshold, for instance). On the other hand, singular perturbation theory can be used to analyse the structure of phase space far from bifurcation points but it requires a clear separation of time scales, and dealing with singular perturbations may be tricky. Lastly, we would like to point out a mathematical difficulty encountered in multiparameter problems, and related to the number of control parameters. Hodgkin-Huxley-like models depend on numerous parameters: capacitance, maximal conductances of the currents involved, parameters specifying the kinetics of the various gating variables, injected current. Setting all the membrane parameters and
384
C. Meunier and L Segev
varying only the injected current as in current clamp experiments amounts to moving along a line in the high-dimensionality parameter space. The only bifurcations that may arise then are those that can be displayed by a one-parameter family of differential systems. They are known as codimension one [65] bifurcations and have been extensively studied, especially when they are local and involve only fixed points. These codimension one bifurcations then occur when one real eigenvalue or two complex conjugate eigenvalues of the linearized vector field around the fixed point cross the imaginary axis. On the contrary, when establishing the full phase diagram of a neurone, or when studying the behaviour of spikers or bursters using singular perturbation theory one may encounter bifurcations of codimension two or higher, every time that several control parameters are involved. Such bifurcations occur, for instance, when different eigenvalues of the linearized vector field at a fixed point, simultaneously cross the imaginary axis or when eigenvalues are degenerate. Unfortunately our mathematical understanding of such bifurcations is much less complete than for codimension one bifurcations. 2.3. Discussion 2.3.1. Is neuronal dynamics deterministic? Patch clamp experiments, and in particular single channel recordings, show that individual channels open and close randomly as predicted by Hodgkin and Huxley (see Appendix). How then can deterministic equations, such as the Hodgkin-Huxley system, correctly describe the actual behaviour of the neurone, inasmuch as the nonlinear properties of the membrane enable neurones to fire a full-blown spike starting from a voltage fluctuation? Gating variables, such as m, h and n, only give the average number of open ionic channels in a membrane patch. Hodgkin and Huxley argued that in view of the large number of channels N, the actual number of open channels Nopen would seldom depart significantly from these average values. This assumption is a priori not valid in all spiking regimes. Indeed, if typical fluctuations ~Nopen in the number of open channels are always small with respect to the total number of available channels (of the order of 1/x/N), relative fluctuations ~Nopen/Nove, in the number of open channels may be still quite important if the fraction Nopen/N of open channels is small. This may happen if ionic currents are weakly activated or strongly inactivated. Still it does not seem that neuronal discharge can be strongly affected by channel stochasticity. Indeed important fluctuations of gating variables are expected only below the rheobase (activation variables) where the resting potential is linearly stable and the active conductances are small with respect to the leak conductance, or at high depolarizations (inactivation variables). This is supported by recordings of pyramidal cells in slices of rat neocortex [30], that showed little variability of the discharge in current clamp experiments (CV ~ 0.1), and a reliable entrainment of the neurone by fluctuating stimuli. In physiological conditions channel noise is likely to play a lesser role than synaptic fluctuations [98], and the high variability (CV ~ 0.5-1) of cortical neurones in vivo results more probably from network effects [99,100] than from the intrinsic stochasticity of neurones.
Neurones as physical objects: structure, dynamics and function
385
Still the stochastic dynamics of neurone is an interesting issue from the conceptual viewpoint and - following the work of Strassberg and DeFelice [43] - several authors [101-105] have recently investigated stochastic versions of Hodgkin-Huxley-like models. These studies, essentially numerical, considered isopotential patch of membranes such as the soma, the initial segment of the axone, or a Ranvier node, and studied numerically the firing pattern in current clamp situations (constant or fluctuating current). Considering only an isopotential patch simplifies greatly the problem as one needs only to compute how the number of channels in each possible kinetic state evolves in time through a simple Markov chain. It must also be noticed that the conductance fluctuations of an isopotential patch of membrane are not related to the density of channels but to their actual number. Consequently fluctuations can be less important at the initial segment than at the soma of a neurone in spite of the high local concentration in voltage-dependent channels. The firing pattern of a stochastic Hodgkin-Huxley model in response to a constant current is readily interpreted, on the basis of the deterministic phase portrait, once it is recognized that the average fraction of open sodium and potassium channels is surprisingly small in the subthreshold voltage range. Two types of stochastic effects can be clearly distinguished: small Gaussian fluctuations and rare large deviations (see Fig. 14). The f - I curve relating the injected current I to the average firing frequency f of the stochastic model displays no singularity near the current threshold of the deterministic model: it becomes smooth on the whole injected current range and vanishes only in the limit of an infinite hyperpolarizing current. This is consistent with the fact that the topological notion of bifurcation no longer makes sense for a stochastic system [106]. However several interesting points related to channel stochasticity have still deserved too little attention. A clear understanding of which gating variables play the most important role, of which kinetic features (activation voltage, time constant, etc.) are important and of the interplay between channel stochasticity and the bifurcation scheme of the deterministic dynamics is still missing. Clearly this issue would benefit from a rigorous mathematical treatment. A step in that direction was taken by Chow and White, who showed that the subthreshold discharge could be addressed as a problem of random barrier escape in a bistable potential. However it is well known from physics, where significant progress on the related problem of random motion in a bistable potential was obtained only through the use of rather formidable functional integration techniques [107], that understanding the effect of multiplicative coloured noise on non-linear systems is a very difficult problem. A second issue, that was little addressed, is the impact of channel stochasticity on extended systems. G. Renversez [103] investigated the issue of the isopotentiality of the soma while Manwani et al. [98] discussed the different sources of stochasticity on dendrites. Impact of channel stochasticity at Ranvier nodes on spike conduction along myelinated axones, and in particular random differential conduction, was also investigated [101]. More studies along these two lines are needed to justify our feeling that channel stochasticity has no significant impact on the firing of neurones in physiological conditions.
386
C. Meunier and I. Segev
100
A
I=0 p N c m 2
13
'
' =4pA/'cm 2
C
=7 a A / c m 2
D
'
'l=lo.Nc
i
i
80 60 40 20
-20
100
'
80 60 40 20
-20
300
500 msec
700
900
300
i
i
500 700 msec
i
900
Fig. 14. Spike firing in the stochastic Hodgkin and Huxley model. A membrane patch of 600 pm 2 with 10,000 potassium channels and 36,000 sodium channels was modelled. Spike firing in the stochastic Hodgkin and Huxley model. (A) Membrane voltage fluctuations without input current. (B) With 4 pA/cm 2 DC current, subthreshold membrane oscillations can be seen with spontaneous spike firing (with statistics close to Poisson) due to large deviations. (C) With I = 7 pA/cm 2 (which is the current threshold for spike firing in the corresponding deterministic Hodgkin-Huxley model) tonic spiking is observed. The bistability of the deterministic Hodgkin-Huxley model near the current threshold imparts some burstiness to the discharge. (D) More regular firing similar to the deterministic model, is observed with I = 10 pA/cm 2. The high frequency discharged is disrupted at times due to large deviations in the number of open channels. Reprinted from [105] (Fig. 5) by permission of MIT Press. 2.3.2. Passive membrane properties and neuronal excitability We have focused on the non-linear properties of the membrane as they are at the origin of the excitability properties of neurones and explain such phenomena as the firing of action potentials of the existence of complex firing patterns such as exhibited by bursters. W h a t is then the role of passive membrane properties in steadystate spiking. H o w does an increase of the membrane conductance, such as occurs when synaptic activity is intense, affect neuronal firing? The first consequence is a decreased input resistance. This, together with the fact that the relative importance of active currents is diminished, increases the current threshold of the neurone, shifting rightwards the steady-state f - I curve of tonic neurones. In addition, the
Neurones as physical objects: structure, dynamics and fimction
387
nature of the bifurcation itself may change. Indeed increasing the membrane conductance decreases the amount of membrane rectification, which may, for instance, change the bifurcation from saddle-node to Hopf. This provides a simple example of a problem with two control parameters (I and Gleak) where codimension two bifurcations may appear (see Section 2.2.4). May an increase in the membrane conductance also affect in a non-trivial way the f - I curve of a neurone far from the current threshold? This question must be raised as the synaptic shunt reduces the membrane time constant, which defines an important time scale of neuronal dynamics. There has been a longstanding debate over this issue, some authors claiming that the synaptic shunt reduces the gain of neurones, that is, the slope of its f - I curve at a given frequency. This supposed divisive effect of shunting has been used in various contexts, from the control of the sensitivity of sensory neurones in the electric fish [108] to orientation selectivity in the primary visual cortex [109] where shunting inhibition was recently shown to be important by direct measurements of the membrane conductance of neurones in vivo [110]. This is rather surprising as it was proved as early as 1966 during in vivo experiments anaesthetized cats that a synaptic stimulation superimposed to a steady current injection merely shifted the f - I curves of of lumbar motoneurones firing in their primary range [111-113]. Still one may argue as Granit et al. themselves did, that the activated synapses were preferentially activated on dendrites and that little synaptic shunt was felt by the soma. This explanation is not satisfying for two reasons. Firstly it can be shown that, whatever the magnitude of the shunt they elicit, the firing frequency (but not the voltage trajectory) is the same as if the soma of motoneurones just received from the dendrites an effective current (see, for instance, [113]). Secondly it cannot apply to somatic synapses, which are expected to have the stronger effect on the spiking mechanism. It was shown by Capaday and Stein [114] in a work on gain control in motoneurone pools that changing the input resistance of a single compartment model of the lumbar motoneurone did not affect the slope of the f - I curve (see also [115]). This result is not particular to spinal motoneurones [116]. Even the Lapicque model, which is purely linear in the subthreshold voltage range displays a similar behaviour [116]. The membrane time constant is not a simple scaling parameter in the equation of its f - I curve (see Eq. 2) and at high injected current (when it is much larger than the leak current G ( V t h - Vreset) at the voltage threshold) the firing frequency becomes independent of the input conductance. This corresponds to the regime where the f - I curve is approximately linear. It seems therefore very unlikely that the synaptic shunt might ever have a divisive effect on steady-state f - I curves. Still a complete theoretical analysis of the shunt effect was done only for a few simple models and additional theoretical and experimental evidence of the subtractive effect of the synaptic shunt would be welcome.
2.3.3. Are conductance-based models reducible to &tegrate-and-fire models? Many tonic central neurones can generate low frequency spike trains. For lumbar motoneurones of anaesthetized cats, for instance, the ratio of the interspike interval to the spike duration ranges between 10 (near the upper end of the firing rage) and
388
C. Meunier and L Segev
100 (for spikes of 1 ms fired at less than 10 Hz). As a consequence many authors have modelled the dynamics of these neurones by integrate-and-fire models [117,118]: voltage evolution in thec interspike interval is governed by a panoply of subthreshold currents and a stereotyped spike is assumed to be fired whenever voltage reaches some threshold value. Can this approximation be rigorously justified? Is it possible to capitalize on the presence of these two time scales, spike duration and firing period, to reduce a single compartment conductance-based model of tonic neurone to some simpler integrate-and-fire model, the approximation becoming perfect in the limit of vanishing frequency? The answer is in the negative but reduction of Hodgkin-Huxley-like models to a non-linear integrate-and-fire model can be performed in another limit [86]. Consider a two-dimensional relaxation oscillator, just like the FitzHugh-Nagumo or Hindmarsh-Rose models. In the limit of singular relaxation oscillations (i.e., when the recovery variable evolves on very slow time scale as compared to the membrane voltage), the voltage evolution during the interspike interval is given by the hyperpolarised branch of the V nullcline. This evolution takes place between two well-defined values of the voltage. The lower corresponds to the resetting potential, Vreset and is implicitly specified by the value of the recovery variable at the second knee of the V nullcline, whereas the higher value, Vth is the voltage at the first knee of the nullcline. This shows that in this limit the neurone can be approximated by a non-linear integrate-and-fire model
C--~dV- F(V,H(V)) + I where F(V, W) denotes the sum of the leak current and of the various voltagedependent ionic currents and W = H(V) is the equation of the first branch of the V nullcline. Note also that both the voltage threshold and the resetting voltage depend on the injected current. When performing such a reduction, one never assumes that the neurone is spiking at low frequency. Spikes in the resulting integrate-and-fire model have some finite duration on the slow time scale defined by the period of the neurone, and becomes instantaneous with respect to the interspike interval only when we consider in a second step the limit of vanishing firing frequency. In other terms one does not work in the vicinity of the bifurcation but uses instead the limit of a small membrane time constant (with respect to all the other time scales, including the interspike interval). This limit may possibly make sense for axones, especially myelinated axones where the efficient axial current flow to neighbouring regions drastically decreases the input resistance. But it is hard to imagine a situation where the intrinsic membrane time constant of a neuronal soma (generally estimated to be of the order of several tens of milliseconds) would be smaller than the characteristic repolarisation times, although an extremely intense synaptic bombardment might decrease the effective membrane time constant by an order of magnitude. In much the same way real neurones do not display genuine bifurcations because bifurcations are defined as structural instability points where the system becomes extremely sensitive to any perturbation (such as channel noise and synaptic fluctuations). Still in both cases considering limiting situations enable us to understand the foundations of
Neurones as physical objects: structure, dynamics and function
389
neural excitability. Exploiting a priori unrealistic limits because the system is then amenable to a theoretical study, and gaining in this way unforeseen knowledge on what happens in real situations is part of the art of the physicist. Still one must not expect too much from investigating such limits. Let us go back to the example of the lumbar motoneurone. Its steady-state f - I curve displays several zones (see Fig. 9). Near the current threshold Ith, the average firing frequency increases steeply with the driving current (this zone is hard to see on experimental f - I curves due to the proximity to the threshold). At higher currents the f - I curve becomes approximately linear (primary range, the f - I curve may also exhibit a steeper secondary range). The low frequency firing near the threshold may be investigated semi-analytically or numerically on conductance-based models. The onset of firing will then be interpreted in terms of bifurcation and correct predictions on f - I curves might be obtained in this way. However, the slow calcium-dependent potassium of motoneurones plays no role in this initial range of t h e f - I c u r v e , as the firing frequency is too low for after-hyperpolarization to build up and cause frequency adaptation. On the contrary this current plays the key role in the primary firing range where it controls the firing frequency and linearises the steady-state f - I curve. The cross-over between these two regimes occurs very close to the current threshold. Therefore bifurcation analysis gives us information on a small current range of no physiological interest. On the opposite primary range firing can be largely investigated analytically and numerically using integrate and fire models with slow currents, despite the facts that such models are not rigorously derived from conductance-based models.
2.3.4. Can we classify neurones according to their firing patterns? It is convenient and now traditional to divide neurones in several categories according to their firing pattern. Such classifying schemes originated with Arvanitaki [66] and are commonly used, as an alternative to functional classification, for samples of neurones recorded in vitro. For instance, cortical neurones were divided between fast spikers, regular spikers, and bursters on the basis of their firing pattern [119]. Neuronal populations are heterogeneous and even if the neurones of a given type share the same set of voltage-dependent currents, their membrane properties widely differ quantitatively. As a consequence the firing properties of tonically firing neurones within a population may display clear qualitative and quantitative differences. This issue has been thoroughly investigated by Gustafsson and Pinter [38] on spinal ~ motoneurones of anaesthetized cats. This study showed that the different subclasses of motoneurones innervating the different physiological types of motor units (slow, fast and resistant to fatigue, fast and fatigable) do not display in average the same membrane properties. Slow motoneurones tend to be more excitable, due to a higher specific passive resistance of their membrane and a smaller size. Moreover all morphological and electric properties investigated still vary by no less than 20-25 % among motoneurones of a given physiological type. As a consequence, the f - I curves of these neurones may present different behaviours at high injected current: continuation of the primary firing range, secondary firing
390
C. Meunier and I. Segev
range, saturation, etc. A similar study on tonically firing layer V pyramidal neurones revealed a similar situation. Two subclasses could be identified, corresponding to low and high input resistance cells but the different transient behaviours of large low resistance neurones (frequency adaptation, low and high threshold bursts) arose from a continuum of membrane properties and firing patterns. In addition the steady-state firing pattern of a neurone may drastically change with the input it receives, as different voltage-dependent conductances are brought into play. Neurones commonly display bifurcations between spiking and bursting regimes. Thalamo-cortical cells provide a perfect example of that: tonic excitation leads to spiking activity, but tonic inhibition provokes the de-inactivation of the low threshold calcium current IT and the activation of a non-specific sag current [120]. The post-inhibitory rebound mechanisms thus uncovered leads to bursting activity of the neurone. These two firing patterns are encountered in different physiological states (wakefulness and sleep) and are thought to subserve different functions: active processing of visual information versus rhythmogenesis in the thalamo-cortical system. The membrane properties themselves may dynamically change, leading to different firing patterns, as a result of neuromodulation, which exerts its pervasive influence in the nervous systems of both invertebrates and vertebrates. Spinal motoneurones provide one good example of this phenomenon in vertebrates. All the early data on these neurones were obtained on anaesthetized cats. There is now clear evidence on both slices of turtle spinal cord [121] and decerebrated cats that the active membrane properties of motoneurones are modulated by monoamines (serotonin, norepinephrine). This neuromodulation strongly affects the firing pattern of motoneurones, suppressing the after-hyperpolarization and leading to calcium plateau potentials and firing at high rate. Some motoneurones may even become bistable in such conditions and keep on firing after the excitatory stimulation is removed. The physiological role of motoneurone plateau potentials and bistability is still debated [122,123]. The interesting idea that the number of channels expressed by a neurone may also undergo slow plastic changes, dependent on the activity of the neurone, was investigated by Abbott and coworkers [124] on single compartment models of neurones. These authors postulated that the maximal conductances of depolarising currents were down regulated by activity whereas hyperpolarizing conductances were up regulated. The slow dynamics thus defined in parameter space may then present a stable fixed point, which corresponds to a fixed level of activity. This plastic mechanism, which make neurones "functionally" stable with respect to perturbations of their normal operating environment, can explain why stomatogastric cells in primary culture progressively recover their normal firing pattern, although dendritic and axonal processes have not necessarily grown back to their original extent [31]. Since then similar results have been obtained on neocortical pyramidal cells [125]. The detailed cellular mechanisms underlying this activity dependent modulations of channels expression or phosphorylation are still little known but it is likely mediated by variations of the intra-cellular calcium concentration.
Neurones as physical objects: structure, dynamics and function
391
2.3.5. Can the spatial structure of the neurone be neglected? All the analysis of the spiking mechanism presented above was based on lumped models of the perisomatic spike-initiation region. Is the spatial structure of the neurone then mostly irrelevant? Are single compartment models sufficient to understand neuronal firing patterns? What single compartment model should be used to best approximate the complexity of a real neurone? The actual spiking mechanism is complex and involves the interaction between different subunits with contrasted geometric and electric properties. Araki and Terzuolo [126] showed on anaesthetized cats that the action potentials of lumbar motoneurones are initiated at the initial segment of the axone. These action potentials then propagate both orthodromically down the axone and antidromically, which leads to an IS spike of reduced amplitude in the soma. This reduction is due to a strong increase in diameter from the axone to the soma. The remainder of the IS component of the spike is then actively regenerated at the soma giving rise to a full blown action potential, known as the somato-dendritic (SD) spike. Barrett and Crill [127] showed that somatic regeneration is due to a transient sodium conductance, which also boosts synaptic inputs. This conductance is not inactivated at the resting membrane potential whereas the axonal sodium conductance displays about 30% inactivation in the same conditions. The same author, together with P. Schwindt, also showed that the somatic membrane is endowed with a full set of voltagedependent currents: fast and slow persistent potassium current [128,129], low and high threshold calcium currents, hyperpolarization activated current [128,130]. One of the main roles of these currents is to pattern the discharge. In particular, the initial adaptation of the discharge [117], the low frequency firing in the primary range of the f - I curve and the low variability of the discharge (of the order of 15-25% in humans performing weak sustained muscle contractions) appear to be controlled by the slow calcium-dependent potassium current responsible for the after-hyperpolarization of the membrane. This current is indirectly triggered by the high threshold calcium current, which is activated by each successive SD spike. Therefore this patterning mechanism requires that full blown spikes be regenerated at the soma. This separation between an "axonal" subunit responsible for spike generation and endowed with fast membrane properties (small membrane time constant, fast transient sodium current), and a "somatic" subunit responsible for discharge patterning (adaptation, gain control, bursting) and exhibiting both fast regenerative membrane properties and slower subthreshold regulatory currents has been now shown on different classes of central neurones, in particular neocortical pyramidal cells. However the currents involved depend on the class of neurones. For instance, neither A-type potassium nor persistent sodium current seem to be present in spinal motoneurones, at variance with hippocampal and neocortical pyramids [7,8]. A similar separation between "axonal" and "somatic" subunits has also been demonstrated on invertebrate neurones [131]. Still assigning well-defined roles to these two subunits is perhaps somewhat artificial. Firstly, there is no clearly defined spike initiation zone. A recent in vitro study shows that the axone hillock and initial segment of hippocampal pyramidal cells display the same density of sodium chan-
392
C. Meunier and L Segev
nels at the soma [29], and thus constitutes neither a privileged hot spot for spike initiation, nor a booster zone that would help the IS spike to invade the soma. Secondly, there is some evidence that spike may be initiated at some Ranvier node or at the soma, as well as at the initial segment, and that the initiation zone may shift depending on the input received by the neurone [132]. In addition to the soma, dendrites also play a role in the regulation of the firing pattern, through both their passive and active membrane properties. It is well known that the current leak to the dendrites constitutes a major membrane recovery process. In lumbar motoneurones of anaesthetized cats, for instance, passive membrane properties (and essentially the load constituted by dendrites) are sufficient to repolarize the somatic membrane when an action potential is triggered. When the delayed rectifier potassium current is blocked spike duration is drastically increased (from about 1 ms to 5-10 ms) but membrane repolarization still occurs [133]. Active dendritic properties are known to play an important role in the complex spikes of Purkinje cells and olivary neurones (see Section 4). Moreover a neurone may respond differently to a transient synaptic input by firing a single spike or a burst of spikes through complex mechanisms involving both the dendrites and the axo-somatic region. This is best illustrated by mitral and tufted cells [134]. Weak excitatory stimulation of the distal tuft of primary dendrites elicits a single sodium spike that propagates back into the primary dendrite. On the other hand, strong excitation of the distal tuft provokes the firing of a sodium spike that elicit a burst of activity in the axo-somatic region. Similarly the interaction between synaptic input on dendrites and the back-propagating axo-somatic spike may initiate a burst response at the soma of pyramidal cells [135]. All these issues regarding the interaction between axone, soma and dendrites cannot be addressed in the framework of a lumped single compartment model of the neurone. For instance, the re-excitation of the axone of brainstem motoneurones [136] requires to model the axone hillock since the firing of a double impulse may be due to the reflection of the IS spike in this region [137]. Shifts in the spike initiation zone cannot be investigated without a fine description of the axo-somatic region from the soma to the first Ranvier nodes. Accordingly all these issue were mostly investigated using multicompartmental models [138]. Still it is not necessary to model the whole morphological and electric complexity of the neurone, and simple two-compartments models, for instance, can be quite effective. The most convincing theoretical study showing that the spatial segregation of excitable channels between the soma/axone region and the dendritic region can play an important role in the firing dynamics used such a model [139]. This work was grounded on a detailed model of hippocampal CA3 pyramidal cell incorporating 19 compartments [53]. The numerous parameters of this model (up to six active ionic conductances per compartment, each controlled by 10 parameters specifying the kinetics of voltage-dependent gating channel gating variables) were inspired by available experimental data. A network incorporating such model neurones was then shown to replicate several important aspects of the repertoire of rhythmogenesis observed in the hippocampus [140]. Traub et al. did recognize that the successful replication of these experimental observed behaviours depended on
Neurones as physical objects." structure, dynamics and function
393
specifying different ion channel types and densities for the soma and for the dendrites. Pinsky and Rinzel [139] obtained essentially the same behavioural repertoire of the network with their two compartments model of pyramidal cell. One compartment represented the soma and proximal dendrites and was equipped with the fast ion channels responsible for spiking (inward sodium, and delayed rectifier currents) whereas the other compartment represented the distal dendrites and contained slower calcium and calcium modulated currents. Depending on the electric coupling between the two ("soma" and "dendritic") compartments, a large repertoire of firing patterns, from very low frequency bursting to complex periodic orbits was obtained. This work was later extended by Mainen and Sejnowski [141] to a variety of neocortical neurones. These authors showed that the firing pattern of these heterogeneous models critically depends on the extension of the dendritic tree (i.e., on the ratio of dendritic and somatic membrane areas). In response to a steady current input, the firing repertoire ranges from low frequency bursting in the large layer V pyramidal neurone to a high frequency repetitive firing in the small layer III aspiny stellate cell. Similarly a two-compartment model was used to investigate the hypothesis of a dendritic origin for the bistability of lumbar motoneurones [142]. The results of all these studies suggest that at least two compartments per neurone may be needed for replicating the firing behaviours found experimentally. A single lumped compartment, with all of the ion channels in parallel, might not produce the same behaviour. Still this issue deserves further clarification.
2.3.6. W h y do n e u r o n e s f i r e so regularly? Neurones recorded in vitro or in vivo in
anaesthetized animals generally display regular spiking or bursting patterns in current clamp experiments, once transients have died out. These behaviours can be described by Hodgkin-Huxley-like nonlinear differential systems. Still one knows that first-order non-linear differential systems of dimension three or higher may be intrinsically stochastic: although deterministic, they can display an aperiodic behaviour with a sensitive dependence on initial conditions (characterized by a positive Lyapunov exponent). Why then is neuronal dynamics, which typically involve from one to ten voltage-dependent currents, so simple? Is it a general feature of dynamical systems with a small number of degree of freedom, which exhibit robust aperiodic behaviour less frequently than one might expect? Are there some biological constraints, that underlie the periodic firing behaviour of neurones? The most famous example of a low-dimensional system displaying intrinsic stochasticity is probably the Lorenz model [65,143] dx 8-7 =
- x),
dy d t = 9x - y - xz, dz dt =
+ xy.
394
C. Meunier and L Segev
This system of three non-linear differential equations, obtained by truncating the Oberbeck-Boussinesq for fluid convection in a two-dimensional layer, depends on three control parameters: the Prandtl number c~, the Rayleigh number P and the aspect ratio ]3. It displays a transition to an aperiodic behaviour for P "~ 24.74 (when cy = 10 and [3 -- 8/3 as in Lorenz's original paper). On the basis of such examples many people are tempted to believe that non-linear systems - e v e n low-dimensional o n e s - generically display an aperiodic behaviour over a large part of their parameter space. The Lorenz model constitutes a valid approximation of the Oberbeck-Boussinesq equations only near the transition from conduction to stationary convection (for P ~ 1). For higher values of the Rayleigh number (when Lorenz's equations exhibit an aperiodic behaviour) finite-dimensional approximations of the OberbeckBoussinesq equations require many more modes. Therefore Lorenz's model, in spite of its conceptual importance, does not enlighten us much on the physical determinants of aperiodic behaviour. To shed some light on this question, one must go back to real convective systems. Convection experiments in small boxes consistently display a transition from conduction (stable fixed point) to regular stationary or oscillatory convection (stable limit cycle on which one frequency or two frequencies oscillations are performed) and then to turbulent convection (strange attractors) when the Rayleigh number is increased. Their behaviour is governed by partial differential equations but the actual dynamics takes place on a finite-dimensional manifold, the dimension of which increases with the Rayleigh number. Accordingly the periodic behaviour exhibited at high values of the Rayleigh number by the Lorenz's model - which becomes integrable in the limit where p -~ e~ - is not observed in convective systems. Spatio-temporal convective patterns also depend on the size and aspect ratio of the box. In large boxes, stationary convection patterns exhibit topological defects and slow aperiodic motions of these structures (phase turbulence) are easily excited. The actual dynamics of such extended systems no longer involves a finite number only of degrees of freedom. We can sketchily summarize all these experimental results as follows. When the stabilizing effect of diffusion and viscous boundary conditions dominate the dynamics, the system settles in a steady state of conduction. On the opposite, when the dynamics is governed by the non-linearities, fully developed turbulent convection sets in. Between these two extreme regimes a sequence of bifurcations leads to more and more complex dynamical states as the control parameter is increased. At some point a transition from periodic to aperiodic behaviour occurs. The effective dimensionality of the dynamics increases with the actual amount of non-linearity, decreases when boundary conditions more efficiently constrain the dynamics, and can also be reduced by the existence of invariants. What specificity of neurones may preclude such a systematic transition to aperiodicity when the injected current is increased? An aperiodic behaviour is not expected at the onset of spiking as the disappearance or destabilisation of a stable fixed point generically involves only one or two degrees of freedom. But why do we not observe frequently, for instance, transitions from quiescence to tonic spiking, regular bursting, and then aperiodic firing? Although neurones are extended and display a complex morphology, their firing dynamics is low-dimensional. Spiking
Neurones as physical objects: structure, dynamics and function
395
can be well described by a finite set of differential equations, arising from the voltage-dependent kinetics of the ionic currents present in the spike initiation zone. The rest of the neurone, and particularly dendrites, acts largely as a linear current sink that dampens voltage fluctuations in the spike initiation zone (see Section 2.3.5). Moreover the non-linearity associated with any given ionic current operates only in a restricted voltage range. Ionic currents that are inactivated or de-activated play no role in the dynamics, and the sigmoid shape of steady-state activation (and inactivation) curves entails that at large voltages the activation of any depolarization-activated ionic current saturates at its maximal level. The voltage-dependent current then acts on the dynamical behaviour of the neurone as the passive leak c u r r e n t - that is, as a linear current term. Consequently, the non-linearity character of the neuronal dynamics is expressed only in a limited voltage range. For more hyperpolarized or depolarized membrane potentials neurones behave as linear systems. Another feature of neuronal dynamics is worth being commented upon: the existence of homoclinic orbits. Spikes of large amplitude and vanishing frequency at the current threshold are observed when the neuronal dynamics exhibit an homoclinic orbit (see Section 2.2.2). Homoclinic and heteroclinic orbits are rare in physical systems, as they are not structurally stable features of the dynamics. They generally disappear as soon as the system is perturbed. When they are present they are often associated with aperiodic behaviour [144-146], for instance, with spiral type strange attractors [147]. How can homoclinic orbits be at the same time so common in neuronal dynamics and not associated with an aperiodic behaviour? The existence of an heteroclinic orbit in the saddle-node bifurcation scenario stems from the global stability of the resting state. If other fixed points than the resting state exist and the system displays no bistability, these fixed points must be unstable and perturbations away from them must lead to the stable resting state. An invariant circle then exists, which contains the stable resting state, and on which the saddle-node bifurcation must take place. This leads to a two-dimensional dynamics, once irrelevant stable degrees of freedom are dismissed (see Section 2.2.3, which precludes aperiodic dynamical states. We also note that, although small perturbations of the associated dynamical system generally preserve the bifurcation scenario of a neurone, homoclinic orbits exist only for bifurcation values of the control parameter (at which one eigenvalue of the linearized vector field around the fixed point vanishes). We never encounter a situation where a homoclinic orbit is biasymptotic to a hyperbolic fixed point. In the saddle-node scenario, the fixed point disappears as soon as homoclinicity is achieved. In the saddle-loop scenario the limit cycle and the unstable fixed point move apart as soon as the injected current is increased away from the bifurcation value. Therefore the applicability conditions of Shilnikov's theorem [144] are not fulfilled. The above discussion explains why aperiodic behaviour is not observed at the spiking transition, why it is unlikely to be widely achieved in neuronal dynamics, and why it is therefore reasonable to think primarily of neurones as periodic nonlinear oscillators. However aperiodic behaviour is by no means impossible. Transitions from tonic firing to aperiodic bursting have been, for instance, investigated in models of thalamo-cortical cells (see also [96,148] for discussions of this issue,
C. Meunier and I. Segev
396
respectively, in the context of pancreatic [3 cells and neurones). In particular the three-dimensional model of burster proposed by Hindmarsh and Rose [87] displays intermittent bursting at this transition [149], due to the existence of orbits homoclinic to an invariant but unstable state of aperiodic spiking [150]. Other types of bifurcations between periodic and aperiodic firing states certainly exist.
3. Conduction and distribution of action potentials 3.1. Action potential conduction 3.1.1. Linear and non-linear effects
Action potential conduction on a homogeneous squid axone is described by the full Hodgkin-Huxley system i~V d i~2V cm i~t -- ~4 p~ ~x2 "[- gleak(Vleak -- V) --[- gNam3h(VNa -- V) q- gKn4(VK -- V), dm
~m -~ -- m~:(V) - m,
dh
"ch--~= h~c(V) - h, dn
z , - - ~ = n:~(V) - n,
(l 1)
(12) (13) (14)
where d is the diameter of the axone, p the axoplasmic resistivity, C m the specific membrane capacitance, and gleak, gNa and gK are the specific conductances of the passive and active currents involved. The additional term with respect to the spaceclamped version of the Hodgkin-Huxley Eqs. (3)-(6) arises from the axial current that tends to equalize voltage gradients along the axone (see Section 4.2). Numerical integration of these reaction-diffusion equations using the shooting method enabled Hodgkin and Huxley [3] to account for both the shape and conduction velocity of action potentials travelling down the squid giant axone. Action potentials are stable solutions of this diffusion-reaction differential system, that preserve their shape and amplitude while propagating along the axone. This results from the competition between a regenerative non-linear process (sodium current activation) and linear dampening effects: relaxation to the resting membrane voltage (leak current) and passive diffusion (the axial current). Accordingly perturbations away from the stationary profile are quickly damped as the action potential travels filong the axones. However action potentials are just solitary waves, and not genuine solitons [151]. Because of the temporary refractoriness of the membrane in the wake of action potentials, collision between an orthodromic spike and an antidromic spike travelling backwards does not result in a mere phase shift but provokes the mutual annihilation of both spikes. NON-LINEARITY is not balanced by linear dispersion in the Hodgkin-Huxley system, which is not fully integrable (by the inverse scat-
397
Neurones as physical objects." structure, dynamics and function
tering transform): the Hodgkin-Huxley equations are dissipative, and perturbations from the spike solution are not radiated away but linearly damped. Action potentials accelerate near axonic terminals due to the sealed end that reflects the axial current but they travel at a constant speed along an ideal infinite axone. The fact that spikes propagate with a unique velocity along a given axone may be shown as follows. Let us focus on the leading edge of the spike. Its propagation is similar to the propagation of a front [152] that replaces the quiescent solution (V - Vrest, all gating variables at their corresponding steady-state values) by a depolarized solution (V = ~'~epol) where the voltage comes close to the Nernst potential of sodium ions [153,154]. For simplicity we shall not take into account the slower membrane repolarization that follows the action potential (i.e., we shall assume that h and n retain their resting values through the leading edge) and shall assume that the fast sodium current activation is instantaneous. These assumptions, which lead to an overestimation of the conduction velocity, were done in most studies of axonal conduction [154]. The Hodgkin-Huxley system (11)-(14) then reduces to the single partial differential equation OV
d ~2V
Cm ~t = 4--p~x 2 nt- gleak(~eak
-- V) nt- g N a m 3 ( V ) h o c ( V r e s t ) ( V N a
-
V)
+ gKrt4(Vrest)(VK -- V).
After renormalization of the leak parameters l/leak and 9leak to Vleakand 9leak SO as to incorporate the potassium current, it can be rewritten as OV d ~2V cm ~t = - -4~ P ~x2 -t- gleak(I)L ( V ) + gNa(I)NL(V),
(16)
where 0~a is the effective sodium conductance at the resting potential (i.e., the total conductance of de-inactivated channels) 9Nah(Vrest), (I)L(V) = ( g - Vleak) and q~NL(V) = m3,~( V ) ( V - VNa). Introducing the time constant r - Cm/gleak, the space constant ) ~ - xfd/4pgl*eak that characterizes the spatial decay of potential, and the dimensionless parameter 13= 9~a/91eak that quantifies the non-linearity, and going to dimensionless electrotonic variables X - x/)~ and T -- t/~ we have
OV
~2V
~---~= ~X 2
OI.(V) - [3~yL(V).
(17)
A stable front propagating a constant velocity v must be stationary in its comoving frame. Therefore it is solution of the steady-state equation dV d~---5- -- - V-d~ + ~L ( V) + ~@NL ( V),
dzV
where the dimensionless quantity V is the velocity v expressed of )~/~ (V-v~/)~), and the new "space variable" is ~ = X - V T , also dimensionless. In addition the front must satisfy the conditions V(-ec) = Vdepo and V(oc)-- Vrest. Introducing the
(18) in units which is boundary potential
398
C. Meunier and L Segev
W( V) -- -- ~rest [(~z ( U) "-~ p(~NL(U)]dU w e s e e that equation (18)is identical to the motion equation of a unit mass particle moving in the potential well W(V) and submitted to a fluid friction of coefficient V (see Fig. 15). When [3 = 0 there is no non-linearity in the problem and W(V) is a parabola. When 13 is finite but small, W(V) still displays a single maximum near the resting potential. This is the weakly non-linear regime where no propagation of front is possible and the effect of nonlinearities can be computed perturbatively. On the contrary at a critical value 13c of 13 the function W(V) develops two local maxima. In this strongly non-linear regime where non-linear effects are no longer perturbative, propagation of a depolarising front is possible. The voltage profile V(~) of the front in the comoving frame corresponds to the trajectory starting at "time" ~ - - - e ~ from the higher maximum V-- Vrest to reach at "time" ~ = ~ the lower maximum V = ~/depol. Such a heteroclinic solution connecting the two unstable fixed points exists only for a specific value of the friction coefficient V, which depends only the shape of W, that is, on the non-linearity parameter [3 and the steady-state activation properties of the sodium current. In turn V uniquely determines the conduction velocity v. This simple analysis shows that the conduction velocity is proportional to the quantity X/z, which could have been anticipated as spike propagation is driven by
7 10 s
' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' '
W (mY 2)
6 10 s 5 10 s 4 10 s 3 10 s
1 300
2 10 s
/
1 10 s
...................
,oo
/
/_
/
_ 0
11| . . . . . . . . . . . . . . . . .
-75 ,,,|11,111,111,1,,,1.,*1,,.1'',
-1 10 s -80
-60
-40
-20
0
-70
20
-65
40
-60
60
-55
80
V (mV) Fig. 15. Potential well W(V) for the Hodgkin-Huxley model. Parameters are for T -- 6.3~ W(V) is displayed on the voltage range from VK = -77 mV to VNa = 50 mV. W presents two local maxima (circles), one at the resting membrane potential (V = -65 mV, see insert), the second near VNa (V - 48 mV). The moving depolarisation front (see text) corresponds to the trajectory connecting these two maxima.
Neurones as physical objects: structure, dynamics and function
399
voltage diffusion. Therefore spike conduction velocity can be written under the form v -- Kv/dgm/4pCm where gm is the specific passive conductance of the membrane, Cm its specific conductance, and K is a dimensionless factor. This formula indicates that conduction velocity should increase with the square root of the axone diameter. It is difficult to go much beyond the derivation of such scaling laws and to predict quantitatively the actual value of spike conduction on an axone. Indeed, the above analysis must then be extended to include the finite activation time of the sodium current and the recovery processes. Moreover the dimensionless factor K depends in a complicated way on the active properties of the membrane. That is why most studies of axonal conduction have further assumed that steady-state activation was described by a Heaviside function (null below a voltage threshold Vth and equal to unity above it) or was a polynomial function of the membrane voltage, in order to make the problem analytically tractable and to establish the explicit dependence of the conduction velocity on the non-linear membrane properties (see [154] for a review). Still the schematic analysis presented above makes clear that the stable propagation of the spike leading edge at a unique velocity results from the existence of two metastable states of the axonal membrane. Propagation in the limiting case of a nonlinearly unstable medium would present very different features [155]: growth and spread of small perturbations leading to "pulled fronts" with a continuum of velocities. We also note that the linear stability of the front can be easily studied [156]. One is then led to a Schroedinger eigenvalue problem. Finally we note that this simplified model does not account for all propagation phenomena on the axone. Indeed the numerical integration of the original Hodgkin-Huxley equations reveals a second subthreshold travelling wave of low velocity [42,157,160], which is unstable and has therefore probably no physiological relevance. Still it is not explained in the framework of our simplified model. In particular it has no relationship with the high velocity trajectories which reach the local minimum of W(V). 3.1.2. Localized and distributed non-linearities." dealing with space scales
Myelination of the axone, which is common in vertebrates but rarely occurs in invertebrates, results in an increase of conduction velocity by one or two orders of magnitude. Indeed the insulating myelin sheath drastically decreases the membrane capacitance, increasing accordingly the passive diffusion constant of the axone D = d/49Cm and the conduction velocity [161,162]. In vertebrates spike regeneration generally occurs at hot spots known as Ranvier nodes, that display a high density of sodium channels [163-165]. They seem essentially deprived of potassium channels, except in the paranodal zones, and fast membrane repolarization is achieved by the efficient flow of axial current along the axone. Still Ranvier nodes are not the only form of hot spots found. In shrimps of the genus Penaeus, for example, high conduction velocities (of the order of 200 m/s according to [166]) are ensured by another form of fenestration of the myelin sheath of the axones. In the axone of Mauthner cells regeneration seems to occur on the short unmyelinated "thorny" collaterals that contact spinal neurones [167]. Spike conduction in such heterogeneous media is said to be saltatory as regeneration occurs at well-separated spots. Many space
400
C. Meunier and L Segev
scales are brought into play during saltatory conduction: length of a node (about 3 lam) and spacing of nodes, local space constant (smaller at the leaky Ranvier nodes than in the insulated internodal zones), average space constant, length of the depolarized region, total length of the axone (up to several tens of centimetres). Unfortunately there is still a dearth of theoretical studies capitalizing on the ordering of these space scales and on the approximately periodic structure of the myelinated axone to build a rigourous theory of conduction on myelinated axones. To a large extent we are still relying on Rushton's scaling arguments [168] and multicompartmental numerical simulations. We also note that there is some experimental evidence that non-myelinated axones are heterogeneous in invertebrate species. Clusters of sodium channels can be detected, for instance, on Aplysia axones [169]. Conduction velocity is probably enhanced by such heterogeneities. However the space scales involved (length and spacing of clusters) differ by less than one order of magnitude, at variance with myelinated axones. Can we still speak of saltatory conduction in such cases? This is an open question as no precise definition of saltatory conduction was ever given, and no theoretical study has ever been devoted to spike conduction on cables with smoothly modulated membrane properties. Finally we note that ultrastructural studies on axones show varicosities of varying diameter and length. Moreover these varicosities are not located at regular intervals. Similarly, internodal zones on a myelinated axone have generally not all the same length. As a consequence, axones are not only heterogeneous: they constitute spatially disordered media. A clear spatial periodicity represents the exception rather than the rule. 3.2. Spike conduction failures 3.2.1. Are axones reliable transmission lines? Axones as well as dendrites display complicated geometries (see Fig. 1): branchings in several principal branches, profuse terminal fields in the projection zones, strings of en passant boutons, etc. Nonetheless, whereas many researchers believe in a complex, organized and physiologically relevant processing of the synaptic inputs impinging on dendrites, the idea - originating from Ram6n y Cajal's work [1] - that the information elaborated by a neurone propagates along the different branches of its axonal arborization and reaches all its neuronal targets, contacted at terminals or en passant boutons, has long received a wide acceptance. Still more and more evidence show that the action potentials trains emitted by a neurone do not systematically elicit the same post-synaptic effects in all the targets of its axone [22,23]. This may arise from a combination of post-synaptic, synaptic and pre-synaptic effects (see Fig. 16). At the post-synaptic level, the response elicited drastically depends on the nature of the receptors involved and is also shaped by the passive and voltage-dependent properties of the neuronal membrane. At the synapses themselves, depression and potentiation phenomena have now been shown to be at work at various time scales. They may be due to transmitter depletion, to the intrinsic processes leading to transmitter release or involve metabotropic auto-
Neurones as physical objects." structure, dynamics and function
401
Presynaptlc Inhibition: reduction of spike amplitude. Active membrane properties: channel noise, subthreshold conductances. Impedance mismatch at branching points and varlcoeltlee: delays, conduction failures, reflections.
Synaptic dynamics: transmission failures, depression, facilitation.
Membrane properties of post-synaptic neurones: receptors, voltage-dependent conductances
Fig. 16. The same spike train may elicit various effects depending on the post-synaptic neurone contacted. This stems from: (i) the combined operation of diverse linear or nonlinear presynaptic mechanisms linked to the geometric or the electric properties, intrinsic to the axone or resulting from the activation of axo-axonic synapses (in blue on the figure); (ii) from the intrinsic dynamics of synapses (in red); and (iii) and from differences in the nature and density of ligand-gated, voltage-gated and leak channels on the membrane of the post-synaptic neurones (in green). receptors located on synaptic boutons. Moreover it is now clear that many synapses are little reliable: action potentials do not consistently elicit post-synaptic potentials in the target neurones (no more than a few percents of the spikes generated by some hippocampal neurones have post-synaptic effects). But differential effects on post-synaptic neurones might also arise from presynaptic mechanisms occurring upstream from the synapses. Due to conduction failures of branch points, action potential trains might altogether fail to propagate in certain branches (conduction blocks). In an intermediate regime between reliable conduction and full conduction block, spike trains might undergo frequency changes, a fraction only of the spikes invading successfully certain branches. Even if conduction does not fail at branch points or at varicosities, individual spikes might still be reduced in amplitude. This raises the important issue, at least from the
C. Meunier and L Segev
402
conceptual viewpoint, of whether spike conduction is as reliable as it is generally believed [170]. The first demonstration that conduction blocs could occur in the peripheral nervous system was provided by Krnjevic and Miledi [171] for the intramuscular arborization of rat motor axones, where synaptic transmission itself is fully reliable. They demonstrated that repetitive stimulation of a motor axone in the phrenic nerve failed to excite all the muscle fibres constituting a motor unit of the diaphragm. Later on, it was suggested by Henneman that similar conduction blocks also took place in the central nervous system of mammals. The variability in the size of EPSPs in 0~ motoneurones elicited by the stimulation of a single Ia fibre (these myelinated fibres innervate neuromuscular spindles) were consistent with the existence of conduction failures [11] although alternative explanations could not be readily dismissed. More conclusive experimental evidence for failures was obtained recently on cultures of embryonic dorsal ganglion cells [172]. 3.2.2. Electrotonic structure, cable theory and impedance m&match Conduction failures have generally been considered to result from an impedance mismatch at points where the geometric properties of the axone abruptly change: branch points [173] where a parent branch divides into daughter branches, bottlenecks, varicosities (associated, for instance, to en passant boutons, etc.) The underlying idea is that the axone may not be sufficiently depolarized beyond the mismatch point to sustain spike propagation through the activation of the regenerative sodium conductance. Let us first examine the electrotonic properties of an ideal uniform and semiinfinite axone. If voltage-dependent conductances of equations are not taken into account in Eq. (11), membrane voltage evolves according to the passive cable equation [174]
8V d 82V Cm 8----t= 4 P ~x2
gm V = O,
which provides a good approximation of the axone's behaviour in the subthreshold voltage range (see Section 2 for more on the cable equation, in the context of dendrites [175]). If we then assume that a constant current I is injected at the open end of this cable, the steady-state membrane voltage is given by the linear equation d
82V
4 Pgm 8x2
V-0
together with the boundary condition dV
49
dx
~zd2
so that V is proportional to the injected current x
Neurones as physical objects." structure, dynamics and function
403
In that simple steady-state situation the cable impedance Z is just the input resistance, Rinput, that is, the inverse of the input conductance Ginput -- gm
~zdX- -~
d3/2
.
(19)
Let us now investigate the case where two semi-infinite cylinders with different diameters dl and d2 abut at x - 0. They are characterized by different space constants ~1 and ~2. Along the first cylinder (that is, for x < 0) the voltage reads
V(x) - Aexp( - ~11)
+ RA exp
(-~~) .
The second term can be interpreted as a reflected solution spreading backward from the discontinuity point x - 0. The factor R is the corresponding reflection coefficient (on amplitudes). On the second cylinder (that is, for x > 0) the voltage can be written
V(x) -TAexp (- -~2) where we have introduced the transmission coefficient T. Using the continuity of voltage and current at x = 0, one easily establishes that R-T--
22 -- Z1 El -~-22
2Z2 Z1 -]-22
q - d~/d2
= =
1"13/2-- 1 113/2 -~- 1 2q3/2 1]3/2 + 1 '
where is the ratio of the diameters. When the second cylinder is thinner than the first one (q >~ 1), transmission is always possible; R and T both increase with q and tend, respectively, to 1 and 2 when q goes to infinity. The situation is different when the second cylinder is thicker than the first. Then R decreases with q from R - 0 when the two cylinders are identical (q - 1) and impedance match is perfect (no reflection, full transmission: T - 1) to - 1 when the second cylinder becomes extremely thick (q - 0) and no transmission is possible (complete reflection, T - 0). This suggests that a large increase in diameter will lead to a conduction failure as the current spreading axially will then not be able to depolarize enough the axone beyond the discontinuity to activate the transient sodium current. Still this prediction must be checked on the full non-linear Eqs. (11)-(14). Non-linear cable equations for a medium displaying geometric heterogeneities are not readily amenable to analytical studies. Accordingly conclusions on conduction failures were derived from the numerical integration of the equations underlying spike propagation on piecewise homogeneous media. For the extended Hodgkin-Huxley model, single spikes fail to propagate when the diameter ratio between two abutting sections of the axone exceeds 5 [60,158]. Consequently, only very thick varicosities may hinder reliable spike conduction along an unbranched portion of the axone. We note that
C. Meunier and L Segev
404
transmitted spikes are nonetheless delayed by their passage through varicosities [158,159]. We can now understand what happens at branch points, by using the concept of equivalent cylinder introduced by Rall. Suppose one semi-infinite cylinder of uniform diameter dl splits into two semi-infinite branches of diameters dz,a and d2,b, with corresponding space constants ~.2,a and ~2,b- One can then replace these two branches by a single semi-infinite cylinder of diameter d2 and space constant ~2 (note that in the finite geometry such a reduction would also require that the two daughter branches have the same electrotonic length). The (local) condition for the equivalence of the two structures is that their impedance be the s a m e : Z 2 = Z2,a -~ Z2,b. This requires that d2 /2
,43/2 ,43/2 = '~2,a -+ "2,b "
(20)
Then the voltage on a daughter branch at a given electrotonic distance from the branch point will be the same as the voltage on the equivalent cylinder at the same electrotonic distance
and the above analysis of impedance mismatch at geometric discontinuities will still be applicable. This simplified analysis of conduction blocks must now be qualified. Firstly we note that conduction of spike trains is less reliable than single spike conduction. Impedance mismatches that cannot produce a full conduction block may lead to a rhythm transformation where, for instance, only every other spike in a train is transmitted down the axone (see, for instance, [160,176]). Secondly we remark that conduction blocks are favoured by abrupt changes in the geometry of the axone. On the contrary impedance adaptation may be achieved, in spite of large diameter increases, when the axone progressively flares or tapers. A good example is provided by the invasion of the soma (which can be as big as 50-70 I~m in the case of 0t motoneurones) by the IS spike, generated on the much thinner initial segment of the axone. This back-propagation is made possible by the flare of the axone hillock and the electrotonic compactness of the soma. Such smooth changes in the geometric properties can still be addressed in the framework of cable theory (see Section 2). One last aspect that must be considered is finite geometry: For example, short daughter branches lead to enhanced depolarizations (due to the sealed end boundary condition), which makes conduction failures at branch points more difficult. We have focused on the geometric properties of the axone but its electrotonic architecture is also determined by the specific conductance of the membrane. Moreover varicosities associated with en passant boutons are likely to display membrane properties different from the rest of the axone. Therefore one may wonder whether conduction failures might arise from electric rather than geometric heterogeneities, and be controlled by the electric state of the membrane. It is obvious from Eq. (19) that heterogeneities in the specific membrane conductance leads to
Neurones as physical objects: structure, dynamics and function
405
impedance mismatches: a conductance increase, as well as an increase in diameter, may hinder voltage spread along the axone. Impedance is less sensitive to conductance changes (by a cubic factor) but increasing the conductance also acts by modifying the balance of linear dampening terms and non-linear regenerative terms in the full non-linear problem. It has been recently suggested [51] that the de-inactivation of an A-type potassium current present on the axonal membrane might provoke conduction failures through such a shunt effect, which would explain the differential conduction observed in pyramidal cells in rat hippocampal slice cultures following somatic hyperpolarization [51]. This hypothesis was investigated by the numerical simulation of a detailed model of the axonal arbour where clusters of A-type channels near branching points indeed led to differential conduction [177]. However a high density of A-type channels at hot spots was required, to compensate for voltage decrement between the soma and the hot spot, and the results obtained were very sensitive to the exact value of the channel density. A fine tuning of model parameters seems to be required to account for the experimental observations. All this casts some doubts on the generality of this filtering mechanism, which is unlikely to operate if axones have a small diameter, are poorly myelinated, or branch far from from the soma. Nonetheless this study raises the interesting issue of whether spike trains propagating on axones are affected by voltage-dependent currents, that not directly involved in non-decremental conduction, very much like somatic subthreshold currents pattern the discharge of neurones. Few authors have yet tackled this problem [178], probably due to the dearth of experimental data on such axonal currents and to a relative disinterest for axonal information processing in the past 20 years.
3.3. Presynaptic inhibition 3.3.1. Inhibition mediated by axo-axonic synapses In 1957 Frank and Fuortes [179] demonstrated on anaesthetized cats that the EPSPs induced in spinal motoneurones by the stimulation of proprioceptive Ia fibres could be reduced, without any evidence of direct inhibition of the motoneurones. They suggested that this reduction was due to presynaptic effects taking place on the Ia fibres themselves and accordingly coined the term presynaptic inhibition to describe this phenomenon. Presynaptic inhibition has now been extensively studied in the spinal cord of mammals and was shown to affect not only Ia fibres but more generally all myelinated sensory fibres, that provide peripheral inputs to the spinal circuits involved in motor control (see [180,181] for a review): group Ib fibres innervating Golgi tendon organs [182,183], group II [184] muscle fibres and group A cutaneous fibres (innervating, respectively, the Golgi tendon organs and the secondary endings of spindles). Presynaptic inhibition has been also widely studied in invertebrate species [185-187] where it also affects sensory fibres. Eccles et al. showed [188], still on group I afferents, that presynaptic inhibition was accompanied by a depolarization of the fibres, that they called primary afferent depolarization (PAD). On the basis of electrophysiological and pharmacological experiments, they claimed that presynaptic inhibition of Ia fibres was due to the
406
C. Meunier and L Segev
activation of axo-axonic synapses [189] - which provoked both a local increase in the conductance of the axone and the observed PAD - and that it was mediated by GABA [189,190]. There is now accumulated ultrastructural evidence of such axoaxonic synapses on proprioceptive fibres [191-194] (see [195] for a review) and on group II fibres. The hypothesis that axo-axonic synapses are GABAergic is supported by recent immunolabelling studies [196,197] (see Fig. 17). The sensitivity of presynaptic inhibition to GABAA antagonists indicates that the activation of axoaxonic synapses open post-synaptic chloride channels [198]. Strangely the activation of axo-axonic synapses depolarises the axone, which points to an inverted Nernst potential of chloride ions (see [186] for a study of this issue on the crayfish), perhaps due to the operation of a Na +/K +/C1- pump [199].
3.3.2. Selectivity of presynaptic inhibition For a long time presynaptic inhibition was thought to regulate the gain of sensory pathways [180,181], which cannot be done easily through direct post-synaptic inhibition of target neurones [114] (see Section 3.3), and to gate sensory information so that only the most relevant information in a given physiological context is transmitted to the spinal circuits. For instance, presynaptic inhibition was shown to gate the transmission of proprioceptive information generated by Golgi tendon organs during muscle contraction [183,200]. Consequently, only the onset of contraction and large increases of the force are signalled to motoneurones by inhibitory postsynaptic potentials. The adaptation to the specific requirements of the motor task and to the conditions in which it is executing is made possible by the wide convergence of information - from peripheral, spinal, and supraspinal origin - on the interneurones that mediate presynaptic inhibition [180,181,201-204]. However it became considered only in the last decade that presynaptic inhibition might achieve a selective distribution of sensory information to the different targets of sensory fibres [205-210], and that this control might occur down to the level of the individual fibre [208,211,212]. The first example of selective effects at the segmentary level came from experiments on man: presynaptic inhibition was shown to decrease in the branches of Ia afferents that project to a contracting muscle, and to increase in branches projecting to non-contracting muscles [206,213]. 3.3.3. Mechanisms." shunt and PAD How can the activation of axo-axonic synapses result in a graded and flexible funnelling of information in the axonal arborization of sensory neurones? Eccles and coworkers suggested that the depolarization of terminals reduced the amplitude of action potentials and therefore transmitter release [189] (see review in [180,181]). They also pointed out that the opening of synaptic conductances might contribute to presynaptic inhibition by shunting other ionic currents. Experimental evidence that these mechanisms can indeed explain presynaptic inhibition of afferent fibres cannot be readily obtained in vertebrate preparations and the study of this issue greatly benefited from computer studies. Segev [176] showed that the synaptic shunt could reduce the amplitude of action potentials. More recently it was shown that PAD could further reduce the peak voltage of action potentials by enhancing the in-
Neurones as physical objects." structure, dynamics and function
407
Fig. 17. Tridimensional reconstruction of a Ib collateral. The collateral (red) bears 12 presumed GABA immunoreactive contacts (yellow spheres), which constitutes only one fifth of the total complement. They are located in terminal fields. D, R, and L, respectively, indicate dorsal, rostral and lateral directions. Length of each axis, 100 lam. Reprinted from [197] (Fig. 1).
408
C. M e u n i e r a n d L S e g e v
activation of the sodium current [214,215]. The fact that membrane depolarizations through current injection does reduce the size of action potentials travelling on sciatic group I fibres brings some indirect experimental support to this numerical result [197]. We note a fundamental difference between these two possible modes of operation. The synaptic shunt increases the linear passive membrane conductance and thus opposes the regenerative processes at work during the upstroke of the action potential. On the contrary the PAD directly interferes with the non-linear kinetics of voltage-dependent currents and decreases (by enhancing inactivation) the effective sodium conductance without affecting the passive membrane conductance. This mechanism which directly exploits the non-linearity of the membrane can be very efficient, provided that steady-state sodium channels inactivation already occurs at the resting membrane potential. These two effects of presynaptic inhibition are easily seen on the simple model presented in Section 1 where a depolarization front rather than a spike travels at constant velocity along a uniform and infinite non-myelinated axone. Let us add a uniform synaptic term gsyn(Esyn- V) to the right-hand side of Eq. (15). It depolarises the membrane by an amount AV, which depends on both gsyn and Esyn, so that the resting membrane potential now becomes Vrest -~- AV, at which value steadystate sodium current inactivation, h~, and steady-state potassium current activation, no~ must now be evaluated. Renormalizing the leak parameters to incorporate now both the potassium current term and the synaptic current term, we still obtain Eq. (17) but now the non-linearity parameter 13 reads _
.
_
_
gNahcx~(grest -~- AN) gleak -+- gsyn -+- gKn4(grest -t- A V)
instead of
gNahee (grest) gleak -+- gK/14 (grest) previously. If synapses are purely shunting [176] (Esyn--0) no depolarization is generated (AV = 0), and the non-linearity parameter ]3 decreases with increasing synaptic conductance. The dependence of ]3 on gsyn is sigmoid. Small synaptic shunt (with respect to the intrinsic passive properties) has no significant effect on spike conduction whereas large shunt strongly reduces ~ which becomes smaller than ]3c. Spike regeneration is then impossible. When the synapses are depolarizing- as in the case in actual presynaptic inhibition where Esyn is thought to be of the order of 30 mV above r e s t - the depolarization A V they create enhances sodium inactivation and, to a lesser extent, potassium current activation, which further reduces 13. For small synaptic conductances (gsyn muchsmaller than gleak), synapses almost function as current injectors and it is easily shown that the effect of the depolarization on the non-linearity parameter is larger than the effect of the shunt by the factor (Es~ - Vrest)h~ (/'/rest)/ho~(Vrest). Still small PADs have obviously a very limited impact on action potentials [214]: a drastic effect is expected ofily for depolarizations of the order of 10 mV [52,197].
Neurones as physical objects: structure, dynamics and function
409
Two experimental facts suggest that such large depolarizations do occur on proprioceptive fibres: Firstly PAD can reach values high enough to trigger antidromic spikes [202,216]. Secondly depolarizations of the order of 1 mV are frequently observed during intra-axonal recordings of group I fibres on anaesthetized cats. The amplitude of PAD will be one order of magnitude larger near the terminals if they are located at no more than 3 space constants from the recording site. This condition is not very stringent as the intra-spinal portions are almost fully myelinated [193].
3.3.4. Functional compartmentalization of the axone A single axo-axonic synapse should have a conductance of the order of the input conductance of the axone to produce a large depolarization, equal to half its reversal potential. Morphological studies have revealed a substantial synaptic equipment on group Ia fibres [192] and Ib fibres [215]. It is therefore likely that PADs of the order of 10 mV result from the coactivation of axo-axonic synapses. However the selectivity of presynaptic inhibition on axones cannot be consistent with a wide spread of PAD on the axonal arborization. A powerful and selective presynaptic inhibition requires that the axonal arborization be divided in almost independent subdomains and that axo-axonic synapses located within any given subdomain cooperate efficiently to produce a large depolarization locally. This issue, similar to the functional compartmentalization of dendrites (see Section 4.2) was recently investigated analytically [52], using cable theory. Applying cable theory to sensory fibres required to overcome two difficulties. Firstly, these fibres exhibit highly non-linear properties at Ranvier nodes. It could be shown that only the average non-linearity, which is small because of the small size of Ranvier nodes, had to be taken into account in the time-independent regime where no antidromic spikes are generated by the depolarization. Secondly the passive properties of the axones are highly heterogeneous. The main branches are myelinated and display an alternation of leaky Ranvier nodes and insulated internodal zones. The terminal fields are not myelinated and the activation of the discrete set of synapses they bear provokes a local increase of their membrane conductance. Using homogenization methods, originally developed in Physics [217], both myelinated and non-myelinated regions could be replaced by a homogeneous effective medium, the passive properties of which were obtained by spatially averaging the heterogeneous passive properties (see also [218]). This study concluded that the requirements for independent compartments were satisfied on sensory fibres. As synapses are grouped in terminal fields [197], the depolarization they create locally in a given terminal field has little effect on other terminal, despite the myelination of the main branches which favours voltage spread. Terminal fields are electrotonically compact, so that the coactivation of axoaxonic synapses may easily depolarize them. Within a terminal field the amplitude of the PAD depends on the number of active synapses but little on their location. This study suggests that due to a match between the electrotonic architecture, which displays two separate space scales, and the distribution of synapses, the intraspinal arborization of sensory fibres might be divided into independent functional domains, each contacting a specific target population, and each independently con-
410
C. Meunier and L Segev
trolled by presynaptic inhibition according to the requirements of the task performed [52]. Accordingly the views of Chung et al. [219] who suggested that the axone transformed the temporal pattern of activity of the presynaptic neurone into a spatial pattern of excitation or inhibition of its post-synaptic targets seems more and more relevant.
3.4. Discussion 3.4.1. How wide is an action potential?
The saltatory conduction of action potentials on myelinated axones does not entail that, at any given time, the membrane is at rest except in the immediate neighbourhood of some Ranvier node. Conduction velocities as high as 120 m/s are reached on certain fibres of the cat's sciatic nerve (as compared to 1 m/s on nonmyelinated axones). Since the spike duration at a given place is of the order of the millisecond, this entails that in the wake of the spike the membrane potential remains perturbed away from its resting value in a region of about 10 cm. In comparison, the distance between successive Ranvier nodes, which cannot be much larger than the axone's space constant, is no more than 2 mm. If the depolarization was limited to a zone of such a length, action potentials would last only 10 ~ts, 100 times less than what is observed on myelinated axones. This demonstrates that the action potential cannot be considered as a local excitation of the myelinated axone. Action potentials are no more local excitations on homogeneous non-myelinated axones. An action potential travelling at 1 m/s, for instance, and lasting a few milliseconds, depolarises the axone on several millimetres, typically 100 times the space constant. This corresponds to the distance travelled by the action potential during the time it takes for recovery processes (sodium current inactivation, delayed rectifier current activation) to play their role (a few milliseconds). If these voltagedependent recovery processes were not taken into account, action potentials would become excitation fronts (see previous section) that would leave the whole axone depolarized in their wake.
3.4.2. Spike reflection on axones
Spikes travelling on an axone may be reflected by geometric heterogeneities, where the diameter abruptly increases (from the value dl to the larger value d2). Regeneration of a full blown antidromic spike is hindered by the relative refractoriness of the axone in the wake of the orthodromic spike, so that conduction block is not necessarily accompanied with a reflection. However in a range of diameters [220] (when 3dl ~
Neurones as physical objects: structure, dynamics and function
411
We also note that the depolarization of the axone due to the activation of axoaxonic synapses in the terminal fields of sensory fibres may be high enough to activate the transient sodium current. At its initiation site this spike has low amplitude, due to the presynaptic inhibition caused by the axo-axonic synapses. It regenerates to its full amplitude while travelling antidromically along the axone. Such antidromic spikes are observed in in vitro preparations of neonatal brain s t e m spinal cord preparations of neonatal rat at room temperature [216], during fictive locomotion experiments on decerebrated cats [202], and in vitro experiments on the crayfish chordotonal organ [186]. Their occurrence in fully physiological conditions remains unwarranted and their functional role, if any, mysterious. 3.4.3. Do conduction failures play a physiological role? Reliable spike propagation and conduction blocks are contradictory requirements. Conduction failures occur only when stringent conditions on the geometric and electric heterogeneities of the axone are fulfilled. Typically only large changes in the electrotonic properties, by one order of magnitude, can completely block conduction. Therefore in most situations heterogeneities will not not induce failures. Moreover differential conduction of individual spikes or differential rhythm transformation of spike trains requires sufficient asymmetry between branches. Altogether the large-scale electrotonic structure of axones seem more adapted to the reliable transmission of action potentials than to their differential distribution in the main collaterals. Still conduction is certainly strongly affected by the accumulation of geometric and electric heterogeneities that any spike travelling along an axone encounters. Conduction through tens - if not hundreds - of en passant boutons (see Fig. 1 where the impedance locally increases by several times compared to the average impedance of the axone - slows down action potentials [159]. The very heterogeneous geometry as of terminal fields of axones where diameter changes accumulate on short distances due to successive branchings and en passant and terminal synaptic boutons strongly reduces the safety factor for action potential propagation and is therefore favourable to conduction failures, as shown by numerical studies [176,224,225]. This heterogeneity also makes spike conduction more vulnerable to the action of axo-axonic synapses [176]. Altogether it seems now clear that axones filter the input signals they receive. Spike trains emitted at the axo-somatic junction already undergo velocity (and therefore frequency) changes [223] during their propagation along the axone, before encountering any branch point. Spike trains may also be differentially filtered in the different branches of the axonal arborization, at least in certain frequency ranges. Finally conduction failures in terminal fields increase the variability of synaptic transmission at the axone terminals [224,225]. However, numerical studies [177,224,225] of spike conduction show that conduction failures are very sensitive to the details of the electrotonic properties of the axone. In view of the large variations in the electrotonic architecture of axone from neurone to neurone, the same processing of spike trains is unlikely to consistently occur within a given neuronal population. It is still harder to believe that conduction
412
C. Meunier and L Segev
failures endow each neurone with the capability to finely process and distribute spike trains in a physiologically relevant way, down to the small-scale structure of the axone. Perhaps the complexity of the electrotonic architecture of the axone of many neurones necessarily entails some rate of conduction failures in spike trains, which reduce the average efficacy of synaptic transmission but have no other physiological implications. The nervous system would bear with these failures but would not capitalize on them to differentially distribute signals to post-synaptic neurones in a consistent and physiologically meaningful manner. Still the last word is probably not yet said on this issue.
3.4.4. Is presynaptic inhibition specific to proprioceptive systems? It is now well established that presynaptic inhibition plays an important role in the operation of spinal circuits involved in motor control. Its implication in the processing of nociceptive information, in which framework P. Wall proposed the longdebated gate control theory, has not been established. Evidence is still missing for presynaptic inhibition in supra-spinal centres, though axo-axonic synapses have been demonstrated in the thalamus. Therefore we cannot yet estimate to what extent the central nervous system relies on presynaptic inhibition to funnel information in axonal arbours. 3.4.5. Alternative mechanisms of presynaptic inhibition Most axo-axonic synapses on group I afferent fibres were found on preterminal and terminal branches. There is also strong evidence for axo-axonic synapses clustering on terminal boutons [192]. In view of this location a more direct action on transmitter release mechanisms cannot be excluded. Moreover axo-axonic synapses located outside terminal fields might induce differential conduction failures if they are strategically located near branching points. This idea was put forward by P. Wall [207] to explain the selective conduction block observed in rat myelinated afferents and its disappearance under the action of bicuculline. Concrete evidence for such an operation is still missing in the central nervous system but it was shown that presynaptic inhibition of sensory information generated by the chordotonal organ of the crayfish was due to the activation of a cluster of axo-axonic synapses located at the main branching point of the axones innervating this sensor [186]. In that specific case axo-axonic synapses mainly act through their shunting effect on the membrane.
4. Dendrites and synaptic integration 4.1. How do neurones utilize dendrites? A longstanding question 4.1.1. Synaptic integration The most striking and characteristic feature of neurones is the branching structures of their dendritic tree (see Fig. 18). The membrane surface of dendrites may range between 20,000-750,000 ~tm2 and the total length of a single dendritic tree may reach
Neurones as physical objects: structure, dynamics and function
Layer V pyramidal Cell
Purkinje Cell
413
o~- Motoneuron
Cal = 1 O0 ILtm .
U.,LU.,LI
Fig. 18. Different types of neurones characterised by the unique shape of their dendritic tree. (A) Layer V cortical pyramidal neurone from the cat visual cortex (blue, reconstructed by J. Andersen, R. Douglas and K. Martin). Smaller basal dendritic trees coexist with a long and thick apical dendrite terminated by a ramified distal tuft. (B) Purkinje ceils from the guinea pig cerebellar cortex (green, reconstructed by M. Rapp). The dendritic arborization, studded with spines, is planar and profuse. (C) a motoneuron from the cat spinal cord (brown, reconstructed by R.E. Burke). Dendritic trees are profuse and well balanced. Length of dendrites may reach 1 mm. 10 mm (in lumbar a motoneurones, for instance). Most importantly, dendrites are the major receptive region of neurones in the mammalian central nervous system. Several thousands (up to 100,000 in the case of the cerebellar Purkinje cell) of converging excitatory and inhibitory synapses contact a single dendritic tree. When a given synapse is activated, a local conductance change is produced directly (ionotropic synapses) or indirectly (metabotropic synapses) in the dendritic membrane through which an ionic current transiently flows. A certain portion of the total synaptic current reaches the axo-somatic region where an action potential (or a train of them) may be fired if the local depolarization there is sufficiently large. There is a wide agreement on the idea, originating in the histological works of Ramon y Cajal, that the main role of dendrites is to collect afferent inputs. However understanding how the spatio-temporal activation of the many somato-dendritic synapses determine the axonal firing pattern remains a major challenge. This collective effect of synapses on the neuronal discharge was termed "synaptic integration" by Sherrington [12], who was himself dissatisfied with the fuzziness of this notion (see review in [13]). Today and in spite of recent experimental advances the nature of the integrative processes that shape the input-output function of neurones and underlie their role in networks are still elusive. There are goods reasons for that: 9 We have not yet a clear understanding of the response of dendrites to synaptic inputs, and of the morphological and electric determinants of that response. In particular, we do not know whether neurones in physiological conditions should be considered rather as "integrators" dealing equally with all the inputs, or as "coincidence detectors" highly responsive to specific events.
414
C. Meunier and L Segev
9 We have also a limited notion of what synaptic inputs are received by a neurone in physiological conditions. Still some estimates of the average synaptic activity in anaesthetized animals could be recently obtained [35,110]. 9 We know too little on the functional properties of most neurones to guess what processing of synaptic inputs take place on dendritic trees. The only case where it was really possible to capitalize on such properties corresponds to neurones belonging to sensory pathways and acting as coincidence detectors highly sensitive to the timing of their inputs: auditory neurones in the bat or the owl [226], phase detecting neurones of electric fishes [227], etc. 9 Finally it is not clear what impact the dendritic properties of neurones may have on the integrated behaviour of a network, which is the level at which functional issues should be addressed. On the experimental side developmental studies are probably best suited to answer that question. It has been shown, for instance, that the qualitative pattern of locomotor activity in the spinal cord of Xenopus laevis does not change drastically during the vigorous growth of dendritic arborizations of neurones at the larval stage. Moreover the quantitative changes observed in the rhythmic bursts of activity seem to be due to the expression of voltage-dependent neuronal conductances rather than to morphological changes [228]. Few theoretical studies have tackled this issue. It has been established that shunting inhibition limits firing rates in a network [229-231]. More speculatively it has also been suggested that spatial pattern formation in a network could arise through a Turing instability if the location of synapses on the dendrites reflected the distance between presynaptic and postsynaptic neurones [232], synapses from distant neurones impinging more distally one the dendritic tree. In this context it makes sense to frame as little hypotheses as possible and to analyse first synaptic integration in a simple but still comprehensive framework. This is precisely what W. Rall did when applying passive cable theory to dendrites [175]. This approach takes only into account the passive properties of dendrites, which leads to analytically tractable linear parabolic partial differential equations [233], at least when conductance changes due to synaptic activation are neglected. Accordingly synaptic integration is considered as the mere summation of synaptic potentials, mitigated by the diffusion. This enabled Rall and his followers to understand the basic consequences of the spatial extent of dendrites and to show that the interplay between morphology of and electric properties of neurones can be critical for determining the input-output functions of neurones. In addition cable theory provided an appropriate framework to interpret recordings at the soma and to infer from them information on the dendritic structure of neurones. 4.1.2. Dendrites as an e x t e n d e d m e d i u m
Now that we are used to model the soma of a neurone as a single compartment, it is hard to believe that Lorente de N6 insisted on the fact that spatial summation could occur only between synapses impinging close to each other on the soma [234]. One of Rall's first contribution to neuroscience was to demonstrate that the soma, despite its large size, was isopotential [235]. This introduced the correct framework for investigating all electric events taking place on the neurone: electrotonic distances,
Neurones as physical objects: structure, dynamics and function
415
measured in units of the electric diffusion length should be considered rather than physical distances. The three key papers of Wilfrid Rall [175,236,237] on the biophysical properties of the branching dendritic trees of lumbar ~ motoneurone, were based on this same idea. Until these pioneering works dendrites had been largely ignored, functionally. They had been the focus of intense anatomical studies for many years since the classical studies of Ramon y Cajal and his contemporaries in the 1890s, and their extended and complex morphology was well appreciated (see [2] for a review). That dendrites constitute an extended electric medium had also been known for many years, simply from the fact that extra-cellular electric field potentials, such as the electro-encephalogram, can be recorded in the brain. But these considerations were overridden by the need for simplifying assumptions in facing the complexity of the nervous system. More than 40 years have now passed since Rall's seminal contributions and the number of experimental, theoretical and modelling works dedicated to dendrites has exploded. Still most of the problems that have been addressed were already considered by Rall and fall into a few categories dictated by the extended nature of dendrites. The first such category concerns the filtering of synaptic inputs by dendrites. Indeed the electrotonic length of dendrites is estimated to be of the order of one space constant, and this figure may be increased several times due to ongoing synaptic activity. As a consequence voltage transients at the soma are liable to be strongly attenuated compared to the input location. One may even wonder how distal synapses may have any effect at all at the soma despite of the electrotonic attenuation along the way. In its more primitive form passive cable theory focused largely on the spread of a single synaptic potential, which constitutes the most elementary form of electrotonic filtering [238]. The investigation of this simple case showed that the total transfer of charge to the soma could be quite efficient [238]. As the spike initiation zone is driven by the current coming from dendrites, this result showed that distal dendrites were liable to have an impact on spike firing. However the waveforms of distal signals are low pass filtered by dendrites, so that sharp voltage transients at distal sites are poorly transmitted to the soma [239]. Therefore many authors have investigated how signal transfer to the soma might be improved to increase the total charge delivered to the soma and best preserve the original form of synaptic potentials. In this category fall all the works on the impact of dendritic taper and electric heterogeneities on signal transfer, boosting of distal inputs by non-linear conductances, etc. The basic idea underlying all these works is that some gradient of linear or non-linear properties from proximal to distal locations compensate for the electrotonic attenuation between the synaptic location and the soma and make all synapses equal. We may put in the same category the possibility of non-decremental signal transfer along dendrites via calcium or sodium spikes fired at distal or intermediate locations on dendrites. All these ideas are implicitly grounded on the assumption that dendrites represent a necessary evil for neurones: they are extended so as to collect the many inputs that the neurone must process, but their geometric and electric properties are adjusted to minimise as far as possible their adverse impact on neuronal dynamics. A second category of problems concerns the local processing of synaptic inputs: how do excitatory and inhibitory inputs shunt each other and sum up locally, how is
416
c. Meunier and L Segev
their interaction affected by non-linear membrane properties, does the tree-like morphology of dendrites naturally lead to a "functional compartmentalization" of dendrites in relatively independent regions? Works along this line are largely based on the study of the electrotonic architecture of dendritic trees. A last category of problems concern the constructive or destructive interaction between distant synaptic inputs. Do proximal inhibitory synapses screen more distal excitation? Do synapses add maximally their effects if they are activated in some specific order dependent on their location? This last possibility, demonstrated by Rall on simple models [240], underlies the idea that dendrites might perform such functions as motion detection [269]. This raises the more general issues of the activation patterns of synapses, and of the input-output function of neurones. How are synaptic inputs on the dendritic tree correlated with the ongoing physiological activity? Do neurones mainly respond to the average synaptic activity or to highly coincident inputs? Can they switch from one firing pattern to another depending on the spatio-temporal activation pattern of synapses? We also note that the extended nature of dendrites just means that their physical length, g, is a priori comparable to the diffusion length )~ (the dendritic space constant), so that they cannot be treated as an isopotential medium. Whether they should be considered as a genuine extended medium (g much larger than ~,) or as a medium with a confined geometry (g of the order of )~) will be amply discussed in the following sections. 4.2. Passive cable theory 4.2.1. Fundamentals o f cable theory The passive cable equation [175] describes the evolution of the voltage V (interior voltage minus exterior voltage, defined relatively to the resting potential) across the dendritic membrane when transient synaptic currents flow from the dendritic input site to other dendritic regions, including the cell body. The dendritic tree is assumed to be composed of a connected set of passive cylinders (see Fig. 19). Current flows
Fig. 19. Main insights from Rail's passive cable theory for dendrites. (A) Current injection from a microelectrode into the soma of a neurone. Rail showed that, because most of the conductance is at the dendrites, a large percentage of the injected current flows into the dendrites rather than via the soma membrane. This implies that the voltage response at the soma is faster than expected in an isopotential compartment [262]. (B) The time course of the excitatory synaptic potential (EPSP) at the soma can be used for estimating the synaptic input site in the dendritic tree. Inset at lower right shows that a certain class of dendrites are mathematically equivalent to a single cylinder which can be represented by a chain of isopotential RC compartments (spheres numbered from 1 to 10). Inset at top shows that distal synaptic input (at compartment 8) appears broad and delayed at the soma whereas somatic input (compartment 1) is briefer and less delayed (for the sake of comparison, the amplitude of EPSPs was normalized). The change in rise-time versus half-width (width at half amplitude)
Neurones as physical objects." structure, dynamics and function
A '.~.~ . \ ~ . I/i i' \ I ........ :~............... " ; \ i
,,.!9.......~i ..........
B
:4.i~
.....
t ................
i7
417
AA
-/
& ()+X
C).~
I .
tl~
-
i .~
S.##i
,
~ i
# i *i
ooo" s Io
1.2
i
~4g
~
0.8
@
0
o/
o"
~r
/~ 9
'
.....
..... ..... " ,...
A"
! ! S O M A ~-
0.2
Soma
~
Dendrites
A~>OO~
0.26 ~ o - o A , . C ~
o.os~ I 0
D 4o,-~
m [Rroo
-
At,
o OO(:>OOOOO~3~,, J
&t3
8 ~
I~
~
i
-
fo
......
DENDRITES
. . . . . . .
0'.6 Ptlok
I/1"
.
!
7
\ /
l SINGLE | BRANCH
3.0
C.1
.
A , .
1 ---
......... "..... '<.;
0.4 Timt
C
9
"',... ',.. ".". " ... " . .". " .....
sow
t--.-~..~_
c_AJ
"'" 1.0
/il,
| -0.5
i
i, 0
I
|_ 0.5
,
._ 1.0
. ~/j
tlr
of the somatic EPSP for different input sites is depicted by the shape index loci curve. (C) Soma voltage depends on the spatio-temporal sequence of synaptic activation. With excitatory synapses on a dendritic cylinder (filled circles), the soma depolarisation is larger and more delayed when synaptic activation starts distally and progresses proximally (D ~ C B ~ A). The y-axis could be interpreted as a firing probability. Thus, this neurone can detect the direction of motion of its synaptic inputs [240]. (D) Large and asymmetric attenuation is expected in the dendritic tree. DC current input is delivered to the distal dendritic arbour (I); voltage profile from the input site is depicted (continuous line). Note the steep attenuation in the centripetal (dendrite-to-soma) direction (leaky boundary conditions) and the very shallow attenuation in the centrifugal direction. Nevertheless, the soma voltage is only more than half what is obtained when the same DC current is applied directly to the soma (dotted lone). This indicates that most input current delivered to the dendrites reaches the soma [331].
418
C. Meunier and I. Segev
either longitudinally (x-axis) along the cable (core conductor) or through the membrane. The longitudinal current (proportional to dV/dx) encounters the cytoplasm resistance, p (per unit length, in f~ cm). The membrane is electrically modelled by an equivalent R - C circuit; thus current can either cross the membrane via the passive (resting) membrane channels, represented by rm for unit length (in f~ cm), or charge the membrane capacitance Cm (per unit length, in F/cm). Charge conservation imposes that the change per unit length of the longitudinal current, (1/p)O 2 V/Ox 2, is equal to the membrane current per unit length V/rm + Cm(~V/Ot). This leads to the (dimensionless) passive cable equation,
~2 o2V
~V
V
(21)
X = x / k and T = t/r (see Section 3.2 for the definition of X and r). This linear equation can be solved analytically for arbitrary passive dendritic trees [175]. The solution depends on the electric properties of the membrane and cytoplasm as well as on the boundary condition at the end of the segment towards which the current flows. Indeed, the tree attached at the end of each dendritic segment acts as a sink for the longitudinal current (i.e., a "leaky end" boundary conditions) and, the "leakier" is the end, the steeper is the voltage attenuation along this segment. The extreme case is the case of "killed end" (short circuit) boundary conditions; the other extreme is the "sealed end" (open circuit) boundary conditions where voltage is least attenuated along the cylinder. This latter condition is the appropriate approximation for the conditions at the ends of the distal dendritic arbours [154]. Separating space and time variables, and writing the Green function of Eq. (21) as an orthonormal expansion, one shows that the general solution of the passive cable equation is the sum of a series of exponentially decaying functions (3O
V(x, t) -- Z Ci exp (-t/~i) i=0
(22)
with Ti < Ti+1 for any i. For a given tree, Ci are constants that depend on the location (x) and on the initial distribution of voltage in the tree whereas the time constants zi's are independent of location. For uniform membrane over the whole dendritic surface (with no short circuits or voltage clamp) and with sealed ends boundary conditions at the dendritic termination, the slowest time constant, z0 equals the membrane time constant r. The smaller (faster) time constants, q~l, T2, etc. are called the "equalizing" time constants; they correspond to the more rapid equalizing spatial spread of current between the various dendritic regions. When synapses are assumed to be deliver current to the dendrites and the changes in membrane conductance they induce are neglected, passive cable theory provided a fully linear framework to investigate the spatio-temporal summation of inputs. The simple case where a single test pulse is delivered at a given location can be readily investigated and all situations may a priori be investigated on this basis through the linear combination of elementary solutions. Still it makes sense to directly characterize voltage transients to get a better insight on voltage summation
Neurones as physical objects." structure, dynamics and function
419
and diffusion on an arbitrary passive tree. Agmon-Snir and Segev [241] proposed to analyse the moments of voltage transients rather than their waveform. The nth moment of the voltage transient V(x, t) is defined at any given location x as
Mv,n(X) -
t~V(x,t)dt,
~0~176
where Mv,o is just the area (time integral) of V(t); the ratio between My,1 and the Mv,o is the "center of gravity", or centroid, of V(t) which is a measure of the characteristic time of the transient. The second moment Mv,2 gives a measure for the "width" of the signal, etc. The evolution of the moments Mv,n(x) is governed by the hierarchy of differential equations: ~2 d2Mv;n
dx 2 + ~Mv,~(x) + nMv,._l (x) = O.
Using this method one can compute the net dendritic delay (defined with respect to the centroid), introduced when the synaptic potential propagates between a given synapse and the soma. In cortical pyramidal neurones, this delay ranges between 0 (for somatic inputs) and ~ (for distal synapses on the apical dendrite). One can also compute the local delay, that is, the time difference between the centroid of the input current and the centroid of the resultant voltage transient at the input site. This local delay can serve as a measure for the time-window in which local synaptic potentials can efficiently sum. It can be shown that, for any point in a given tree, both the net delay and the local delay are independent of the shape of the transient current input. For an isopotential soma the local delay is equal to the membrane time constant ~. On the contrary the local delay at distal sites in an extended dendritic arbour may be one order of magnitude smaller. This implies that, compared to the soma, a more precise temporal synchronization of inputs is required for local summation of EPSPs in the dendrites.
4.2.2. Cable theory for heterogeneous dendrites For simplicity, early implementation of the model assumed certain morphological idealisation (such as the electrotonic equivalence to a cylinder) as well as uniform membrane electric properties. The consequences of relaxing these assumption of homogeneity were considered in later analytical studies (see, for instance, [154, 242-245] and more recently [246]). The diameter of dendrites tends to progressively decrease as one moves away from the soma (taper). The notion that this could boost considerably the efficiency at the soma of distal synaptic inputs was pushed forward by Schwindt and Crill [247]. Indeed the input conductance is then reduced at distal dendritic sites (remember that the input conductance scales as d 3/2 on a semi-infinite cylinder) which increases the amplitude of voltage transients in response to current injection. On the other hand, synaptic saturation is sooner achieved. In addition taper makes dendrites electrotonically less compact [246] (compared to the homogeneous situation with the same average diameter). These various local and global effects of taper
420
C. Meunier and L Segev
can be investigated by extended the passive cable equation to the heterogeneous case [248]. It then reads Cm Ot = -~X
--~X
-- g m V .
In electrotonic variables mensionless form
T - t/~
~V
~2V
(23) and
X - J o du/k(u) this equation takes the di-
~V
~---~ = OX---5 + Q ( X ) ~
-
v,
(24)
which shows that in these electrotonic variables taper (or flare) amounts to adding an axial current to the homogeneous cable equation. This is the well-known drift effect due to diffusion in a heterogeneous medium. It is then possible to solve fully analytically the problem for an exponentially tapering cylinder and some other simple but representative cases [249]. Recent experimental evidence on apical dendrites of pyramidal cells suggest that the neuronal membrane is "leakier" in distal dendritic regions compared to proximal regions [250]. Cable theory can be readily extended to incorporate this other, electric, form of heterogeneity. As for geometric heterogeneities the steadystate diffusion equation, which in that case exhibits no drift term, can be solved analytically only for certain simple forms of conductance gradients (linear, polynomial, exponential, etc.). Indeed the equation then takes the form of an Airy equation (linear gradient) or some other standard equation, the solutions of which are special functions [246]. One can prove in these cases that the conductance gradient increases voltage transfer to the soma, albeit slightly. This comes from the decrease in transfer resistance with respect to the homogeneous case, partly due to the fact that dendrites are electronically more compact [246]. An important difference compared to geometric heterogeneities is that electric heterogeneities modify the membrane passive time constant. It can be shown, by computing the Green function of the problem (which implies solving a Schroedinger eigenvalue problem for quantum motion in a potential well shaped as the conductance profile), that a conductance gradient increases the system time constant [246] (a twofold increase is expected for a linear gradient on a dendrite of electrotonic length L = 5). Therefore the decay of synaptic potentials should be somewhat slower on electrically heterogeneous dendrites, and this all the more as they are electrotonically less compact. A more general framework for dealing with any kind of geometric or electric heterogeneity was also recently proposed [251,252] (see also [253] for a discrete version involving matrices rather than path integrals). The Green function of the problem is computed as a functional integral over all possible Brownian paths that a particle diffusing in the heterogeneous medium might follow. Heterogeneities are incorporated through the weighting coefficients of the paths. Although comprehensive, this approach requires the numerical evaluation of the Green function by summation over paths up to a given length. Precise estimates are speedily obtained
Neurones as physical objects: structure, dynamics and function
421
for short-time diffusion (only short paths then contribute) but the evaluation of the long-term behaviour of the Green function is time consuming.
4.2.3. Electrotonic architecture and functional compartments Many neurones display several dendritic trees with strikingly different morphological features and receiving inputs from different sources. For instance, sensory inputs from glomeruli impinge on the terminal tuft of the long primary dendrites of mitral cells, while the shorter lateral dendrites mediate dendro-dendritic interactions with granule cells. A similar organization is found in neocortical pyramidal cells. In such cases one may rightfully considered that the dendritic arborization is organized in distinct functional units. May a similar functional architecture exist within a single dendritic tree? Dendrites constitute an extended electric medium, on which nearby synapses may cooperate to create large voltage transients that quickly subside as they spread due to the large load constituted by the rest of the dendritic tree. Moreover the interaction between synapses is local as the range of the depolarizing effect of a synapse and the range of its shunt effect are, respectively, )~ and )~/2 (see Section 4.3.2). On this basis and in view of the complex morphology of dendritic trees, it has often been said that a dendritic tree may operate as a system of semiindependent functional subunits. In each of these units, strong local interactions between synapses would take place almost independently from the operations occurring in other subunits. The axo-somatic region would be provided only with the net result of this local processing of synaptic inputs. Rall was the first to suggest that non-linear interactions between dendritic synapses could endow the neurone with a rich repertoire of local operations [254]. Several authors have claimed that local interactions on dendrites, involving spines [255] or not [256], readily endowed dendrites with the ability to perform logical AND, NOT, XOR, and AND-NOT operations on their inputs. A particular emphasis was put on the AND-NOT operations based on shunting inhibition [256]. This vision of synaptic integration fits within the metaphor of the neurone of a logical element, first introduced by McCulloch and Pitts [257] (see [258] for a critical discussion of these views). An extension of the idea of functional subunits performing local non-linear "computations", in which local "multiplicative" operations are done by clusters of synapses while a global summation of inputs is achieved at the soma, was proposed by Mel [258,259]. Dendritic subunits are generally defined on the degree voltage attenuation between different dendritic locations [256,260]. Accordingly, a functional unit is defined as a dendritic region over which voltage decay is relatively small (the region is close to being isopotential) in comparison to voltage attenuation between different units, and between each unit and the soma (see Fig. 20). This approach relies on the computation at every location Xrec on the tree of the voltage response to a single steady-state current input at point Xinput. On an unbranched structure, voltage deFXrec dU/L(U) I) SO that its log-attenuation, --log(Vrec/~nput), is dicays like exp( I JXinput -
-
rectly given by the electrotonic distance of the recording point from the input. This is not so on a branched structure where voltage drops more steeply because of
422
C. Meunier and L Segev
i -
0.5
1
I i "y !\ I
i
Fig. 20. Are there functional subunits in dendrites? The region of synaptic influence is computed from the voltage attenuation following a local injection of DC current at different dendritic regions (arrows). When the input was delivered to the soma (top left) the "territory" of synaptic effect is extended to all basal dendrites and the proximal part of the apical tree (red). In contrast, for inputs at distal dendritic arbours (top middle and right) the region of influence is very restricted. For inputs to individual basal dendrites (lower right), the region of effect is limited to the input dendrite. Thus, the basal tree is a favourable region for having several electrically independent functional subunits. Colour scale in right denotes the proportion of voltage attenuation; red indicating near isopotentiality. current flow into the intervening branches. The idea of splitting the dendritic tree into functional subunits based on the degree of electric interaction or communication between different dendritic regions was further developed by Zador et al. [261], who rigorously defined the morpho-electrotonic transform. They showed that a "distance" from some reference point to any other point could be defined on the dendritic tree by either the log-attenuation of voltage, which applies to steady-state current inputs, or by the signal propagation delay, which applies to transient current inputs (these are not genuine distances as the asymmetry of current flow in the dendrites prohibits that d ( x , y ) -- d ( y , x ) ) . The dendritic tree is then displayed using one of these metrics, which enables one to visualize vividly its electrotonic architecture as seen from the soma or any given point in the dendritic tree itself. 4.3. The synaptic shunt 4.3.1. The shunt as a saturating non-linearity
The initial cable theory was restricted to the simpler case of current inputs on dendrites. In that case voltage transients scale linearly with the injected current. This is not so when inputs are genuine synaptic conductances [262]. Indeed the current
Neurones as physical objects." structure, dynamics and function
423
injected through a synapse is not a linear function of the synaptic conductance. It grows almost linearly with the synaptic conductance, as long as this conductance is much smaller than the input conductance but in the opposite case the local membrane depolarization tends to the reversal potential of the synapse and the synaptic current saturates. In all cases the dependence of the membrane voltage on the synaptic conductance, anywhere on the dendritic tree, is less than linear: V(x; aGsyn) < aV(x; Gsyn). Similarly if several synapses are simultaneously activated at different locations the voltage will everywhere be smaller than the sum of the voltages obtained for each synapse separately: V(x;Gsyn,1, Gsyn,2)< V(x;Gsyn,1)+ V(x; Gsyn,2). This subadditivity arises from the mutual shunt between the synapses and reflects the fact that the voltage response is not a linear functional of the synaptic conductances. Thus the passive cable equation, which incorporates no voltage-dependent current, still exhibits non-linear properties when the shunting effect of synapses is taken into account. This non-linearity has a saturating effect on the membrane voltage as opposed to the voltage-dependent activation of the sodium conductance which underlies neuronal excitability by making the membrane unstable. This has far reaching consequences, in particular for in vitro electrophysiological experiments. In experiments direct stimulation of dendrites is performed by injected current at a given location [25,26], or stimulating some afferent synaptic pathway [25]. In the latter case local excitation of dendrites may result if the synapses associated to different pathways are rather well segregated on the dendritic tree. Such protocols enable experimentalist to assess what is the local voltage response to the test pulse and, if an intra-somatic recording electrode is used at the same time, to examine the resulting voltage transient at the soma. They may shed some light on how local properties of the dendritic membrane, in particular the voltage-dependent currents, shape the membrane response. However the effect of physiological synaptic stimulation on the neurone cannot be readily inferred from the the elementary response to test pulses because of the non-linearity due to the synaptic shunt. Different synapses will interact destructively through their mutual shunt, and the somatic response will be strongly affected by the filtering of voltage transients all along their way to the soma induced by the interposed synapses.
4.3.2. Background synaptic activity The shunt effect of a single synapse can be easily analysed (see also [263]) by solving the steady-state passive cable equation for an infinite cable of diameter d and specific conductance gm submitted to a current injection, I, at x = Xin and bearing one "silent" synapse (its reversal potential is equal to the resting membrane potential so that its activation elicits no direct voltage response) of conductance Gsyn located at x = 0 by convention -d~2 V - gm V - -Gsyn - ~ V b (x )
-4- -~ 6 (x - Xin) -- 0.
424
C. Meunier and L Segev
Introducing the input conductance G~ = 2rcd)~gm (equal to twice the input conductance of a semi-infinite cable, see Section 3.2.2) and the electrotonic distance X - x / k , this equation can be rewritten as ~2 V
Gsyn V6 (Y) + 2 I V - 2 G~ ~ 6(X - X ] n) = 0.
~X 2
The depolarization V(x) generated by the injected current is the unique solution of this equation that vanishes at infinity. On the half-space x ~Xin it is equal to V(x)=~-~-
1- 1+7
where 7 is the dimensionless quantity Gsyn/G~. The amplitude of this depolarization is reduced by the factor A-1
-
7
e
-2gin/~
with respect to the case where no synaptic shunt occurs ( 7 - 0). Therefore the magnitude of the shunt effect at the input location Xin can be characterized by the quantity S - 1 - A --
7 e-2hqn/~.. 1+7
We note that the shunt effect exponentially decays with increasing Xin. It is maximal (and equal to 7/1 + 7) for Xin = 0. The input current is then completely shunted out (S - 1) in the limit of an infinite synaptic conductance (7 ~ oe). If the synapse is located at a finite distance from the synapse the shunt effect will always be partial: S tends to e -2xin/k when y goes to infinity. The characteristic length of the shunt effect is equal to k/2. In contrast the depolarization elicited at a distance x from a single synapse with reversal potential Vsyn is V(x) =
7
Vsyne-X/k
The corresponding decay length is the membrane space constant )~. This entails that the shunt effect of a depolarizing synapse is much more local than the voltage response it elicits. For instance, at a distance x of a depolarizing synapse where the depolarization still retains 10% of its peak value (exp(-x/)~) - 0.1), the shunt effect of the synapse will have decayed to a mere 1% of its maximum (exp(-2x/)~) -(exp(-x/)~))2 = 0.01). On the contrary the shunt effect and the depolarization depend in the same way on the synaptic conductance Gsyn, through the dimensionless quantity y. This ratio 3' of the synaptic conductance to the intrinsic input conductance of the cable is generally small on dendritic branches (typical unitary conductances are of the order of 1 nS) although it may become larger than unity on spine heads (see Section 5). As a
Neurones as physical objects." structure, dynamics and function
425
consequence the shunting effect of a single synapse located on a smooth dendrite or on the shaft of a spiny dendrite is small. However many synapses are simultaneously and repetitively activated in physiological conditions (see Fig. 21, which may strongly modify the integrative capabilities of the neurone [264-266]. Indeed the local increase in membrane conductance following synaptic activation decreases both the local space constant X and the local time constant r and increases the input conductance at the injection site. It has been estimated that the mere spontaneous activity of the cerebello-olivary system, in which regime synapses probably fire at a few Herz per second, the synaptic conductance was several times larger than the resting membrane conductance of dendrites [266]. Stronger effects have been demonstrated on cortical neurones during episodes of strong activity in animals anaesthetized with ketamine [35]. Large effects are also expected in neural structures actively engaged in an ongoing physiological task. This intense background activity effectively renormalizes the passive cable properties of dendrites [230], increasing possibly the electrotonic length and input conductance by up to an order of magnitude. In such conditions the synaptic -25
-30
-35 -
-40
E
~" -45
-50 -
-55
-60
1
1.5
I
I .
2
2.5
t (s)
.
.
.
.
I
3
._
[.
3.5
4
Fig. 21. Spontaneous activity of neurones. In vivo patch recording from neurone in the visual cortex of an anaesthetised cat. Spontaneous synaptic activity (without visual input) of about 5 mV in amplitude is observed with an occasional action potential firing (large signal; peak is graphically cut). Courtesy of Lyle Borg-Graham.
426
C. Meunier and L Segev
shunt makes dendrites less effective at driving the soma. The reason for that can be investigated even on a simple model where dendrites are lumped in a single isopotential compartment, resistively coupled to the soma. The conductance equivalent to this isopotential dendritic compartment, as seen from the soma, is equal to ga•215 +gm where gm represent the total dendritic membrane conductance (including synaptic conductance) and gax the coupling conductance with the soma. For small gm the equivalent conductance is almost equal to g,, whereas it tends to the limiting value g~x as the dendritic conductance goes to infinity. However the isopotential model is no longer valid in that case because the space constant decreases like 1/x/~,,,. If we take this effect into account, the electrotonic length of dendrites actually increases as v/~m and their input conductance grows asymptotically as x/~m, in agreement with Eq. (19) for a semi-infinite cable. The non-linear dependence of the input conductance on 9,, entails that the net current delivered to the soma by the activation dendritic excitatory synapses grows less than linearly with the total synaptic conductance, due to the synaptic shunt. This depolarizing current is further reduced by the activation of "silent" inhibitory synapses [251] that affect little by themselves the resting membrane potential but increase the membrane conductance. This shunting inhibition is particularly important if the inhibitory synapses are located on or near the soma and interposed between it and more distal excitatory synapses. Note also that in the case of synapses impinging on the electrotonically compact soma, the shunting effect is not mitigated by a decreased local space constant. The background synaptic activity is not constant but fluctuates in time as it results from the repetitive activation of a large but finite number of synapses per unit length of dendrites. The effect of fluctuations, described as a sum of independent stochastic processes correlated in time, were investigated by Bressloff [231] on a passive membrane. He showed that the effective time constant was less reduced by a fluctuating activity than it would be by a constant synaptic background with the same average amplitude. Moreover the fluctuating synaptic background modifies the low pass filtering of incoming signals by dendrites and makes it more acutely tuned. The response to sinusoidal inputs superimposed on the synaptic background is enhanced at low frequencies. Moreover, its bandwidth decreases when the correlation time of the coloured noise is increased. On the contrary no modification is elicited by an incoherent noise (white noise) where no frequency is privileged. Many authors have recently investigated how the response of neurones to a periodic stimulus might be enhanced in the presence of noise by a stochastic resonance mechanism (see, for instance, [267]) and have discussed its functional relevance. For instance, it has been suggested that the detection capabilities of the cricket cercal sensory system were enhanced by this mechanism [268]. The existence of a fluctuating background of synaptic activity has important implications for the detection and integration of synaptic inputs. It stands to mind that the effect of a single synapse will be hidden in the fluctuating synaptic background. As a consequence only the average synaptic activity and the coherent coactivation of large groups of synapses will be signalled to the spike initiation zone in physiological conditions. Secondly even these coherent events will be ill-
Neurones as physical objects: structure, dynamics and function
427
tered on the way to their soma, in a manner that largely depends on the average synaptic activity itself. This entails that the response of the soma of an in vitro neurone to the coherent stimulation of dendrites does not tell us faithfully what effect the same stimulation would elicit in a behaving animal. Thirdly we note that strong coherent events may also be due to large deviations of the random synaptic activity. This limits the capability of the neurone to detect coincidences in a reliable way (see [30] and [105] for a discussion of a similar issue in the context of channel stochasticity). 4.3.3. Shunting excitation out Functional compartmentalization of dendrites might arise in physiological conditions, not so much from the intrinsic electrotonic architecture of the dendritic tree, as from the synaptic activity in its relationship to the electrotonic architecture. Rall showed that inhibitory synapses are most efficient in "vetoing" excitatory synapses when they are placed on the path between the excitatory synapses and the soma [240] (see also [154,256]). This "on the path condition" implies that strong inhibitory activity may screen all excitatory inputs distal to the inhibitory synapses. Proximal inhibitory synapses may screen all excitatory activity on the dendritic tree, whereas more distal inhibition will veto only excitatory synapses in some distal distal regions of the tree. Depending on the spatial distribution and the temporal order of activation of excitatory and inhibitory inputs, the excitatory activity of different regions of the tree can be "permitted" to reach the soma or not, in much the same way as presynaptic inhibition may specifically impede action potential propagating into a subtree of the axone (see Section 3.3). It has been proposed that this mechanism might enhance the selective response of visual neurones to the direction and speed of motion [269,270], in particular in the retina. If motion provokes an orderly activation of sequences of excitatory synapses along an amacrine cell dendrite, the strongest response at the tip of the dendrite (and the weakest response at the soma) will be elicited when the excitation wave travels toward the tip, and vice versa. The asymmetry between the tips of dendrites, which constitute the output regions of the non-spiking amacrine cells, and the soma can be strongly enhanced if a slower shunting inhibition is superimposed to synaptic excitation. Indeed, it will lag behind the excitation wave, and will not hinder the build up of voltage transients at the leading edge of the wave. On the contrary, it will shunt out voltage transients in the wake of the excitation wave ("on the path" condition). This example where the key output is the dendritic tip already shows that shunting inhibition must not be considered only from a somatic viewpoint. Shunting inhibition may play another role in synaptic integration by preventing proximal excitation to interfere too much with the processing of distal inputs. In Purkinje cells, for instance, inhibitory inputs arising at the same time as climbing fibres activation inhibit the spread of the large climbing fibre EPSPs in the distal dendrites [271]. More generally shunting inhibition may contribute to isolate different functional subunits of the dendritic tree, one from the other. For instance, local inhibitory input can functionally de-couple the electric communication
428
c. Meunier and I. Segev
between soma and distal apical dendrites thereby avoiding the initiation of a dendritic calcium spike. 4.4. Non-linear membrane properties 4.4.1. Voltage-dependent conductances on dendrites
The presence of excitable ion channels in dendrites was predicted in the early studies of axotomised spinal motoneurones by Eccles [272] on the basis of the observation of so-called "partial responses" at the soma. Still direct demonstration of the excitable nature of dendrites came only in the 1980s from intra-dendritic recordings with sharp electrodes, most notably in cerebellar Purkinje cells [9]. With the recent development of infra-red differential interference contrast (IR-DIC) video microscopy [14], it became possible to clearly view individual (1 tam in diameter) processes (dendrites and axones) in brain slices. This visual control enables experimentalists to record, using patch-pipettes, the local electric activity at distal dendritic arbours and to excise small membrane patches of dendrites, so as to characterize their active membrane properties (see Fig. 22)[26,273]. The development of CaZ+-dependent dyes, the use of confocal microscopy and, more recently, two-photon microscopy added the capability to optically image, both in vivo and in vitro, Ca 2+ dynamics in dendritic branches and individual dendritic spines in response to synaptic inputs [ 15,274,275]. A fascinating picture emerged from these new experimental techniques. It became clear that dendrites and their spines are decorated with pleura of different
-80 mV to -60 mV ~ ..
n ^
,
9 ^ l^^
9
~
,j~A,,.
^
nl
..a
,,-.,,
,_p~_at~
........
V,'Vv
f
v
v,
~/
V V~]w~r
~r W ~ r v V ' ~ / ~ ,
vv-
v ' ~ v " "-~, w - w ,Vw-
Vv
<
10 ms Fig. 22. Dendrites are endowed with excitable channels. Infra-red DIC video microscopy image of a portion of the apical dendrite of a CA1 pyramidal neurone (left panel, with the recording patch-pipette is seen to the right of that dendrite) and single channel recordings of T-type Ca -+ channels (right panel). The recordings are from cell-attached patch in the apical dendrite, about 150 lam from the cell body. A voltage step (from -80 to -60 mV) via this patch result in the opening of individual ion channels (transient inward deflections). Courtesy of Jeffrey Magee and Daniel Johnston, Baylor College, Houston.
429
N e u r o n e s as p h y s i c a l objects." s t r u c t u r e , d y n a m i c s a n d f u n c t i o n
K +
IH .IIIIIIII~
U
SIAHP ....................
~
IKd "........ ~
~
"
relative
.................... ~
23
pS/pm z
123
Na +
c~§
II
INa IL
i.-...:..: ~
IN, R
iiiiiiiii: ~
IT
.- ....... .
60
~
.............................................................. 12
15 ................................... o
Axon Soma
10
&~o/~r
Apical dendrite
Fig. 23. Ion channels distribution in apical dendrites of cortical pyramidal cells. Dotted lines: density not measured. Shaded: direct experimental measurements. Open: inferred from combined modeling and experiments. Number on the right indicate maximal conductance for each channel [24]. types of excitable channels, generally at relatively low density (see Fig. 23). These include voltage-dependent transient and persistent Na + channels [276], low and high threshold Ca 2+ channels [277], A-type transient potassium channels [250] and htype channels [278], to mention only a few. It seems that, at variance with the axonal membrane, most known types of voltage-gated ion channels can be found on the dendritic membrane (see [279] and the recent reviews by Mainen and Sejnowski [24] and by Magee [25]). Still we must keep in mind that data are restricted to a few neuronal types (hippocampal and neocortical pyramidal cells, Purkinje cells, thalamo-cortical cells) and to part of the dendritic arborisation. For instance, investigations of active dendritic properties of neocortical pyramids concern much more the primary apical dendrite than the apical tuft and the basal dendrites. Let us also recall that many of the voltage-dependent channels of the dendritic membranes are also ligand-gated (NMDA receptors) or indirectly controlled by metabotropic receptors. Consequently synaptic activity modifies not only passive membrane properties but also active voltage-dependent properties in physiological conditions. Evidence from in vitro experiments accumulates for a heterogeneous spatial distribution of all these active channels. Potassium A-type conductance increases (perhaps linearly) when moving distally along the primary apical dendrite of pyr-
430
C. Meunier and I. Segev
amidal cells [250] and on the dendritic arborization of Purkinje cells [271,280]. Similar gradients of I - h conductance have been observed on hippocampal pyramidal cells [278] and thalamo-cortical cells. On the contrary sodium depolarizing conductances do not seem to display such gradients. They are more uniformly distributed (pyramidal cells) or preferentially located in the perisomatic regions (Purkinje cells [280]). Still, all these recent results need to be confirmed. 4.4.2. Dendritic spikes Can synaptic inputs trigger spike firing on excitable dendrites? This question has long been debated. Neurophysiologists wonder very early whether the transfer of electric signals along dendrites and axones followed the same rules. If so, the behaviour of the neurone would rely on a simple operation principle, the generation of impulses and their non-decremental conduction. This would have readily explained how distal synapses could communicate with the soma but made harder to understand how the numerous synaptic inputs could be integrated on dendrites to deliver a single synthetic signal to the soma [281]. The electrophysiological study of spinal motoneurones brought a negative answer to that question. Thus the idea of nondecremental conduction on dendrites was replaced by the notion of passive dendrites. This changed when LlinS.s and Sugimori [9] suggested on the basis of somatic and dendritic recordings of Purkinje cells in vitro that calcium spikes could be initiated on dendrites and elicit a "complex spike" with sodium and calcium components at the soma. Very recently it was demonstrated on mitra| cells that sodium spikes could also be initiated locally in the dendrites by a sufficient excitation of the distal dendritic tuft [134]. Using imaging, direct evidence was also given for the initiation of calcium spikes in the distal dendrites of neocortical pyramidal cells in vitro [282]. Thus it appears that calcium spikes and sodium spikes observed of dendrites can be directly initiated by local input on the dendrites or result from the back-propagation of axo-somatic sodium spikes. A priori dendritic spikes, that travel at a velocity of the order of 0.1 m/s seem to provide a very efficient way for the fast signalling to the soma of synaptic excitation occurring on a distal tuft. The problem of voltage transients attenuation along the almost 0.5-1 mm long primary dendrite of mitral or pyramidal cells, in which dendritic spikes are recorded, would thus be solved. Models of dendritic spike initiation and propagation [283,284] mitigate this simple view and provide important insights regarding the interplay between dendritic morphology, membrane excitability and synaptic inputs. Firstly, the threshold for initiating an action potential in distal arbours critically depends on the spatial distribution of the excitatory inputs [285]. A spatially distributed input (whereby many synapses simultaneously depolarize a large area of the dendrites and, consequently, the tree becomes effectively more isopotential) improves the condition for spike initiation as compared to a spatially restricted input. This is just the concept of liminal length for spike initiation on axones [286] applied to dendrites. Secondly, the propagation of the action potential in a homogeneously excitable dendritic tree is more secure toward distal branches; dendritic spikes usually block proximally. Indeed in the distal direction (from soma to dendrites), the action potential typically propagates from
Neurones as physical objects." structure, dynamics and function
431
thicker to thinner branches, and towards the favourable (sealed end) boundary conditions in the distal terminal tips. In the proximal direction, however, the increasing diameter, the sister branches and cousin arbours, and the leaky boundary condition provided by the soma and the other dendritic trees radiating from it all impose an important load on thin distal branches. Consequently, most of the currents flowing entering through voltage-dependent channels are dissipated in this current sink, and the resulting depolarizing membrane current density is typically insufficient to fire the thicker, more proximal, branches. Hence, a local input to a distal excitable dendritic arbour is likely to generate a regenerative response in only a limited distal portion of the tree [285]. Thirdly spike initiation requires that voltage-dependent channels be sufficiently activated to trigger a regenerative response. In view of the low average density of these channels of dendrites and the nonuniform distribution of repolarising conductances (such as the A-type potassium conductance) heterogeneous distribution, this entails that spike initiation cannot occur as easily everywhere. It will take place preferentially at sites where synaptic input is strong and membrane excitability is above average due to a high local density of depolarizing channels, a local depletion in hyperpolarizing channels, or at places where local excitability is enhanced by the presence of numerous active spines (see Section 4.5). These various determinants of dendritic spike firing are all found in cerebellar Purkinje cells, which thus provide a particularly illustrative example. These cells display an extremely profuse planar dendritic arborisation. Except in proximal regions, these dendrites are studded with excitable spines. Sodium channels are restricted to proximal regions whereas calcium conductance seems to be distributed over the whole dendritic surface. Potassium hyperpolarizing conductances are preferentially located in more distal regions [280]. Climbing fibres make numerous and strong en passant synapses along the main trunks of dendrites whereas the other excitatory input, the parallel fibre synapses, impinge on dendritic spines located on the thin tertiary branches. Two types of responses may be distinguished in Purkinje cells. Firstly, somatic depolarization (with an intra-cellular electrode) triggers a burst of sodium spikes interspersed with strong calcium-dependent depolarization in different regions of the dendritic tree. A similar response (the "complex spike") is observed following the activation of the climbing fibre input. Secondly, the activation of spiny dendrites by individual parallel fibres leads to a subthreshold depolarization at the soma whereas locally, at the spine head, voltage-gated calcium transients are observed [287]. The synchronous activation of many parallel fibres may lead to a strong calcium spike in the dendritic tree and, consequently, to a response of the cell similar to the response elicited by climbing fibre activation.
4.4.3. Spike back-propagation It has been considered for a long time that the axo-somatic spike spreads into the proximal dendrites by passive diffusion. The strong leak of current from the soma to the dendrites favours this invasion. However, passive current loss through the dendritic membrane, and current division at branch points, is likely to cause a strong attenuation of the spike amplitude as it moves away from the soma.
432
c. Meunier and L Segev
Therefore the demonstration, by simultaneous recordings at the soma and in the apical dendrite of pyramidal neurones in vitro [26] that axo-somatic spikes could invade dendrites in depth with little decrement along the way, aided by a low density of sodium channels in the dendritic membrane (see Fig. 24) came to many as a surprise. It showed that spikes recorded were not necessarily genuine dendritic spikes signalling strong synaptic excitation to the soma. The back-propagating fast sodium spike provided a "hand-shaking" mechanism by which dendrites could be aware of the ongoing spiking activity of the axo-somatic region. These results have now been extended to different cell types (see review in [288] and recent work on thalamo-cortical neurones [289]), which has led to a more nuanced appreciation of back-propagating spikes [283]. A gradient of potassium Atype channels from the proximal to distal regions was demonstrated on apical dendrites of hippocampal pyramidal cells, which may regulate the invasion of these dendrites by back-propagating sodium spikes [250]. It was also shown that the density of sodium channels on Purkinje cells strongly decreased when moving distally, at variance with the apical dendrite of pyramidal cells, so that most dendritic membrane is essentially deprived of sodium channels. This, together with the large effective membrane area of spiny dendrites (see Section 4.5), hinders spike re-
Soma
100 p 20 m V s
Fig. 24. Back-propagation of sodium spike from the axone to the dendrites. A reconstructed layer V pyramidal neurone is shown at left. Synaptic input was delivered to the apical dendrite; this initiated a sodium action potential first in the axone (lower flame in right). The action potential then propagated backwards into the soma and, consequently, into the dendrite (top flame in right). Axonal and dendritic recordings were 17 and 270 l~m from the soma, respectively. Vertical line indicates time of action potential peak at the soma. Reprinted from [288].
Neurones as physical objects: structure, dynamics and function
433
generation. Spike back-propagation is then passive and decremental, and distal regions of the dendritic tree are not invaded [273]. In this context, background synaptic activity [266] is likely to play an important role [284]. As the density of fast sodium channels on dendrites is low, a large increase in passive membrane conductance due to synaptic shunt may simply preclude any genuine regeneration of the action potentials. Back-propagation would then be much closer to a passive diffusion (mitigated by a small boosting effect of sodium conductances) than to a non-decremental conduction with a definite velocity. Therefore phenomena clearly observed in vitro might have no significant counterpart in neurones in vivo.
4.4.4. Amplification, regular&ation and linear&ation of synaptic inputs What are the functional consequences of dendritic excitability? Are there some important effects of voltage-dependent dendritic conductances unrelated to the firing and propagation of spikes? Excitable channels on dendritic branches and spines, in particular sodium and calcium channels, provide a mean for boosting the local synaptic inputs. An appropriate distribution of depolarizing channels might compensate for the significant attenuation of distal voltage transients synaptic due to electrotonic spread, and preserve the waveform of these transients, so that proximal and distal inputs elicit the same effect at the soma [25]. This general idea that active conductances correct for the intrinsic deficiencies of passive dendrites and enable synaptic inputs to add their effects linearly and independently of their location was investigated by many authors over the years on model dendrites [25,290-292]. The predictions of these models recently gained some experimental support [275,293-295]. Most of these experimental and theoretical studies refer to in vitro conditions. The main problem encountered in all these studies is that a heterogeneous distribution of conductances, with an excitability gradient from proximal to distal dendrites does not agree with the observed distribution of channels where hyperpolarizing conductances are often found to be preferentially located in distal regions (see previous section). This problem is well illustrated by the work of Siegel and coworkers [296] who investigated on multicompartmental models of neurones slow activity-dependent changes in the density of channels (see Section 2.3.4). Neurones spontaneously developed over time a spatially heterogeneous distribution of active channels, that tended to make the net effect at the soma of synaptic inputs independent of their location on the dendrites. A uniform distribution of synaptic inputs created opposite gradients of depolarizing and hyperpolarizing conductances. Depolarising channels became localized more distally on dendrites and hyperpolarizing channels more proximally, just the opposite of many experimental observations on the distribution of channels. Can these contradictory results be reconciled? One suggestion is that hyperpolarizing conductances (or conductance increase due to N M D A receptors) might reduce saturation of the synaptic current and enable synapses to work in their linear range [250,297], and therefore more efficiently. Weak active properties might also have a completely different role: modifying the filtering of time-dependent inputs by
434
C. Meunier and L Segev
dendrites, generating N M D A oscillations [298,299], and causing a resonant response to particular inputs [300]), isolating synaptic inputs via the decrease of the effective local space and time constants [297], controlling the back propagation of axo-somatic spikes and the initiation and spread of dendritic spikes (see below). Clearly their role in synaptic integration is not understood. 4.5. Dendritic spines." a microcosm 4.5.1. Spiny dendrites Dendritic spines have already been evoked a few times in this section. These short appendages (see Fig. 25) densely decorate the dendrites of many types of neurones, in particular pyramidal neurones in the neocortex and in the hippocampus; the dendritic tree of these neurones is covered with several thousands of spines. Still the most striking example is the cerebellar Purkinje cell, the dendrites of which are studded with about 100,000 spines (on the order of 10 spines per 1 lam). Dendritic spines were first depicted in these cells at the end of the 19th century by Ram6n y Cajal. Since that time, anatomists tend to divide neurones into two general classes, the spiny and the aspiny (or smooth) neurones. The development of the electron microscope enabled one to measure the dimensions of dendritic spines. These little dendritic branches have a typical shape consisting of a short neck (about 1 ~tm long and 0.1 lam in diameter) which terminates with a bulbous head (with diameter of 0.5 ~tm or less and a minute volume of 0.005-0.3 ~tm3). The typical area of a single dendritic spine is 1 lam2. In cortical pyramids, spines may constitute up to 25% of the total dendritic area; in Purkinje cells, 75% of the dendritic area is in spines. Clearly spines cannot be ignored but their role has long remained mysterious, all the more as they are not found ubiquitously: Spinal ~ motoneurones, for instance, have smooth dendrites. Many ideas have been proposed on their involvement in dendritic excitability and synaptic plasticity, that for a long time could not be experimentally confirmed. 4.5.2. Passive spines and excitable spines Long before the concept of synapse was even suggested Ram6n Y Cajal suggested that spines serve for connecting axones to dendrites. This was confirmed by electronic microscopy that showed that essentially all spines receive an excitatory (asymmetric) synapse typically on the membrane of their head. In some cases, a second synapse contacts the same spine; in such cases this synapse is inhibitory [301]. This view of spines as mere contact zones prevailed until it was realised that the small diameter of the spine neck implies a very high resistance (several hundreds M~) between the spine head membrane and the spine base. The electric implications of the small dimensions of the spine neck were first analysed systematically by Rall [302] for passive spines. Rall was puzzled by the results from ultrastructural studies showing that, statistically, spines with thin necks tend to be located on distal thin dendrites whereas stubby "mushroom"-like spines tend to emerge from thick proximal dendrites. Using a simple electric model for the spine, Rall showed that when the resistance of spine neck matches the input resistance at the spine base (the
435
Neuronesas physicalobjects."structure,dynamicsandfunction
A
1 lxm
/
B
C
ii
Synapse
Spinehead---~~ ~
gsyn(t)Esyn
Spineneck---~1I Spinebase~~'~ I-
gr
C
Rneck~
Dendrite _i__
_L
2_
Fig. 25. Dendritic spines. (A) A segment of spiny dendrite from hippocampal CA1 region. Reconstructed from 71 serial section electron micrograph (from Synapse Web, Boston University, http://synapses.bu.edu/). (B) Schematic representation of a single spine receiving an excitatory synapse at its head membrane. (C) Compartmental model of the structure in A. The synapse is modelled as a transient conductance change, in parallel with the passive conductance, grest, and capacitance, C, of the spine membrane. The spine neck is modelled by an axial resistance, Rneck connecting the spine head to the dendritic shaft.
gsyn,
input resistance of the dendrite), small changes in spine neck diameter - therefore in its resistance - significantly affect the efficacy of the excitatory synapse impinging on this spine. He suggested that spine neck resistance might serve as a biophysical mechanism for plastic changes in the nervous system. It was later found that the spine neck consists of actin filaments; this raised the intriguing possibility that the
436
C. Meunier and L Segev
spines might "twitch" like muscle fibres and change their neck diameter in response to the excitatory input they receive [303,304]. In recent years it was indeed shown that spines may undergo activity-dependent morphological changes [305]. In some cases spines may serve not only as post-synaptic input elements, but also as presynaptic output elements. The dendro-dendritic synapses in the mammalian olfactory bulb is one clear example. In this case, two synaptic contacts with opposite polarities coexist within the same dendritic spine of a granule cell. One is an output synapse whereas the other is an input synapse. This reciprocal dendro-dendritic synaptic connection is responsible for the mitral-to-granule excitation and the granule-to-mitral inhibition; this negative-feedback loop underlies the rhythmic oscillatory behaviour found in the olfactory bulb [306]. Very similar reciprocal synaptic arrangement was found in the retina [307]. In such cases the interaction between dendrites of nearby neurones is very local and does not involve the soma: individual dendritic spines constitute the functional input/output units. The existence of excitable channels in dendrites raised the possibility that spine heads might also display active membrane properties. Synaptic input might activate these channels, perhaps even eliciting a full spike in the spine head [307-309]. This local boosting mechanism of the synaptic input by the excitable channels in spines may spread via a chain reaction to other adjacent excitable spines thereby firing a whole region of the dendritic arbour. The strong amplification of synaptic inputs by this chain reaction mechanism was addressed in numerical studies [310,311], that highlighted the important dependence of the spread of excitable waves in dendritic structures on the interplay between membrane excitability, spatial distribution (and density) of excitable channels and synaptic input [285]. 4.5.3. Spines as chemical compartments
In recent years the spotlight of attention shifted to the chemical, rather than electric, implications of the presence of spines on dendrites. The very small volume of the spine head implies that synaptic activity or back-propagation of the axo-somatic action potential may elicit large local changes in ion concentrations [312-314]. With the development of two-photons microscopy and ion-sensitive dyes, it became possible to optically record changes in calcium [315] and sodium [316] ion concentrations in individual dendritic spines. Chemical compartmentalization of the dendritic tree could be very important for local plastic processes (modulation of specific synapses, for instance). The activity-dependent increase in Ca 2+ and Na + concentration in spines may start a cascade of intra-cellular events responsible for long-lasting changes in the efficacy of the activated synapses. The question of whether such plastic changes affect only individual spines or concern more extended regions of the dendrites was explored in a recent study [317] in organotypic slice cultures of rat hippocampus. Using a local superfusion technique, Engert and Bonhoeffer were able to localise and individually activate neighbouring groups of synapses made on spines of pyramidal cells by Schaeffer collaterals with a spatial resolution better than 30 ~m. Their result indicates that there is no input specificity at a distance of less than 70 ~tm. Synapses near a site of potentiation are also potentiated regardless of their own history of activation, whereas far away synapses
Neurones as physical objects: structure, dynamics and function
437
show no potentiation. These results suggest that the strict concept of a very local "Hebbian" synapse has to be modified to encompass the notion of enhancement spreading at several tens of micrometers within the dendritic tree. This suggests that groups of synapses, rather than single synapses, constitute the smallest functional scale of dendrites, in keeping with the idea that individual synapses have a negligible effect on the firing of a neurone. 4.5.4. Modelling spiny dendrites Modelling spiny dendrites is not a trivial matter, because spines introduce a smallscale geometry in dendrites. Accordingly they must be studied both at the very local level of electric and chemical events occurring in a single spine (or in a cluster of adjacent spines), and at the global level of the dendritic tree where their net effect on the electric behaviour must be understood. For questions addressed at the finest grain, each spine is individually modelled as a spherical head compartment connected to the spine base by a thin cylindrical neck. If chemical processes in spines are of interest - for instance, when analysing the electrochemical response to a synaptic input at the spine h e a d - chemical or, more appropriately, electro-chemical diffusion equations are used to describe the flow of ions, in particular Ca 2+ ions in the spine head and between the spine head and the dendritic shaft [312-314]. If the interest focuses on purely electric phenomena at the spine, the spine head membrane may be modelled as an isopotential R - C circuit, connected by a short cylindrical cable to the dendrite (see [302,309] for details). For questions regarding the global effect of thousands of spines on the integrative properties of dendrites, the approaches suggested are based on insights gained from cable theory. Even in the passive case one cannot generally "add" the spines to the dendrites in a simple way. Still this can be done when currentflowsj~'om the dendrite into the spine [154,309] which is of relevance to study, for instance, the propagation of an axo-somatic spike back into spiny dendrites [284]. Indeed it can be shown in that case that for biologically plausible values of the specific membrane resistance Rm, capacitance Cm, and of the cytoplasmic resistivity R;, the spine base and the spine head membrane are nearly isopotential. The membrane of spines can then be incorporated into the membrane of the dendrite in two equivalent ways. One is by increasing the physical dimensions (length, g, and diameter, d) of the dendritic shaft [318]: g ' = gF 2/3 and d ' = dF 1/3. The renormalizing factor F > 1 is just the ratio of the total membrane area (including spines) to the membrane area of the sole dendritic shaft. The other method is to renormalise the electric properties of the dendritic shaft rather than its geometric properties [249,266] R'm - R m / F and Um = CmF where F is defined as above. In both methods, the specific resistance, input resistance, transfer resistance, membrane time constant and effective electrotonic length of the spiny segment (which is larger than the electrotonic length of the dendritic shaft [266]) are preserved. Both methods can be used when the spines and parent dendrite have the same specific membrane properties (R m and Cm) but the second method can also be extended to the case where they have different membrane properties [266]. In this case, one first computes an effective specific conductance Rm for the spiny dendrite using the relative membrane areas of the spines (that may have
438
C. Meunier and L Segev
a low Rm value due, for instance, to tonic synaptic activity) and of the dendritic shafts. One then uses either of the methods mentioned above to model the spiny dendrite as an "equivalent" cylinder. However spines cannot be incorporated into dendrites as just described when the current is generated at the spine head membrane by synaptic input, because there is a large voltage drop between the spine head membrane and the spine base in this condition. To deal with this problem Baer and Rinzel [311] formulated a new cable theory to investigate the electric interactions between many passive or excitable dendritic spines. In their study the spines, that interact only indirectly by voltage spread along the dendritic shaft, are considered as a continuum. This assumption requires that a spine density can be defined at the scale of the effective dendritic space constant, which is supported by experimental results on the characteristic spatial range of excitability [317]. Spiny dendrites can then be described by two partial differential equations, one including a voltage diffusion term and governing voltage evolution along the dendritic shaft, the other without a diffusion term governing voltage evolution in the spines.
4.6. Discussion 4.6.1. Conceptual importance of passive cable theory Cable theory for dendrites penetrated the scientific community slowly but deeply. Among cellular neuroanatomists and neurophysiologists the possible functions of dendrites began to attract, with time passing, serious attention. Old experimental results were re-interpreted and new experimental and theoretical studies were designed to explore the input-output function of dendrites [319,320] (see reviews in [154,258,321-323]). It is now well accepted that most of the synaptic information transmitted between nerve cells is processed in the dendrites and that it is there that many of the plastic changes underlying learning and memory take place. Thus, we speak naturally today, although too often loosely, of "dendritic integration", "spatio-temporal summation of synaptic inputs", "dendritic non-linearities", "dendritic plasticity", etc. Much of this new vocabulary has emerged from Rall's cable theory. Passive cable theory for dendrites provided a tentative comprehensive theoretical framework for understanding the processing of electric signals in dendrites. It defined the key biophysical parameters governing the flow of electric current flow in dendritic trees. One of them is the membrane time constant which, to a first approximation, sets the relevant time-window for synaptic integration. Synaptic inputs that arrive within this time-window will interact (see [324] for a fuller discussion of the role of the membrane time constant). The second key parameter is the electrotonic length of dendrites determined by both the morphology and the diffusion length k; this parameter controls both the amount of local interaction between synapses and the signal transfer to the soma. The impedance matching at branch points, which depends on the ratio between diameters of parent and daughter branches as well as on their electric properties also plays an important role in determining the profile of voltage attenuation in dendrites.
Neurones as physical objects: structure, dynamics and function
439
4.6.2. What did we learn from passive cable theory? Passive cable theory suggested new experiments that enabled experimentalists to extract these key parameters from single electrode recordings at the soma. Eq. (22) implies that the time-course of the voltage response to a step current injection (and the voltage decay at the end of current input) is not governed by a single exponent [326]. Voltage build-up as well as decay is faster than expected in an isopotential R - C circuit, in which the voltage time course is governed solely by the membrane time constant, ~. It is only at sufficiently large time that the voltage over the whole tree equalizes, current flows only across the membrane (provided that the tree has homogeneous membrane properties [246]), and that the voltage decay rate is governed by r. Rall showed that the values of the equalizing time constants ~i, i ~> 1, depend on the electrotonic length, L of the dendritic tree, and that the voltage equalizes faster for electrotonically compact trees. He showed how L could be directly estimated from t0 and r l through the formula (valid for a uniform cylinder) L =
V/T0/T1- 1 Both t0 and 'C 1 can be recovered experimentally from the transient voltage response to current injection to the soma by "peeling" exponents [325,1000]. Using this "peeling" method, the passive membrane time constant of many central neurone types was shown to range between 5 and 100 ms (which implies that the specific membrane resistance, Rm is between 5000 and 100,000 f~ cm 2 assuming that the specific capacitance is of the order of 1 laF/cm2). The typical electrotonic length, L, of dendrites was estimated to be on the order of 0.2-2 (see reviews in [44]). In particular it was shown that the average intrinsic electrotonic length of the profuse dendritic trees of spinal motoneurones (up to 1 mm) was of the order of 1-2)~ [175]. We remark once more that these estimates of the intrinsic electrotonic length were obtained from preparations with little background synaptic activity, in particular from motoneurones of cats anaesthetized with nembutal. Rall predicted the specific membrane resistance and of the membrane time constant of spinal motoneurones by applying cable theory to a simplified model of the dendritic tree as an equivalent cylinder [262]. They were several times higher than the previous estimates of Eccles [327,328] who, relying on the experimental measurement of the input resistance at the soma and assuming that the length of dendrites was at least 3)~ thought that dendritic synapses had little impact on the soma. Rall's predictions were validated when detailed morphological studies of motoneurones by Burke [329] provided the data required to compute the input resistance at the soma as a function of the specific membrane conductance. Still obtaining precise estimates of the specific membrane resistance using cable theory proved harder than initially believed. The above estimates were derived from recordings with sharp electrodes. This raised the issue of the leak around the electrode, that should be subtracted [321,330], and led to the suggestion that the specific conductance of the soma was perhaps higher than the specific conductance of dendrites [27]. Recent experiments on hippocampal neurones using perforated clamp electrodes, led to a revision of the estimates of the time constant and input
440
C. Meunier and L Segev
resistance of neurones [28]. Both quantities were found to be several times higher than previously thought. This led to the important conclusion that the passive membrane properties of the soma and dendrites were quite similar (a conclusion also reached for active membrane properties). This further emphasized the electrotonic dominance of dendrites with respect to the soma, which follows from the fact that the total dendritic membrane area is typically one order of magnitude larger than the area of the soma. Using the equivalent cylinder model of the dendritic tree, Rall showed that dendrites were electrotonically compact. On such a compact cylinder a single current input at a distal site will elicit a voltage response at the soma that will not be strongly attenuated compared to the local response at the input location. However, on a tree, even on a relatively compact one, voltage is expected to decrease very steeply between distal excitatory dendritic synapse and the soma because of the "leaky" boundary conditions imposed on thin distal dendritic arbours by the thicker parent branches. The peak synaptic potential near the synaptic input site may then be two orders of magnitude larger than that observed at the soma. Still a large portion of the electric charge injected by the distal synapse will reach the axo-somatic spike initiation region [238,331]. Clarifying this issue was one of the major successes of passive cable theory. Many of these early theoretical predictions can now be tested directly with pair electrode recordings from the soma and dendrites of the same cell, using the infra-red DIC video microscopy [14]. The peak voltage attenuation and the degree of charge transfer from dendrites to soma can now be directly measured in in vitro preparations. Cable theory also introduced a method for estimating at which electrotonic distance, X]n from the soma synapses were located. The rise time and half-width of a synaptic potential at the soma increases with X]n [319] (see also [154]). This change in the "shape indices" of the somatic EPSP can be used for estimating both X]n and the time course of the synaptic input. Brief rise time and relatively narrow EPSP's indicate a perisomatic origin whereas slow rise time and broad EPSP's point out to distal input sites. This theoretical tool was first used in spinal a motoneurones, to determine the location of excitatory Ia synapses. They were estimated to be widely distributed from the soma to rather distal locations at 1.5)~ from the soma. The time course of the Ia synaptic current was estimated to be very brief, on the order of the millisecond. These predictions were later confirmed by experiments combining the morphological determination of synaptic location and the electrophysiological recording of the corresponding EPSP at the soma [332]. We note that such comparisons require detailed morphological models of dendrites.
4.6.3. Are dendrites really divided & functional compartments? The appealing idea of functional dendritic subunits must be considered with some caution: 9 Sub-units are defined on quantitative criteria. There is no limit where subunits can be rigorously defined as electrically uncoupled regions; 9 There are two important space scales in dendritic tree: the local space constant )~ that characterizes the range of synaptic effects (voltage spread and shunt effect),
Neurones as physical objects: structure, dynamics and function
441
and the electrotonic length L, that defines the electrotonic distance between distal dendrites and soma, and between different distal dendritic branches. The intrinsic electrotonic length of dendrites is estimated to be in the range 0.2)~-2)~. Therefore the two space scales may really differ only in thin distal dendrites where the local space constant is small; 9 Background synaptic activity may drastically change the electrotonic architecture of dendrites (see Fig. 26). A priori it does not affect uniformly the dendritic tree. The notion of functional subunit can make sense only if strong synaptic activity on a compartment does not change its electrotonic length so much that it can no longer operate as an integrated unit, and provided that the quasi-isolation from the other subunits is preserved, whatever be the synaptic activity. One may argue that functional compartmentalization is flexible and adapts itself to the physiological conditions, but this remains unsubstantiated. 9 The distribution of synaptic inputs on the dendritic trees should match the electrotonic compartmentalization, so that each functional unit is involved in the processing of synaptic inputs associated to given pathways and local operations make sense physiologically. Synaptic inputs from different pathways often impinge on dendrites at different distances from the soma (see [307] for a review). Still there are examples of widely distributed synaptic inputs (for instance, Ia synapses on
Background synaptic activity (3 Hz)
No synaptic activity
Fig. 26. Effect of spontaneous background synaptic activity on the electrotonic structure of dendrites. In this graph, the morpho-electrotonic transform, MET [261] is used to depict the effective electrotonic structure of the dendritic tree as seen from the soma (centrifugal direction). In the absence of background synaptic activity the neurone is electrically compact. However, when each of the 4000 excitatory and 1000 inhibitory modelled synapses is activated randomly at an average frequency of 3 Hz, the neurone becomes electrically extended. The scale bar corresponds to the distance over which steady voltage attenuates by a factor of e~ while spreading from the soma towards the dendritic tips. Modified from [265].
442
C. Meunier and I. Segev
spinal a motoneurones [333]), and evidence of segregation of inputs on different subtrees is poor (still mitral cells that receive sensory information in glomeruli provide an example). Therefore a strict segregation of inputs on dendritic trees cannot be considered the general rule; * The usual concept of functional compartments can be easily extended to incorporate weak active membrane properties. However a redefinition of functional compartments is necessary to take into account strongly non-linear events, such as dendritic spikes. Indeed, dendritic spikes may be easily transmitted through a branch point but fail to be transmitted in the reverse direction through the same branch point if impedances are mismatched. The same asymmetry is not displayed in passive functional compartments, the transfer resistance being invariant under the exchange of the input and recording locations. A possible extension could be to define a functional compartment as the subtree invaded by the dendritic spikes generated at a given location. Considering all these caveats, the idea of a functional compartmentalization within a dendritic tree certainly makes sense when the distal arbours consist in separated morphological subunits such as glomeruli, each of them electrotonically compact and receiving inputs from a specific source (see Section 3.3 for a similar situation on axones; note however that in this latter case the physiological input was constituted by the afferent action potentials, and axo-axonic synapses controlled its distribution to terminals). Imaging of calcium ions accumulation in apical dendrites of neocortical pyramidal cells also hints at the existence of two separate functional compartments: the main apical dendrite, and the distal apical tuft [334]. This distinction makes sense as the primary dendrite has certainly a dedicated function, such as efficiently transmitting to the soma the net current resulting from the synaptic integration of synaptic inputs in the distal tuft. However it is still unwarranted that most dendritic trees are organized in functional compartments, actually utilized by the neurone for a sophisticated processing of synaptic inputs in physiological conditions. Spinal motoneurones, for instance, seem to provide a counter-example: several profuse and balanced dendritic trees with similar morphology radiate from their soma, on which synaptic inputs seem to be poorly segregated.
4.6.4. Active membrane properties and synaptic integration In view of the low estimates for the density of voltage-dependent channels on dendrites it seems reasonable to think that weak synaptic inputs might "sum" according to the principles first brought to light by Rall, although passive integration is certainly modified by the membrane non-linearities. On the opposite strong synaptic input may trigger dendritic spikes. These spikes might, for instance, signal to the soma the synchronous activation of synaptic inputs, an idea that underlies B. Mel's recent work on input classification [259,335]. Pushing to the limit the speculations found in recent experimental and numerical papers, one might then propose that non-linear dendrites display two complementary modes of synaptic integration. In the first mode, best appropriate for integrating moderate and asynchronous inputs, synaptic excitations would add their effects linearly and independently of their location to deliver a net depolarizing
Neurones as physical objects." structure, dynamics and function
443
current with low high frequency contents to the axo-somatic region (weakly nonlinear mode), membrane non-linearities just compensating for the passive attenuation of electric signals. In the second mode strong synchronous events would be detected, and signalled to the soma by dendritic spikes, eliciting in the spike generation zone the sharp voltage transients known as fast prepotentials (strongly nonlinear mode). Still such speculations are unwarranted and might be mere fantasies. How could a linear behaviour of dendrites arise from their non-linear properties? What are these"weak non-linearities", nonetheless able to have a regenerative effect on back-propagating spikes and even to elicit fast or slow dendritic spikes? All these questions urgently need to be put back to firmer mathematical and physical grounds. Moreover the presently feasible in vitro on dendrites might mislead us on the actual operating principles of neurones in physiological conditions. Still, with modern refined imaging techniques [15,336] combined to theoretical studies of model dendrites, the dream of understanding synaptic integration is hopefully closer to becoming true than we imagine. 4.6.5. Is passive cable theory still relevant for active dendrites? Passive cable theory provides the skeleton upon which the more complicated nonlinear case, in which membrane conductance is voltage-dependent and synaptic inputs are transient conductance change, rather than linear current source, is considered. It is often erroneously believed that active membrane properties completely change the behaviour of the neuronal membrane. However active properties may have a mere quantitative impact, without giving rise to qualitatively new phenomena on dendrites. This may well be the case in many physiological situations, if we keep in mind that dendrites do not seem endowed with a high density of voltagedependent channels and that background synaptic activity increases the relative importance of linear membrane properties compared to non-linear properties. In such cases one may hope to understand how active properties modify the behaviour of passive dendrites by a perturbative approach. Moreover passive properties are still relevant, even if strongly non-linear events occur. The best example of that is supplied to spike conduction on axones, the velocity of which is largely determined by the passive properties of the membrane. 4.6.6. How can we speak of spike propagation on compact dendrites? The very notion of spike conduction is relevant only for non-compact dendrites, all the more as spikes depolarize a wide region, the width of which exceeds one space constant (see the discussion of axonal spikes width in Section 3.4). If the length of the cable is of the order of its space constant (which is particularly true for the basal tree of cortical pyramids when only the intrinsic passive membrane properties are considered) then relatively little passive attenuation will occur between the proximal and distal ends, and boundary conditions will have a strong impact on the voltage transients, and the distinction between passive diffusion and regenerative conduction will be minute. This is not so if the electrotonic length becomes one order of magnitude larger than the space constant, due to intense background synaptic activity. In this case, strong passive attenuation will occur from proximal to distal
444
C. Meunier and L Segev
ends, boundary effects will be limited to relatively short proximal and distal regions of the cable, spikes will be spatially narrower, and there will be a clear difference between passive spread and non-decremental conduction. In the former case dendrites will behave as a limited medium largely dominated by finite length effects, in the latter they will really behave as an extended medium. This distinction made no sense on the long and highly excitable axones. Moreover, as the density of fast sodium channels on dendrites is low, a large increase in passive membrane conductance due to synaptic shunt may simply preclude any genuine regeneration of the action potentials. Back-propagation would then be much closer to a passive diffusion (mitigated by a small boosting effect of sodium conductances) than to a non-decremental conduction with a definite velocity. Therefore phenomena clearly observed in vitro might have no significant counterpart in neurones in vivo.
4.6.7. What is the function of spike back-propagation &to the dendrites? The functional implications of spike back-propagation remain obscure. How do sodium spikes, and the fast kinetics transient and persistent sodium and potassium channels necessary for their regeneration, affect synaptic integration? Firstly, we note that the low density of dendritic sodium channels ensures that the dendritic tree will continue to integrate the many synaptic inputs it receives, without activation of local sodium actions that would spread over the whole dendritic tree and resets its membrane. At the same time fast voltage-dependent currents may help to boost local synaptic inputs and compensate for passive attenuation (see Section 4). Secondly, the invasion of sodium spikes into the dendritic tree lasts only a few milliseconds so that most of the time dendrites will not "feel" the direct effects of axo-somatic spikes. Temporal coincidence of synaptic inputs and the back-propagating spike within this brief time-window are required to give rise to supra-linear voltage response in dendrites [135] (see Fig. 27). At a longer time scale, the back-propagating spikes were shown to act indirectly by the activation of slow (at the seconds range) calcium dynamics in dendrites and spines. This may provide a signal for triggering Hebbian synaptic plasticity mechanisms in the dendrites [315,337,338]. Still one may wonder how distal synapses, located outside the region invaded by the back-propagating spike can be affected. 4.6.8. Active cable theory In view of the diversity of dendritic ionic currents in terms of reversal potential, steady-state activation and inactivation, time constants, and spatial distribution, the very idea of a non-linear cable theory may seem meaningless. Non-linear reactiondiffusion equations cannot be analysed as easily as linear diffusion equations, and, at variance with the passive case, all non-linear properties can probably not be incorporated in some comprehensive framework. Accordingly very few general results have been yet established on non-linear cables. We can just mention the generalisation of the concept of equivalent cylinder to the non-linear case [248]. In
445
Neurones as physical objects." structure, dynamics and function
.........,,~x~...................... ,,, pia
................... ...
B Vrn
"--._ ,,,
,,,
1.2/3
gi /
-,_i
_
Istim
~
"
C 5O mV 3hA
'~ /
Vm
lOms
h
. . . . . .
D 1.4
|,~m r-l
,..=._
At
>~.
E
,i
..7
"':'"
-
i
{
Fig. 27. Calcium spike in the dendrites elicits a burst of sodium action potentials in the axone. (A) Reconstructed pyramidal neurons with schematics showing site of electrode recordings (soma in black; blue, 400 Ixm and red, 770 lxm from the soma, respectively). Scale bar= 200 lxm. (B) Current injection (Istim via the distal electrode on its own results with subthreshold depolarisation at both the input location (red trace, Vm) and at the soma. (C) Step current injection to the soma gives rise to an action potential at the soma (black trace) which then propagates with decreased amplitude to the dendrite (blue and red traces). (D) Combining B and C injections, separated by 5 ms, evoked a burst of three Na + action 9 , . _ . 2 + sp" lke i"n the distal dendrite (red). (E) Distal potentials following the onset of a -broaa~ Ft~a dendritic C a 2+ action potential can be also initiated by a stronger current input alone via the distal electrode [135] by permission of Macmillan Magazines. addition non-linear cable equations can rarely be solved analytically and numerical methods must be applied. Still we know well from Physics that large classes of non-linear partial differential equations may display the same behaviour: just think of fully integrable equations that exhibit solitons. Moreover some important general issues that have been raised can probably be best addressed from a theoretical viewpoint: Can a heterogeneous distribution of membrane non-linearities exactly compensate for passive attenuation so that individual inputs have the same effect at the soma, independently of their location. Can they add linearly their effects? Do calcium spikes and back-propagating sodium spikes travel along dendrites without interfering at all with the passive integration of weak inputs? The answer to all these questions is obviously no, as nonlinear equations cannot exhibit a linear behaviour. But understanding if these ideas
446
C. Meunier and L Segev
make any sense at all, to some degree of approximation and within some physiologically relevant operating regime, will certainly be best achieved by analysing theoretically the properties and the behaviour of simple non-linear cable equations.
5. Conclusion
The canonical neurone described in most textbooks is grounded on the histological observations of Ramon y Cajal's: Inputs are added in the dendritic arborization, action potentials are fired by the perisomatic region of the neurone, they are transmitted and distributed by the axone. Sections 2, 3 and 4 made clear that bifurcation theory, singular perturbation theory, and passive cable theory have unravelled basic physical mechanisms underlying the operation of these three regions. Is it enough to piece together three subunits with distinct geometric and electric properties to understand the operating principles of neurones? The last two sections cast some doubts about that. Even if they deliver output spike trains to their targets in response to synaptic inputs, neurones are not simple polarised devices where the input of each region is determined by the output of the region immediately upstream. The back-propagation of axonal spikes into the soma and dendrites, where they might interfere with synaptic integration is good evidence for that. In addition the elaboration of the output signal of the neurone may involve all the morphological regions of the subunits of the neurone: somato-dendritic integration of synaptic inputs through linear and non-linear processes, generation and patterning of action potentials trains in the axo-somatic region, selective distribution of these action potentials to the post-synaptic neurones in the axonal arborization. Therefore it appears that the old view that the central nervous system processes information in a feed-forward fashion must be revised, not only at the level of networks where recurrent connections inside neural centres and feed-back between areas were progressively shown to play an important role, but also at the elementary level of the single cell. Furthermore the behaviour of neurones may change depending on their working environment, that is, on the physiological state of the animal and the task performed, due to variations of the average synaptic input or neuromodulation. In the first case, the balance between the different voltage-dependent currents is modified by shifting the operation point of the neurone while in the second case the very properties of the voltage-dependent currents are affected. On the contrary, neurones may be stabilized against perturbations of this environment by slow plastic processes that modify the density of voltage-dependent channels. All this points out to the fact that we must now address the neurone as an integrated unit, understand what are its dynamical states, and relate them to the operations performed by the neurone in physiological conditions. How should this issue be tackled? As shown in this chapter many conceptual advances on the operation of the neurone made in the past have relied on mathematically simple models, not on detailed models of neurones trying to incorporate all the geometric and electric complexity of neurones revealed by experiments. There are good reasons
Neurones as physical objects." structure, dynamics and function
447
for that. Firstly detailed models depend on a huge number of parameters, specifying the morphology of the neurone, the density of ion channels, and their kinetic properties, that should be correctly set. This difficult process was well described by L. Borg-Graham for hippocampal pyramidal cells [339]. Most parameters are not readily available from experiments; not infrequently even the order of magnitude of some quantities are unknown. One understands why so many experimentalists still think that a detailed model may display any desired behaviour provided that parameters are carefully set for that. Secondly detailed models are not analytically tractable, and it is very hard to identify on the basis of a multiparameter numerical study which parameters are the most relevant and how they affect the behaviour of the model. Thirdly detailed models implicitly rest on the notion of a paradigmatic neurone, that would embody the main features of a given cell type. Frequently parameters are finely tuned to replicate the behaviour of one cell as revealed by experiments, and it becomes difficult with such a starting point to analyse the qualitative and quantitative variations observed in the behaviour of neurones throughout a population. On the contrary analytical models depend on few parameters. They enable us to pinpoint what are the key parameters of a problem, to analyse how they control the qualitative behaviour of the system studied, and to understand the impact of heterogeneities in populations. One is freed from the problem of inferring operating principles of some generality from particular cases. Still many experimentalists feel much more at ease with detailed models, for the understandable reason that they seem more realistic. Detailed models look like real neurones on figures, and this familiarity is comforting. The fine grain of the description satisfies the feeling that everything in neurones was exquisitely designed by evolution and tuned by plastic processes to optimise its performances. But a sensitive dependence of the model on the exact value of parameters, which just reflects irrelevant individual variations inside a population of neurones, is then too often mistaken for fine information processing capabilities of the neurone. Or the capability of the model to account for some striking figure is considered as a proof of its explanatory and predictive power, whereas it is a trivial consequence of the fine tuning of parameters. The operating principles of neurones, and in particular the nature of synaptic integration, are still puzzling us. Many issues have not been deeply addressed yet: non-linear cable theory, functional compartmentalization of dendrites, interactions between the dendrites and the axo-somatic region, etc. They cannot be studied in the framework of lumped models of the neurone, and confront us to transient behaviours, beyond the scope of most analytic methods. Still we believe that the most interesting theoretical results on these questions will be derived from relatively simple models, amenable to semi-analytic studies or numerical simulations with a small number of control parameters. If the optimistic prediction that such models will give us good insights on the dynamical states of neurones, and on their relationship to the context within which neurones are operating, is fulfilled it will still leave wide open the issue of how dynamical states of neurones are used by the nervous systems to perform physiological functions. This question is beyond the scope of Neurophysics, which mainly
448
c. Meunier and L Segev
aims at understanding what dynamic phenomena may arise in neuronal structures. Bridging the gap between dynamics and function is a priori the ambition of the socalled Computational Neuroscience. This other theoretical approach tackles the structure/function problem in another, top-down rather bottom-up, way. It considers neuronal systems not as physical systems but as information processing devices, and accordingly it relies on other concepts (logical operations, algorithms, information, etc.) than Neurophysics does. Unsurprisingly it developed starting from the 1940s together with cybernetics, computer science and Artificial Intelligence. In his book Vision [340] David Marr distinguishes three levels at which neuronal systems could and should be studied: the computations performed by the system studied, the algorithms underlying those computations, and their concrete biophysical implementations. Still, at variance with the systemic level, very few clear examples of computations can be given at the cellular level, beyond the mere transduction of physical signals by sensory neurones and some cases where neurones clearly work as feature or coincidence detectors. In the framework of visual perception nice ideas have been proposed on the possible role of single neurones in directional selectivity [269] but they remain poorly substantiated by experimental evidence [341]. In fact it seems difficult to assign well-defined computational capabilities to most neurones, and it is not granted that the very notion of function does not necessarily make much sense at the single cell level. As a consequence many studies at the cellular level that explicitly refer to Computational Neuroscience have focused on the transfer function of neurones, determining general "logical" or "computational primitives" that might be widely used by neurones [323], or characterizing quantitatively the input-output properties of neurones, for instance, with particular information theoretic measures [342]. Quite frequently also they have investigated the dynamical behaviour of neurones on the basis of the biophysical substrate constituted by passive and active properties, with little emphasis on function and computations. Then the essence of the computational approach is largely forgotten and distinguishing it from Neurophysics becomes a byzantine debate. Acknowledgements
We would like to thank Drs. Lyle Borg-Graham, David Hansel, L6na Jami and Daniel Zytnicki for careful reading of the manuscript and many useful comments. We are also indebted to Mr. Michael London and to Prof. Yosef Yarom for many insightful discussions.
Appendix: The Hodgkin-Huxley model A.1. Derivation o f the Hodgkin-Huxley equations
The evolution of the membrane potential, V, in the space clamp situation studied by Hodgkin and Huxley is governed by the balance of the different currents flowing through the membrane, that is, the capacitive and leak currents that describe the
Neurones as physical objects." structure, dynamics and function
449
passive voltage-independent membrane properties and the two voltage-dependent currents. Namely, writing down that the different currents flowing through the unit surface of membrane should compensate to fulfil the requirement of charge conservation: dV Cm - ~ +/leak + INa + IK =/electrodeHodgkin and Huxley showed that this equation held by comparing the results of voltage-clamp experiments, where V is kept constant and accordingly the capacitive current Cm(dV/dt) required to charge the membrane is absent, and the results from current-clamp experiment, where/electrode is held constant [3]. The expression of the leak current is simple as it just accounts for ohmic ion flow through the membrane /leak = G l e a k ( V - ~eak),
where Gleak is the leak specific conductance (per unit area) and, by definition, Heak is the reversal potential of the current. This leak current of poorly known origin is responsible for the relaxation (with time constant z = C/G leak) of small voltage perturbations with respect to the resting membrane potential Vrest. Hodgkin and Huxley could study the potassium current separately from the sodium current by substituting choline to sodium ions in the extracellular medium (specific blockers of the sodium channels like TTX were not available at the time). This enabled Hodgkin and Huxley to unravel the kinetics of the two voltagedependent currents involved through a series of voltage-clamp experiments where the membrane potential is increased from Vrest to some other holding value V. The persistent potassium current is not ohmic: its specific conductance is but a fraction of the maximal conductance, GK and this fraction increases with membrane depolarization in steady-state situations. Moreover when the axone is submitted to a voltage step it takes some time for the potassium current to relax to its new steady-state value. To account for the properties of this delayed rectifier current Hodgkin and Huxley introduced a phenomenological activation variable, n, that followed a first-order kinetics d.
-..
As the relaxation of the conductance to its steady-state value was not exponential in time, they had to introduce a non-linear dependence of the conductance on the activation variable n
IK = GKn4 ( V - VK) to fit the experimental data and determine the maximal conductance, GK, the steadystate activation, n~(V), and the activation time constant, ~n(V). They analysed in a similar way the inward sodium current. However the transient nature of this current led them to introduce an inactivation variable, h, in addition to the activation variable, m
450
C. Meunier and L Segev
INa = G N a m 3 h ( V ) - VNa,
"era(V) -dm ~=
m~c(V) - m,
"oh(V) dh
If the voltage is increased from Vrestto some higher value V0, the sodium current first increases, due to the faster activation process, and then decreases due to slower inactivation. To determine both activation and inactivation kinetics, Hodgkin and Huxley performed voltage-clamp experiments with two successive holding potentials. For instance, clamping first the axone for a sufficient time at a hyperpolarized voltage V0 before imposing a depolarized voltage V1 ensured that full de-inactivation occurred before the second step, and enabled Hodgkin and Huxley to study the fast activation kinetics at voltage V0 in isolation before the slower inactivation interfered with it. The later decay of the sodium current then gave them access to inactivation kinetics at voltage V0. After all conductances, reversal potentials and kinetic parameters were determined, Hodgkin and Huxley integrated numerically the system of four non-linear equation they had established, demonstrated that it exhibited solutions similar to an action potential, and performed successful quantitative comparisons of these solutions with the voltage data obtained in current-clamp experiments. Many variants of the Hodgkin-Huxley models were later proposed with more complex kinetic schemes to reproduce still more accurately experimental data. These improved models of spike regeneration on the squid axone all share the same basic features, already present in the original Hodgkin-Huxley equations, namely a fast activating transient sodium current responsible for the upstroke of the action potential, and a slower persistent potassium current, that repolarizes the membrane with a delay. A.2. Gating theory
Hodgkin and Huxley's non-linear membrane summarized as follows current. The first-order rewritten as:
gating theory proposes a microscopic explanation for the properties underlying the action potential. It may be (see also [343]). Let us consider first the delayed rectifier kinetic equation (6) describing its activation process can be
dvl
d t = a,(V)(1 - n) - b,(V)n, where the so-called rate functions a.(V) and bn(V) are defined by =
1 + b.(V)
:
a.(V)
a,(V)+b,(V)"
(1)
Neurones as physical objects: structure, dynamics and function
451
Eq. (1) can be interpreted as describing a two-state channel randomly oscillating between a closed state and an open state. The activation variable n is then the probability of finding at a given time the channel in the open state. The rate functions an and bn are readily interpreted as the mean transition frequencies from closed to open state and vice versa. Hodgkin and Huxley suggested that the proteins constituting the ionic channel were prone to spontaneous conformational changes due to the motion of some charged region of these molecules. This part of the protein would act as a "gating particle" that would open or close the channel. Moreover since the gating particle is charged the transition frequencies will depend on the electric field in the channel, that is, on the voltage drop across the membrane. The transitions of the gating particle between its two stable states correspond to crossings of a free energy barrier under the effect of thermal fluctuations. This can be described by a Boltzmann statistics and if one assumes that the electric field is uniform all along the channel, this readily leads to an exponential dependence of the rate functions on the membrane potential (see [343] for a fuller account) and to a sigmoid shape of the steady-state activation and inactivation function, in agreement with experimental data. Hodgkin and Huxley postulated that the potassium channel was controlled by four independent such gating particles. The probability of finding the channel in the open state is then n 4 and the average conductance of the (isopotential) unit surface of membrane is GKn 4 where the maximal specific conductance GK is the unitary channel conductance times the density of channels. This readily explained why n appeared with an integer power in the Hodgkin-Huxley equations. This gating theory may be extended to inactivating currents, such as the transient sodium current. One then needs to postulate the existence of two independent types of gating particles. Three independent identical particles of the first type display fast transitions and describe the activation process. A single gating particle of the second type controls the slower inactivation process. In this gating particle formalism a non-inactivating channel may present only two elementary states, open and close. This occurs when a single particle controls the channel, and the probability of finding the channel in the open state is merely equal to the value of the activation variable. On the opposite, an inactivating channel cannot present less than four elementary states, two of them corresponding to inactivated states. Opening the channel requires not only that it is activated but also that it has recovered from inactivation. This slower de-inactivation is the rate limiting factor in the repetitive discharge of the squid giant axone. The great lines of Hodgkin and Huxley's gating theory were substantiated over the years by single channel recordings which give access to many channel kinetic parameters (unitary currents, latency, mean open time, transitions between states, etc.), recordings of gating currents, and the elucidation of the molecular structure of ionic channels starting from the mid-1960s. These new techniques also pointed out some discrepancies between Hodgkin-Huxley gating theory and the actual behaviour of ionic channels. In particular they showed that deactivation of sodium channels is slower than predicted by Hodgkin-Huxley model and that the actual
452
c. Meunier and L Segev
gating properties o f these channels c a n n o t be accurately described by four ind e p e n d e n t gating particles [344,345]. H o w e v e r the impact o f these studies at the molecular level on our u n d e r s t a n d i n g of neural excitability m u s t not be overrated. M o s t sodium, p o t a s s i u m and calcium currents at the origin of the non-linear properties o f the n e u r o n a l m e m b r a n e were identified on the basis o f intra-cellular recordings (current and voltage clamps) in c o m b i n a t i o n with the use o f specific channel blockers and a p p r o p r i a t e modifications o f the ionic concentrations. The transient A-type p o t a s s i u m current was identified in this way on crustacean axones by C o n n o r et al. [64] The non-inactivating (persistent) s o d i u m current, the slow c a l c i u m - d e p e n d e n t p o t a s s i u m currents, and low and high threshold calcium currents were similarly b r o u g h t to light by recordings in the s o m a with sharp electrodes. W h o l e cell clamp electrodes enabled experimentalists m o r e recently to estimate accurately m e m b r a n e c o n d u c t a n c e s [28] and patch clamp experiments n o w give us some access to the local properties o f the dendritic m e m b r a n e (at least in the apical t r u n k o f p y r a m i d a l cells). In c o m p a r i s o n we have not yet reached a full understanding o f the molecular m e c h a n i s m s underlying the o p e r a t i o n of the s o d i u m and delayed rectifier channels. If studies at the molecular level are in their own right fascinating they still tell us little on w h a t h a p p e n s at the m o r e integrated level o f the neuronal membrane.
References 1. Ramon y Cajal, S., Pasik, P. (translator) and Pasik, T. (translator) (1998) Texture of the Nervous System of Man and the Vertebrates: An Annotated and Edited Translation of the Original Spanish Text (1899-1904). Springer, Vienna, New York. 2. Shepherd, G.M. (1991) Foundations of the Neuron Doctrine. Oxford University Press, New York, Oxford. 3. Hodgkin, A.L. and Huxley, A.F. (1952) A Quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. (London) 117, 500-544. 4. Tasaki, I. (1959) Conduction of the nerve impulse, in: Handbook of Physiology, Vol. 1: Neurophysiology, ed J. Field. pp. 75-121. American Physiological Society, Bethesda. 5. Schwindt, P.C. and Crill, W.E. (1982) Factors influencing motoneuron rhythmic firing: results from a voltage clamp studies. J. Neurophysiol. 48(4), 875-890. 6. Schwindt, P.C. and Crill, W.E. (1984) Membrane properties of cat spinal motoneurones, in: Handbook of the Spinal cord, Chapter 6, ed R. Davidoff. pp. 199-242. Marcel Dekker, New York, Basel. 7. Schwindt, P.C. (1992) Ionic currents governing input-output relations of Betz cells, in: Single Neuron Computation, eds T. McKenna, J. Davis and S.F. Zornetzer. pp. 235-258. Academic Press, New York. 8. Crill, W.E. and Schwindt, P.C. (1983) Active currents in mammalian central neurones. Trends Neurosci. 6, 236-240. 9. Llin~s, R. and Sugimori, M. (1980) Electrophysiology properties of in vitro Purkinje cell dendrites in mammalian cerebellar slices. J. Physiol. (London) 305, 197-213. 10. Coombs, J.S., Curtis, D.R. and Eccles, J.C. (1957) The interpretation of spike potentials of motoneurones. J. Physiol. (London) 139, 198-231.
Neurones as physical objects: structure, dynamics and function
453
11. Henneman, E., Luescher, H.-R. and Mathis, J. (1984) Simultaneously active and inactive synapses of single Ia fibres on cat spinal motoneurones. J. Physiol. (London) 352, 147-161. 12. Sherrington, C.S. (1961) The Integrative Action of the Nervous System. 1961 Edn., Yale University Press, New Haven. 13. Redman, S.J. (1976) A quantitative approach to integrative function of dendrites, in: International Review in Physiology: Neurophysiology II, Vol. 10, ed R. Porter. University Park Press, Baltimore, MD. 14. Stuart, G.M., Dodt, H.U. and Sakmann, B. (1993) Patch-clamp recordings from the soma and dendrites of neurons in brain slices using infrared video microscopy. Pfluegers Arch. 423, 511518. 15. Svoboda, K., Denk, W., Kleinfeld, D. and Tank, D.W. (1997) In vivo dendritic calcium dynamics in neocortical pyramidal neurons. Nature 385(6612), 161-165. 16. Helmchen, F., Svoboda, K., Denk, W. and Tank, D.W. (1999) In vivo dendritic calcium dynamics in deep-layer cortical pyramidal neurons. Nat. Neurosci. 2(11), 989-996. 17. Tsien, R.Y. (1989) Fluorescent probes of cell signaling. Ann. Rev. Neurosci. 12, 227-253. 18. Denk, W., Strickler, J.H. and Webb, W.W. (1990) Two-photon laser scanning fluorescence microscopy. Science 248(4951), 73-76. 19. Denk, W., Delaney, K.R., Gelperin, A., Kleinfeld, D., Strowbridge, B.W., Tank, D.W. and Yuste, R. (1994) Anatomical and functional imaging of neurons using 2-photon laser scanning microscopy. J. Neurosci. Methods 54(2), 151-162. 20. Mainen, Z.F., Maletic-Savatic, M., Shi, S.H., Hayashi, Y., Malinow, R. and Svoboda, K. (1999) Two-photon imaging in living brain slices. J. Neurosci. Methods 18(2), 231-239. 21. Sharp, A.A., O'Neil, M.B., Abbott, L.F. and Marder, E. (1993) Dynamic clamp: computer-generated conductances in real neurons. J. Neurophysiol. 69(3), 992-995. 22. Deuchars, J., West, D.C. and Thomson, A.M. (1994) Relationships between morphology and physiology of pyramid-pyramid single axon connections in rat neocortex in vitro. J. Physiol. (London) 478(3), 423-435. 23. Markram, H., Lubke, J., Frotscher, M., Roth, A. and Sakmann, B. (1997) Physiology and anatomy of synaptic connections between thick tufted pyramidal neurones in the developing rat neocortex. J. Physiol. (London) 500(2), 409-440. 24. Mainen, Z. and Sejnowski, T. (1998) Modeling active dendritic processes in pyramidal neurons, in: Methods in Neuronal Modeling: From Ions to Networks, 2nd Edn., Chapter 5, eds C. Koch and I. Segev. pp. 171-210. MIT Press, Cambridge, MA. 25. Magee, J.C. (1999) Voltage gated ion channels in dendrites, in: Dendrites, eds G. Stuart, N. Spruston and M. Hausser, pp. 139-160. Oxford University Press, Oxford. 26. Stuart, G.M. and Sakmann, B. (1994) Active propagation of somatic action potential into neocortical pyramidal cell dendrites. Nature 367, 69-72. 27. Durand, D. (1984) The somatic shunt cable model for neurones. Biophys. J. 46, 645-653. 28. Spruston, N. and Johnston, D. (1992) Perforated patch-clamp analysis of the passive membrane properties of three classes of hippocampal neurons. J. Neurophysiol. 67(3), 508-529. 29. Colbert, C.M. and Johnston, D. (1996) Axonal action-potential initiation and Na + channel densities in the soma and axon initial segment of subicular pyramidal neurons. J. Neurosci. 16(21), 6676-6686. 30. Mainen, Z.F. and Sejnowski, T.J. (1995) Reliability of spike timing in neocortical neurones. Science 268, 1503-1508. 31. Turrigiano, G.G., Abbott, L.F. and Marder, E. (1994) Activity-dependent changes in the intrinsic properties of cultured neurones. Science 264, 974-977. 32. Colbert, C.M. and Johnston, D. (1998) Protein kinase C activation decreases activity-dependent attenuation of dendritic Na + current in hippocampal CA1 pyramidal neurons. J. Neurophysiol. 79(1), 491--495. 33. Wagner, S., Castel, M., Gainer, H. and Yarom, Y. (1997) GABA in the mammalian suprachiasmatic nucleus and its role in diurnal rhythmicity. Nature 387(6633), 598-603.
454
C. Meunier and L Segev
34. Borg-Graham, L.J., Monier, C. and Fr6gnac, Y. (1996) Voltage-clamp measurement of visuallyevoked conductances with whole-cell patch recordings in primary visual cortex. J. Physiol. (Paris) 90, 185-188. 35. Destexhe, A. and Park, D. (1999) Impact of network activity on the integrative properties of neocortical pyramidal neurones. J. Neurophysiol. 81(4), 1531-1547. 36. Hausser, M. and Clark, B. (1997) Tonic synaptic inhibition modulates neuronal output pattern and spatiotemporal synaptic integration. Neuron 19(3), 665-78. 37. Shepherd, G.M. (1992) Canonical neurons and their computational organization, in: Single Neuron Computation, Chapter 2, eds T. McKenna, J. Davis and S.F. Zornetzer. pp. 27-60. Academic Press, San Diego. 38. Gustafsson, B. and Pinter, M.J. (1984) Relations among passive electrical properties of lumbar ~-motoneurones of the cat. J. Physiol. (London) 356, 401-434. 39. Schwindt, P.C., O'Brien, J. and Crill, W.E. (1997) Quantitative analysis of firing properties of pyramidal neurons from layer 5 of rat sensorimotor cortex. J. Neurophysiol. 77(5), 2484-2498. 40. Lapicque, L. (1907) Recherches quantitatives sur l'excitation ~lectrique des nerfs trait6e comme une polarization. J. Physiol. (Paris) 9, 620-635. 41. Rinzel, J. (1986) On different mechanisms for membrane potential bursting, in: Non-linear Oscillations in Biology and Chemistry, ed H.G. Othmer. pp. 19-83. Lecture Notes in Biomathematics, Vol. 66, Springer, Berlin, Heidelberg, New York. 42. Rinzel, J. (1975) Spatial stability of traveling wave solutions of a nerve conduction equation. Biophys. J. 15(10), 975-988. 43. Strassberg, A.F. and DeFelice, L.J. (1993) Limitations of the Hodgkin-Huxley formalism: Effects of single channel kinetics on transmembrane voltage dynamics. Neural Comput. 5, 843-856. 44. Segev, I., Rinzel, J. and Shepherd, G.M. (1995) The theoretical foundation of dendritic function: Selected Papers of Wilfrid Rall with Commentaries. MIT Press, Cambridge, MA, London. 45. Galvani, L. (1791) De viribus electricitatis in motu musculari commentarius. Ex Typographia Instituti Scientarium, Bologna. 46. Von Helmholtz, H. (1850) Vorlauefiger Bericht ueber die Fortpflanzungsgeschwindigkeit der Nervereizung. Arch. Anat. Physiol. (Anat. Abt., Supplement-Bd.), pp. 71-73. Translation in Dennis, W. (1948) Readings in Psychology. pp. 197-198. Appleton-Century-Corfts, New York. 47. Bernstein, J. (1902) Untersuchungen zur Thermodynamik der bioelektrischen Stroeme. Pfluegers Arch. 92, 521-562. 48. Descartes, R. (1972) Treatise of Man (De homine, 1662), French text with translation and commentary by T. Steele Hall. Harvard University Press, Cambridge, MA. 49. Offrieu de La Mettrie, J. (1994), Man a Machine (L'Homme-machine, 1748), translated by R.A. Watson and M. Rybalka. Hackett Pub. Co., Indianapolis. 50. Hopfield, J.J. (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554. 51. Debanne, D., Gu6rineau, N.C., Gaehwiller, B.H. and Thompson, S.M. (1997) Action-potential propagation gated by an axonal I(A)-like K + conductance in hippocampus. Nature 389(6648), 286-289. 52. Lamotte d'Incamps, B., Meunier, C., Zytnicki, D. and Jami, L. (1999) Flexible processing of sensory information induced by axo-axonic synapses on afferent fibers. J. Physiol. (Paris) 93(4), 369-377. 53. Traub, R.D., Wong, R.K., Miles, R. and Michelson, H. (1991) A model of a CA3 hippocampal pyramidal neuron incorporating voltage-clamp data on intrinsic conductances. J. Neurophysiol. 66(2), 635-650. 54. Mascagni, M.V. and Sherman, A.S. (1998) Numerical methods for neuronal modeling, in: Methods in Neuronal Modeling: From Ions to Networks, Chapter 14, eds C. Koch and I. Segev. pp. 569-606. MIT Press, Cambridge, MA. 55. Bower, J.M. and Beeman, D. (eds) (1995) The Book of GENESIS: Exploring Realistic Neural Models with the General Neural Simulation System. Telos/Springer Verlag, Santa Clara, CA.
Neurones as physical objects: structure, dynamics and function
455
56. Hines, M.L. and Carnevale, N.T. (1997) The neuron Simulation Environment. Neural Comput. 9, 1179-1209. 57. Ekeberg, 0., Wall6n, P., Lansner, A., Tr~tv6n, H., Brodin, L. and Grillner, S. (199 l) A computerbased model for realistic simulations of neural networks - I. The single neuron and synaptic interaction. Biol. Cybern. 65, 81-90. 58. Hodgkin, A.L. and Huxley, A.F. (1939) Action potentials recorded from inside a nerve fiber. Nature 144, 710-711. 59. Schwindt, P.C. and Crill, W.E. (1999) Mechanisms underlying burst and regular spiking evoked by dendritic depolarization in layer 5 cortical pyramidal neurons. J. Physiol. (London) 81, 13411354. 60. Tuckwell, H.C. (1988) Introduction to Theoretical Neurobiology. Cambridge University Press, New York. 61. Hansel, D., Mato, G., Meunier, C. and Neltner, L. (1998) On numerical simulations of integrateand-fire neural networks. Neural Comput. 10, 467-483. 62. Hodgkin, A.L. (1976) Chance and design in electrophysiology: an informal account of certain experiments on nerve carried out between 1934 and 1952. J. Physiol. (London) 263(1), 21. 63. Connor, J.A. and Stevens, C.F. (1971) Inward and delayed outward membrane currents in isolated neural somata under voltage clamp. J. Physiol. 213, 81-19. 64. Connor, J.A., Walter, D. and McKown, R. (1977) Neural repetitive firing. Modifications of the Hodgkin-Huxley axon suggested by experimental results from crustacean axons. Biophys. J. 18, 81-102. 65. Guckenheimer, J. and Holmes, P. (1997) Non-linear oscillations, dynamical systems, and bifurcations of vector fields. 5th Edn. Applied Mathematical Sciences series, Vol. 42. Springer, New York. 66. Arvanitaki, A. (1939) Les variations gradu6es de la polarization des syst6mes excitables. Hermann, Paris. 67. Hopf, E. (1942) Abzweigung einer periodischen Loesung von einer stationaeren Loesung eines differential-System. Ber. Math.-Phys. K1. Saechs Acad. Wiss. Leipzig 94, 1-22. 68. Marsden, J.E. and McCracken, M. (1976) The Hopf Bifurcation and its Application. Springer, New York, Heidelberg and Berlin. 69. Hassard, B. (1978) Bifurcation of periodic solutions of the Hodgkin-Huxley model for the squid giant axon. J. Theor. Biol. 71(3), 401-420. 70. Troy, W.C. (1978) The bifurcation of periodic solutions in the Hodgkin-Huxley equations. Quat. Appl. Math. (April issue) 73-83. 71. Rush, M.E. and Rinzel, J. (1995) The potassium A-current, low firing rates, and rebound excitation in Hodgkin-Huxley models. Bull. Math. Biol. 57(6), 899-929. 72. Ermentrout, G.B. (1998) Linearization of F - I curves by adaptation. Neural Comput. 10(7), 1721-1729. 73. Fitzhugh, R. (1961) Impulses and physiological states in theoretical models of nerve membrane. Biophys. J. 1, 445-466. 74. Nagumo, J.S., Arimoto, S. and Yoshizawa, S. (1962) Proc. IRE 50, 2061-2074. 75. Hindmarsh, J.L. and Rose, R.M. (1982) A model of the nerve impulse using two first-order differential equations. Nature 296, 162-164. 76. Kepler, T.B., Marder, E. and Abbott, L.F. (1990) The effect of electrical coupling on the frequency of model neuronal oscillators. Science 248(4951), 83-85. 77. Meunier, C. (1992) The electric coupling of two simple oscillators: load and acceleration effects. Biol. Cybern. 67, 155. 78. White, J.A., Chow, C.C., Soto-Trevin6, C. and Kopell, N. (1998) Synchronization and oscillatory dynamics in heterogeneous, mutually inhibited neurons. J. Comp. Neuro. 5, 5-16. 79. Fitzhugh, R. (1955) Mathematical models of threshold phenomena in the nerve membrane. Bull. Math. Biophysics 17, 257. 80. van der Pol, B. (1926) On relaxation oscillations. Phil. Mag. 2, 978.
456
C. Meunier and L Segev
81. Hirsch, M.W. and Smale, S. (1974) Differential Equations and Linear Algebra. Springer, New York, Heidelberg, Berlin. 82. Troy, W.C. (1976) Bifurcation phenomena in FitzHugh's nerve conduction equation. J. Math. Anal. Appl. 54, 678-690. 83. Kokoz, Y.M. and Krinskii, V.I. (1973) Analysis of equations of excitable membranes-II. Method of analysing the electrophysiology characteristics of the Hodgkin-Huxley membrane from the graphs of the zero-isoclines of a second-order system. Biofizika 5, 878-885. 84. Kepler, T.B., Abbott, L.F. and Marder, E. (1992) Reduction of conductance-based neuron models. Biol. Cybern. 66, 381-387. 85. Meunier, C. (1992) Two- and three-dimensional reductions of the Hodgkin-Huxley equations: separation of time scales and bifurcation scheme. Biol. Cybern. 67, 461. 86. Abbott, L.F. and Kepler, T.B. (1990) Model neurons: from Hodgkin-Huxley to Hopfield. in: Statistical Mechanics of Neural Networks, ed L. Garrido. pp. 5-18. Springer, Berlin. 87. Hindmarsh, J.L. and Rose, R.M. (1984) A model of neuronal bursting using three coupled first order differential equations. Proc. R. Soc. Lond. B 221, 87-102. 88. Guckenheimer, J. (1996) Towards a global theory of singularly perturbed dynamical systems, in: Progress in Non-linear Differential Equations and Their Applications, Vol. 19. Birkhaueser Verlag, Basel. 89. Plant, R.E. and Kim, M. (1976) Mathematical description of a bursting pacemaker neuron by a modification of the Hodgkin-Huxley equations. Biophys. J. 16, 227-244. 90. Rose, R.M. and Hindmarsh, J.L. (1985) A model of a thalamic neuron. Proc. R. Soc. Lond. B 225, 161-193. 91. Rose, R.M. and Hindmarsh, J.L. (1989) The assembly of ionic currents in a thalamic n e u r o n - I. The three-dimensional model, Proc. R. Soc. Lond. B 237, 267-288. 92. Rose, R.M. and Hindmarsh, J.L. (1989) The assembly of ionic currents in a thalamic n e u r o n - II. The stability and state diagrams. Proc. R. Soc. Lond. B 237, 289-312. 93. Rinzel, J. and Lee, Y.S. (1987a) Dissection of a model for neuronal parabolic bursting. J. Math. Biol. 25, 653-675. 94. Rinzel, J. (1987b) A formal classification of bursting mechanisms in excitable systems, in: Proceedings of the International Congress of Mathematicians, ed A.M. Gleason. pp. 578-594. Providence, RI, USA. 95. Wang, X-J. and Rinzel, J. (1995) Oscillatory and bursting properties of neurons, in: The Handbook of Brain Theory and Neuronal Networks, ed M.A. Arbib. pp. 689-691. MIT Press, Cambridge, MA. 96. Rinzel, J. and Ermentrout, G.B. (1998) Analysis of neural excitability and oscillations, in: Methods in Neuronal Modeling: From Ions to Networks, 2nd Edn., eds C. Koch and I. Segev. MIT Press, cambridge, MA, London. 97. Lochak, P. and Meunier, C. (1988) Multiphase averaging for classical systems with applications to the adiabatic theorems. Applied Mathematical Sciences Series, Vol. 72. Springer, New York. 98. Manwani, A. and Koch, C. (1999) Detecting and estimating signals in noisy cable structures: I. Neuronal noise sources. Neural Comput. 11(8), 1797-1829. 99. Hansel, D. and Sompolinsky, H. (1996) Chaos and synchrony in a model of a hypercolumn in visual cortex. J. Comp. Neuro. 3, 7-34. 100. van Vreeswijk, C. and Sompolinsky, H. (1998) Chaotic balanced state in a model of cortical circuits. Neural Comput. 10(6), 1321-1372. 101. Horikawa, Y. (1993) Simulation study on effects of channel noise on differential conduction at an axon branch. Biophys. J. 65, 680-686. 102. Chow, C.C. and White, J.A. (1996) Spontaneous action potentials due to channel fluctuations. Biophys. J. 71, 3013-3021. 103. Renversez, G. and Parodi, O. (1996) Potential distribution on a neuronal somatic membrane during an action potential. Europhys. Lett. 36(4), 313-318. 104. Rubinstein, J.T.(1995) Threshold fluctuations in an N sodium channel model of the node of Ranvier. Biophys. J. 68, 779-785.
Neurones as physical objects." structure, dynamics and function
457
105. Schneidman, E., Freedman, B. and Segev, I. (1998) Ion channel stochasticity may be critical in determining the reliability and precision of spike timing. Neural Comput. 10(7), 1679-1703. 106. Meunier, C. and Verga, A. (1988) Noise and bifurcations. J. Stat. Phys. 50(1/2), 345-375. 107. Luciani, J.-F. and Verga, A. (1987) Functional integral approach to bistability in the presence of correlated noise. Europhys. Lett. 4, 255-261. 108. Nelson, M.E. (1994) A mechanism for neuronal gain control by descending pathways. Neural Comput. 6, 242-254. 109. Carandini, M. Heeger, D.J. and Movshon, J.A. (1997) Linearity and gain control in V1 simple cells, in: Cortical Models, eds E.G. Jones and P.S. Ulinski, Cerebral Cortex, Vol. 12. Plenum Press, New York. 110. Borg-Graham, L.J., Monier, C. and Fr~gnac, Y. (1998) Visual input evokes transient and strong shunting inhibition in visual cortical neurones. Nature 393, 369-373. 111. Granit, R., Kernell, D. and Lamarre, Y. (1966) Algebraic summation in synaptic activation of motoneurones firing within the 'primary range' to injected currents. J. Physiol. (London) 187, 379-399. 112. Schwindt, P.C. and Calvin, W.H. (1973) Equivalence of synaptic and injected current in determining the membrane potential trajectory during motoneuron rhythmic firing. Brain Res. 59, 389-394. 113. Powers, R.K.D.B. and Binder, M.D. (1995) Effective synaptic current and motoneuron firing rate modulation. J. Neurophysiol. 74(2), 793-801. 114. Capaday, C. and Stein, R.B. (1987) A method for simulating the reflex output of a motoneuron pool. J Neurosci. Methods 21(2-4), 91-104. 115. Kernell, D. and Hultborn, H. (1990) Synaptic effects on recruitment gain: a mechanism of importance for the input-output relations of motoneurone pools? Brain Res. 507, 176-179. 116. Holt, G.R. and Koch, C. (1997) Shunting inhibition does not have a divisive effect on firing rates. Neural Comput. 9(5), 1001-1013. 117. Baldissera, F. and Gustafsson, B. (1974) Firing behaviour of a neurone model based on the afterhyperpolarization conductance time course and algebraical summation. Adaptation and steady-state firing. Acta Physiol. Scan. 92, 27-47. 118. Powers, R. (1993) A variable-threshold motoneurone model that incorporates time- and voltagedependent potassium and calcium conductances. J. Neurophysiol. 70(1), 246-262. 119. McCormick, D.A., Connors, B.W., Lighthall, J.W. and Prince, D.A. (1985) Comparative electrophysiology of pyramidal and sparsely spiny stellate neurons of the neocortex. J. Neurophysiol. 54(4), 782-806. 120. McCormick, D.A., Huguenard, J.R. and Strowbridge, B. (1992) Determination of state dependent processing in thalamus by single neuron properties and neuromodulators, in: Single Neuron Computation, Chapter 10, eds T. McKenna, J. Davis and S.F. Zornetzer. pp. 259-290. Academic Press, New York. 121. Hounsgaard, J. and Kiehn, O. (1989) Serotonin-induced bistability of turtle motoneurones caused by a nifedipine-sensitive calcium plateau potential. J. Physiol. 414, 265-282. 122. Eken, T., Hultborn, H. and Kiehn, O. (1989) Possible functions of transmitter-controlled plateau potentials in alpha motoneurones. Prog. Brain Res. 80, 257-267. 123. Baldissera, F., Cavallari, P. and Dworzak, F. (1991) Cramps: a sign of motoneurone 'bistability' in a human patient. Neurosci. Lett. 133, 303-306. 124. Abbott, L.F. and Le Masson, G. (1993) Analysis of neuron models with dynamically regulated conductances. Neural Comput. 5, 823-842. 125. Desai, N.S., Rutherford, L.C. and Turrigiano, G.G. (1999) Plasticity in the intrinsic excitability of cortical pyramidal neurones. Nature Neurosci. 2(6), 515-520. 126. Araki, T. and Terzuolo, C.A. (1962) Membrane currents in spinal motoneurones associated with the action potential and synaptic activity. J. Neurophysiol. 25, 772-789. 127. Barrett, J.N. and Crill, W.E. (1980) Voltage clamp of cat motoneurone somata: properties of the fast inward current. J. Physiol. (London) 304, 231-249.
458
C. Meunier and L Segev
128. Barrett, E.F., Barrett, J.N. and Crill, W.E. (1980) Voltage-sensitive outward currents in cat motoneurones. J. Physiol. (London) 304, 251-276. 129. Schwindt, P.C. and Crill, W.E. (1981) Differential effects of TEA and cations on outward ionic currents of cat motoneurones. J. Neurophysiol. 1, 1-16. 130. Nelson, P. and Frank, K. (1967) Anomalous rectification in cat spinal motoneurons and effect of polarizing currents on excitatory postsynaptic potential. J. Neurophysiol. 30, 1097-1112. 131. Tazaki, K. and Cooke, I.M. (1983) Separation of neuronal sites of driver potential and impulse generation by ligaturing in the cardiac ganglion of the lobster, Homarus americanus. J. Comp. Physiol. 151, 329-346. 132. Gogan, P., Gueritaud, J.-P. and Tyc-Dumont, S. (1983) Comparison of antidromic and orthodromic action potentials of identified motor axons in cat's brain stem. J. Physiol. (London) 335, 205-220. 133. Schwindt, P.C. and Crill, W.E. (1980) Effects of barium on cat spinal motoneurones studied by voltage clamp. J. Neurophysiol. 44(4), 827-846. 134. Chen, W.R., Midtgaard, J. and Shepherd, G.M. (1997) Forward and backward propagation of dendritic impulses and their synaptic control in rnitral cells. Science 278, 463-467. 135. Larkum, M.E., Zhu, J.J. and Sakmann, B. (1999) A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature 398(6725), 338-341. 136. Gogan, P., Gustafsson, B., Jankowska, E. and Tyc-Dumont, S. (1984) On re-excitation of feline motoneurones: its mechanism and consequences. J. Physiol. (London) 350, 81-91. 137. Melinek, R. and Muller, K.J. (1996) Action potential initiation site depends on neuronal excitation. J. Neurosci. 16(8), 2585-2591. 138. Segev, I. and Burke, R.E. (1998) Compartmental models of complex neurons (with appendix by Hines, M.), in: Methods in Neuronal Modeling: From Ions to Networks, Chapter 3, eds C. Koch and I. Segev. pp. 93-136. MIT Press, Cambridge, MA. 139. Pinsky, P.F. and Rinzel, J. (1994) Intrinsic and network rhythmogenesis in a reduced Traub model for CA3 neurons, J. Comput. Neurosci. 1(1-2), 39-60; Erratum published in J. Comput. Neurosci. 1995, 2(3), 275. 140. Traub, R.D., Miles, R. and Buzsaki, G. (1992) Computer simulation of carbachol-driven rhythmic population oscillations in the CA3 region of the in vitro rat hippocampus. J. Physiol. (London) 451, 653-672. 141. Mainen, Z.F. and Sejnowski, T.J. (1996) Influence of dendritic structure on firing pattern in model neocortical neurons. Nature 382(6589), 363-366. 142. Booth, V. and Rinzel, J. (1995) A minimal, compartmental model for a dendritic origin of bistability of motoneuron firing patterns. J. Comput. Neurosci. 2(4), 299-312. 143. Lorenz, E.N. (1963) Deterministic non-periodic flow. J. Atmos. Sci. 20, 130-141. 144. Shilnilov, L.P. (1965) A case of the existence of a denumerable set of periodic motions. Sov. Math. Dokl. 6, 163-166. 145. Pomeau, Y. and Manneville, P. (1980) Intermittent transition to turbulence in dissipative dynamical systems. Comm. Math. Phys. 74, 189-197. 146. Meunier, C. (1984) Continuity of type I intermittency from a measure-theoretical point of view. J. Stat. Phys. 36(3/4), 321. 147. Arn6odo, A., Coullet, P. and Tresser, C. (1981) Possible new strange attractors with spiral structure. Comm. Math. Phys. 79, 573-579. 148. Chay, T.R. and Rinzel, J. (1985) Bursting, beating, and chaos in an excitable membrane model. Biophys. J. 47(3), 357-366. 149. Wang, X.-J. (1994) Multiple dynamical modes of thalamic relay neurons: rhythmic bursting and intermittent phase-locking. Neuroscience 59(1), 21-31. 150. Wang, X.-J. (1993) Genesis of bursting oscillations in the Hindmarsh-Rose model and homoclinicity to a chaotic saddle. Physica D, Special Issue on Homoclinic Chaos, 62, 263-274. 151. Remoissenet, M. (1996) Waves Called Solitons: Concepts and Experiments. Springer, Berlin. 152. Fife, P.C. (1980) Mathematical aspects of reacting and diffusing systems. Lect. Notes in Biom., 28.
Neurones as physical objects." structure, dynamics and function
459
153. Rinzel, J. and Keller, J.B. (1973) Traveling wave solutions of a nerve conduction equation. Biophys. J. 13(12), 1313-1337. 154. Jack, J.J.B., Noble, D. and Tsien, R.W. (1975) Electrical Current Flow in Excitable Cells. 2nd Edn., Clarendon Press, Oxford, UK. 155. van Saarlos, W. (1989) Front propagation into unstable states: Linear versus nonlinear marginal stability and rate of convergence. Phys. Rev. A 39, 6367. 156. Ben-Jacob, E., Brand, H.R., Dee, G., Kramer, L. and Langer, J.S. (1985) Pattern propagation in non-linear dissipative systems. Physica D 14, 348. 157. Huxley, A.F. (1959) Can a nerve propagate a subthreshold disturbance? J. Physiol. (London) 148, 80--81. 158. Khodorov, B.I. and Timin, E.N. (1975) Nerve impulse propagation along nonuniform fibres. Prog. Biophys. Molec. Biol. 30(2/3), 145-184. 159. Manor, Y., Koch, C. and Segev, I. (1991) Effect of geometrical irregularities on propagation delay in axonal trees. Biophys. J. 60, 1424-1437. 160. Miller, R.N. and Rinzel, J. (1981) The dependence of impulse propagation speed on firing frequency, dispersion, for the Hodgkin-Huxley model. Biophys. J. 34(2), 227-259. 161. Scott, A.C. (1975) The electrophysics of a nerve fiber. Rev. Modern Phys. 47(2), 487-533. 162. Tasaki, I. (1982) Physiology and electrochemistry of nerve fibers, in: Physiology and Electrochemistry of Nerve Fibers, ed A. Noordergraaf. pp. 1-348. Academic Press, New York. 163. Dodge, F.A. and Frankenhauser, B. (1958) Membrane currents in isolated frog nerve fibre under voltage clamp conditions. J. Physiol. (London) 143, 76-90. 164. Chiu, S.Y., Ritchie, J.M., Rogart, R.B. and Stagg, D. (1979) A quantitative description of membrane currents in rabbit myelinated nerve. J. Physiol. (London) 292, 149-166. 165. Schwarz, J.R. and Eikhof, G. (1987) Na and action potentials in rat myelinated nerve fibres at 20 and 37C, Pfluegers Arch. 409(6), 569-577. 166. Terakawa, S. and Hsu, K. (1991) Ionic currents of the nodal membrane underlying the fastest saltatory conduction in myelinated giant nerve fibers of the shrimp Penaeus ]aponicus. J. Neurobiol. 22(4), 342-352. 167. Yasargil, G.M., Greeff, N.G., Luescher, H.-R., Akert, K. and Sandri, C. (1982) The structural correlate of saltatory conduction along the Mauthner axon in the tench (Tinca tinca L.), identification of nodal equivalent at the axon collaterals. J. Comp. Neurol. 212(4), 417-424. 168. Rushton, W.A.H. (1951) A theory of the effects of fibre size in medullated nerve. J. Physiol. (London) 115, 101-122. 169. Johnston, W.L., Dyer, J.R., Castellucci, V.F. and Dunn, R.J. (1996) Clustered voltage-gated Na + channels in Aplysia axons. J. Neurosci. 16(5), 1730-1739. 170. Rinzel, J. (1990) Mechanisms for nonuniform propagation along excitable cables. Ann. N.Y. Acad. Sci. 591, 51-61. 171. Krnjevic, K. and Miledi, R. (1958) Failure of neuromuscular propagation in rats. J. Physiol. (London) 148, 56-57P. 172. Luescher, C., Streit, J., Quadroni, R. and Luescher, H.-R. (1994) Action potential propagation through embryonic dorsal root ganglion cells in culture - I. Influence of cell morphology on propagation properties. J. Neurophysiol. 72, 634-643. 173. Parnas, I. (1972) Differential block at high frequency of branches of a single axon innervating two muscles. J. Neurophysiol. 35, 903-914. 174. Hodgkin, A.L. and Rushton, W.A.H. (1946) The electrical constants of a crustacean nerve fibre. Proc. Roy. Soc. London B 133, 444. 175. Rail, W. (1959) Branching dendritic trees and motoneuron membrane resistivity. Exp. Neurol. 1, 491-527. 176. Segev, I. (1990) Computer study of presynaptic inhibition controlling the spread of action potentials into axon terminals. J. Neurophysiol. 63(5), 987-998. 177. Kopysova, I.L. and Debanne, D. (1998) Critical role of axonal A-type K + channels and axonal geometry in the gating of action potential propagation along CA3 pyramidal cell axons: a simulation study. J. Neurosci. 18, 7436-7451.
460
C. Meunier and L Segev
178. Kepler, T.B. and Marder, E. (1993) Spike initiation and propagation on axons with slow inward currents. Biol. Cybern. 68(3), 209-214. 179. Frank, K. and Fuortes, M.G.F. (1957) Presynaptic and postsynaptic inhibition of monosynaptic reflexes. Fed. Proc. 16, 39-40. 180. Schmidt, R.F. (1971) Presynaptic inhibition in the vertebrate central nervous system. Ergeb. Physiol. 63, 20-101. 181. Schmidt, R.F. (1973) Control of the access to somatosensory pathways, in: Handbook of Sensory Physiology. Vol. II Somatosensory System, ed A. Iggo. pp. 151-206. Springer, Berlin. 182. Eccles, J.C., Schmidt, R.F. and Willis, W.D. (1963) Depolarization of central terminals of group Ib afferent fibers of muscle. J. Neurophysiol. 26, 1-27. 183. Zytnicki, D., Lafleur, J., Horcholle-Bossavit, G., Lamy, F. and Jami, L. (1990) Reduction of Ib autogenetic inhibition in motoneurons during contraction of an ankle extensor muscle in the cat. J. Neurophysiol. 64(5), 1380--1389. 184. Jankowska, E. and Riddell, J.S. (1998) Neuronal systems involved in modulating synaptic transmission from group II muscle afferents, in: Presynaptic Inhibition and Neural Control, eds P. Rudomin, R. Romo and L.M. Mendell. pp. 315-328. Oxford University Press, New York. 185. Dudel, J.S. and Kuffler, S.W. (1961) Presynaptic inhibition at the crayfish neuromuscular junction. J. Physiol. (London) 55, 543-562. 186. Cattaert, D., E1 Manira, A. and Clarac, F. (1992) Direct evidence for presynaptic inhibitory mechanisms in crayfish sensory afferents. J. Neurophysiol. 67, 610-624. 187. Clarac, F. and Cattaert, D. (1999) Functional multimodality of axonal tree in invertebrate neurons. J. Physiol. (Paris) 93(4), 319-328. 188. Eccles, J.C., Magni, F. and Willis, W.D. (1962) Depolarization of central terminals of group I afferent fibres from muscle. J. Physiol. (London) 160, 62-93. 189. Eccles, J.C., Schmidt, R.F. and Willis, W.D. (1963) Pharmacological studies on presynaptic inhibition. J. Physiol. (London) 168, 500-530. 190. Curtis, D.R. and Lodge, D.R. (1982) The depolarization of feline ventral horn group Ia spinal afferent terminations by GABA. Exp. Brain Res. 46(2), 215-233. 191. Conradi, S. (1968) Axo-axonic synapses on cat spinal motoneurons. Acta Soc. Med.,Ups. 73(5-6), 239-242. 192. Pierce, J.P. and Mendell, L.M. (1993) Quantitative ultrastructure of Ia boutons in the ventral horn: scaling and positional relationships. J. Neurosci. 13, 4748-4763. 193. Walrnsley, B., Graham, B.P. and Nicol, M.J. (1995) Serial E-M and simulation study of presynaptic inhibition along a group Ia collateral in the spinal cord. J. Neurophysiol. 74(2), 616-623. 194. Maxwell, D.J., Kerr, R., Jankowska, E. and Riddell, J.S. (1997) Synaptic connections of dorsal horn group II spinal interneurons: synapses formed with the interneurons and by their axon collaterals. J. Comp. Neurol. 380(1), 51-69. 195. Alvarez, F.J. (1998) Anatomical basis for presynaptic inhibition of primary sensory fibers, in: Presynaptic Inhibition and Neural Control, eds P. Rudomin, R. Romo and L.M. Mendell. pp. 13-49. Oxford University Press, New York, Oxford. 196. Maxwell, D.J., Christie, W.M., Short, A.D. and Brown, A.G. (1990) Direct observations of synapses between GABA-immunoreactive boutons and muscle afferent terminals in lamina VI of the cat's spinal cord. Brain Res. 530(2), 215-222. 197. Lamotte d'Incamps, B., Destombes, J., Thiesson, D., Hellio, R., Lasserre, X., Devanne-Kouchtir, N., Jami, L. and Zytnicki, D. (1998) Indications for GABA-immunoreactive axo-axonic contacts on the intraspinal arborization of a Ib fiber in cat: a confocal microscope study. J. Neurosci. 18, 10030-10036. 198. Stuart, G.J. and Redman, S.J. (1992) The role of GABAA and GABA~ receptors in presynaptic inhibition of Ia EPSPs in cat spinal motoneurones. J. Physiol. (London) 447, 675-692. 199. Alvarez-Leefmans, F.J., Gamino, S.M., Giraldez, F. and Nogueron, I. (1988) Intracellular chloride regulation in amphibian dorsal root ganglion neurones studied with ion-selective microelectrodes. J. Physiol. (London) 406, 225-246.
Neurones as physical objects: structure, dynamics and function
461
200. Lafleur, J., Zytnicki, D., Horcholle-Bossavit, G. and Jami, L. (1992) Depolarization of Ib afferent axons in the cat spinal cord during homonymous muscle contraction. J. Physiol. (London), 445, 345-354. 201. Lundberg, A. and Vyklicky, L. (1966) Inhibition of transmission to primary afferents by electrical stimulation of the brain stem. Arch. Ital. Biol. 104, 86-97. 202. Gossard, J.-P. (1996) Control of transmission in muscle group Ia afferents during fictive locomotion in the cat. J. Neurophysiol. 76(6), 4104-4112. 203. Meunier, S. and Pierrot-Desilligny, E. (1998) Cortical control of Ia afferents in humans. Exp. Brain Res. 119, 415-426. 204. Aimonetti, J.-M., Schmied, A., Vedel, J.-P. and Pagni, S. (1999) Ia presynaptic inhibition in human wrist extensor muscles: effects of motor task and cutaneous afferent activity. J. Physiol. (Paris) 93(4), 395-401. 205. Harrison, P.J. and Jankowska, E. (1984) Do interneurones in lower lumbar segments contribute to the presynaptic depolarization of group I muscle afferents in Clarke's column? Brain Res. 295(2), 203-210. 206. Hultborn, H., Meunier, S., Pierrot-Deseilligny, E. and Shindo, M. (1987), Changes in presynaptic inhibition of Ia fibres at the onset of voluntary contraction in man. J. Physiol. (London) 389, 757-772. 207. Wall, P.D. (1994), Control of impulse conduction in long range branches of afferents by increases and decreases of primary afferent depolarization in the rat. Eur. J. Neurosci. 6, 1136-1142. 208. Lomeli, J., Quevedo, J., Linares, P. and Rudomin, P. (1998) Local control of information flow in segmental and ascending collaterals of single afferents. Nature 395(6702), 600-604. 209. Zytnicki, D., Lafleur, J., Kouchtir, N. and Perrier, J.-F. (1995) Heterogeneity of contractioninduced effects in neurons of the cat dorsal spinocerebellar tract. J. Physiol. (London) 487(3), 761772. 210. Zytnicki, D. and Jami, L. (1998) Presynaptic inhibition can act as a filter of input from tendon organ during muscle contraction, in: Presynaptic inhibition and neural control, eds P. Rudomin, R. Romo and L.M. Mendell. pp. 303-314. Oxford University Press, New York, Oxford. 211. Eguibar, J.R., Quevedo, J., Jim~nez, I. and Rudomin, P. (1994) Selective cortical control of information flow through different intraspinal collaterals of the same afferent fiber. Brain Res. 643, 328-333. 212. Quevedo, J. and Eguibar, J.R. et al. (1997) Patterns of connectivity of spinal interneurons with single muscle afferents. Exp. Brain Res. 115(3), 387-402. 213. Pierrot-Deseilligny, E. and Meunier, S. (1998) Differential control of presynaptic inhibition of Ia terminals during movement in humans, in: Presynaptic inhibition and neural control, eds P. Rudomin, R. Romo, and L.M. Mendell. pp. 351-365. Oxford University Press, New York, Oxford. 214. Graham, B.P. and Redman, S.J. (1994) A simulation of action potentials in synaptic boutons during presynaptic inhibition. J. Neurophysiol. 71(2), 538-549. 215. Lamotte d'Incamps, B., Meunier, C., Monnet, M.-L., Jami, L. and Zytnicki, D. (1998) Reduction of presynaptic action potentials by PAD: Model and experimental study. J. Comp. Neuro. 5, 141-156. 216. Vinay, L. and Clarac, F. (1999), Antidromic discharges of dorsal root afferents and Inhibition of the lumbar monosynaptic reflex in the neonatal rat. Neuroscience 90(1), 165-176. 217. Bensoussan, A., Lions, J.-L. and Papanicolaou, G.C. (eds) (1978) Asymptotic analysis for periodic structures, studies in mathematics and their applications. North-Holland, Amsterdam. 218. Basser, P.J. (1993), Cable equation for a myelinated axon derived from its microstructure. Med. and Biol. Eng. Comput. 31, $87-$92. 219. Chung, S.H., Raymond, S.A. and Lettvin, J.Y. (1970) Multiple meaning in single visual units. Brain Behav. Evol. 3, 72-101. 220. Goldstein, S.S. and Rail, W. (1974) Changes of action potential shape and velocity for changing core conductor geometry. Biophys. J. 14(10), 731-757.
462
C. Meunier and L Segev
221. Ramon, F., Joyner, R.W. and Moore, J.W. (1975) Propagation of action potentials in inhomogeneous axon regions. Fed. Proc. 34, 1357-1363. 222. Baccus, S.A. (1998) Synaptic facilitation by reflected potentials: enhancement of transmission when nerve impulses reverse direction at branch points. Proc. Natl. Acad. Sci. USA 95(14), 8345-8350. 223. Horcholle-Bossavit, G., Jami, L., Petit, J. and Scott, J.J.A. (1987) Activation of motor units by paired stimuli at short intervals. J. Physiol. (London) 387, 385-399. 224. Luescher, H.-R. and Shiner, J.S. (1990) Computation of action potential propagation and presynaptic bouton activation in terminal arborizations of different geometries. Biophys. J. 58, 13771388. 225. Luescher, H.-R. and Shiner, J.S. (1990) Simulation of action potential propagation in complex terminal arborizations. Biophys. J. 58, 1389-1399. 226. Agmon-Snir, H., Carr, C.E. and Rinzel, J. (1998) The role of dendrites in auditory coincidence detection. Nature 393(6682), 268-272. 227. Lytton, W.W. (1991) Simulations of a phase comparing neuron of the electric fish Eigenmannia. J. Comp. Physiol. A 169, 117-125. 228. Saint-Mleu, B. and Moore, L.E. (2000) Firing properties and electrotonic structure of Xenopus larval spinal neurons. J. Neurophysiol. 83(3), 1366-1380. 229. Abbott, L.F. (1991) Realistic synaptic inputs for model neural networks. Network: Comp. Neural Sys. 2, 245-258. 230. Amit, D. and Tsodyks, M. (1992) Effective neurons and attractor neural networks in cortical environment. Network 3, 121-137. 231. Bressloff, P.C. (1994) A Green's function approach to analysing the effects of random synaptic background activity in a model neural network. J. Phys. A 27, 4097-4113. 232. Bressloff, P.C. (1996) New mechanism for neural pattern formation. Phys. Rev. Lett. 76(24), 46444647. 233. Rail, W. and Agmon-Snir, H. (1998, Cable theory for dendritic neurons, in: Methods in Neuronal Modeling: From Ions to Networks, Chapter 2, eds C. Koch, and I. Segev. pp. 27-92. MIT Press, Cambridge, MA. 234. Lorente de N6, R. (1938) Synaptic stimulation as a local process. J. Neurophysiol. 1 194-207. 235. Rail, W. (1953) Electrotonic theory for a spherical neurone. Proc. Univ. Otago. Med. School 31, 14-15. 236. Rail, W. (1957) Membrane time constant of motoneurons. Science 126, 454 237. Rall, W. (1960) Membrane potential transients and membrane time constant of motoneurons. Exp. Neurol. 2, 503-532. 238. Rinzel, J. and Rail, W. (1974) Transient response in a dendritic neuron model for current injected in one branch. Biophys. J. 14(10), 759-790. 239. Mainen, Z.F., Carnevale, N.T., Zador, A.M., Claiborne, B.J. and Brown, T.H. (1996) Electrotonic architecture of hippocampal CA1 pyramidal neurons based on three-dimensional reconstructions. J. Neurophysiol. 76(3) 1904-1923. 240. Rail, W. (1964) Theoretical significance of dendritic trees for neuronal input-output relations, in: Neural Theory and Modeling, ed R.F. Reiss. pp. 73-97. Stanford University Press, Stanford, CA. 241. Agmon-Snir, H. and Segev, I. (1993) Signal delay and input synchronization in passive dendritic structures. J. Neurophysiol. 70(5), 2066-2085. 242. Butz, E.G. and Cowan, J.D. (1974) Transient potentials in dendritic systems of arbitrary geometry. Biophys. J. 14(9), 661-689. 243. Horwitz, B. (1981) Unequal diameters and their effects on time-varying voltages in branched neurons. Biophys. J. 36(1), 155-192. 244. Horwitz, B. (1983) An analytical method for investigating transient potentials in neurons with branching dendritic trees. Biophys. J. 41, 51-66. 245. Kawato, M. (1984) Cable properties of a neuron model with nonuniform membrane resistivity. J. Theor. Biol. 111(1), 149-169.
Neurones as physical objects." structure, dynamics and function
463
246. London, M., Meunier, C. and Segev, I. (1999) Signal transfer in passive dendrites with nonuniform membrane conductance. J. Neurosci. 19(19), 8219-8233. 247. Schwindt, P.C. and Crill, W.E. (1984) Transformation of synaptic input into spike trains in central mammalian neurons, in: Handbook of Physiology, Chapter 12, pp. 234-284. 248. Ohme, M. and Schierwagen, A.K. (1998) An equivalent cable model for neuronal trees with active membrane. Biol. Cybern. 78, 227-243. 249. Holmes, W.R. and Rall, W. (1992) Electrotonic length estimates in neurons with dendritic tapering or somatic shunt. J. Neurophysiol. 68(4), 1421-1437. 250. Hoffman, D.A., Magee, J.C., Colbert, C.M. and Johnston, D. (1997) K + channels regulation of signal propagation in dendrites of hippocampal pyramidal neurons. Nature 387, 869-875. 251. Abbott, L.F., Farhi, E. and Gutmann, S. (1991) The path integral for dendritic trees. Biol. Cybern. 66(1), 49-60. 252. Abbott, L.F. (1992) Simple diagrammatic rules for solving dendritic cable problems. Physica A 185, 343-356. 253. Bressloff, P.C. and Taylor, J.G. (1993) Compartmental-model response function for dendritic trees. Biol. Cybern. 70, 199-207. 254. Rall, W. (1970) Cable properties of dendrites and effect of synaptic location, in: Excitatory synaptic mechanisms, eds P. Andersen and J.K.S. Jansen. pp. 175-187. Universitetsforlaget, Oslo. 255. Shepherd, G.M. and Brayton, R.K. (1987) Logic operations are properties of computer-simulated interactions between excitable dendritic spines. Neuroscience 21, 151-166. 256. Koch, C., Poggio, T. and Torre, V. (1982) Retinal ganglion cells: a functional interpretation of dendritic morphology. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 298(1090), 227-263. 257. McCulloch, W. and Pitts, W. (1943) A logical calculation of ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115-133. 258. Mel, B.W. (1994) Information processing in dendritic trees. Neural Comput. 6, 1427-1439. 259. Mel, B.W. (1993) Synaptic integration in an excitable dendritic tree. J. Neurophysiol. 70(3), 10861101. 260. Gogan, P. and Tyc-Dumont, S. (1989) How do dendrites process neural information? News Physiol. Sci. 4, 127-130. 261. Zador, A.M., Agmon-Snir, H. and Segev, I. (1995) The morphoelectrotonic transform: a graphical approach to dendritic function. J. Neurosci. 15(3 Pt 1), 1669-1682. 262. Rail, W. (1962) Theory of physiological properties of dendrites. Ann. N.Y. Acad. Sci. 96, 10711092. 263. Koch, C., Douglas, R.J. and Wehmeier, U. (1990) Visibility of synaptically induced conductance changes: theory and simulations of anatomically characterized cortical pyramidal cells. J. Neurosci. 10(6), 1728-1744. 264. Holmes, W.R. and Woody, C.D. (1989) Effect of uniform and non-uniform synaptic "activation-distribution" on the cable properties of modeled cortical pyramidal cells. Brain Res. 505, 12-22. 265. Bernander, 0., Douglas, R.J., Martin, K.A.C. and Koch, C. (1991) Synaptic background activity determines spatio-temporal integration in single pyramidal cells. Proc. Natl. Acad. Sci. USA 88, 11569-11573. 266. Rapp, M., Yarom, Y. and Segev, I. (1992) The impact of parallel fiber background activity on the cable properties of cerebellar Purkinje cells. Neural Comp. 4, 518-533. 267. Longtin, A., Bulsara, A., Pierson, D. and Moss, F. (1994) Bistability and the dynamics of periodically forced sensory neurons. Biol. Cybern. 70(6), 569-578. 268. Levin, J.E. and Miller; J.P. (1996) Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature 380, 165-168. 269. Torre, V. and Poggio, T. (1978) A synaptic mechanism possibly underlying directional selectivity to motion. Proc. R. Soc. Lond. (Biol.) 202, 409-416. 270. Borg-Graham, L.J. and Grzywacz, N.M. (1992, A model of the directional selectivity circuit in retina: transformations by neurons singly and in concert, in: Single Neuron Computation,
464
271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282.
283. 284.
285. 286. 287. 288. 289. 290. 291.
292. 293.
294.
C. Meunier and L Segev
Chapter 13, eds T. McKenna, J. Davis, and S.F. Zornetzer. pp. 347-375. Academic Press, New York. Spruston, N., Stuart, G. and Hausser, M. (1999) Dendritic integration, in: Dendrites, eds G. Stuart, N. Spruston, and M. Hausser. pp. 231-270. Oxford University Press, Oxford. Eccles, J.C., Libet, B. and Young, R.R. (1958) The behavior of chromatolyzed motor neurones studied by intracellular recording. J. Physiol. (London) 143, 11-40. Stuart, G.M. and Hausser, M. (1994) Initiation and spread of sodium action potentials in cerebellar Purkinje cells. Neuron 13(3), 703-712. Denk, W., Svoboda, R. and Tank, D.W. (1996) Imaging calcium dynamics in dendritic spines. Curr. Opin. Neurobiol. 6(3), 372-378. Svoboda, K., Helmchen, F., Denk, W. and Tank, D.W. (1999) Spread of dendritic excitation in layer 2/3 pyramidal neurons in rat barrel cortex in vivo. Nat. Neurosci. 2(1), 65-73. Schwindt, P.C. and Crill, W.E. (1995) Amplification of synaptic current by persistent sodium conductance in apical dendrite of neocortical neurons. J. Neurophysiol. 74(5), 2220-2224. Magee, J.C. and Johnston, D. (1995) Characterization of single voltage-gated Na + and Ca 2+ channels in apical dendrites of rat CA1 pyramidal neurons. J. Physiol. (London) 487(1), 67-90. Magee, J.C. (1998) Dendritic hyperpolarization-activated currents modify the integrative properties of hippocampal CAI pyramidal neurons. J. Neurosci. 18(19), 7613-7624. Hille, B. (1992) Ionic channels of excitable membranes. 2nd Edn., Sinauer Associates, Sunderland, MA. Midtgaard, J. (1994) Processing of information from different sources: spatial synaptic integration in the dendrites of vertebrate CNS neurons. TINS 17(4), 166-173. Lorente de N6, R. and Condouris, G.A. (1959) Decremental conduction in peripheral nerve. Integration of stimuli in the neuron. Proc. Natl. Acad. Sci. USA 45, 592-617. Schiller, J., Schiller, Y., Stuart, G. and Sakmann, B. (1997) Calcium action potentials restricted to distal apical dendrites of rat neocortical pyramidal neurons. J. Physiol. (London), 505(3), 605-616. Mainen, Z.F., Joerges, J., Huguenard, J.R. and Sejnowski, T.J. (1995) A model of spike initiation in neocortical pyramidal neurons. Nature 15, 1427-1439. Rapp, M., Yarom, Y. and Segev, I. (1996) Modeling back propagating action potential in weakly excitable dendrites of neocortical pyramidal cells. Proc. Natl. Acad. Sci. USA 93(21), 11985-11990. Segev, I. and Rall, W. (1998) Excitable dendrites and spines: earlier theoretical insights elucidate recent direct observations, Trends Neurosci. 21(11), 453-460. Rushton, W.A.H. (1937) Initiation of the propagated disturbance. Proc. R. Soc. B 124, 210. Denk, W., Sugimori, M. and Llimis, R. (1995) Two types of calcium response limited to single spines in cerebellar Purkinje cells. Proc. Natl. Acad. Sci. USA 92(18), 8279-8282. Stuart, G., Spruston, N., Sakmann, B. and Hausser, M. (1997) Action potential initiation and backpropagation in neurons of the mammalian CNS. Trends Neurosci. 20(3), 125-131. Williams, S.R. and Stuart, G.J. (2000) Action potential backpropagation and somato-dendritic distribution of ion channels in thalamocortical neurons. J. Neurosci. 20(4), 1307-1317. Bernander, O., Koch, C. and Douglas, R.J. (1994) Amplification and linearization of distal synaptic input to cortical pyramidal cells. J. Neurophysiol. 72(6), 2743-2753. De Schutter, E. and Bower, J.M. (1994) Simulated responses of cerebellar Purkinje cells are independent of the dendritic location of granule cell synaptic inputs. Proc. Natl. Acad. Sci. USA 91(11), 4736-4740. Cook, E.P. and Johnston, D. (1999) Voltage-dependent properties of dendrites that eliminate location-dependent variability of synaptic input. J. Neurophysiol. 81(2), 535-543. Nicoll, A., Larkman, A. and Blakemore, C. (1993) Modulation of EPSP shape and efficacy by intrinsic membrane conductances in rat neocortical pyramidal neurons in vitro. J. Physiol. (London) 468, 693-710. Cash, S. and Yuste, R. (1998) Input summation by cultured pyramidal neurons is linear and position-independent. J. Neurosci. 18(1), 10-15.
Neurones as physical objects." structure, dynamics and function
465
295. Cash, S. and Yuste, R. (1999) Linear summation of excitatory inputs by CA1 pyramidal neurons. Neuron 22(2), 383-394. 296. Siegel, M., Marder, E. and Abbott, L.F. (1994) Activity-dependent current distributions in model neurones. Proc. Natl. Acad. Sci., USA 91, 11308-11312. 297. Wilson, C.J. (1995) Dynamic modification of dendritic cable properties and synaptic transmission by voltage-gated potassium channels. J. Comput. Neuro. 2, 91-115. 298. Korogod, S.M., Kopysova, I.L., Bras, H., Gogan, P. and Tyc-Dumont, S. (1996) Differential backinvasion of a single complex dendrite of an abducens motoneuron by N-methyl-D-aspartate-induced oscillations: a simulation study. Neuroscience 75(4), 1153-1163. 299. Moore, L.E., Chub, N., Tabak, J. and O'Donovan, M. (1999) NMDA-induced dendritic oscillations during a soma voltage clamp of chick spinal neurons. J. Neurosci. 19 300. Koch, C. (1984) Cable theory in neurons with active, linearized membranes. Biol. Cybern. 50(1), 15-33. 301. Shepherd, G.M. (1996) The dendritic spine: a multifunctional integrative unit. J. Neurophysiol. 75(6), 2197-2210. 302. Rall, W. (1974) Dendritic spines, synaptic potency and neuronal plasticity, in: Cellular Mechanisms Subserving Changes in Neuronal Activity, eds C.D. Woody, K.A. Brown, T.J. Crow and J.D. Knispel. pp. 13-21. Brain Information Service Research Report No. 3, UCLA, Los Angeles. 303. Crick, F. (1982) Do dendritic spines twitch?. Trends Neurosci. 5, 44-46. 304. Fischer, M., Kaech, S., Knutti, D. and Matus, A. (1998) Rapid actin-based plasticity in dendritic spines. Neuron 20(5), 847-854. 305. Maletic-Savatic, M., Malinow, R. and Svoboda, K. (1999) Rapid dendritic morphogenesis in CA1 hippocampal dendrites induced by synaptic activity. Science 283(5409), 1923-1927. 306. Rall, W., Shepherd, G.M., Reese, T.S. and Brightman, M.W. (1966) Dendrodendritic synaptic pathway for inhibition in the olfactory bulb. Exp. Neurol. 14(1), 44-56. 307. Shepherd, G.M. (1998) The Synaptic Organization of the Brain. 4th Edn. Oxford University Press, New York, Oxford. 308. Shepherd, G.M., Brayton, R.K., Miller, J.P., Segev, I., Rinzel, J. and Rall, W. (1985) Signal enhancement in distal cortical dendrites by means of interactions between active dendritic spines. Neuroscience 82(7), 2192-2195. 309. Segev, I. and Rall, W. (1988) Computational study of an excitable dendritic spine. J. Neurophysiol. 60(2), 499-523. 310. Rall, W. and Segev, I. (1987) Functional possibilities for synapses on dendrites and dendritic spines, in: Synaptic Function, eds G.M. Edelman, W.F. Gall, and W.M. Cowan. Neurosci. Res. Foundation, pp. 605-636. Wiley, New York. 311. Baer, S.M. and Rinzel, J. (1991) Propagation of dendritic spikes mediated by excitable spines: A continuum theory. J. Neurophysiol. 65(4), 874-890. 312. Gamble, E. and Koch, C. (1987) The dynamics of free calcium in dendritic spines in response to repetitive synaptic input. Science 236(4806), 1311-1315. 313. Holmes, W.R. (1990) Is the function of dendritic spines to concentrate calcium?. Brain Res. 519 (1-2), 338-342. 314. Qian, N. and Sejnowski, T.J. (1990) When is an inhibitory synapse effective?. Proc. Natl. Acad. Sci. USA. 87, 8145-8149. 315. Yuste, R. and Denk, W. (1995) Dendritic spines as basic functional units of neuronal integration. Nature 375(6533), 682-684. 316. Rose, C.R., Kovalchuk, Y., Eilers, J. and Konnerth, A. (1999, Two-photon Na + imaging in spines and fine dendrites of central neurons. Pfluegers Arch. 439(1-2), 201-207. 317. Engert, F. and Bonhoeffer, T. (1997) Synapse specificity of long-term potentiation breaks down at short distances. Nature 388(6639), 279-284; Erratum published in Nature 388(6643), 698. 318. Stratford, R.D., Mason, A.J.R., Larkman, A.U., Major, G. and Jack, J.J.B. (1989) The modeling of pyramidal neurons in the visual cortex, in: The Computing Neuron, eds R. Durbin, C. Miall, and C. Mitchison. Addison-Wesley, Reading, MA.
466
C. Meunier and L Segev
319. Rail, W. (1967) Distinguishing theoretical synaptic potentials computed for different soma-dendritic distributions of synaptic input. J. Neurophysiol. 30(5), 1138-1168. 320. Rail, W., Burke, R.E., Smith, T.R., Nelson, P.G. and Frank, K. (1967) Dendritic location of synapses and possible mechanisms for the monosynaptic EPSP in motoneurons. J. Neurophysiol. 30(5), 884-915. 321. Rail, W. (1977) Core conductor theory and cable properties of neurons, in: The Handbook of Physiology: The Nervous System, Vol. 1 (Cellular Biology of Neurons), eds E.R. Kandel, J.M. Brookhart, and V.B. Mountcastle. American Physiological Society, Bethesda, MD. 322. Rail, W., Burke, R.E., Holmes, W.R., Jack, J.J., Redman, S.J. and Segev, I. (1992) Matching dendritic neuron models to experimental data. Physiol. Rev. 72(4), S159-S186. 323. Koch, C. (1999) Biophysics of computation: Information processing in Single Neurons. Oxford University Press, Oxford. 324. Koch, C., Rapp, M. and Segev, I. (1996) A Brief History of Time (constants). Cereb. Cortex 6(2), 93-101. 325. Rail, W. (1969), Time constants and electrotonic length of membrane cylinders and neurons. Biophys. J. 9(12), 1483-1508. 326. Rail, W. (1969) Distributions of potential in cylindrical coordinates and time constants for a membrane cylinder. Biophys. J. 9, 1509-1541. 327. Eccles, J.C. (1957) The Physiology of Nerve Cells. Johns Hopkins Press, Baltimore, MD. 328. Eccles, J.C. (1960) The properties of the dendrites, in: Structure and Function of the Cerebral Cortex, eds D.B. Tower and J.P. Shade. Elsevier, Amsterdam. 329. Burke, R.E. and Bruggencate, G.T. (1971) Electrotonic characteristics of alpha motoneurones of varying size. J. Physiol. (London) 212, 1-20. 330. Major, G., Larkman, A.U., Jonas, P., Sakmann, B. and Jack, J.J. (1994) Detailed passive cable models of whole-cell recorded CA3 pyramidal neurons in rat hippocampal slices. J. Neurosci. 14(8), 4613-4638. 331. Rail, W. and Rinzel, J. (1973) Branch input resistance and steady attenuation for input to one branch of a dendritic neuron model. Biophys. J. 13(7), 648-688. 332. Redman, S.J. and Walmsley, B. (1983) The time course of synaptic potentials evoked in cat spinal motoneurons at identified group Ia synapses. J. Physiol. (London) 343, 117-133. 333. Mendell, L.M. and Henneman, E. (1971) Terminals of single Ia fibers: location, density, and distribution within a pool of 300 homonymous motoneurons. J. Neurophysiol. 34(1), 171-187. 334. Yuste, R. (1994) Ca 2+ accumulations in dendrites of neocortical pyramidal neurons: An apical band and evidence for two functional compartments. Neuron 13(1), 23-43. 335. Archie, K.A. and Mel, B.W. (2000) A model for intradendritic computation of binocular disparity. Nat. Neurosci. 3(1), 54-63. 336. Single, S. and Borst, A. (1998) Dendritic integration and its role in computing image velocity. Science 281(5384), 1848-1850. 337. Markram, H., Helm, P.J. and Sakmann, B. (1995) Dendritic calcium transients evoked by single back-propagating action potentials in rat neocortical pyramidal neurons. J. Physiol. (London) 485(1), 1-20. 338. Magee, J.C. and Johnston, D. (1997) A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science 275(5297), 209-213. 339. Borg-Graham, L.J. (1997) Interpretations of data and mechanisms for hippocampal pyramidal cell models, in: Cerebral Cortex, Vol. 12 (Cortical Models), eds E.G. Jones and P.S. Ulinski. Plenum Press, New York. 340. Marr, D. (1982) Vision. W.H. Freeman, San Francisco. 341. Anderson, J.C., Binzegger, T., Kahana, O., Martin, K.A. and Segev, I. (1999) Dendritic asymmetry cannot account for directional responses of neurons in visual cortex. Nat. Neurosci. 2(9), 820-824. 342. Borst, A. and Theunissen, F. (1999) Information theory and neural coding. Nature Neurosci. 2, 947-957.
Neurones as physical objects." structure, dynamics and function
467
343. Borg-Graham, L.J. (1991) Modelling the non-linear conductances of excitable membranes, in: Cellular and Molecular Neurobiology: A Practical Approach, eds H. Wheal and J. Chad. Oxford University Press, Oxford. 344. Armstrong, C.M. (1981) Sodium channels and gating currents. Physiol. Rev. 61, 644-683. 345. Kuo, C.-C. and Bean, B.P. (1994) Na + channels must deactivate to recover from inactivation. Neuron 12, 819-829.
This Page Intentionally Left Blank
C H A P T E R 12
A Framework for Spiking Neuron Models" The Spike Response Model W. GERSTNER Center for Neuro-mimetic Systems, Computer Science Department, EPFL-DL Swiss Federal Institute of Technology, CH-IO15 Lausanne EPFL, Switzerland
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
469
Contents
1.
Introduction
2.
Hodgkin-Huxley model
3.
4.
5.
6.
................................................. ..........................................
2.1.
Definition o f the m o d e l
.......................................
2.2.
R e d u c e d model: spike r e s p o n s e m e t h o d (1)
...........................
471 472 472 478
Spike response m o d e l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
484
3.1.
Definition o f the S R M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
484
3.2.
Background
491
..............................................
Intergrate-and-fire model
.........................................
491
4.1.
Definition o f the basic m o d e l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
492
4.2.
S t i m u l a t i o n by s y n a p t i c c u r r e n t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
495
4.3.
Spike r e s p o n s e m e t h o d (2): reset as c u r r e n t pulse . . . . . . . . . . . . . . . . . . . . . . . .
497
4.4.
Spike response m e t h o d (3): reset as initial c o n d i t i o n . . . . . . . . . . . . . . . . . . . . . .
500
4.5.
Discussion
504
...............................................
Multi-compartment model ......................................... 5.1.
Definition o f the m o d e l
.......................................
5.2.
Spike response m e t h o d (4)
.....................................
504 504 506
E x t e n s i o n s a n d discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
509
6.1. T h r e s h o l d process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
510
6.2.
Adaptation ...............................................
511
6.3.
Nonlinearities
512
6.4.
Conclusions
Abbreviations References
............................................. ..............................................
................................................
.....................................................
470
513 514 514
1. Introduction
The successful mathematical description of action potentials in the giant axon of the squid by Hodgkin and Huxley in 1952 has lead to a whole series of modeling papers which try to describe in detail the dynamics of various ion channels on the soma and dendrites during spike reception and spike emission. With modern computers it is now possible to numerically integrate models with 10-50 types of ion channel and hundreds of spatial compartments [1-3] and reproduce experimental findings to a high degree of accuracy. On the other hand, it is often difficult to grasp intuitively the essential phenomena of neuronal dynamics from these models. In particular, it is out of reach to understand these models analytically. Moreover, in a network setting the question arises whether all the details described in compartmental models are necessary to understand the computation in large populations of neurons. For an analytical understanding of networks of spiking neurons, a simplified description of neuronal dynamics is therefore desirable [4,5]. For this reason integrate-and-fire models [6-8] have become increasingly popular for the investigation of principles of cortical dynamics and function, e.g., [9-13]. The reduction of detailed neuron models to a standard integrate-and-fire unit requires simplifications in at least two respects. First, the nonlinear dynamics of spike generation [14,15] must be reduced to a leaky integrator with threshold firing [4]. Second, effects of the spatial structure of the neuron [16,8,17,18] must be reduced to some effective input [5]. In this paper, we address both issues from the systematic point of view of a response kernel expansion. It is shown that spike generation in the Hodgkin-Huxley model can be reproduced to a high degree of accuracy by a single-variable threshold model [19]. The problem of spatial structure is studied for a multi-compartmental integrate-and-fire model with a passive dendritic tree [17,18] and active currents at the soma. In this case, the model dynamics can be solved and systematically reduced to a single-variable model with response kernels. After the reduction of the intricate neuronal dynamics to a threshold model, it is then possible to study analytically the dynamics of networks of neurons. It has been shown previously that in a large network of model neurons with homogeneous couplings, the stability of coherent, incoherent, or partially coherent states can be understood in a transparent manner [10,20-22,13]. Moreover, the collective response of a population of spiking neurons to a common time-dependent input can be analyzed [23]. The mathematical considerations that are necessary for a reduction of the highly nonlinear Hodgkin-Huxley equations to a single-variable threshold model are therefore worth the effort. The chapter is organized as follows. We start in Section 2 with a review of the standard Hodgkin-Huxley model. The four differential equations of Hodgkin and Huxley give an accurate description of neuronal spiking in the giant axon of the 471
472
W. Gerstner
squid. The drawback is that they are highly nonlinear and therefore difficult to analyze mathematically. We therefore aim for a simpler phenomenological description. The method we propose is based on spike response kernels and provides a biologically transparent description of the essential effects during spiking. In Section 2.2, we will see that the Spike Response Model (SRM), derived from the HodgkinHuxley model by the SRM, can reproduce up to 90% of the spike times of the Hodkgin-Huxley model correctly. A short summary of the mathematics of the SRM is presented in Section 3. Another well-known model of neuronal spiking is the integrate-and-fire model which is reviewed in Section 4. We show that the integrate-and-fire is in fact a special case of the SRM. The mapping from the integrate-and-fire model to the SRM is discussed in some detail in Sections 4.3 and 4.4. In Section 5 we address the question of spatial structure. We show that in the case of a linear dendritic tree the dynamics can be well captured by spike response kernels. Finally in Section 6 we discuss weakly nonlinear effects. Throughout the text, the general arguments are interrupted by examples intended to illustrate the main results.
2. Hodgkin-Huxley model The classic description of neuronal spiking dates back to Hodgkin and Huxley [14] who summarized their extensive experimental studies on the giant axon of the squid with four differential equations. A first and fundamental equation describes the conservation of electric currents. Then there are three further differential equations which describe the dynamics of sodium and potassium ion channels. Modern models of neuronal dynamics make use of the same type of equations, but often involve many more ion channel types. The ion channels may be located on different compartments of a spatially extended neuron model. A single neuron may then be described by hundreds of coupled nonlinear differential equations. In this section we stick to the standard Hodgkin-Huxley model without spatial structure and use it as a reference model to study the dynamics of spike generation. In Section 2.1 we review the Hodgkin-Huxley equations. In Section 2.2 we reduce the nonlinear dynamics of the Hodgkin-Huxley model to a threshold model with a single variable u(t). This reduction will be the basis for a discussion of the SRM in Sections 3-6.
2.1. Definition of the model The Hodgkin-Huxley model can be understood with the help of Fig. 1. The semipermeable cell membrane separates the interior of the cell from the extracellular liquid. Due to the membrane's selective permeability and also because of active ion transport through the cell membrane, the ion concentration inside the cell is different from the one in the extracellular liquid. The difference in concentration generates an electrical potential between the interior and the exterior of the cell. The cell membrane acts like a capacitor which has been charged by a battery. If an input current I(t) is injected into the cell, it may add further charge on the capacitor, or leak through the channels in the cell membrane.
473
A framework for spiking neuron models: the spike response model
K
inside -
-
+
+
-
/
+
C1
+
Na
outside
Fig. 1.
-
cT
T
L
T,, T
Schematic diagram for the Hodgkin-Huxley model. Taken from [24].
Let us now translate the above considerations into mathematical equations. The conservation of electric charge on a piece of membrane implies that the applied current I(t) may be split in a capacitive current Ic which charges the capacitor C and further components Ik which diffuse through the ion channels. Thus
I(t) = Ic + Z I k ,
(1)
k
where the sum runs over all ion channels. In the standard Hodgkin-Huxley model there are only three types of channel: a sodium channel with index Na, a potassium channel with index K and an unspecific leakage channel with resistance R; cf. Fig. 1. From the definition of a capacity C = Q/u, where Q is a charge and u the voltage across the capacitor, we find the charging current Ic = Cdu/dt. Hence from (1) du C dt = - Z
Ik + I(t).
(2)
k
In biological terms, u is the voltage across the membrane and ~-~kIk is the sum of the ionic currents which pass through the cell membrane. As mentioned above, the Hodgkin-Huxley model describes three types of channel. All channels may be characterized by their resistance or, equivalently, by the conductance. The leakage channel is described by a voltage-independent conductance gL = 1/R; the conductance of the other ion channels is voltage dependent. If the channels are fully open, they transmit currents with a maximum conductance gNa o r gK, respectively. Normally, however, the channels are partially blocked. The removal of the block is voltage dependent and is described by additional variables m, n, and h. The combined action of m and h controls the Na channels. The K gates are controlled by n. Specifically, Hodgkin and Huxley formulated the three current components as ZIk
-- tJNam3h(u - VNa) -+- gK/'/4(U -- VK) -if- gL(U -- VL).
(3)
k The parameters VNa, VK, and VL are called reversal potentials since the direction of a current h changes when u crosses Vk. Reversal potentials and conductances are empirical parameters and summarized in Table 1.
474
W. Gerstner Table 1 The parameters of the Hodgkin-Huxley equations. The membrane capacity is C = 1 laF/cm2 V,. (mV)
9x (ms/cm2)
I 15 -12 10.6
120 36 0.3
0tx (u/mV)
13x (u/mV)
(0.1 -0.01 u)/[exp(1 -0.1 u) - 1] (2.5- 0.1 u)/[exp(2.5- 0.1 u) - 1] 0.07 exp(-u/20)
o. 125 exp(-u/80) 4 exp(-u/18) 1/[exp(3 - 0.1 u) + 1]
Na K L
T h e three variables m, n, a n d h evolve a c c o r d i n g to the differential equations: rh - ~m(U)(1 -- m) -- ~m(U)m, h -- ~,,(u)(1 - n) - ~n(u)n,
(4)
/z -- o~h(u)(1 - h) - 13h(u)h with rh = d m / d t , a n d so on. T h e 0t a n d 13, given in Table l, are empirical functions o f u t h a t have been adjusted by H o d g k i n a n d Huxley to fit the d a t a of the giant a x o n o f the squid. Eqs. (2)-(4) define the H o d g k i n - H u x l e y model. E a c h of the three e q u a t i o n s (4) m a y also be written in the f o r m 1 -
-
Ix
-
xo(,)],
(5)
w h e r e x stands for m, n, or h. F o r fixed voltage u, the variable x a p p r o a c h e s the value
xo(u) with a time c o n s t a n t ~x(U). T h e a s y m p t o t i c value xo(u) and the time c o n s t a n t ~x(U) are given by the t r a n s f o r m a t i o n xo(u) = Otx(U)/[atx(U) + 13x(U)] a n d ~x(U) = [ ~ ( u ) + 13x(u)]-1 . U s i n g the p a r a m e t e r s given by H o d g k i n a n d Huxley [14], we have p l o t t e d in Fig. 2 the functions xo(u) a n d ~x(U).
(a)l.O
-
(b)
lO.O
~
. . . . . . . . . . . .
.. 0.5
h\!,,""?m
J
n, '!
t
i I '~
0.0
-1~.0
Fig. 2.
.... : : C / , "'..;,.......... J
-50.0
0.0
u[mV]
50.0
100.0
!
/'", h s0
n.-"7-'"-", ..'""
0.0 .... ".... ' . -100.0 -50.0
.
.
m ""..... "--'::--------~.
. . 0.0 u[mV]
.
.
50.0
100.0
Equilibrium function (a) and time constant (b) for the three variables m, n, h in the Hodgkin-Huxley model. Taken from [24].
A framework for spiking neuron models." the spike response model
475
2.1.1. Example." spike generation We see from Fig. 2a that m0 and no increase with u whereas h0 decreases. Thus, if some external input causes the membrane voltage to rise, the ion conductance of sodium (Na) increases due to increasing m and positive sodium ions flow into the cell. This raises the membrane potential even further. If this positive feedback is large enough, an action potential is initiated. At high values of u the sodium conductance is shut off due to the factor h. Note from Fig. 2b that Th is always larger than Tm- Thus the variable h, which closes the channels, reacts more slowly to the voltage increase than the variable m which opens the channel. On the same slower time scale, the potassium (K) current sets in. Since it is a current in outward direction, it lowers the potential. The overall effect of the sodium and potassium currents is a short action potential followed by a negative overshoot. In Fig. 3a we show the time course of the membrane voltage u(t) during an action potential. The spike has been initiated by a short current pulse of 1 ms duration applied at t < 0. Note that the amplitude of the spike is about 100 mV. If the size of the stimulating current pulse is reduced below some critical value, the membrane potential returns to the rest value without a large spike-like excursion; cf. Fig. 3b. Thus we have a threshold-type behavior. 2.1.2. Example." constant input and mean firing rates The Hodgkin-Huxley equations (2)-(4) may also be studied for constant input I(t) = Io for t > 0. (The input is zero for t < 0.) If the value I0 of the stimulation is larger than a critical value I0, we find a regular spiking behavior. We may define a firing rate v - 1IT, where T is the interspike interval. The firing rate as a function of
(a) ~oo >
80
(b) 10 15
6O
S
40 ~"
m ~" t J " "i)}--
2O 0
-10
0
5
I0 t/ms
15
20
. . . .
10
15
2"0 " " 25-
" 30
t/ms
Fig. 3. (a) Action potential. The Hodgkin-Huxley model has been stimulated by a short, but strong, current pulse before t --0. The time course of the membrane potential u(t) for t > 0 shows the action potential (positive peak) followed by a relative refractory period where the potential is below the resting potential. The resting potential has been set to zero. In the spike response framework, the time course u(t) of the action potential for t > 0 defines the kernel rl(t ). (b) Threshold effect in the initiation of an action potential. A current pulse of 1 ms duration has been applied at t = 10 ms. For a current amplitude of 7.0 laA/cm2, an action potential with an amplitude of about 100 mV as in (a) is initiated (solid line, the peak of the action potential is out of bounds). If the stimulating current pulse is slightly weaker (6.9 laA/cm2) no action potential is emitted (dashed line) and the voltage v stays always below 10 mV. Note that the voltage scale in (b) is different from the one in (a). Taken from [19].
476
W. Gerstner
the constant input I0 is plotted in Fig. 4b. Spike trains with intervals T - 1/v occur if the input current I0 is larger than a threshold value I0 ~ 6~t A/cm 2.
2.1.3. Example." step current input In the previous example we have seen that a constant input current I0 > I0 generates regular firing. In this paragraph we generalize this approach and study the response of the Hodgkin-Huxley model to a step current of the form
I(t) = Ii + A/~c~(t).
(6)
Here ~r denotes the Heaviside step function. At t - - 0 the input changes from a constant value I1 to a new constant value I2 - I1 + A/; see Fig. 5a. We may now ask whether spiking for t > 0 depends only on the final value 12 or also on the step size A/. The answer to this question is given by Fig. 5b. A large step A/facilitates the spike initiation. Even for h = 0 a spike is possible, provided that the step size is large enough. This is an example of inhibitory rebound: A single spike is fired, if an inhibitory current I~ < 0 is released. The letter S in Fig. 5b denotes the regime where only a single spike is initiated. Repetitive firing (regime R) is possible for I2 > 6 ~tA/ cm 2, but must be triggered by sufficiently large current steps. We may conclude from Fig. 5b that there is no unique current threshold for spike initiation: The trigger mechanism for action potentials depends not only on I2 but also on the size of the current step A/. More generally, it can be shown that the concept of a threshold itself is questionable from a mathematical point of view [15,25]. In a mathematical sense, the transition in Fig. 3b, that 'looks' like a threshold is, in fact, smooth. At a higher resolution of the input current in the regime between 6.9 and 7.0 l~A/cm 2, we would find a family of response curves in between the curves shown in Fig. 3b. For practical purposes, however, the transition can be treated as a threshold effect as we will see below.
2.1.4. Example." stimulation by time-dependent input As a final example, we stimulate the Hodgkin-Huxley model by a time-dependent input current I(t). In the numerical implementation, the current is generated by the following procedure. Every 2 ms, a random number is drawn from a Gaussian distribution with zero mean and standard deviation a - - 3 pA/cm 2. To get a con(a)
150.0
-
9
..
( b ) lOO.O
7
100.0 37 >
0.0
~%.o Fig. 4.
~
J
~.o) 4o.0 ~io-=e0.o t[ms]
50.0
8o.o
loo.o
0.0 0.0
5.0
lO.O
1~.o 2 o . o
Io
(a) Spike train of the Hodgkin-Huxley model for constant input current Io. The mean firing rate as a function of Io is shown in (b).
A framework for spik&g neuron models." the spike response model
477
(a)
I i I 0
(1•0)0 ~
6
-..
,
50 t ~ I n~s
.
i00
,
,
...........
~.o'6
.
"[:::~
...""
.....'~,r
o .~
iii ~.
-
0
, .... ........ ...... 50
~,m~
100
v
-
0
.
6i
.
~
.
.........
-
~
.
.
~
'
~
0
\
2
~00
O".'.
50
i00
. . . ~ . o 6 ~
............::.i time/ms
50
t~[ I lllillil ..."'"
'14
0
8
-2 12/pA cm
",,s
.....
o
i
current
(e) ~
50
S
~-" -
50
,........ ~
o 0
~ 50
100
4 -2 current 12/ pA cm
Fig. 5. Phasediagram for stimulation with a step current. (a) The input current I(t), shown in the top graph, changes at t = 0 from I1 to I2. (b) Hodgkin-Huxley model and (c) SRM with optimal kernels. Three regimes denoted by S, R, and I may be distinguished. In I no action potential is initiated (inactive regime). In S, a single spike is initiated by the current step (single spike regime). In R, continuing spike trains are triggered by the current step (repetitive firing). Examples of voltage traces in the different regimes are presented in the smaller graphs to the left and right of the phase diagram in the center. Note that the SRM shows qualitatively the same behavior as the Hodgkin-Huxley model but phase boundaries are not at exactly the same location. Taken from [19]. tinuous input current, a linear interpolation was used between the target values. The resulting time-dependent input current was then applied to the H o d g k i n - H u x l e y model (2). The response to the current is the voltage trace shown in Fig. 6. Note the action potentials which occur at irregular intervals.
W. Gerstner
478
i00
>
80
~
60
~
40
,--, o
).0 o
~'1' 0
200
400
+rl 600
800
z000
time / ms Fig. 6. Spike train of the Hodgkin-Huxley model driven by a time-dependent input current. The action potentials occur irregularly. The figure shows the voltage u as a function of time. Taken from [19].
2.1.5. Extensions Using the above equations and an appropriate set of parameters, Hodgkin and Huxley were able to describe an enormous amount of data from experiments on the giant axon of the squid. Due to its success in this special system, there have subsequently been several attempts to generalize the model in order to describe other experimental situations as well (for a review see, e.g., [26,3]). Whereas the model had originally been designed to describe the form and temporal change of an action potential during axonal transmission, a set of equations completely analogous to Eqs. (2)-(4) has been used to describe spike generation at the soma of the neuron [27-30,2,31,1]. The main difference is that additional ion channels have to be included, in particular those that account for Ca 2+ and the slow components of the potassium current. For each type of ion channel i, a current Ii --gixTg(u- Vii) is added. Here x; is yet another variable with dynamics (5). The conductance parameters gi, the exponents n;, the reversal potential Vi, as well as the functions xo(u) and r(u) are adjusted to fit experimental data. Nonlinear effects on dendrites are described analogously. 2.2. Reduced model." spike response method (1) The system of equations proposed by Hodgkin and Huxley is rather complicated. It consists of four coupled nonlinear differential equations and as such is difficult to analyze mathematically. For this reason, several simplifications of the HodgkinHuxley equations have been proposed. The most common approach reduces the set of four differential equations to a system of two equations [32-34,15,4]. Two important approximations are made. First, the m dynamics, which has a faster time course than the other variables (see the plot for T m in Fig. 2b), is considered to be instantaneous, so that m can be replaced by its equilibrium value mo(u). Second, the equations for n and h, which have according to Fig. 2b roughly the same time constants, are replaced by a single effective variable. Rinzel [34] and Abbott and Kepler [4] have shown how to make such a reduction systematically. The resulting two-dimensional model is often called the Morris LeCar model or the FitzHugh-
A framework for spiking neuron models: the spike response model
479
Nagumo model. The advantage of a two-dimensional set of equations is that it allows a systematic phase plane analysis. For a review of the methods and results see the excellent article of Rinzel and Ermentrout [15]. For a further reduction of the two-dimensional model to an integrate-and-fire model, see the article of Abbott and Kepler [4]. In this Section, we will take a somewhat different approach [19]. We would like to reduce the four Hodgkin-Huxley equations to a single variable u(t). We identify u with the membrane potential of the neuron. As we have seen in Fig. 3b, the Hodgkin-Huxley model shows a sharp, threshold-like transition between an action potential (spike) for a strong stimulus and graded response (no spike) for a slightly weaker stimulus. This suggests the idea that emission of an action potential can be described by a threshold process. In the simplified model, an action potential will be fired if the voltage u(t) approaches a formal threshold 8 from below. Let us suppose that the threshold is reached at a time t (f) defined by
()
u t 0c) = 8
and
~u
t (f)
>0.
(7)
We call t (f) the firing time of the neuron. If there are several neurons we add a lower index to identify the neuron so that t~ ) is one of the firing times of neuron i. Let us write ti "- max{tff)]t} f) < t} for the last firing time of neuron i. In the following we only have a single neuron and we suppress the subscript i. The notation t stands for the last firing time of this neuron. Action potentials in the Hodgkin-Huxley model have the stereotyped time course shown in Fig. 3a. Whatever the stimulating current that has triggered the spike, the form of the action potential is always roughly the same (as long as the current stays in a biologically realistic regime). This is the major observation that we will exploit in the following. Let us consider the spike triggered at time t. If no further input is applied for t > i, the voltage trajectory will have a pulse-like excursion before it eventually returns to the resting potential. For t > i, we may therefore set u(t) = r l ( t - t) + U r e s t , where 11 is the standard shape of the pulse and Urest is the resting potential that the neuron assumes in the absence of any input. Since, without further input, the voltage will eventually approach the resting value, we have v l ( t - t) ~ 0 for t - ~ ~ e~. Let us now consider an additional small input current pulse I which is applied at t' > ~. Due to the input, the membrane potential will be slightly perturbed from its trajectory. If the input current is sufficiently small, we may describe the perturbation by a linear impulse response function ~z. Since the voltage u depends on the last firing time ~, the response kernel ~: does so as well. For an input with arbitrary time course I(t ~) for t~ > ~ we therefore set
t-i u(t) = q(t - ~) + fo
~(t - ~,s)I(t - s ) d s + Urest.
(8)
Eq. (8) will be called the SRM. Note that after an appropriate shift of the voltage scale, the resting potential can always be set to zero, U r e s t --- 0.
480
W. Gerstner
To construct an approximative mapping between the S R M (8) and the H o d g k i n Huxley equations, we have to determine the following three terms: (i) the kernel r I which describes the response to spike emission; (ii) the kernel ~: which describes the response to incoming current; and (iii) the value of the threshold ,9 in Eq. (7).
2.2.1. The q-kernel In the absence of input the membrane potential u is at some resting value Urest. If we apply a strong current pulse, an action potential may be excited. The time course of the action potential determines the kernel 11. To get the kernel rl we use the following procedure. We take a square current pulse of the form
I(t)--c--~q0
for0
(9)
and zero otherwise, q0 is a unit charge and c a parameter chosen large enough to evoke a spike. The principle is indicated in Fig. 3b. We consider a series of current pulses of increasing c but the same duration of A - - 1 ms. At a critical value of c the voltage response u(t) shows an abrupt change from a response amplitude of about 10 mV to an amplitude of nearly 100 mV. If c is increased even further, the form of the pulse remains nearly the same. The kernel rl allows us to describe the standard form of the spike and the spike after potential. In order to define the kernel q, we set q(t-
i) = u(t) - Urest for t >
(10)
and q(t - i) = 0 for t < i. u(t) is the voltage trajectory caused by the supra-threshold current pulse. The firing time i is defined by the m o m e n t when u crosses the formal threshold ,~ from below. The kernel q(s) is shown in Fig. 3a.
2.2.2. The ~-kernel To find the kernel ~: we perform a simulation with a short current as in Eq. (9), but with a duration A << 1 ms and c sufficiently small. (Formally, we consider the limits A ~ 0 and c ~ 0.) The voltage response of the H o d g k i n - H u x l e y model to this subthreshold current pulse defines the kernel ~:, ~z(cx~,t) -- 1 [u(t) - Urest]. c
(11)
t > 0 is the time since the initiation of the pulse. The first argument of ~ has been set to infinity in order to indicate that the neuron did not fire an action potential in the recent past. In order to calculate ~ for finite t - i, we use a first strong pulse to initiate a spike at a time i < 0 and then apply a second weak pulse with amplitude c at t - 0. The result is a m e m b r a n e potential with time course u(t). Without the second pulse the time course of the potential would be uo(t) = q ( t - ~) + Urest for t > ~. The response to the second pulse is u ( t ) - uo(t), hence
481
A framework for spiking neuron models." the spike response model
~2(t- t, t) -- l [U(/) -- r l ( t C
t) -- Urest].
(12)
for t > 0. We repeat the above procedure for various spike times ~. The result is shown in Fig. 7. Since the input current pulse delivers a unit charge during a very short a m o u n t of time A < 0.1 ms, the ~:-kernel jumps almost instantaneously at time t = 0 to a value of 1 mV. Afterwards it decays, with a slight oscillation, back to zero. The decay is faster if there has been a spike in the recent past. This is easy to understand intuitively. During and immediately after an action potential many ion channels are open. The resistance of the cell membrane is therefore reduced and the effective membrane time constant is shorter.
2.2.3. The threshold The third term to be determined is the threshold ,9. Even though Fig. 3b suggests, that the H o d g k i n - H u x l e y equations exhibit some type of threshold behavior, the threshold is not well defined [15,25] and it is fairly difficult to estimate a voltage threshold directly from a single series of simulations. We therefore take the threshold as a free parameter that will be adjusted by a procedure discussed below.
2.2.4. Example: stimulation by time-dependent input To test the quality of the S R M approximation we compare the performance of the SRM (8) with that of the full H o d g k i n - H u x l e y model (2)-(4). We study the case of a time-dependent input current I(t). The input is generated by the procedure discussed in Section 2.1.4. The same current is applied to both the H o d g k i n - H u x l e y and the SRM. In Fig. 8 the voltage trace of the H o d g k i n - H u x l e y model is compared to that of the S R M with the kernels rl and ~: derived above. We see that the
1 ~_.
0.8 !
"L~
0
+
0
6
2 0
-0.2
0
5
I0
t/ms
15
20
Fig. 7. The voltage response of the Hodgkin-Huxley model to a short subthreshold current pulse defines the kernel ~:. The input pulse has been applied at t = 0. The last output spike occured at i = -At. We plot the time course ~:(At + t, t) (the tilde has been suppressed in the figure legend). For At -~ oo we get the response shown by the solid line. For finite At, the duration of the response is reduced due to refractoriness (dashed line, ouput spike At = 10.5 ms before the input spike; dotted line At = 6.5 ms) taken from [19].
W. G e r s t n e r
482
7
,
,
I00
>N
7s
~
50
,--,
25
i,, I9
9" 9 9
9
~
I;o
~
~l/
2;0
i
9
,,' i !
s 99
i !
te
9s
240
]l
e
e
! 9.
e
s
! I
e e
9 9 9
i
e
9s
! !
i
e
5 I t
0
',, / \,...,/
-5 180
190
200
210
220
2~-0
~.~o
2~o
,,,.--
time /
ms
Fig. 8. A segment of the spike train of Fig. 6. The inset in the lower left corner shows the voltage of the Hodgkin-Huxley model (solid) together with the approximation of the SRM defined by (8) (dashed line) during a period where no spike occurs. The approximation is excellent. The inset on the lower right shows the situation during and after a spike. Again the approximation by the dashed line is excellent. For comparison, we also show the approximation by the SRM0 model which is significantly worse (dotted line). Taken from [19]. approximation is excellent both in the absence of spikes and during spiking. As an aside we note that it is indeed important to include the dependence of the kernel ~: upon the last output spike time ~. If we neglected that dependence and used ~ ( ~ , s ) instead of ~ ( t - ~,s), then the approximation during and immediately after a spike would be significantly worse; see the dotted line, referred to as SRM0, in the lower right graph of Fig. 8. We have used the above scenario with time-dependent input current to optimize the threshold ~ by the following procedure. The same input was applied to the Hodgkin-Huxley model and the SRM (8) with kernels derived by the procedure described above. The threshold has been adjusted so that the total number of spikes was about the same in the two models; see [19] for details. To check whether both models generated spikes at the same time, we compared the firing times of the two models. About 90% of the spikes of the SRM occurred within -t-2 ms of the action potentials of the Hodgkin-Huxley model [19]. Thus the SRM (8) reproduces the firing times and the voltage of the Hodgkin-Huxley model to a high degree of accuracy.
A framework for spiking neuron models." the spike response model
483
2.2.5. Example: constant input and mean firing rates We study the response of the S R M to constant stimulation using the kernels derived by the procedure described above. The result is shown in Fig. 9. As mentioned above, we take the threshold 8 as a free parameter. If 8 is optimized for stationary input, the frequency plots of the Hodgkin-Huxley model and the S R M are rather similar. On the other hand, if we took the value of the threshold that was found for time-dependent input, the current threshold for the S R M would be quite different as shown by the dashed line in Fig. 9.
2.2.6. Example: step current input Finally, we test the S R M for the case of step current input. For 8 we take the value found for the scenario with time-dependent input. The result is shown in Fig. 5c. The S R M shows the same three regimes as the H o d g k i n - H u x l e y model. In particular, the effect of inhibitory rebound is present in the SRM. The location of the phase boundaries depends on the choice of ~ and would move if we changed I).
2.2.7. Example: spike input In the H o d g k i n - H u x l e y model (2), input is formulated as an explicit driving current I(t). In networks of neurons, input typically consists of the spikes of other, presynaptic, neurons. Let us, for the sake of simplicity, assume that a spike of a presynaptic n e u r o n / , which was emitted at time t/), generates for t > t/) a current input I(t) - ~(t -tJ: ~) to a postsynaptic neuron i. Here ~ is some arbitrary function which describes the time course of the postsynaptic current. The voltage of the postsynaptic neuron
i
changes,
according
to
(8)
by
an
amount
Aui(t) -- fo --~ ~ ( t - ti,s)e~(t- ( ) - s)ds, where ti is the last output spike of neuron i. For reasons of causality, the voltage response Au vanishes for t < t / ) . For t > t/) we define (note that there is no tilde on the left-hand side) 125: ioo
,.~
50
(
,
25
J
i
0 0
5
I0
15
20
I / #Acm -2 Fig. 9. The firing rate of the Hodgkin-Huxley model (solid line) is compared to that of the SRM. Two cases are shown. If the threshold 8 is optimized for the constant-input scenario, we get the dotted line. If we take the same value of 8 as in the dynamic-input scenario of the previous figure, we find the dashed line. Input current has a constant value I. Taken from [19].
484
W. Gerstner
i
-
t-i; (13)
,/o
What is the meaning of the definition (13)? Let us assume that the last output spike of the postsynaptic neuron was a long time back in the past. The voltage response Aui(t) - ~(oo, t - tj(f) ) is the postsynaptic potential of neuron i caused by the firing of the presynaptic neuron j. The time course of the postsynaptic potential can be measured in experiments and has a clear biological interpretation. For excitatory synapses the response of the postsynaptic neuron is positive and called an excitatory postsynaptic potential (EPSP). For inhibitory synapses it is negative (inhibitory postsynaptic potential (IPSP)). The function (13) will play a major role in the formal definition of the SRM in Section 3.
3. Spike response model In this section we collect the results of the previous discussion of the HodgkinHuxley model. We start with a formal presentation of the SRM in Eqs. (14) and (15). We then try to give each of the terms in (15) a biological meaning. To do this we make heavy use of the intuitions and results developed during the discussion of the Hodgkin-Huxley model. Finally, we present some examples and simplifications of the SRM which prepare the transition to a discussion of the integrate-and-fire model in Section 4. 3.1. Definition o f the S R M In the framework of the SRM [19,35,20-24,36-38], the state of a neuron i is described by a single variable u;. Neuron i fires, if ui approaches a threshold ~) from below. The moment of threshold crossing defines the firing time t}f), ui(t) = 9
and
d -;:ui(t) > 0 = ~ t - t } f).
(14)
13/--
In the absence of any spikes, the variable U i would have a value of 0. Each incoming spike will perturb u; and it takes some time before ui returns to zero. The function e describes the time course of the response to an incoming spike. If, after the summation of the effects of several incoming spikes, ui reaches the threshold ,9, an output spike is triggered. The form of the action potential followed by a return to a low value after the pulse is described by a function 11. Let us suppose neuron i has fired its last spike at time ii. After firing the temporal evolution of ui is given by
jE Fi
-t-
# t)(.[) E,~-j
~(t -- ti, S) Iext (t -- S) ds,
(15)
485
A framework for spiking neuron models." the spike response model
where ti is the last spike of neuron i, t~ ) the spikes of presynaptic neurons j and wij is the synaptic efficacy. The last term accounts for an external driving current/ext. The sum runs over all presynaptic neurons j E Fi where (16)
Fi = { j l j presynaptic to i}
and ~-j is the set of all firing times t~ ) < t of neuron j. So far Eqs. (14) and (15) define a formal model. Can we give a biological interpretation of the terms? Let us identify the variable u; with the m e m b r a n e potential of neuron i. The functions qi and eij are response kernels which describe the effect of spike emission and spike reception on the variable ui. This interpretation has motivated the term SRM. Let us discuss the meaning of the response kernels; see Fig. 10. As we have seen in Section 2.2.1., the kernel qi describes the standard form of an action potential of neuron i including the negative overshoot which typically follows a spike. Graphically speaking, a contribution qi is 'pasted in' each time the membrane potential reaches the threshold ~; Fig. 10. Since the form of the spike is always the same, the exact time course of the action potential carries no information. W h a t matters is whether there is the event 'spike' or not. The event is fully characterized by the firing time ti(f). In a simplified model, the f o r m of the action potential may
ui(t)
1'1
A
(t- ti)
, .........
,~'.. I~
4 .................................................................................................
t_•
!
,,.e
i
^
~
~
ti
t'
t"
,~
" "-
-
-- . . . . .
t
Fig. 10. Schematic interpretation of the SRM. The figure shows the time course of the membrane potential of neuron i as a function of time t. A spike of neuron i has been initiated at ~/. The kernel q ( t - ~/) for t > i/describes the form of the action potential (positive pulse) and the (negative) spike after potential that follows the pulse (solid line). If an input current pulse is applied at a time t" a long time after the firing at ~g, it evokes a standard response described by the function gz(e~,t - t") and indicated by the dashed line starting at t" (arrow). An input current pulse at t' which arrives shortly after the postsynaptic spike at ti evokes, due to refractoriness of the neuron, a response of significantly shorter duration. Its time course is described by the response kernel ~:(t- ~g,t - tt); see the dashed line after t'.
486
W. Gerstner
therefore be neglected as long as we keep track of the firing times t/(f). The kernel ]]i describes then simply the 'reset' of the potential to a lower value after the spike at ii. This idea will be exploited later on in Section 4 in the context of the intergrate-andfire model. The kernel ~,(t - ii, s) is the linear response of the membrane potential to an input current. We have already seen in Section 2 that the response depends, in general, on the time that has passed since the last output spike at ii. Immediately after ii many ion channels are open. The resistance of the membrane is reduced and the voltage response to an input current pulse of unit amplitude is therefore reduced compared to the response of an inactive neuron. A reduced response is one of the signatures of neuronal refractoriness. Formally, this form of refractory effect is included by making the kernel ~: depend, in its first argument, on the time difference t - ii. In Fig. 10 we compare the effect of an input current pulse at t' shortly after ii to that of a pulse at t" some time later. The response to the first input pulse is shorter and less pronounced than that to the second one. The kernel sij(t- ti,s) as a function of s = t - t~r) can be interpreted as the time course of a postsynaptic potential evoked by the firing of a presynaptic neuron j at time t~:/). If the synapse from j to i is excitatory, sij is called the EPSP. If it is inhibitory, it is called the IPSP. Similarly as for the kernel ~:, the exact shape of the postsynaptic potential depends on the time t - ii that has passed since the last spike of the postsynaptic neuron i. In particular, if neuron i has been active immediately before presynaptic spike arrival, the postsynaptic neuron is in a state of refractoriness. In this case, the response to an input spike is smaller than that of an 'unprimed' neuron. The first argument of sij(t- ii, s) accounts for the dependence upon the last firing time of the postsynaptic neuron. In order to simplify the notation later on, it is convenient to introduce the total
postsynaptic potential
.jE Fi
E.~-j
t(:f)
f0 C
(17)
Eq. (15) can then be written in the form ui(t) = n i ( t -
i;) +
h sp(tli;).
(18)
3.1.1. Example: refractoriness Refractoriness may be described qualitatively by the observation that immediately after a first action potential it is much more difficult to excite a second spike. In our description two factors contribute to refractoriness; see Fig. 10. Firstly q contributes because, during the spike, the voltage is above threshold. Thus it is excluded that the membrane potential is crossed from below so that emission of another spike is by definition impossible. Moreover, after the spike the membrane potential passes through a regime of hyperpolarization (negative overshoot) where it is below the
A framework jbr spiking neuron models: the spike response model
487
resting potential. During this phase, more stimulation than usual is needed to drive the membrane potential above threshold. Secondly, ~ and ~ contribute because, immediately after an action potential, the response to incoming spikes is shorter and, possibly, of reduced amplitude. Thus more input spikes are needed to evoke the same depolarization of the membrane potential as in an 'unprimed' neuron. The first argument of the ~ function (or function) allows us to incorporate this effect. If tJ/) - ti -----+ o o , then the response of neuron i to a presynaptic spike of neuron j is the standard EPSP. If t~ ) is close to ti, then the postsynaptic potential ~(t- ~, t - tJf )) has a different time course.
3.1.2. Example." experimental results In recent experiments, Stevens and Zador [39] have stimulated a cortical neuron with time-dependent current and measured the response of the membrane potential during repetitive firing. In order to fit their measurements to integrate-and-fire type dynamics, they found that it was important to work with a time-varying time 'constant' r ( t ' - ~). Given that the last output spike was at ~ < 0, the response to input at t - 0 is (for t > 0) approximated by { ~0 t dr' } ~(t'- ~) ' g:(t- ~, t) -- ao exp -
(19)
where a0 is a parameter and ~ ( t ' - ~) is the instantaneous membrane time constant. Immediately after the output spike at ~ the membrane time constant is only about 2 ms; for t ' - ~ - + oc the membrane time constant increases and approaches the standard value "Cm ~ 10--15 ms.
3.1.3. Example: SRMo A simpler version of the SRM can be constructed, if we neglect the dependence of upon the first argument. We simply set =
=
and use (151) in the form "i(t) -- rli(t - ti) + Z wij Z F~O(t-- ( ) ) jEFi tif) Egj
+
f0 ~
~.O(S)IeXt(t- s)ds.
(20)
J
Thus each presynaptic spike evokes a postsynaptic potential with the same time course, independent of the index j of the presynaptic neuron and independent of the last firing time ~; of the postsynaptic neuron. Only the amplitude of the response is scaled with the synaptic efficact wij. This simple version of the SRM has been termed SRM0 [23] and has been used for the analysis of computation with spiking neurons [36,37] and for that of network synchronization [22].
488
W. Gerstner
3.1.4. Example." from action potentials to formal events The shape of an action potential is described by the function r l ( t - t). Since it has a stereotyped time course, the form of the action potential does not transmit any information. What counts is the event 'spike' as such. In formal model, the form of the pulse is therefore often replaced by a 8 function. The negative overshoot after the spike is modeled as a reset to a lower value. One of several possible descriptions is r l ( t - t) - 8 ( t - i) - rl0 e x p ( with a parameter rl0 > 0. The negative overshoot (second term on the right-handside of (21)) decays back to zero with a time constant t. The simpification from a nicely shaped action potential to a formal q-function is illustrated in Fig. 11.
3.1.5. Example: graphical construction of firing times Let us now summarize the considerations of the two preceding examples and proceed to a graphical illustration of the model; cf. Fig. 12. The neuron under consideration receives input from several presynaptic cells. Each input spike evokes a postsynaptic potential of some standard form s0(s). We assume excitatory synapses, so that s is positive. The excitatory postsynaptic potentials are summed until the firing threshold 9 is reached. Each output spike is approximated by a 8 pulse, followed by a reset as in (21). Then the summation of inputs restarts.
3.1.6. Example." coherent versus incoherent input What can we do with the simplified neuron model defined by (20) and (21)? The SRM can provide an intuitive understanding of questions of neuronal coding and signal transmission. For example, we may easily understand why coherent input is more efficient than incoherent input in driving a postsynaptic neuron.
6(t-f)
\ ,, i i
i i
j:'i........................................... -11o. ?~ Fig. 11. In formal models of spiking neurons, the shape of an action potential (dashed line) is replaced by a 8 pulse (thick vertical line). The negative overshoot (spike after potential) after the pulse is included in the kernel r l ( t - t) (thick line). The pulse is triggered by the threshold crossing at ~.
489
A frameworkfor spiking neuron models: the spike response model ' I ~voltage response
,,./
input spikes I
1
ii/
,.~
!/
"spike
u
input spikes Fig. 12. Spike Response Model SRM0. Each input pulse causes an excitatory postsynaptic potential (EPSP) e(s). All EPSPs are added. If the threshold is reached the voltage is reset. The reset corresponds to adding a negative kernel q(s).
To illustrate this point, let us consider an e kernel of the f o r m ~~
jSexp(-~)z
for s > 0
(22)
and zero otherwise. We set J = 1 m V and z = 1 0 ms. The function (22) has a m a x i m u m value of J/e at s = z. The integral over s is normalized to Jz. Let us consider a neuron i which receives input f r o m 100 presynaptic neurons j. Each presynaptic n e u r o n fires at a rate of 10 Hz. All synapses have the same efficacy w0 = 1. Let us first study the case of a s y n c h r o n o u s input. Different neurons fire at different times so that, on average, spikes arrive at intervals of At = 1 ms. Each spike evokes a postsynaptic potential defined by (22). The total m e m b r a n e potential of neuron i is
bli(t) -- l ] ( t - ti) + ~
Z
Wo~.o(t- ( ) )
OG
rl(t- ti) + wo Z eo(t- nat)
(23)
n=0
If neuron i has been quiescent in the recent past ( t - ~; ~ ~ ) , then the first term on the right-hand side of (23) can be neglected. The second term can be a p p r o x i m a t e d by an integral over s, hence
ui(t) ~ -wo ~ ]o"~ e 0 ( s ) d s - woJz At
_- 10 mV.
(24)
If the firing threshold of the neuron is at 8 = 20 m V the neuron stays quiescient.
W. Gerstner
490
N o w let us consider the same number of inputs, but fired coherently at t~) - 0, 100,200,... ms. Thus each presynaptic neuron fires as before at 10 Hz but all presynaptic neurons emit their spikes synchronously. Let us study what happens after the first volley of spikes has arrived at t = 0. The m e m b r a n e potential of the postsynaptic neuron is
ui(t) = r l ( t - ti) + Nwoao(t),
(25)
where N -- 100 is the number of presynaptic neurons. The m a x i m u m of (25) occurs at t = z = 10 ms and has a value of woNJ/e ,~ 37 mV which is above threshold. Thus the postsynaptic neuron fires before t = 10 ms. We conclude that the same numer of input spikes can have different effects depending on their level of coherence. In Fig. 13 we illustrate this effect for a simplified scenario of two groups of presynaptic neurons. Neurons within each group fire synchronously. In a there is a phase shift between the spikes of the two groups, whereas in b the two groups are synchronous.
3.1.7. Example: sliding threshold interpretation The simplified model SRM0 defined in (20) with the rl kernel defined in (21) allows us to give a reinterpretation of refractoriness as an increase of the firing threshold. To see how this works, let us introduce the input potential
hi(t ) ~ Zwij ~ ,)
jGFi
.~f)
F~o(t - tj~.)) q- ~o~
(26)
-c.~.j
We emphasize that hi depends on the input only. In particular, there is no dependence upon ti. With the above definition of hi, Eq. (20) is simply u i ( t ) r l 0 ( t - ti) + hi(t). The next spike occurs if ui(t) = ~) or
h,(t) = a-
n0(t-
(27)
i,).
We may consider ,9 - rl0(t - ti) as a dynamic threshold which is increased after each firing. Eq. (27) has a simple interpretation: The next firing occurs if the input po-
(b)
I
I ..
I
., I
I I
I
I
Fig. 13. Potential u of a postsynaptic neuron which receives input from two groups of presynaptic neurons: (a) Spike trains of the two groups are phase shifted with respect to each other. The total potential u does not reach the threshold. There are no output spikes: (b) Spikes from two presynaptic groups arrive synchronously. The summed EPSPs reach the threshold ~) and cause the generation of an output spike.
A framework for spiking neuron models." the spike response model
tential hi(t) reaches the dynamic threshold ~ - q 0 ( t - t i ) illustration.
491
[40]. See Fig. 14 for an
3.2. Background In contrast to the standard integrate-and-fire model which is usually stated in terms of differential equations, Eq. (15) is based on an 'integral' representation with response kernels. Eq. (! 5) is linear in the spikes and can be considered a starting point of a systematic expansion [21,19]. As we will see later in Section 6, nonlinear effects between pairs of spikes can be included by second-order kernels of the form ~,ijk(t- ti, t - ( I , t - t~)). Higher-order nonlinearities are treated similarly. Effects of earlier spikes of postsynaptic neurons can be treated by kernels q i ( t - ti, t - t~2)), rli(t- "ii,... , t - t}k)) where ti = t}1) is the last spike of n e u r o n / a n d t}k) is the kth spike counting backward in time. The approach by spike response kernels provides a link between simplified neuron models of the form (15) and multi-compartmental models [1-3] and presents an alternative to earlier approaches towards a reduction of Hodgkin-Huxley equations [4,34,32,41]. The remainder of the chapter is organized as follows. In the following section, the integrate-and-fire neuron without spatial structure (point neuron) is reviewed. It is shown that integration of the model leads to (15) or (20). Thus, the integrate-andfire model is a special case of the SRM. In Section 5, a spatially extended version of the integrate-and-fire model with linear dendritic tree is considered. It is shown that integration of the model leads again back to (15). In Section 6 the problem of nonlinearities during synaptic and dendritic transmission is discussed.
4. Integrate-and-fire model We start in the first section with a review of the integrate-and-fire neuron. In the following sections we will show that the integrate-and-fire neuron is a special case of the SRM defined in Section 3.
1.0 el-
0.0
0'
,
,
zoo
t [ms]
Fig. 14. Sliding threshold interpretation. The input potential h(t) (solid line) is generated by the superposition of the EPSPs (solid line) caused by presynaptic spikes. Each spike arrival is denoted by an arrow. An output spike occurs, if h(t) hits the dynamic threshold l~(t) (dashed line). At the moment of spiking the value of the threshold is increased by one. After the spike, the threshold decays exponentially back to its resting value ~1= 1.
492
W. Gerstner
4.1. Definition o f the basic model The basic circuit of an integrate-and-fire model consists of a capacitor C in parallel with a resistor R driven by a current I(t); see Fig. 15, dashed circle. The driving current may be split into two components, I(t) = IR + Ic. The first component is the resistive current IR, which passes through the linear resistor R. F r o m Ohm's law we obtain I R - - u / R , where u is the voltage applied at the resistor. The second component Ic charges the capacitor C. F r o m the definition of the capacity as C = q/u (where q is the charge and u is the voltage), we find a capacitive current Ic = Cdu/dt. Thus
u(t) du I(t) - ~ + C d---t"
(28)
We multiply (28) by R and introduce the time constant ~m = RC of the 'leaky integrator'. This yields the standard form du
T'm dt -- - u ( t ) + RI(t).
(29)
We refer to u as the membrane potential and to ~m as the membrane time constant of the neuron. In integrate-and-fire models the form of an action potential is not described explicitly. Spikes are reduced to formal events and fully characterized by a 'firing time' tg/. The firing time is defined by a threshold process
u(t) -- ~) ~
t = t (f).
(30)
from neuron j _~_
axon ~
8 (t-t (f)) J
I
~ ,'% ,,
synapse
: ~.
' ~(t-t~f)) .- . . . . . . . ..
soma
i ,,"~~_~(t) "",, "'-..
..'""
g (t-t !~ 1
Fig. 15. Schematic diagram of the integrate-and-fire model. The basic circuit is the module inside the dashed circle on the right-hand side. A current I(t) charges the RC circuit. The voltage u(t) across the capacitance (points) is compared to a threshold 0. If u(t) = 0 at time t~ ) an output pulse 8 ( t - t i (f)) is generated. Left part: a presynaptic spike 8 ( t - ( ) ) i s low-pass filtered at the synapse and generates an input current pulse a ( t - t]f)).
A framework for spiking neuron models: the spike response model
493
Immediately after t (f), the potential is reset to a new value Ur < 9, lim u(t (f) + 8) -- Ur.
840+
(31)
For t > t (f) the dynamics is again given by (29) until the next threshold crossing occurs. The combination of leaky integration (29) and reset (31) defines the basic integrate-and-fire model.
4.1.1. Example: constant stimulation and firing rates Before we continue with the definition of the integrate-and-fire model and its variants, let us study a simple example. Let us suppose that the integrate-and-fire neuron defined by (29)-(31) is stimulated by a constant input current I(t) = Io. To keep the mathematical steps as simple as possible we take the reset potential to be Urn0.
As a first step, let us calculate the time course of the membrane potential. We assume that a first spike has occurred at t = t (~ The trajectory of the membrane potential can be found by integrating (29) with the initial condition u(t (~ = Ur = O. The solution is
_ 0[1 exp(
,32)
The membrane potential approaches for t ---, oc the asymptotic value u(ec) = R/0. For R/0 < l) no further spike can occur. For R/0 > 8, the membrane potential reaches the threshold ,9 at time t (~), which can be found from the threshold condition u(t (1)) = 8 or
,33) Solving (33) for the time interval T = t (1) - t (0) yields T - ~mln R/0 R/0 - 8"
(34)
After the spike at t (1) the membrane potential is again reset to Ur = 0 and the integration process starts again. If the stimulus I0 remains constant, the following spike will occur after another interval of duration T. We conclude that for a constant input current I0, the integrate-and-fire neuron fires regularly with period T given by (34). We may define the mean firing rate of a neuron as v = 1/T. The firing rate of the integrate-and-fire model with stimulation I0 is therefore
v-
R/0 I)].1-1 [z,,lnRio_
In Fig. 16b the firing rate is plotted as a function of the constant input I0.
(35)
494
W. Gerstner
(a)
(b) 0.4 1.0
9
...
,
.
,
.
,
N m,,,--=
0
20 t[ms]
0.2
0.0
40
o
0.0
2.0
4.0
I,,
6.0
8.0
Fig. 16. (a) Time course of the membrane potential of an integrate-and-fire neuron driven by constant input current I0 = 1.5. The voltage u(t) is normalized by the value of the threshold ~) = 1. (Resistance R = 1 and membrane time constant rm = 10 ms.) (b) The firing rate v of an integrate-and-fire neuron without (solid) and with absolute refractoriness of ~abs -- 4 ms (dashed) as a function of a constant driving current I0. Current units normalized so that the current threshold is I0 -- 1. (Reset to Ur = 0.)
4.1.2. Example: time-dependent stimulus I(t) The results of the preceding example conditions. Let us suppose that a first lating current is I(t). We allow for an treated as an initial condition for the integration is
u(t)
=
u,.exp( ---~-m t-i ) + Clf'-i
can be generalized to arbitrary stimulation spike has occurred at i. F o r t > ~ the stimuarbitrary reset value Ur. The value Ur will be integration of (29). The formal result of the
exp
(r-~) I(t -
s)ds.
(36)
Eq. (36) describes the m e m b r a n e potential for t > i and is valid up to the m o m e n t of the next threshold crossing. If u(t) = ,9, the m e m b r a n e potential is reset to Ur and integration may restart; see Fig. 17.
4.1.3. Example." absolute refractoriness It is straightforward to include an absolute refractory period. After a spike at t (f), we force the m e m b r a n e potential to a value u = Ur and keep it there during a time ~abs. C u r r e n t I(t) which arrives during the interval [t~f),t (y) + 6 abs] has no effect and is disregarded. At t (/) + ~abs the integration of (29) is restarted with the initial value u = ur. The time interval 6 abs during which the neuron is insensitive to input is called the 'absolute refractory period'. The inclusion of an absolute refractory period does not cause any problems for the integration of the model equations. F o r example, we can solve the dynamics for a constant input current I0. If a first spike has occurred at t = t (~ then u(t) =_ Ur for t (~ < t < t (~ + ~abs and
,,,,
ol, -
for t > t (~ + ~abs.
-
+Urexp(t--t'O'-- abs)--
(37)
495
A framework for spiking neuron models: the spike response model
1.0
r 3
A . 4 . . ~
o 9
.
.
0
9
20
.
.
.
.
.
40
.
.
.
.
.
.
60
t [ms]
J
80
.
.
.
1 O0
Fig. 17. Voltage u(t) of an integrate-and-fire model (top) driven by the input current l(t) shown at the bottom. The input I(t) consists of a superposition of four sinusoidal components at randomly chosen frequencies plus a positive bias current I0 = 1.2 which drives the membrane potential towards the threshold. If R/0 > 9, the neuron will fire regularly. Due to the absolute refractory period the interval between firings is now longer by an a m o u n t ~abs compared to the value in (34). The mean firing rate v = 1/T is
R/0 -
"
(38)
The firing rate of the integrate-and-fire neuron as a function of the constant input current is plotted in Fig. 16b.
4.2. Stimulation by synaptic currents So far we have considered an isolated neuron which is stimulated by an applied current I(t). In a more realistic situation, the integrate-and-fire model would be part of a larger network. The input current I(t) is then generated by the activity of presynaptic neurons. In the framework of the integrate-and-fire model, we may assume that each presynaptic spike generates a synaptic current pulse of finite width. If the presynaptic neuron j has fired at t~f5, spike arrival at the synapse will evoke a current a ( t - tj(f)) f o r t > ( ) . Since several presynaptic neurons contribute to driving the neuron, the total input current to neuron i is
Ii(t)- Z cij ,jEFi
~ a(t- ()). o(f)E~j
(39)
W. Gerstner
496
The factor cij is a measure of the efficacy of the synapse with units of a charge. 1 (39) is a reasonable model of synaptic interaction. Indeed, each input spike arriving at a synapse opens some ion channels and evokes a current through the membrane of the postsynaptic neuron i. Reality is somewhat more complicated, however, since the amplitude of the synaptic input current may itself depend on the membrane voltage ui. In detailed models, each presynaptic action potential evokes a change in the synaptic conductance with standard time course g ( t - t(~)), where tCr) is the arrival time of the presynaptic pulse. The synaptic input current is modeled as
l i ( t - t (f)) = 9 ( t - t(f))[ui(t) -- Urev]-
(40)
The parameter Urev is called the reversal potential of the synapse. The level of the reversal potential depends on the type of synapse. For excitatory synapses, Urev is much larger than the resting potential. The synaptic current then shows saturation. The higher the voltage u;, the smaller the amplitude of the input current. The total input current is therefore not simply the sum of independent contributions. Nevertheless, since the reversal potential of excitatory synapses is usually significantly above the firing threshold, the factor [ui- Urev] is always some large number which varies only by a few percent. Systematic corrections to the current equation (39) will be derived in Section 6. For inhibitory synapses, the reversal potential is close to the resting potential. An action potential arriving at an inhibitory synapse pulls the membrane potential towards the reversal potential Urev which is close to Urest. Thus, if the neuron is at rest, inhibitory input hardly has any effect. If the membrane potential is instead considerably above the resting potential, then the same input has a strong inhibitory effect. This is sometimes described as the 'shunting' phenomenon of inhibition. The limitations of the current Eq. (39) will be discussed in Section 6. In the following we will always work with (39).
4.2.1. Example." pulse-coupling and a-function In this section we will give some examples of the synaptic current a(s) in (39). We start with the simplest choice. Spikes of a presynaptic neuron j are described as Dirac k-pulses which are fed directly into the postsynaptic neuron i. Thus a(s) = 8(s). The total input current to unit i is then
I i ( t ) - Z cij Z jEFi
8(t-(
))"
(41)
t(.l) j c.yT i
As before, the factor cij is a measure of the strength of the connection from j to i. In case of (41), c;j can be identified with the charge deposited on the capacitor C by a single presynaptic pulse of neuron j.
1 c/7 is, of course, proportional to the synaptic efficacy wij as we will see later on.
497
A frameworkfor spiking neuron models: the spike response model
More realistically, the synaptic current 0~ should have some finite width. In Fig. 15 we have sketched the situation where ~(s) consists of a simple exponentially decaying pulse
'exp( )
ors 0
and zero otherwise. Eq. (42) is a first approximation to the low-pass characteristics of a synapse. The exponential pulse (42) can be considered to be the result of some synaptic dynamics described by a first-order linear differential equation. Let us set d
(43)
jeri
t}f) c~j
Integration of the differential equation (43) yields (39) with a(s) given by (42). In (42) the synaptic current has a vanishing rise time which is not very realistic. More generally, we may assume a double exponential which sets in after a transmission delay Aax. For s < Aax we therefore have a(s) = 0. For s > A ax we set ~(s) = 1
"1;s -- -Or
[
exp
(
-
s - A ax) ~ "Cs
- exp
( --~AaX~]s "Cr ,]
.
(44)
Here Zs is a synaptic time constant in the millisecond range and 1;r with "Cr _~ Ts is a further time constant which describes the rise time of the synaptic current pulse. In the limit of Tr --+ "Cs, (44) yields (for s > A ax)
~(s)-s-Aax
----v--- exp
(
-
s-Aax~
.
(45)
"Cs ,/ In the literature, a function of the form x e x p ( - x ) such as (45) is often called an afunction. While this has motivated our choice of the symbol at for the synaptic input current, ~(.) in (39) may stand for any form of an input current pulse. As mentioned before, an yet more realistic description of the synaptic input current would include a reversal potential for the synapse as defined in (40). 4.3. Spike response method (2)." reset as current pulse The basic equation of the integrate-and-fire model, Eq. (29), is a linear differential equation. It can therefore be integrated in a straightforward manner. Due to the threshold and reset conditions, Eqs. (30) and (31), the integration process is not completely trivial. In fact, there are two different ways of integrating (29). The first one treats the reset as a current pulse, the second one as an initial condition. We discuss both methods in turn. In this subsection we focus on the first method and describe the reset as an additional current. Let us consider for the m o m e n t a short current pulse/out = -qS(t) applied to the RC circuit of Fig. 15. It removes a charge q from the capacitor C and lowers the
498
W. Gerstner
-q/C.
potential by an amount Au Thus a reset of the membrane potential from a value of u - 8 to a new value u = Ur corresponds to a negative current pulse which removes a charge q = C ( 8 - Ur). Such a reset takes place at the firing time t/(f). The total reset current is therefore
I?Ut(t)---C(~-Ur) Z t(.f) i EJr
~(t-ti(f)) '
(46)
i
where the sum runs over all firing times. Formally, we may add the output current (46) on the right-hand side of (29) T'm
dui
dt =
(47)
- u i ( t ) .qt_RIi(l) + R/OUt(t).
Here (48)
Ii(t) -- ~ Ciff y ~ O~(t- ~fl))-]-/text(t) jEFi t(f j )E,~ i
is the total input current to neuron i, generated either by presynaptic spike arrival or by external stimulation Iext(t). We note that (47) is a linear differential equation. Integration with initial condition u(-c~) = 0 yields
( s ) [I~
ui(t) -~l
exp -~mm
= Z tlf) E.~
+Z
--(~- Ur)exp - s 8 ( t -
/o Z
exp -~mm a ( t -
jGF~ t~f)E #, 9
+~
lfo~
s) + Ii(t- s)]ds ti~)- s)ds -
/7-.
( S)Iext(t-s)ds.
exp --~mm
(49)
Let us define for s > 0
(so)
q0(S) --" --(0- Ur)exp ( - r-~)
so(s)-
Y)dY
, (s) -~mm
~z0(s)-~exp
(sl) (52)
and r l 0 ( s ) - s 0 ( s ) - ~;0(s)- 0 for s < 0. With the above definitions, (49) may be rewritten in the form
A framework for spiking neuron models." the spike response model
u;(,)= Z
~lo('-t~f))+Zwij
t? )C~i
jcri
-Jr-
~_~ ~~
499
f))
o(f)cd,~j
~,O(s)Iext(t-s)ds.
(53)
with weights wii- cij/C. Eq. (53) is the main result of this subsection. The kernel q0(s) defined by (50) is shown in Fig. 18. We emphasize that the firing times t~ ) which appear implicitly in (49) have to be calculated as before from the threshold condition u~(t) = ,~. In [22], the 'SRM' has been defined by (53). Note that, in contrast to (15), we still have a sum over past spikes of neuron i on the right-hand side of (53).
4.3.1. Examples of eo-kernels If cx(s) is given by (42), then the integral on the right-hand side of (51) can be done and yields (s > 0) 1 -
(54)
[exp ( - ~-~) - exp ( - ~ / 1
1
This is the ~ kernel shown in Fig. 19a. If a(s) is the Dirac 8-function, then we find simply ~0(s) = exp(-s/~m) as shown in Fig. 19b. Note that 0~(s) is the synaptic current. Integration of the synaptic current yields the postsynaptic potential ~o(s). If the synapse is excitatory, e0 is called the EPSP. For an inhibitory synapse, ~0 describes the IPSP.
4.3.2. Short-term memory approximation To keep the discussion transparent, let us set
ui t -
t~i~i'~i
iext
._
0. Eq. (53) is then
(ss)
+
j~Fi (IE~;
On the right-hand side of (55), there is a sum over all past firings of neuron i which does not appear in (15), the equation we are aiming for. According to (50) the effect of the rl0-kernel decays with a time constant ~m. In realistic spike trains, the interval between two spikes is typically much longer than 0.0 /
/
-1.0 0
20 t [ms]
40
60
Fig. 18. The kernel 1"10 of the integrate-and-fire model with membrane time constant "1~m ~
10 ms.
500
W. Gerstner
(a)
(b)
0.1
@
0.1
@ 0.0 0.4
0 ,
.
20 .
.
.
40 .
.
60
.
0.0
0
20
40
60
0
20
40
60
0.4
0.2
0.0
0
~ 20 t [ms]
40
60
0.0
t [ms]
Fig. 19. (a) ~-kernel of the integrate-and-fire model (top) with exponential synaptic input current 0~ (bottom). (b) If the synaptic input is pulse-like (bottom), then the ~-kernel is a simple exponential. the membrane time constant ~m. Hence the sum over the q0 terms is usually dominated by the most recent firing time ti(f) < t of neuron i. We therefore make a truncation and neglect the effect of earlier spikes Z t i(11 E 9
r l o ( t - ti(f)) ~
r l o ( t - ti),
(56)
i
where ti is the last firing time of neuron i. The approximation (56) is good if the mean firing rate of the neuron is low, i.e., if the intervals between two spikes are much longer than Zm. Loosely speaking, we may say that the neuron remembers only its most recent firing. Equation (56) may therefore be called a 'short-term memory approximation' [22]. The final equation is (57) j E Fi
t (/) E.~j )
This is exactly the equation for the model SRM0, defined in (20). Note that we have kept, on the right-hand side of (57), the sum over all presynaptic firing times tj(f). Only the sum over the rl0's has been truncated. Equation (57) can be seen as an approximation to the integrate-and-fire model or else as a model in its own rights. The advantage of SRM0 is that many network results can be derived in a rather transparent manner [21-23]. Moreover questions of computation and coding with spiking neurons can be analyzed in the context of SRM0 [36,37,24]. 4.4. Spike response method (3)" reset as initial condition
In this section, we discuss a method of integration which gives a direct mapping of the integrate-and-fire model
501
A framework for spiking neuron models." the spike response model
du i
"Cm 8--7 z --bli(t) -~-R ~-~cij Z ~(t - t~f )) -Jr-Iext(t) jEFi t~.)E~j
(58)
to the SRM (15). As in (36) we integrate (58) from/i to t with u(/i) - Ur as an initial condition. The result is
u(t)
= Ur exp ( --t-ti).cm +Z ~ jEFi t~f)Eo~j
exp -~mm ~ ( t -
+~1 f0 t-ii exp (-S')lext(t-J)ds'-~m
-
)ds'
(59)
We may now define kernels
rl(t - ti) -- urexp(- t - ti) "Cm ,I
(60)
/0
(61)
a(t-
ti, s) -
exp
--~m a(s - s')ds'
s)~(t-ii-s) ~(t- ti, s) - -~1 exp ( -~mm
(62)
and the synaptic efficacy wij- cij/C. As usual, ~ ( x ) denotes the Heaviside step function which vanishes for x _< 0 and has a value of one for x > 0. The kernels (60)(61) allow us to rewrite (59) in the form
jEFi nt-
jo
~;(t
-
()E~j
t/, s-"ext (t )6
-
s)ds,
(63)
is identical to (15) except for some minor changes of notation. We emphasize that the rl-kernel defined in (60) is not the same as the one defined in (50). In particular, the rl-kernel (60) vanishes if Ur - 0 . which
4.4.1. Examples of e kernels In order to calculate the ~ kernels (61) and (62) explicitly, it is convenient to distinguish two cases. First we consider the case, that the last output spike occurred before presynaptic spike arrival (ti < ( ) ) . Therefore t - t i > t - ( ) - - s . Since a(s - s ~) vanishes for s - s' < 0 we may extend the upper boundary in (61) to infinity without introducing an error. Hence, for tj(f)> ti, we have ~(t-ti, t-t~f )) = ~o(t- t}f)) where t0 has been defined in (51).
w. Gerstner
502
The situation is different, if ~; > tJit, i.e., if the last output spike has occurred after presynaptic spike arrival. In this case only that part of the synaptic current which arrives after ti contributes to the present postsynaptic potential and exp
8(t -- ti, t -- tj(f)) --
- t- r
a({ - tj(f))d{.
(64)
"17m
To be specific, we take a(s) as defined in (42), viz.,
a(s) = "c-~l exp(-s/r,s)~(s). Let us set x -
(65)
t - ~;. The integration of (61) yields [21]
~(x,s) = 1 -' '~
([
exp
-
- exp
~(s)~,Ug(x - s)
-
Tm
+ exp ( _ s - ~ x ) [exp ( - - ~ ) -
exp(-~)]~'~(x)Jg(s-x)).
(66)
The Heaviside functions ~'~(x - s) in the first line of (66) picks out the case ( ) > ti or x > s. The second line contains the factor ~ , ~ ( s - x) and applies to the case t~r / < ti or x < s. See Fig. 20 for an illustration of the result.
4.4.2. Transformation of the ~ kernel W h a t is the relation between the ~ kernel derived in (61) and the e0 introduced in (51)? We will show in this p a r a g r a p h that =
-
exp( ) -
e0 (s - x)
(67)
holds. To see how this comes about we start from (61). We set x - t y = s - s' and find
i; and
0.2 O.1
O.O
~(o [ ~(040 .
t [ms]
80
Fig. 20. The kernel e ( t - ~, t - ( t ) as a function oft for two different situations. If tj(.f) > i, then e ( t - ~, t - t)r/) = e0(t- t~ )) is the standard EPSP (thick solid line). If t~ / < ~, the amplitude of the EPSP for t > ~is much smaller (thin solid line) since the time course of the EPSP is 'reset' to zero at t = i (marked by the long arrow). The time course for t < i is indicated by the dashed line.
A framework for spiking neuron models: the spike response model
G(X, s) --
503
fs ( )
exp -- s -- y a(y)dy
S--X
-
~m
exp -
a(y)dy ~m
exp ~
a(y)dy.
(68)
Tm
In the first term on the right-hand side of (68) we may transform back to the variable s' = s - y, in the second term we set s' = s - x - y. This yields G(x, s) -
/0
exp
~(s - s')ds'
-exp (-rf~)f0~ exp (- ~m)=(~- ~- s')d~'
(x)
= G0(s) - exp - ;c--s G0(s - x).
(69)
The last equality follows from the definition of G0 in (51). By a completely analogous sequence of transformations it is possible to show that
(x)
~z(x, s) - ~:o(s) - exp - ~m ~Zo(s - x).
(70)
The total postsynaptic potential hpsp defined in (17) can therefore be expressed via the input potential hi [23] hpsp(t[ti) - hi(t) - exp ( - t - ti)
(71)
As it should be expected, the reset at ti has an influence on the total postsynaptic potential. We emphasize that the expressions (69)-(71) hold for the integrate-andfire model only. For a general Hodgkin-Huxley type dynamics the transformations discussed in this paragraph would not be possible. 4.4.3. Relation between the two integration methods In order to better understand the relation between the two integration methods outlined in Sections 4.3 and 4.4, we compare the q-kernel in (60) with the q0-kernel defined in (50): q ( S ) - Ur exp (-- T-~) = r l 0 ( s ) + 9exp ( - - ~ ) . Hence with (71), the potential is
(72)
W. Gerstner
504
ui(t) : q ( t - ti) + hpsp(t ti) = qo(t_ ~i) + h(t) _ [h(~i) _ ~)]exp (
--
t--,lTm ti)
.
(73)
The truncation in (56) is therefore equivalent to neglecting the last term in (73).
4.5. Discussion The second of the two integration methods shows that it is possible to map the integrate-and-fire model exactly to the spike response equation (15). The disadvantage of that method is that the ~ kernels look somewhat more complicated. This is, however, no real drawback since the dynamics of a population of spiking neurons can be discussed for arbitrary response kernels 11 and e [21,23]. The integrate-and-fire model is therefore a special case in the general framework of the spike response model. With the first method of integration, the mapping of the integrate-and-fire model to (15) is only approximate. The approximation is good if the typical interspike interval is long compared to the membrane time constant ~m. The main advantage of the approximation is that the ~ kernels do not depend on the state of the postsynaptic neuron. Therefore, the input potential hi(t) can be nicely separated from the effects of reset and refractoriness; cf. (26) and (27). The resulting model SRM0 allows us to discuss dynamic effects in a transparent graphical manner; see, e.g., [22,24,36,37]. The basic integrate-and-fire model is, of course, a rather simple description of neuronal firing. In particular, the neuron has no spatial structure and firing is given by an explicit threshold condition. In the following section we will extend the framework (15) to neuron models with spatial structure.
5. Multi-compartment model 5.1. Definition of the model In this section, the integrate-and-fire model introduced in Section 4 is generalized in two respects. First, we allow for some spatial structure and consider a neuron consisting of several compartments. Second, we refine the reset procedure and include, at the somatic compartment, additional spike currents which generate an action potential. See the chapter of Segev and Meunier in this book for more detailed information about information processing on neuronal dendrites.
5.1.1. Linear dendritic tree We consider a model with n - 1 dendritic compartments 2 _< ~t _< n and a threshold unit at the soma (~t - 1); cf. Fig. 21. Membrane resistance and capacity are denoted by R~ and C ~, respectively. The longitudinal core resistance between compartment ~t and a neighboring compartment v is r uv. We assume a common time constant R~C~--% for all compartments 1 _< ~t _< n. The above specifications define the standard model of a linear dendrite [42].
A framework for spiking neuron models." the spike response model
505
rv~
R"
~r I~
__TC" .k
,,.<_i l ",a
y i
,
.1_
%
I l
2
3
9
Fig. 21. Compartmental neuron model. Dendritic compartments with membrane capacitance C o and resistance R~ are coupled by a longitudinal resistance r vo. Each compartment receives an input P'. The soma (g = 1) emits an output current pulse f~(t), if the membrane potential reaches the threshold 8. Each c o m p a r t m e n t 1 _< la _< n receives input I'(t) from some presynaptic neurons. At the soma, there is an additional current f~(t) due to action potential generation. The change of the membrane potential V~ of c o m p a r t m e n t la is d Vg
.
. V.g .
. V~ - Vv + I~(t) - 8~l~(t)
(74)
V
where the sum runs over all neighbors of c o m p a r t m e n t ~.
5.1.2. Synaptic input The input IO(t) in (74) is due to spikes of those presynaptic neurons with synapses on c o m p a r t m e n t la of neuron i. The set of these neurons is denoted by F ~. As before in (39), we assume that each spike evokes a current pulse of standard form o~(t- t~f)) with a(s) = 0 for s < 0. The amplitude of the current pulse is scaled by cij where i is the index of the postsynaptic neuron. The total input to compartment la of neuron i is
Ir
-
cij Z J~:-F~
(11
(75)
Y. )E~j
Choices for ~(s) have already been discussed in Section 4.2.
5.1.3. Spike currents Neuron i fires, if the somatic membrane potential V/.1(t) reaches a threshold 9. More precisely, a firing time t/(f) is defined by the conditions ~l(t,.(f))- 8 and ddt Vii(t/(f)) > 0. Each firing consists of a short current pulse y ( t - t~f)) at the soma [43,22,12]. The total 'output' current of neuron i is
506
W. Gerstner
Oi(t)-- ~ 7(t-t/(f)).
(76)
t~i ') E~i
Due to causality, ~/(s) vanishes for s < 0. The pulse ~,(s) describes the typical time course of sodium and potassium currents (and possibly calcium currents) during and after an action potential. Typically, ~,(s) is large and positive during the rise time of the spike and 3,(s) is nonpositive thereafter. In principle, 7(s) may also contain an additional phase of late depolarizing calcium currents. In general, the time course of sodium, potassium, and calcium currents depends on the stimulation before and after action potential generation. In the approximation of (76), this dependence is neglected and we assume a standard current pulse with identical time course for each firing. This is the central assumption of our approach. As we have seen in (46), the reset in integrateand-fire units is equivalent to the emission of a current pulse 3,(s) = -qS(s), where 8(s) is the Dirac 8-function.
5.1.4. Example of spike currents Let us study a specific example. We consider the case of two current sources (77)
7(s) = ~Na (s) + 1K (s), which contribute to the action potential. We take INa (s) = qN~ ~
17Na
exp
--
IK(S) = q K - - exp -17K
(78)
[1 -- exp(--Ts)]
(79)
with rNa = 0.1 ms, rK = 1 ms and 3' = 5/ms, qNa -- 100, qK = - - 1 3 3 . For the sake of simplicity, we consider the somatic compartment only. Integration of rmdu/dt = - u + RT(t) with initial condition u(0) = 8 with R -- 1, 8 --- 1, rm -- 10 ms yields the action potential shown in Fig. 22. Note that the current time constant rNa --0.1 ms and rK = 1 ms are extremely short. These are heuristic values which have been chosen so as to yield upon integration a nice shape similar to a real action potential; see [43,12] for related examples.
5.2. Spike response method (4) Equation (74) is a system of linear differential equations. It can be integrated either for a finite number of compartments [8,18] or in the continuum limit [8,17]. As initial conditions we take V~(-c~) - 0 for all compartments 1 _< bt < n. The result of the integration is of the form
v~(t) =
~
d s ' G ~ ( s ') [I~(t - s') - 8 ~ n ( t - s')].
(80)
A framework for spiking neuron models: the spike response model
507
10.0 5.0 0.0 0
Fig. 22.
5
10 t [ms]
15
20
Action potential. Integration of the integrate-and-fire model with a spike current 7(s) given by (77) yields the time course of an action potential.
An explicit expression for the Greens function G ~v(s) for arbitrary geometry can be found in [17,18]. We use Eqs. (76) and (75) and find
(81) t}f) E ~ i
V
jcF v
t!f) C~j J
with egv (s) - --~
lf0~
(82)
G ~v (s')ct(s - s ' ) d s '
if ~ G ~ ( s ' ) y ( s
q ~ ( s ) - --C--d
(83)
- s')ds'.
The kernel a,v (s) describes the effect of an input spike to compartment v as seen at compartment la. Similarly, q"(s) describes the response of compartment ~t to an output spike at the soma. To proceed further, we note that firing depends on the somatic membrane potential only. We define u i - V 1, rio(S)- vll(s) and, for j E F v, we set ~ i j - ~lv. This yields
.//,/- Z
+ Zw/ Z
t (f) C Y i /
J
(84)
t (f) C ~ j j
As in (56), we now make a short-term memory approximation and truncate the sum over the q-terms. The result is
(85) J
t(jf) E~j
w h e r e / / i s the last firing time of neuron i. Thus, the multi-compartment model has been reduced to the single-variable model of Eq. (20). The approximation is good, if the typical interspike interval is long compared to the neuronal time constants.
508
W. Gerstner
5.2.1. Improvement of the approximation Note that the steps taken in the previous paragraph correspond to the first of the two integration methods discussed in Section 4. The second method, which yields an improved mapping to the SRM, can be used as an alternative. In order to apply the second method, we must take care to correctly include the initial conditions V~(~i) for all compartments. Only at the soma, an explicit initial condition V I ( ~ i ) is available. In analogy to (69), we aim for an expression for the kernels ~(t- ti, t - t~[)) in terms of the kernel ~0. Let us start the integration of the voltage at the somatic compartment (ui = V 1) at time t - ti
ui(t) --
f'
j,;'
G 1! ( t - t')y(t' - ti)dt' + Z t CrJ< i i
/
G 1! ( t - {)T t'-t~
) dt'
i
jGI-,.
+ V' (ti)G 1' ( t - ti) + Z V~(ti)G'~(t- ti).
(86)
~>2
The last two terms on the right-hand side of (86) are the initial conditions for the compartment voltages. For the somatic compartment, we use V l (ti) - ,~. For la _> 2 we may formally use (81) evaluated at t - ti. The sum in the first line on the righthand side of (86) vanishes if the spike currents have stopped before the next spike o c c u r s - as it is trivially the case for a reset current y(s) - -qS(s). In the following we will therefore neglect these terms. We now define r l ( t - ti) - r l 0 ( t - ti) +
~GI1(t- li)
(87)
and, for j 6 Fv _
_
(88/
We use GlV(x + y) = ~-~ Gl~(x)G~V(y) in the Greens function in the second line of (86). With (87) and (88) we find after some calculation:
J
~>2
t (f) G.~j J
t tr)
In order to get a mapping to (15) we need to suppress the terms in the second line of (89). These are terms which describe the effect of previous output spikes of neuron i on the dendritic c o m p a r t m e n t v and which cause now for t > ti some feedback onto the soma.
509
A f r a m e w o r k f o r spiking neuron models: the spike response model
Note the close analogy between (88) and (69). The Greens function G 11 is the generalization of the exponential term in (69). Similarly, the q-kernel (87) is the generalization of (72). We emphasize that for a single-compartment model, the sum in the last line of (89) vanishes. The mapping between the integrate-and-fire model and the SRM (15) is then exact, as we have seen in Section 4.4. For a multicompartment model the mapping to (15) is not exact. The approximation derived in this paragraph is, however, better than the truncation that is necessary to get (85).
5.2.2. Example: two-compartment &tegrate-and-fire model We illustrate the SRM by a simple model with two compartments and a reset mechanism at the soma. The two compartments are characterized by a somatic capacitance C 1 and a dendritic capacitance C 2 = aC 1. The membrane time constant is z0 -- R 1 C 1 - R Z c 2 and the longitudinal time constant T12-- r 12cC1C2 1+c2. The neuron fires, if V 1 = ~). After each firing the somatic potential is reset to V 1 = Ur. This is equivalent to a current pulse 7(s) = -qS(s),
(90)
where q = C 1[ 8 - Ur] is the charge lost during the spike. The dendrite receives spike trains from other neurons j and we assume that each spike evokes a current pulse ~(t - t}f)) with time course ~ ( s ) - 1 exp(--Zs - ~ s ) "
(91)
For the two-compartment model it is straightforward to integrate the equations and derive the response kernels q(s) and ~(s); cf. [8,18,44]. We find q0(s)-
(~--Ur l + a ) eXp (~00)
1
[ l + a e x p ( - T-~2)I
( 0)E1~ Z s ~ ,
t0(s) - (1 + a) exp -
1 - e -82s] -- exp --
Zs~2 J
(92)
with 81 - T s 1 - TO1 and 82 = Zs 1 - Zo 1 - zi-21. In Fig. 23 we show the two response kernels for the parameters z0 = 10 ms, z12 = 2 ms, and a = 10. The synaptic time constant is Zs = 1 ms. The kernel t0(s) describes the voltage response of the soma to an input at the dendrite. It shows the typical time course of an excitatory or inhibitory postsynaptic potential. The time course of the kernel q(s) is a double exponential and reflects the dynamics of the reset in a two-compartment model. In Fig. 23a, the moment of spike firing at t = 0 has been marked by a vertical bar for the sake of better visibility. 6. E x t e n s i o n s a n d d i s c u s s i o n
The strict application of the spike response method requires a system of linear differential equations combined with a threshold p r o c e s s - such as in the integrate-
510
W. Gerstner
(a)
(b) 0.5
0.5 I,d
0.0
o.o
oo
,
.
.
i
.
.
.
.
i
.
200
tiros]
.
.
.
i
.
30.0
.
.
.
,,o.o
.
.
i
o.o
,
,
2o:o
t[msl
30.0
40.0
Fig. 23. Two-compartment integrate-and-fire model. (a) Response kernel Tl(s) of a neuron with two compartments and a fire-and-reset threshold dynamics. The response kernel is a double exponential with time constants ~ 2 - 2 ms and c 0 - 10 ms. The spike at s - 0 is indicated by a vertical dash. (b) Response kernel e(s) for excitatory synaptic input at the dendritic compartment with a synaptic time constant Xs - 1 ms. The response kernel exhibits the typical time course of an excitatory postsynaptic potential. (y-axis: voltage in arbitrary units). and-fire model in Section 4. Naturally the question arises how well real neurons fit into this framework. As an example of a more complicated neuron model we have discussed the effects of a linear dendritic tree. We have also seen in Section 2 that, for the Hodgkin-Huxley model, spike generation can be replaced approximatively by a threshold process. In this section we want to continue our discussion and hint to possible extensions and modifications. To check the validity of the approach, we discuss the two basic assumptions, viz., threshold process and linearity.
6.1. Threshold process The dynamics of spike generation can be described by nonlinear differential equations of the type proposed by Hodgkin and Huxley [14]. Spikes are generated by a voltage-instability of the conductivity. Since the opening and closing of Na and K channels are described by three variables with three different time constants, the threshold depends not only on the present voltage, but also on the voltage in the recent past. In other words, there is no sharp voltage threshold [15,25]. This is most easily seen in a scenario with arbitrary time-dependent input. Let us suppose that, for some ion-based neuron model, there exists a voltage threshold ,~. Even if the potential were already slightly above the formal threshold, there could arrive, in the next moment, a very strong inhibitory current which pulls the potential back below threshold. Thus spiking could still be stopped even though the action potential was already initiated. This consideration points to the general limitations of the threshold concept in the context of time-dependent stimulation. Strictly speaking there can be neither a voltage nor a current threshold if we allow for arbitrary input. Nevertheless, an improvement over the simple voltage threshold is possible. The spike response method does not rely on a specific interpretation of the variable u(t). In principle, it can be any relevant variable, e.g., a current [45,25], a voltage, or some combination of current and voltage variables. To be specific, we may take
A framework for spiking neuron models: the spike response model
u(t) --
f (s) V 1 (t - s ) d s
~0~176
--
f
511
(93)
9 V 1,
where V 1 is the voltage at the soma and f some linear filter with normalization f o f ( s ) d s - 1. Since everything is linear, the response kernels derived in the preceding sections can be transformed ~ ~ f 9a and 11 ~ f * 11 and we are back to the standard form (15). Application of the linear operator f on the voltage u in (8) before it is passed through a threshold would, for example, allow us to match the boundaries in the phase diagram of Fig. 5c more closely to that of the HodgkinHuxley model in Fig. 5b. We emphasize that the formal threshold is constant in our approach. A dynamic threshold ~)(t) which is increased after each spike [40] may always be treated as an additional contribution to the response kernel rl(s) as discussed in (27).
6.2. Adaptation In all the discussion above we have assumed that only the last output spike of the neuron is relevant. This is, of course, an over-simplification of reality. For most neurons, adaptation plays an important role. If a constant input is switched on at to, the interspike interval between the first and second spikes is usually shorter than the one between the 10th and 1 lth. How can adaptation be included in the above framework? One possibility is to make a systematic expansion so as to include the effect of earlier spikes 1
(1)
jGFi
_t~l)
,, _ ( ) )
()E•j
JEFi
~f) Gff j
(94) Here t}l) is the most recent firing of neuron i, t}2) the second last firing, and so forth. If too many terms are necessary, then the approach outlined in (94) is not very handy. On the other hand, we may assume that the major contribution comes from the term rl(Z), 11(3),... and neglect the terms g (2) . . . . Moreover, we may assume, for the sake of simplicity, that rl~/l) = r~i(2) = r l i (3) , . . TI. In fact, adaptation and even bursting can quickly be incorporated if we use a description of the form (1) ( t - t ~ l ) t - ( t~) E~'i
SEI'i
)
()Eo~j
For the kernels 11 we may choose a time-course with a long-lasting contribution which could arise due to, e.g., slow calcium-dynamics [22]. An example is shown in Fig. 24. The neuron is driven with a constant input current. The rl(s)-kernel has a phase of after-depolarization. As a result, a first spike at s = 0 makes a second spike around s ~ 5 ms more likely. A late phase of hyperpolarization in the rl-kernel turns
512
W. Gerstner (a)
(b)
0,5
0.5
g-
> 0.0
0,0 0.0
Fig. 24.
10.0
20.0
rims]
30.0
40.0
0.0
100.0
t[ms]
200.0
300.0
Bursting neuron. Constant stimulation of a neuron model with the q-kernel shown in (a) generates the spike train in (b). Taken from [22].
firing off after a couple of spikes. The results is, for constant input, a bursting behavior as in Fig. 24b.
6.3. Nonlinearities We can distinguish at least three types on nonlinearities of neuronal dynamics. First, there is the nonlinear dynamics of spike generation. These nonlinearities are replaced by the output current ~/(s) which is trigerred by a threshold process as explained above. Second there are shunting effects on the dendrite due to the ion reversal potential, and finally there are potential sources of active currents on the dendrite. The last issue has been a subject of intensive discussion recently. There are indications for dendritic spikes [46], but it is unclear whether this is a generic feature of all neurons. In our approach, all active dendritic currents are neglected. In the following we concentrate on the influence of the reversal potential. In Sections 4 and 5 we have assumed that each input induces a standard current pulse 0~(t- t~)). In more detailed models, however, the input current is due to a conductivity change g(t - t / ) ) at the synapse and the amplitude of the current depends on the present value of the m e m b r a n e potential; see Section 4.2. Specifically, in the context of a compartmental neuron model, the input to c o m p a r t m e n t ~t is
Ila(t)- ~
Z [UrevJ EF~ t(f)E._~j i
Vla(t)]g(t- ( ) ) ,
(96)
where Urev is the reversal potential and wij - 1 for the sake of simplicity. For a further analysis of (96) we write U~ev- V~ - (U~ev -- 12) -- (V ~ - l?) and set ~(s) - (U~ev - V)g(s). For the potential 12 we take some appropriate value between the equilibrium potential V0 and the threshold 8, e.g., V - ( 8 - V0)/2. This yields
jEP
f
Urev- V
]
As long as IV~ - V[ << [Urev- V[, the second term can be treated as a small perturbation.
A frameworkfor spiking neuron models."the spike responsemodel
513
Except for periods of very strong excitation or inhibition, the compartmental voltages stay roughly in the range between V0 and ~1. Let us look at some values. The threshold 11 is about 10-30 mV above resting potential. The reversal potential of excitatory synapses is more than 50 mV above threshold. The second term in (97) yields therefore a small correction only. For inhibitory synapses, the reversal potential is approximately 20 mV below resting potential, and the prefactor of the second term in (97) is again small. Thus we are allowed to make a perturbation expansion. 2 To proceed with the perturbation expansion, we replace V~ on the right-hand side of (97) by the expression given in (53) and introduce u = V 1(t). This yields
u(t) - Z 11( t - t~f)) + Zwij~ij( t - t~f)) f j,f -4-~ [~j
-~-ZWijWikEijk(t-t~f)j,k
,
,
.
(98) The second-order kernels are for inputs j E F" and k E F a
Eijk(t __ t(jf)' t __ t~f')) _ CoC,(E1 _ (I) / as G'"(s)cx(t - t~f) - s)
• [/
t 't -
1
rlJ( t - t~f ) ' t - ti(f')) -- CoC.(E - (z)
/
-
- P]
(99)
dsGlV(s) cz(t- t~f) - s )
Nonlinear effects at the synapses, e.g., due to calcium influx or Magnesium block removal, can be treated similarly. The role of nonlinearities for dendritic computation is discussed in [47,48].
6.4. Conclusions We have presented a framework for a systematic analysis of spiking neuron models in terms of spike response kernels. As long as there are no active currents in the dendrite, the linear kernels dominate the expansion. We have demonstrated that an approach with linear response kernels is flexible and allows a phenomenological description of various types of neuron behavior including Hodgkin-Huxley dynamics. Previously, collective network states have been analyzed and stability criteria in terms of the response kernels have been derived [20-22,38,23]. Moreover the
2
For shunting inhibition, the reversal potential would be close to the resting potential and the expansion (98) is not helpful.
514
w. Gerstner
c o m p u t a t i o n a l complexity o f n e t w o r k s o f spiking n e u r o n s has been analyzed in the f r a m e w o r k o f the S R M [36,37]. The present p a p e r provides a link between detailed models o f n e u r o n a l dynamics a n d the simplified models a p p r o p r i a t e for analytical n e t w o r k studies. Abbreviations
C, C a p a c i t o r C1, Chlorine cm, centimeter gF, micro F a r a d EPSP, Excitatory Post Synaptic Potential G, G r e e n s function Hz, Herz IPSP, Inhibitory Post Synaptic Potential K, P o t a s s i u m m, meter ms, millisecond mS, (milli O h m ) -1 mV, milli Volt Na, S o d i u m PSP, Post Synaptic Potential R, Resistance S R M , Spike R e s p o n s e M o d e l SRM0, Simplified version o f S R M t, time V, voltage
References 1. Yamada, W.M., Koch, C. and Adams, P.R. (1989) Multiple channels and calcium dynamics, in: Methods in Neuronal Modeling, From Synapses to Networks, eds C. Koch and I. Segev. MIT Press, Cambridge. 2. Traub, R.D., Wong, R.K.S., Miles, R. and Michelson, H. (1991) A model of a CA3 hippocampal pyramidal neuron incorporating voltage-clamp data on intrinsic conductances. J. Neurophysiol. 66, 635-650. 3. Bower, J.M. and Beeman, D. (1995) The Book of Genesis. Springer, New York. 4. Abbott, L.F. and Kepler, T.B. (1990) Model neurons: from Hodgkin-Huxley to Hopfield. in: Statistical Mechanics of Neural Networks, ed L. Garrido. Springer, Berlin. 5. Abbott, L.F. (1991) Realistic synaptic inputs for model neural networks. Network, 2, 245-258. 6. Lapicque, L. (1907) Recherches quantitatives sur l'excitation electrique des nerfs trait& comme une polarization. J. Physiol. Pathol. Gen. 9, 620-635; in: Tuckwell, H.C. (1988) Introduction to Theoretic Neurobiology. Cambridge University Press, Cambridge. 7. Stein, R.B. (1967) The information capacity of nerve cells using a frequency code. Biophys. J. 7, 797-826. 8. Tuckwell, H.C. (1988) Introduction to Theoretic Neurobiology, Vol. 1. Cambridge University Press, Cambridge.
A framework for spik&g neuron models: the spike response model
515
9. Mirollo, R.E. and Strogatz, S.H. (1990) Synchronization of pulse coupled biological oscillators. SIAM J. Appl. Math. 50, 1645-1662. 10. Abbott, L.F. and van Vreeswijk, C. (1993) Asynchronous states in a network of pulse-coupled oscillators. Phys. Rev. E 48, 1483-1490. 11. Treves, A. (1993) Mean-field analysis of neuronal spike dynamics. Network 4, 259-284. 12. Somers, D.C., Nelson, S.B. and Sur, M. (1995) An emergent model of orientation selectivity in cat visual cortical simple cells. J. Neurosci. 15, 5448-5465. 13. Brunel, N. and Hakim, V. (1999) Fast global oscillations in networks of integrate-and-fire neurons with low firing rates. Neural Comput. 11, 1621-1671. 14. Hodgkin, A.L. and Huxley, A.F. (1952) A quantitative description of ion currents and its applications to conduction and excitation in nerve membranes. J. Physiol. (London) 117, 500-544. 15. Rinzel, J. and Bart Ermentrout, G. (1989) Analysis of neural excitability and oscillations, in: Methods in Neuronal Modeling, eds C. Koch and I. Segev. pp. 135-169, MIT Press, Cambridge. 16. Rall, W. (1964) Theoretical significance of dendritic trees for neuronal input-output relations, in: Neural Theory and Modeling, ed R.F. Reiss. pp. 73-97, Stanford University Press, Stanford CA. 17. Abbott, L.F., Fahri, E. and Gutmann, S. (1991) The path integral for dendritic trees. Biol. Cybern. 66, 49-60. 18. Paul C. Bressloff and John G. Taylor. (1994) Dynamics of compartmental model neurons. Neural Networks 7, 1153-1165. 19. Kistler, W.M., Gerstner, W. and Leo van Hemmen, J. (1997) Reduction of Hodgkin-Huxley equations to a single-variable threshold model. Neural Comput. 9, 1015-1045. 20. Gerstner, W. and van Hemmen, J.L. (1993) Coherence and incoherence in a globally coupled ensemble of pulse emitting units. Phys. Rev. Lett. 71(3), 312-315. 21. Gerstner, W. (1995) Time structure of the activity in neural network models. Phys. Rev. E 51(1), 738-758. 22. Gerstner, W., van Hemmen, J.L. and Cowan, J.D. (1996) What matters in neuronal locking. Neural Comput. 8, 1689-1712. 23. Gerstner, W. (2000) Population dynamics of spiking neurons: fast transients, asynchronous states and locking. Neural Comput. 12, 43-89. 24. Gerstner, W. (1998) Spiking neurons, in: Pulsed Neural Networks, Chapter 1, eds W. Maass and C.M. Bishop. pp. 3-53, MIT Press, Cambridge. 25. Koch, C., Bernander, O. and Douglas, R.J. (1995) Do neurons have a voltage or a current threshold for action potential initiation? J. Comput. Neurosci. 2, 63-82. 26. Jack, J.J.B., Noble, D. and Tsien, R.W. (1975) Electric Current Flow in Excitable Cells. Clarendon Press, Oxford. 27. Bernander, 0., Douglas, R.J., Martin, K.A.C. and Koch, C. (1991) Synaptic background activity influences spatiotemporal integration in single pyramidal cells. Proc. Natl. Acad. Sci. USA, 88, 11569-11573. 28. Bush, P.C. and Douglas, R.J. (1991) Synchronization of bursting action potential discharge in a model network of neocortical neurons. Neural Comput. 3, 19-30. 29. Ekeberg, O., Wallen, P., Lansner, A., Traven, H., Brodin, L. and Grillner, S. (1991) A computer based model for realistic simulations of neural networks. Biol. Cybern. 65, 81-90. 30. Rapp, M., Yarom, Y. and Segev, I. (1992) The impact of parallel fiber background activity pn the cable properties of cerebellar purkinje cells. Neural Comput. 4, 518-533. 31. Wilson, M.A., Bhalla, U.S., Uhley, J.D. and Bower, J.M. (1989) Genesis: A system for simulating neural networks, in: Advances in Neural Information Processing Systems, ed D.Touretzky, pp. 485492, Morgan Kaufmann Publishers, San Mateo CA. 32. FitzHugh, R. (1961) Impulses and physiological states in models of nerve membrane. Biophys. J. 1, 445-466. 33. Nagumo, J., Arimoto, S. and Yoshizawa, S. (1962) An active pulse transmission line simulating nerve axon. Proc. IRE 50, 2061-2070. 34. Rinzel, J. (1985) Excitation dynamics: insights from simplified membrane models. Theor. Trends Neurosci. Federation Proc. 44(15), 2944-2946.
516
W. Gerstner
35. Gerstner, W. and van Hemmen, J.L. (1992) Associative memory in a network of 'spiking' neurons. Network 3, 139-164. 36. Wolfgang Maass. (1996) Lower bounds for the computational power of spiking neurons. Neural Comput. 8, 1-40. 37. Maass, W. (1988) Computing with spiking neurons, in: Pulsed Neural Networks, Chapter2, eds W. Maass and C.M. Bishop. pp. 55-85, MIT Press, Cambridge. 38. Chow, C.C. (1998) Phase-locking in weakly heterogeneous neuronal networks. Physica D 118, 343-370. 39. Stevens, C.F. and Zador, A.M. (1998) Novel integrate-and-fire like model of repetitive firing in cortical neurons, in: Proceedings of the fifth Joint Symposium on Neural Computation. http:// www.sloan.salle.edu/nzados 40. MacGregor, R.J. and Oliver, R.M. (1974) A model for repetitive firing in neurons. Kybernetik 16, 53-64. 41. Thomas B. Kepler, Abbott, L.F. and Marder, E. (1992) Reduction of conductance-based neuron models. Biol. Cybern. 66, 381-387. 42. Rail, W. (1989) Cable theory for dendritic neurons, in: Methods in Neuronal Modeling, eds C. Koch and I. Segev. pp. 9-62, MIT Press, Cambridge. 43. Abeles, M. (1991) Corticonics. Cambridge University Press, Cambridge. 44. Rospars, J.P. and Lansky, P. (1993) Stochastic model neuron without resetting of dendritic potential: application to the olfactory system. Biol. Cybern. 69, 283-294. 45. Rotter, S. (1994) Wechselwirkende stochastische Punktprozesse als Modell ffir neuronale Aktivit~it im Neocortex der S/iugetiere, Reihe Physik. Vol. 21. Harri Deutsch, Frankfurt. 46. Stuart, G.J. and Sakmann, B. (1994) Active propagation of somatic action potentials into neocortical pyramidal cell dendrites. Nature 367, 69-72. 47. Bartlett W. Mel. (1994) Information processing in dendritic trees. Neural Cornput. 6, 1031-1085. 48. Koch, C. (1997) Biophysics of Computation. Oxford University Press, New York, Oxford.
C H A P T E R 13
An Introduction to Stochastic Neural Networks
H.J. K A P P E N SNN University of Nijmegen, Geert Grooteplein Noord 21, 6525 E Z Nijmegen, The Netherlands
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
517
Contents 1.
Introduction
2.
Stochastic b i n a r y n e u r o n s
3.
Stochastic n e t w o r k d y n a m i c s
4.
5.
.................................................
519
.........................................
521
.......................................
3.1.
Parallel d y n a m i c s : Little m o d e l
3.2.
Sequential d y n a m i c s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
525
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.........................................
.
.
.
.
.
525 526
4.1.
Eigenvalue spectrum of T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
526
4.2.
Ergodicity a n d ergodicity b r e a k i n g
527
................................
Boltzmann-Gibbs distributions ......................................
531
5.1. T h e s t a t i o n a r y d i s t r i b u t i o n
531
C o m p u t i n g statistics
5.3. T h e cavity m e t h o d
.....................................
.........................................
532
..........................................
533
5.4. Q u e n c h e d a v e r a g e solution for the S K m o d e l . . . . . . . . . . . . . . . . . . . . . . . . . . Asymmetric networks
...........................................
6.2.
M e a n field t h e o r y in the a b s e n c e o f detailed b a l a n c e
L e a r n i n g in neural n e t w o r k s
534 536
6.1. T h e differences b e t w e e n s y m m e t r i c a n d a s y m m e t r i c n e t w o r k s 7.
.
525
S o m e p r o p e r t i e s o f M a r k o v processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.
6.
.
................
.....................
.......................................
536 538 542
7.1.
Attractor neural networks ......................................
542
7.2.
Boltzmann machines
544
7.3.
Classification o f digits
Abbreviations
.........................................
Acknowledgements
.............................................
A p p e n d i x A. T A P e q u a t i o n s References
........................................
................................................. .......................................
.....................................................
518
547 550 550 550 551
I. Introduction
How does the brain compute? Particularly in the last 100 years we have gathered an enormous amount of experimental findings that shed some light on this question. The picture that has emerged is that the neuron is the central computing element of the brain which performs a nonlinear input to output mapping between its synaptic inputs and its spiky output. The neurons are connected by synaptic junctions, thus forming a neural network. A central question is how such a neural network implements brain functions such as vision, audition and motor control. These questions are to a certain extent premature, because our knowledge of the functioning of the neuron and the synaptic process itself is only partial and much remains to be discovered. Nevertheless, it is interesting to see what emergent behavior arises in a network of very simple neurons. The pioneering work in this direction was done by McCulloch and Pitts [1] in the 1940s. Taking the thresholding property of neurons to the extreme, they proposed that neurons perform logical operations on their inputs, such as A N D and OR. One can show that a network of such neurons, when properly wired, can perform any logical function and is equivalent to a Turing machine. When considering neural networks, an important distinction is between feedforward networks and recurrent networks. In feed-forward networks, the neurons can be labeled such that each neuron only receives input from neurons with lower label. Thus, one can identify input neurons, which receive no input from other neurons and whose activity depends only on the sensory stimulus, and output neurons whose output does not affect other neurons. When in addition the neurons themselves are assumed to have no internal dynamics, the dynamics of feed-forward networks is trivial in the sense that the output is a time-independent function of the input: y(t) = F(x(t)), where F is a concatenation of the individual neuron transfer functions and x and y are input and output activities, respectively. Examples of such networks are the perceptron [2] and the multi-layered perceptron [3,4]. In recurrent networks one typically defines a subset of neurons as input neurons and another subset as output neurons. Even when individual neurons have no internal dynamics, the network as a whole does, and the input-output mapping depends explicitly on time: y(t) - F(x(t), t), Examples of such networks are attractor neural networks [5], topological maps [6] (see chapter by Flanagan in this book), sequence generators [7] and Boltzmann Machines [8]. Unlike the logical McCullock-Pitts neurons, real neurons are noisy and the output of the neuron is a probabilistic function of its input. The dynamics of a network of such neurons is characterized by transient and stationary behavior. The stationary behavior of the network is obtained for large time when the input to the 519
520
H.J. Kappen
network is time independent (or when it is described by a time-independent probability distribution). This behavior is then described in terms of a time-independent probability distribution over the states of the network. The transient behavior is described by the characteristic time(s) to approach stationarity and by its dependence on initial values. In Section 2 we begin with a very brief description of the behavior of the biological neuron and some properties of the synapses and discuss under which assumptions the description by a probabilistic binary threshold device is appropriate. In Section 3 we discuss stochastic neural networks with parallel and sequential dynamics. This dynamics is given by a Markov process, and in Section 4 we discuss some of the properties of Markov processes, such as ergodicity and periodicity. An exact description of transient and stationary behavior for stochastic neural networks is not possible in general. In some special cases, however, one can compute the generic behavior of stochastic networks using mean field (MF) theory. One averages over many random instances of the network (quenched average) and describes the properties of the network with a small number of order parameters. The classical example is the attractor neural network, as proposed by Hopfield [5]. The MF analysis was presented in a series of papers by Amit et al. [9,10]. Due to the symmetric connectivity of the Hebb rule, the asymptotic behavior of the network can be computed in closed form. The patterns that are stored with the Hebb rule become stable attractors of the dynamics when the number of patterns is sufficiently small and the noise in the dynamics is sufficiently low. Thus the network operates as a distributed memory. When the noise is too high, all attractors become unstable and the firing of the neurons becomes more or less uncorrelated (paramagnetic phase). When the number of patterns is too large, the network behaves as a spin glass whose minima are uncorrelated with the stored patterns. In Section 5 we will introduce the quenched average approach for a simpler problem, the SherringtonKirkpatrick (SK) model. It will show us the generic behavior that can be expected from symmetrically connected neural networks. For a more thorough treatment of this topic see the chapters by Coolen in this volume. Clearly, biological neural networks do not have symmetric connectivity. For nonsymmetric networks the theoretical analysis is much harder and fewer results are known. Most of the results have been obtained with numerical simulations. It appears that when a sufficient amount of asymmetry is introduced, the network dynamics is dominated by periodic orbits of exponential length. Thus asymmetric networks are radically different from symmetric networks. The differences between symmetric and asymmetric networks are discussed in Section 6.1. In many instances we are not satisfied with the generic behavior of networks as given by the quenched average approach, but we would like to say something about one individual network. An example is when we consider learning. It has been wellestablished experimentally, that synapses change their strength as a function of the firing of the pre- and postsynaptic neuron. In order to compute these changes, one needs estimates of the mean firing rates and the correlations of the pre- and postsynaptic neuron. In the case of symmetric connectivity, this approach was pioneered by Hinton with the introduction of Boltzmann Machines [8]. Due to the
An &troduction to stochastic neural networks
521
&tractability of the Boltzmann Machine learning rule, it has not been used widely. In Section 6.2 we therefore consider a form of M F theory that does not involve the quenched average. We derive M F approximations for the mean firing rates and the correlations for stochastic networks with arbitrary connectivity. A drawback of this approach is that it is only valid for small values of the weights. However, as we will see in Section 2, due to their noisiness synapses are expected to be small. Subsequently, we will discuss learning in stochastic networks in Section 7. We briefly discuss Hebbian learning in the attractor neural network, as proposed by Hopfield. Then, we discuss the Boltzmann Machine proposed by Hinton. We show that learning in Boltzmann Machines is intractable and how MF theory can be applied to obtain fast learning algorithms. We illustrate Boltzmann Machine learning on a digit classification task. 2. Stochastic binary neurons
The effect of a presynaptic spike on the postsynaptic neuron is a local change in the membrane potential. This change can be either positive or negative and is called the excitatory or inhibitory postsynaptic potential (PSP). The PSP is of the order of 0.05-2 mV [11] and is a stochastic event [12]: it either happens or it does not. The probability of the PSP is experimentally observed anywhere between 0.1 and 0.9 (see [13] and references there) and depends also on recent pre- and postsynaptic cell activity [14,15]. How these local changes in the membrane potential at synaptic junctions contribute to the spike generation process can be computed by compartmental modeling of the geometry of the cell. The dynamics is a complex spatio-temporal process involving many thousand synaptic inputs which are distributed over the dendrites and soma. A main complicating factor is that such simulations require the setting of many parameter values, many of which are not experimentally accessible. The general picture that emerges is, however, that the local PSP at the synapse propagates to the cell body with a delay of 1-2 ms and shows a temporal dispersion of about 10 ms. In addition, the dendrite acts as a low pass filter. It attenuates the frequency components of the PSP below 50 Hz by a factor of 2-4 depending on the frequency of stimulation and location of the synapse on the dendrite and effectively blocks all high frequency components [16]. In order to study the behavior of networks of neurons we may try to find a more compact description of a neuron which ignores its internal details but retains some of its input-output behavior. Let us define the synaptic response function Wij(t) as the temporal response of a presynaptic spike of neuron j on the membrane potential of the soma of neuron i. This function incorporates the effects of delay, attenuation and dispersion mentioned above. This response occurs with probability pij. We describe the activity of neuron j as a train of spikes with each spike a delta peak:
k
H.J. Kappen
522
where t/, k = 1,... are the times at which neuron j fires. We assume that the PSPs from different synapses combine linearly and therefore the soma potential is given by
S'
vi(t) -- ~j.
~ dt'Wij(t - t')xj(t')
= Z
Wij(t - tJk).
(1)
j,k
This potential is to be compared with the threshold | If vi(t) exceeds the threshold, neuron i emits a spike and is forced to remain quiet during the subsequent refractory period zr (2-4 ms). We approximate the neuron dynamics described above, by assuming that the maximal firing rate of the neurons is lower than 1/z, with z ~ 10 ms the characteristic width of W0-(t). In this case, the presynaptic neuron j is likely to fire zero or one time and unlikely to fire more than one time in the period [t - z, t]. Indeed, when the spikes from the presynaptic neuron are given by a Poisson process with mean firing rate f , the probability that it fires exactly k times in the period [ t - z, t] is given by Pt,(*) -- (f~)k e --f~ When f ~ << 1, it is easy to verify that the probability of the neuron to fire more than one time Y~'~=2pk(~) -O(f2~2) and will be ignored. We associate the binary variable yj(t) = 0, 1 with the firing of neuron j in the following way:
yj(t) = 1 ~ neuron j fires in [ t - z, t]. We discretize time in chunks of length , and thus at any time t, the state of a network of n neurons is described by the vector y(t) = (yl (t),... ,y,(t)). In addition, we assume that Wij(t) is block shaped: W,7(t) = W,7 for 0 < t < 9 and zero otherwise. Thus, the potential becomes
vi(t) -- Z
Wijyj(t). J
It is well known experimentally, that the PSPs Wij do not give the same response every time the presynaptic neuron fires. In fact, the synaptic processes are very noisy and give largely varying postsynaptic responses [12]. We will therefore consider the W/j as independent stochastic variables. Let ~# denote their mean value and eye.- ((1 -pij)/pij)W~.~ their variance. Since the membrane potential consists of a typically large sum of PSPs, it becomes a Gaussian variable with mean and variance given by
vi(t) - Z
Wijyj(t), J
(2) J
523
An &troduction to stochastic neural networks
The neuron fires, when the postsynaptic potential exceeds a threshold Oi. Therefore, the probability of a postsynaptic spike is given by 1
p(yi(t + z) - l y(t)) -
1
dvi ~ ~ i e x p
-
2~
i
(1 +erf\ cyiv J)"
(3)
In this equation, erf is the error function, defined as
arK
erf(x) - - ~
dyexp(-y2).
In our derivation of this stochastic neuron model, we have assumed that each of the synapses participates with a contribution ~ j and that the membrane potential is an instantaneous function of the total input (see Eq. (2)). In reality, the dynamics is much more complex. The membrane integrates incoming stochastic activity and the time needed to reach the threshold is known as a first passage problem. The analytical solution of the first passage time problem is not known in general. This problem is well approximated by the above treatment when the membrane time constant is small compared to the rate of change of the presynaptic input signal. Note, that the probability to generate a spike depends on the input activity y(t) both in the numerator and in the denominator of Eq. (3). The former dependence is well known and states that the probability of firing of the cell is a function of the overlap between the input pattern y and the vector of synapses, i.e. the mean membrane potential. This dependence is the basis of coincidence detection: if between t - "r and t a large enough number of afferent cells fire, each of which has an excitatory connection to cell i, cell i will fire. The dependence in the denominator is weaker and is usually ignored. ~ is a sum of random positive quantities and therefore its mean value is of O(n) and its fluctuations of order O(v~). For large n we can therefore ignore the fluctuations in cy 2 so that (3.2 ~, /,/~2F. Here
(4)
(y2 = ~1 y~j cy2. denotes the mean noise in the synapses and r - ~ 1 ~ j )~j denotes the mean firing rate. erf is an increasing function of its argument, and cyi affects the slope of this function. We see that this slope decreases with increasing overall firing rate. This effect can be easily understood as follows. When the firing rate increases, the mean membrane potential will not be affected because of the balance of excitatory and inhibitory synapses. However, total noise in the input as given by Eq. (2) increases. This will broaden the distribution of vi and thus increase (decrease) the probability to fire when ui is lesser (larger) than (~)i, respectively. Thus, because the mean membrane potential is usually lower than the threshold, an increase in the overall input firing rate will increase the probability of the cell to fire (without affecting the mean membrane potential).
524
H.J. Kappen
0.8 /,j'/
I -
r=0.9
0.6 0.4 0.2 0
-50
Fig. 1.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
50
100
Spike probability as a function of mean membrane potential for different values of overall firing rate. See main text for details.
The effect is illustrated in Fig. 1. We consider a model neuron with n - 10,000 synaptic inputs. The synaptic strength is uniformly distributed between - 1 and 1. The synaptic probability is uniformly distributed between 0 and 1. The threshold is set to zero. For firing rates 0.1, 0.5 and 0.9, respectively, we generated 500 binary input vectors and plot the probability of firing, as given by Eq. (3) versus the mean membrane potential ~i. In the case that the neuron only receives excitatory input, the neuron is virtually deterministic and the above effects are absent: The membrane potential is a sum of n positive quantities and therefore of O(n). The membrane potential will display large fluctuations also of O(n) due to the stochastic nature of the synapses and the variable input. Therefore, the threshold must also be of O(n) because otherwise the neuron will be either always firing or always quiet. Therefore, v i - | is of O(n), whereas the denominator is of O(x/-fi). The erf will always be driven to saturation, which makes its output either zero or one and it is insensitive to the particular value in the denominator. The error function is numerically very similar to the hyperbolic tangent, in the following way: 1 erf(x) ~ tanh ( 2 ~ ) . In addition we define si - 2yi - 1 - + 1 to denote whether a neuron is firing or not. The state of the whole network will be simply denoted by s - (Sl,... ,s,). Thus, we can rewrite Eq. (3) in the following way"
The choice of the factor 2/v/-~ is such that the derivatives of both functions in x = 0 are equal and their maximal difference is 0.0352. One can optimize the prefactor such that the maximal difference is minimized. The resulting prefactor is slightly higher and the maximal difference reduces to 0.0189.
An introduction to stochastic neural networks
p(sl, t + zls , t) -- 89(1 + tanh(hi(s)sl) )
525
(5)
with hi -- y ~ wijsj -k- Oi, j•i
wij = v/2 rcnr~ ' Oi = ~-24 Ffiij - 2 | i x/'2 rtnrcy
3. Stochastic network dynamics 3.1. Parallel dynamics." Little model
Eq. (5) describes the probability for a single neuron to emit a spike between t and t + z, given an input activity s. In a network of neurons, this equation must be updated in parallel for all neurons. Thus, the transition probability from a state s at time t to a state s' at time f - t + z is given by T(s', {Is, t) - ]-IP(sl, t + zls , t)
(6)
i
with p(sl, t + rls, t) given by Eq. (5). T denotes the probability to observe the network in state s', given the fact that it was in state s at the previous time step. Since the dynamics is stochastic, the network will in general not be found in any one state but instead in a superposition of states. Therefore, the fundamental quantity to consider is pt (s), denoting the probability that the network is in state s at time t. The dynamics of the network is therefore defined as pt+~(s') - ~
V(s'ls)pt(s).
(7)
s
Eq. (7) is known as a first-order homogeneous M a r k o v process. The first-order refers to the fact that the probability of the new state only depends on the current state and not on any past history. Homogeneous means that the transition probability is not an explicit function of time, as can be verified by Eq. (5). This M a r k o v process was first considered by Little [17]. 3.2. Sequential dynamics
One of the drawbacks of parallel dynamics is that due to the strict discretization of time in intervals of length z, an external clock is implicitly assumed which dictates the updating of all the neurons. There exists another stochastic dynamics which has been used extensively in the neural network literature which is called sequential Glauber dynamics. Instead of updating all neuron in parallel, one neuron is selected
H.J. Kappen
526
at random and is updated. The neurobiological motivation that is sometimes given for this dynamics is that neurons are connected with random delays and that the membrane integration time is negligible [18] or that integration times have random duration [19]. The main reason for the popularity of sequential dynamics is that the stationary distribution is a Boltzmann-Gibbs distribution when the connectivity in the network is symmetric. This makes the connection to statistical physics immediate and allows for all the powerful machinery of M F theory to be applied. Also, the parameters (weights and thresholds) in the Boltzmann-Gibbs distribution can be adapted with a learning algorithm which is known as the Boltzmann Machine [8]. The sequential dynamics is defined as follows. At every iteration t, choose a neuron i at random. Update the state of neuron i using Eq. (5). Let s denote the current state of the network and let Fi denote a flip operator that flips the value of the ith neuron" s' = Fis r s~ - - s i and s~. - sj for all j r i. Thus, the network can make a transition to state s' = E s with probability T(s', t'ls , t) - ! p ( s I, t + xls, t)
(8)
with p(s'i,t + vls, t) again given by Eq. (5). The factor 1/n is a consequence of the random choice of the neurons at each iteration. The probability to remain in state s is given by the equality ~ s , T(s'ls) = 1, so that T(s, t'ls , t) - 1 - -1 Z p ( s l , n
t + zls, t).
(9)
i
Eqs. (8) and (9) together with Eq. (7) define the sequential dynamics. Note, that this dynamics allows only transitions between states s and s' that differ at most at one location, whereas the Little model allows transitions between all states.
4. Some properties of Markov processes In this section, we review some of the basic properties of first-order M a r k o v processes. For a more thorough treatment see [20].
4.1. Eigenvalue spectrum of T Let 5P denote the set of all state vectors s. s E 5e is a binary vector of length n and thus s can take on 2" different values. Therefore, pt(s) in Eq. (7) is a vector of length 2" and T(s'[s) is a 2" • 2" matrix. Since pt(s) denotes a probability vector, it must satisfy ~-]spt(s) - 1. In addition, T(s'ls) is a probability vector in s' for each value of s and therefore each column must add up to one:
Z r(s' s) - 1. St
Matrices with this property are called stochastic matrices.
(10)
An introduction to stochastic neural networks Let us denote the eigenvalues and left and right eigenvectors 2~, l~, r~, a = 1 , . . . , 2 n, respectively. 2 In matrix n o t a t i o n we have
527 of T by
Tr~ = )~r~,
Since T is a n o n s y m m e t r i c matrix, the left and right eigenvectors are different, n o n o r t h o g o n a l and complex valued, t denotes complex conjugation and transpose. The eigenvalues are complex valued. U n d e r rather general conditions each set of eigenvectors spans a n o n o r t h o g o n a l basis of C 2". These two bases are dual in the sense that:
l~rl3 - 8~13.
(11)
8ab denotes the K r o n e c k e r delta: 8ab = 1 if a = b and 0 otherwise, a and b can be real scalars or vectors. We can therefore expand T on the basis of its eigenvectors: 2" ~--- 1
If at t = 0 the n e t w o r k is in a state s o then p0(s) = pt=0(s) -- ~s,s0. Let us set r = 1 for convenience. The probability vector pt at some later time t is obtained by repeated application of Eq. (7): pt - Ttpo - Z
t t k~r~(l~po).
(12)
The stationary probability distribution of the stochastic dynamics T is given by p ~ which is invariant under the operation of T and therefore satisfies rp~ =p~.
(13)
Thus, the stationary distribution is a right eigenvector of T with eigenvalue 1. 4.2. Ergodicity and ergodicity breaking A M a r k o v process is called irreducible, or ergodic, on a subset of states 0. A subset of states
2
In general, the number of eigenvalues of T can be less than 2". However, for our purposes we can ignore this case.
H.J. Kappen
528
For an irreducible Markov process of periodicity d the Perron-Frobenius theorem states that T has d eigenvalues given by ~m - -
exp(2 rcim/d),
m - 0,..., d-
1,
and all remaining eigenvalues of T are inside the unit circle in the complex plane: [~[ < 1.3 In particular, T has exactly one eigenvalue 1. Its corresponding right eigenvector is equal to the (unique) stationary distribution. Note, that the left eigenvector with eigenvalue 1 is o( ( 1 , . . . , 1) as is immediately seen from Eq. (10). The right eigenvector, in contrast, is in general difficult to compute, as will be seen later. A nonirreducible or nonergodic Markov process has more than one eigenvalue 1 and therefore more than one left and right eigenvector with eigenvalue 1. Let us denote these eigenvectors by l l , . . . , lk and r l , . . . , rk, respectively. Any linear combination of the right eigenvectors k
p~--Zp~r
~
(14)
~=1
is therefore a stationary distribution, assuming proper normalization" p ~ (s)~> 0 for all s and ~ s p ~ ( s ) - 1. Thus, there exists a manifold of dimension k - 1 of stationary distributions. In addition, the k left eigenvectors with eigenvalue 1 encode invariants of the Markov process in the following way. Let the state of the network at time t be given by pt. Define L~(pt)--lto~pt, o~- 1,... ,k. Then L~ is invariant under the Markov dynamics:
L ~ ( p , + , ) - l~pt+, - l~Tpt = l~pt = L~(pt). One of these invariants is the left eigenvector I I o( ( 1 , . . . , 1) which ensures that the normalization of the probability vector is conserved under the Markov process. The value of the remaining k - 1 invariants are determined by the initial distribution p0. Since their value is unchanged during the dynamics they parametrize the stationary manifold and determine uniquely the stationary distribution. We can thus compute The fact that all eigenvalues are within the unit circle in the complex plane can be easily demonstrated in the following way. Let ~. be an eigenvalue of T and 1 its corresponding left eigenvector. Then for all s,
sips
Choose s such that [l(s)[ is maximal. Then 1
This statement is known as Gershgoren's theorem. Thus, X is within a circle of radius 1 - T(s[s) centered at T(s s). We do not know which s maximizes [/(s)[ and therefore we do not know the value of T(s s). However, since circles with smaller T(s[s) contain circles with larger T(s[s), ~ is in the largest circle: [X < 1. This completes the proof.
An introduction to stochastic neural networks
529
the dependence of the stationary distribution on the initial state. Because of the orthogonality relation Eq. (11), we obtain L ~ ( p ~ ) = l ~ p ~ - 9~. Because L= is invariant, we also have L=(po) - L ~ ( p ~ ) . Thus, p= --- L=(po) and the stationary state depends on the initial state as k
p ~ -- Z ( l t p o ) r ~ .
(15)
Note, that in the ergodic case (k = 1) the dependence on the initial state disappears, as it should, since l~po = 1 for any initial distribution. The time to approach stationarity is also given by the eigenvalues of T. In particular, each eigenvalue whose norm I~1 < 1 corresponds to a transient mode in Eq. (12) with relaxation time z= -- - 1 / l o g ~,=. Both concepts of irreducibility and periodicity are important for neural networks and we therefore illustrate them with a number of simple examples. Consider a network of two neurons connected symmetrically by a synaptic weight w = wz2 = w21. First consider sequential dynamics. The transition matrix T has four eigenvalues. Their values as a function of w are plotted in Fig. 2a. We observe, that for small w there exists only one eigenvalue 1. Its corresponding right eigenvector is the Boltzmann-Gibbs distribution p(sl,s2) = e x p ( w s l s z ) / Z as will be shown below. For small weights, the dynamics is ergodic: for any initialization of the network the asymptotic stationary distribution is the Boltzmann-Gibbs distribution. The dominant relaxation time is given by the largest eigenvalue that is smaller than 1. For larger w, we observe that the relaxation time becomes infinite because a second eigenvalue approaches l. This means that some transitions in the state space require infinite time and therefore ergodicity is broken. The large weight prohibits the two neurons to have opposite value and therefore only the states (1, l) and ( - 1 , - 1) have positive probability in the stationary distribution. Let us denote the 4 states (1, 1), (1, - 1), ( - 1, 1), ( - 1, - 1) by s~',/t = 1 , . . . , 4. The right eigenvectors with eigenvalue 1 are the Boltzmann-Gibbs distribution
sequential
1.5
0.5
0.5
-0.5
-0.5 0
1
2 W
Fig. 2.
parallel
1.5
3
4
_
0
1
2
3
4
W
Eigenvalues of T as a function of w under sequential and parallel dynamics. For large w, multiple eigenvalues 1 signal ergodicity breaking.
H.J. Kappen
530
and the vector r~(~) -
89(~s~, - ~ s 4 ) .
The stationary distribution is no longer unique and consists of any linear combination of rl and r2 that is normalized and positive: po~ = rl + p2r2, with - 1 < P2 < 1. The left eigenvectors with eigenvalue 1 are t~(s) =
1
12(S) = ~s.s' -- ~s.s 4
and the corresponding quantities Ll and L2 are conserved. The dependence of the stationary distribution on the initial distribution is given by Eq. (15) with k = 2. In particular, the 4 pure states are mapped onto: s z : L2 = S 2"3 : L 2 : s4:L2
1 ~
p ~ (s) = ~ (s) + r 2 ( s ) =
0 ---+ p~
= -1
~
~.~,,
(s) = rl ( s ) ,
p ~ (s) = r~ (s) - ~ 2 ( s) = ~ 4 .
For the same network with parallel dynamics, the eigenvalues are depicted in Fig. 2b. For small weights the network is again ergodic. The stationary distribution is given by Eq. (20) and is flat: independent of w and s. For large weights ergodicity breaking occurs together with the occurence of a cycle of period 2 and two additional eigenvalues 1. Using the invariants, it is easy to show that the four pure states are mapped onto the stationary distributions: s~ ~
p~(s)
-
~.~,,
S 2"3 ---+ p v c ( S ) - - 21-(~s.s2 d- ~s.s3),
s 4 ~ Po~ (s) - 8s.s4. States s ~ and S 4 a r e two attractors: Ts 1'4 - - S 1'4. States 8 2 and s 3 form a limit cycle of length 2: T2s 2 - Ts 3 = s 2. Note in particular, that none of the states is mapped onto the ergodic stationary distribution Eq. (20) when ergodicity is broken. In our examples we have seen that for symmetric networks, all eigen values of T are real. This is indeed in general true for both parallel and sequential dynamics" - 1 <_ ~ _< 1. In addition, one can show for sequential dynamics (symmetric or asymmetric) that all eigenvalues are within the circle centered at 89 + 0i with radius 1 [18]. The proof of this last statement again uses Gershgoren's theorem and the special property of sequential dynamics that T ( F i s l s ) + T ( s l s 1/n. As a consequence, sequential dynamics has always periodicity 1 since other eigenvalues with ;~1- 1 are excluded. Note, that this property holds regardless of whether the network has symmetric or asymmetric connectivity. It also follows that for parallel dynamics with symmetric weights one can have at most periodicity 2. Parallel dynamics with asymmetric weights can have arbitrary periodicity and will be discussed in Section 6.1.
An introduction to stochastic neural networks
531
5. Boltzmann-Gibbs distributions
If we consider a stochastic neural network with a random connectivity matrix, what will the behavior of the network be? This is a rather difficult question to answer in general, but in some specific cases quite a lot is known. In particular for symmetrically connected networks with sequential dynamics, the equilibrium distribution is the Boltzmann-Gibbs distribution which plays a central role in statistical physics (see also the chapter by Coolen in this book). In this section we derive the Boltzmann-Gibbs distribution. Then we indicate the computational problems associated with the computation of statistics of the Boltzmann-Gibbs distribution. Subsequently, we will use the cavity method to describe the behavior of an ensemble of randomly generated networks. It is shown that depending on the type of connectivity in the network, it can be in one of three possible phases: a paramagnetic phase where neural firing is weakly correlated, a ferromagnetic phase where large groups of neurons assume either maximal or minimal firing rates, and a spin-glass phase where neurons are frozen in a random disordered state. Subsequently, we briefly discuss the Hopfield attractor neural network and explain how the above phases arise in this model and affect its storage capacity.
5.1. The stationary distribution In the case that the synaptic connectivity is symmetric, wij = wji one can compute the stationary probability distribution for the parallel and sequential dynamics explicitly. In both cases the derivation uses the argument of detailed balance, which states that for the dynamics T(stls) there exists a function p(s) such that
T(slst)p(s ') = T(stls)p(s)
for all s,s t.
(16)
If detailed balance holds, it implies that p(s) is a stationary distribution of T, which is easily verified by summing both sides of Eq. (16) over all states s t and using Eq. (10). However, the reverse is not true: many stochastic dynamics do not satisfy detailed balance and a solution to Eq. (13) is then typically not available in analytical form, although its existence is dictated by the Perron-Frobenius theorem
[20]. For random sequential dynamics, T is given by Eqs. (8) and (5) and the detailed balance equation reads T(Fisls)p(s ) = T(slFis)p(Fis) for all states s and all neighbor states F/s. It is easy to show that
T'sis' ( ( ) ) TtP.slsa = exp
2 ~wi/s/+Oi J
si
o
Consider the distribution p(s) - ~ e x p
ZWijSiSj + Zi Oisi) " lj . .
(17)
532
H.J. Kappen
p(s)
is called a Boltzmann-Gibbs distribution and plays a central role in statistical physics. For this reason, the expression in the exponent is often referred to as the energy: 1
--E(s) -- -~ Z
WijSiSj + ~ ij
OiSi"
(18)
i
States of low energy have high probability. Z is a normalization constant, Z- ~
exp(-E(s))
(19)
s
and is called the partition function, weights ~ j and
p(s___~)= exp (2 ( Zj
p(FlS)
W~ijSj _[-
p(s) only depends
on the symmetric part of the
Oi) si) "
Thus for symmetric weights, detailed balance is satisfied between all neighboring states. Since all values of T are zero for nonneighboring states this proves that p(s) is the equilibrium distribution. 4
5.2. Computing statistics p(s) in Eqs. (17) and (20) give an analytical expression of the stationary probability distribution of an arbitrary network with symmetric connectivity and sequential and parallel dynamics, respectively. From these equations we can compute any interesting statistics, such as for instance the mean firing rate of each of the neurons: mi -- (si) -- Z
(21)
sip(s)' s
and correlations between neurons: )(.ij- (sisj) - (si) (sj) - Z
s i s j p ( s ) - mimj.
(22)
s
When all neurons are updated in parallel, the transition matrix is given by Eq. (6). As in the case of sequential dynamics, we can again compute the stationary distribution for symmetric weights. We use again detailed balance:
T(s'[s) exp (~-'~ij wijsjs: + ~-~iOis0 cosh(hi(s')) T(s[ff) -exp(~f~ijwijs~.si+ ~-~iO~s~)~i cosh(hi(s)) When the weights are symmetric, the term involving the double sum over i and and j cancels and the remainder is of the form p(s')/p(s), with (20)
p(s) = ~exp Z . logcosh
wijsj + Oi + . Oisi .
t
This is the equilibrium distribution for parallel dynamics [17].
An introduction to stochastic neural networks
533
H o w e v e r , these c o m p u t a t i o n s are in general too time c o n s u m i n g due to the sum over all states, which involves 2 n terms. F o r some distributions, the sum can be p e r f o r m e d efficiently. F o r B o l t z m a n n Gibbs distributions, the subset o f probability distributions for which the sum over states can be p e r f o r m e d efficiently are called decimatable distributions [21]. These include factorized distributions, trees and some other special graphs as subsets. F o r factorized distributions, p(s) = I-Iipi(si), the energy only depends linearly on si and the sum over states can be p e r f o r m e d by factorization:
exp( i i) U(Vexp, ;s;,)H F r o m Eqs. (17) and (20) we infer that this c o r r e s p o n d s to the rather uninteresting case o f a n e t w o r k w i t h o u t synaptic connections. 5 In general, the sum over states c a n n o t be c o m p u t e d in any simple way. In this case we call the the probability distribution intractable a n d one needs to apply a p p r o x i m a t i o n m e t h o d s to c o m p u t e the partition function a n d statistics such as Eq. (21) a n d (22). F o r specific models, i.e. specific realizations o f the connections and thresholds, one can obtain a generic description o f the n e t w o r k behavior by using M F theory. Such an a p p r o a c h typically considers not one n e t w o r k , but an ensemble o f networks and the limit o f n ---, e~. One can then c o m p u t e the average b e h a v i o r o f such networks. This a p p r o a c h has been successfully applied to m a n y neural n e t w o r k models, such as the a t t r a c t o r neural n e t w o r k p r o p o s e d by Hopfield [5]. In this section we will briefly outline this a p p r o a c h and give some o f the m o s t well-known results. F o r a m o r e complete review see the c o n t r i b u t i o n by C o o l e n in this volume. 5.3. The cavity m e t h o d There are various ways to derive M F results. The m o s t well-known a p p r o a c h is to use the replica m e t h o d . Here we will consider a s o m e w h a t simpler a p p r o a c h called the cavity m e t h o d , which dates back to [22]. 6 If we w a n t to c o m p u t e the m e a n firing rate o f n e u r o n i we m u s t c o m p u t e the exponential sum in Eq. (21). D u e to the d e p e n d e n c e o f p(s) on Z, we m u s t also 5
The probability distribution p(s) is called a tree when between any two neurons in the network there exists only one path, where a path is a sequence of connections. Alternatively, one can order the neurons in the graph with labels 1,..., n s u c h that neuron i is connected to any number of neurons with higher label but only to at most one neuron with lower label. For Boltzmann-Gibbs distributions which are trees:
: 6
t: cosh,wi /,
where pi labels the parent of neuron i. For parallel dynamics, such nontrivial decimatable structures do not exist. Here we use a particularly transparent formulation which was communicated to me by Manfred Opper.
H.J. Kappen
534
compute the exponential sum in Eq. (19). The idea of the cavity method is to separate these sums in a contribution from neuron i and a contribution from all other neurons in the following way:
Z - Z
Z
s\i
exp(sihi)exp(-E\i) - 2(cosh hi)\iZ\i,
Si
Z(si) -- Z Z si exp(sihi) s\i Si
exp(-E\i)
-
2(sinh hi)\iZ\i ,
where E\i denotes all contributions to the energy excluding dependencies on si and (-)\; denotes ensemble average with respect to the Boltzmann-Gibbs distribution P~i = exp(-E\i)/Z\i. Thus we obtain (s;) -
(sinhhi)\i
(coshhi)\i
(23)
hi
is the local field defined previously: h~ - ~-~j#~w~jsj + Oi. It is a stochastic quantity consisting of a sum of contributions from all other neurons. In particular, hi does not depend on s~. However, (h~) does depend on s~, because si affects the mean firing rates of all neurons in the network. For instance, if all connections are positive, si -- +1 will increase (decrease) all firing rates (sj) slightly. Instead we consider the restricted averages and write
h i - (hi)\i + ui.
(24)
u; is a stochastic quantity that we assume to be symmetrically distributed under P~i: p~i(--Ui) - - p~i(Ui). In particular, (ui)\i -- O. Substituting Eq. (24) in Eq. (23) we obtain
(sinh(hi))\i - sinh((hi)\i)(exp ui)\i
(cosh(hi))\i = cosh((hi)\i)(exp Ui)\i and thus (s/.) - tanh((h;)\/).
(25)
This is the main result of the cavity method. It states that the expected firing rate of neuron i only depends on the expected value of the local field computed in the
absence of neuron i. 5.4. Quenched average solution for the S K model We can use Eq. (25) to compute the typical behavior of an ensemble of networks in the limit of large n. Before we consider the attractor neural network, we will first analyze the behavior of a simpler model, the so-called Sherrington Kirkpatrick (SK) [23] model. In this model one assumes that w/j are drawn independently at random from a Gaussian distribution with mean value Join - 1 and variance j 2 / n - 1. Oi is drawn independently at random from a Gaussian distribution with mean value I0 and variance I 2. For one realization of the weights, (hi)\/ is not a random quantity, but just a number given by
An introduction to stochastic neural networks
535
wij<sJ)\ i + Oi. J
We compute the distribution of (hi)\i in the ensemble of networks. This is called a quenched average, where quenched means fixed and refers to the fact that we compute the Boltzmann-Gibbs statistics for a fixed realization of the weights and thresholds and, after that, average the resulting firing rates over all realizations of weights and thresholds. The first term in (hi)\i depends on the connections to neuron i. It is multiplied by a term which is an expectation value computed in the network from which neuron i is removed and therefore does not depend on the connections to neuron i: wij,j-- 1,... ,n. We can therefore easily compute the statistics with respect to wij and 0i. Since (hi)\i is a large sum of r a n d o m contributions, it has a Gaussian distribution. Its mean value and variance are
<w=~ JOl Zj#i (sJ)\ i + Io -- Jom + Io, 1
<(w_(((hi)\i)w) 2_ jZq + i 2, where we have defined m - 1 / ( n - 1)Ejr a n d q - 1 / ( n - 1)~j#i<Sj)~i a n d (')w denotes average with respect to Wiy. Note, that m and q are independent of i in the limit n ~ cr m
1 tl
1 | Z<SJ)\ i ~--Z<Sj), j#i j
1
q .
2 l
jr
1
)2
j
9
Thus the quenched average of Eq. (25) becomes m- ~
1
Jf dze -z2 /2 tanh( v/qJ 2 -Jr-IZz + Jom -+-I0)
(26)
and the quenched average of the square of Eq. (25) becomes q -- ~
1 f
dz e_Z2/2t a n h 2 ( v / q J 2 + I2z -4- Jom -Jr-Io).
(27)
These equations are identical to the replica symmetric solution as given in [23]. The solutions for m and q of Eqs. (26) and (27) can be computed for different values of Jo,J, Io and I. Here we restrict ourselves to the case I0 = I = 0, ie. 0i = 0. For both J0 and J small, the only solution is m = q = 0. F r o m the definitions of m and q we see that this means that (si) = 0. This is the regime where the couplings between neurons are small and can in fact be ignored. The mean firing rate is given by the threshold value: (si) ~ tanh(0i). This is called the paramagnetic phase. One can perform a linear stability analysis of the solution m = q = 0 and one finds that the solution is stable as long as 0 < J0 < 1 and 0 < j2 < 1. For J0 > 1 and
H.J. Kappen
536
J < J0, most of the weights are positive. The neurons will excite each other and the net result is that all neurons will align. The solution that is obtained is q = m = 1 which means (si) = 1. This is the ferromagnetic phase. On the other hand, for J > 1 and J0 < J the weights are large but of opposite sign. As a result, the network is frustrated [24]. When for instance three neurons are connected by two positive connections and one negative connection, there is no configuration s such that wijsisj > 0 for all pairs. As a result of frustration, the energy contains (often exponentially) many local minima that have approximately the same value. In contrast, in the absence of frustration the energy contains only one (or maybe a few, due to symmetries) minimum. The solution is m = 0 and q > 0. It means that the each of the neurons has mean value (Si)=-'I-V/c~ and is frozen in this state. This frozen disorder is called the spin glass phase. The results are summarized in Fig. 3. When thresholds are present the behavior of the network remains qualitatively the same. For instance, there is still a paramagnetic phase where neurons fire more or less independently, but the mean firing rates are now not zero but given by (si) ~ tanh(0i). For larger weights there are transitions to ferromagnetic and spinglass phases. The transitions between phases is less abrupt for nonzero thresholds. When I0 and I are large compared to J0 and J, respectively, the collective behavior of the network breaks down and each neuron aligns according to its threshold.
6. Asymmetric networks
6.1. The differences between symmetric and asymmetric networks In Section 5, we have studied the typical behavior of symmetric networks. However, the assumption of symmetry is rather unrealistic. What differences should we expect when we consider asymmetric networks?
rn=0 q>O
f /
,, -/ J J / / / / I/ I I I
m-0 q=0
I
m=l q=l
I I I I I v
J0 Fig. 3.
An introduction to stochastic neural networks
537
There are several ways to introduce asymmetry in neural networks. One approach uses temporal associations of neural activity of the form wij (x s i ( t ) s j ( t + 1). In this way, a s e q u e n c e of patterns is memorized by the network. Such a type of memory may be needed to represent various temporal behaviors, such as motor control tasks or generation of speech. The simplest way to memorize such sequences is by using parallel dynamics. One simply assumes wij e( y~,p~g+li~ , with ~ , g = 1 , . . . , p the sequence of patterns. The effect is that the dynamics of the network contains a limit cycle formed by the sequence of patterns. Regarding the storage capacity of such a network, one can show a similar result as in the symmetric case, i.e. that for low noise level and for small enough p the sequence is recalled [25]. See for instance [26] for a review of these models. Another frequently studied model is the asymmetric SK model. The weights are drawn at random from a mean zero Gaussian distribution. The asymmetry is controlled by a parameter q
(wZ)w
9
rl = 1 , 0 , - 1 describe the symmetric, asymmetric and anti-symmetric case, respectively. The behavior of these networks have been studied extensively in the noiseless limit [27-30]. For symmetric and antisymmetric networks it is easy to show that they have fixed points and limit cycles of length 2 (symmetric case) and limit cycles of length 4 (antisymmetric) [31,18]. 7 As we saw in the previous section, the symmetric noiseless network is a spin glass. Ergodicity is severely broken: Any initial state will converge to one of the exponentially many local minima or limit cycles of length 2, We give the derivation for
L(t, t + 11 = - ~
0i =
0. Consider the two-time L y a p u n o v function
~(t + l lw~j~j(tl.
i,j
Under parallel dynamics the change of L in one time step is
AL = L(t + 1, t + 2) - L(t, t + 1)
We consider the symmetric and antisymmetric case: wij = kwji with k = +1. In addition, define ~i(t) = +1 such that si(t + 2) = ~ci(t)si(t). Then
AL = Z ( k ~ c i - 1)si(t + 2)hi(t + 1). i
Due to the parallel dynamics, si(t + 2)hi(t + 1) is always positive for all i. Since kK~ -- +1, we have AL _< 0. Since L is bounded from below the dynamics converges to a state where AL = 0 and therefore
k~i=l
for alli.
F o r symmetric networks, k -- 1 and thus si(t + 2) = si(t): the network has limit cycles of length 1 and 2. F o r antisymmetric networks, k = - 1 and thus s~(t + 2) = -si(t). This excludes fixed points and limit cycles of length 2 but allows limit cycles of length 4.
538
H.J. Kappen
for sequential or parallel dynamics respectively. The relaxation time is polynomial in the size of the network [29]. Ergodicity is also broken in the antisymmetric network, leading to exponentially many limit cycles of length 4, for sequential or parallel dynamics, respectively. For asymmetric networks, with - q c < q < tic and rl~ ~ 0.5, the behavior is radically different. In the noiseless limit, there exist exponentially long limit cycles that dominate the network dynamics [27]. Thus, to compute statistics in the stationary state requires a simulation time that is exponential in the network size [29]. This latter feature is particularly important, because it means that the behavior of the network for any finite time is transient and its stationary statistics become in a sense irrelevant. The behavior of the network can be interpreted as chaotic: divergent trajectories and exponential size of transients [32]. We illustrate the effect of asymmetry in Fig. 4, where we show the eigenvalue spectrum of a fully connected symmetric, asymmetric and antisymmetric network of 8 neurons with parallel dynamics. The symmetric network has positive and negative real eigenvalues indicating the possibility of fixed point solutions and limit cycles of length 1 and 2. The antisymmetric network has eigenvalues at multiples of i and therefore displays period 4 cycles. Both symmetric and antisymmetric networks display ergodicity breaking. The asymmetric network has eigenvalues all over the complex circle. In the noiseless case we discern a periodic orbit of length 14. No ergodicity breaking occurs.
6.2. Mean field theory & the absence of detailed balance In Section 5 we have seen how a quenched averaged approach is capable of describing the typical behavior of neural networks. However, in many instances we are not satisfied with such average results but would like to say something about an individual network. An example is when we consider learning. It has been well established experimentally, that synapses change their strength as a function of the firing of the pre- and postsynaptic neuron. In order to compute these changes, one needs therefore estimates of the mean firing rates and the (time-delayed) correlations of the pre- and postsynaptic neuron. However, as we have seen these quantities are difficult to compute. In this section we therefore consider a form of MF theory that was previously proposed by Plefka [33] for Boltzmann-Gibbs distributions. It turns out that the restriction to Boltzmann-Gibbs distributions is not necessary and one can derive results that are valid also for asymmetric networks as well as for parallel dynamics. We therefore consider the general case. A drawback of this approach is that it is only valid for small values of the weights. However, as we have seen in Section 2 this is to be expected for biological networks, because due to noise the effective synaptic strength scales with 1/x/-n. We use this method to compute the MF equations and correlations for asymmetric stochastic networks with sequential dynamics. Subsequently, we will illustrate the approach for learning in Boltzmann Machines. Our argument uses an information geometric viewpoint. For an introduction to this approach see for instance [34]. In Section 4 we have seen that when the sto-
539
An introduction to stochastic neural networks
Symmetric, noise
Antisymmetric, noise
Asymmetric, noise 1
0.5
1
0.5 0
-0.5 -1
9, . -
0.5
9
0
Symmetric, no noise
1
-1
..... ., .
, ........
". 9
. .
1
1 ",..
0.5
0
Antisymmetric, no noise
Asymmetric, no noise 1
'..
0.5
0 -0.5
i 9. .... J..
1
0.5
9 ".
i 9 99
-0.5
-1 0
....
"
0
..
-0.5
".,.. _,
9149
0
-0.5
-0.5 .,
-1
0
I
-I
0
I
-I
0
I
Fig. 4. Eigenvalues of the transition matrix T for the fully connected network with random symmetric, asymmetric and antisymmetric connections and 0i = 0. The eigenvalues are complex numbers )~, with I)~I < 1. There is always at least one eigenvalue X = 1 (the PerronFrobenius (PF) eigenvalue) and the corresponding right eigenvector of T is the stationary distribution. The Markov process is called periodic with periodicity d when T has eigenvalues )~ -- exp(2 rdn/d), n = 0 , . . . , d - 1. See Section 4 for additional details. Top row: Weights are drawn from a Gaussian distribution with mean zero and variance 1In. Due to the small weights the dynamics is rather noisy. All eigenvalues except for the PF eigenvalue are in the interior of the unit circle, which means that these modes do not survive asympotically. Bottom row: same weights as in the top row, but scaled wij ~ ~wij, with 13---, ec. In this case the dynamics is deterministic. We clearly see the limit cycles of period 2 for the symmetric case. Ergodicity is broken: in this example, the number of eigenvalues 1, 0 and - 1 are 22, 216 and 18, respectively. Thus, there are 22 independent stationary distributions, of which 4 are fixed points and 18 are limit cycles of length 2. Also in the antisymmetric case ergodicity is broken: in this example, there are 204 eigenvalues 0 and 13 cycles of length 4 with eigenvalues (1, i, - 1, -i). In contrast, the asymmetric case is ergodic: there is only one eigenvalue 1, forming with the other 13 nonzero eigenvalues a limit cycle of length 14. These cycles persist forever. chastic n e u r o n dynamics is ergodic, it has a u n i q u e stationary probability distribution. W e will assume ergodicity and denote the stationary distribution by p(sl0, w), which is a probability distribution over s and depends on the weights and thresholds o f the network. Unless the connectivity is symmetric, we do not k n o w its functional f o r m explicitly. Let ~ = {p(s[0, w)} be the m a n i f o l d o f all the probability distributions over the state space Y that can be o b t a i n e d by considering different values o f 0, w. contains a s u b m a n i f o l d ~ c ~ of factorized probability distributions. This subm a n i f o l d is described by
H.J. Kappen
540
+//{- {q(sl0 , w) C ~@[w- 0}. 0 - ( 0 1 , . . . , 0,) parametrizes the manifold ~/, and w parametrizes the remainder of the manifold ~'. For q C ~ we can write the stationary distribution explicitly: exp(0qsi)
1
q(slOq) -- H 2 cosh(0 q) = H ~(1 + mqsi), i
i
with m/q = ( s i ) q - tanh(0q). Here, (.)q denotes expectation value with respect to the distribution q. The submanifold ~//t' describes the factorized stationary distributions for networks with all synaptic connections zero. Consider a network, whose weights and thresholds are given by 0, w. This network has a stationary distribution p(sl0, w) e ow. We want to find its M F approximation which we define as the factorized distribution q r ~ that we obtain by orthogonal projection of p onto Jr'. It can then be shown [34,35], that the orthogonal projection onto ~r found by minimizing the relative entropy
D(p, q) = ~-~p(sl0 w)log(p(s!O-.'-w))\ s ' \ q(s] Oq) / with respect to the coordinates of 0 q of the factorized distribution q. We find
dD(p, q) dOq
= mq - mp -- 0
(28)
with ~ (Si)p. This equation states that the closest factorized model has its first moments equal to the first moments of the target distribution p. This is illustrated in Fig. 5. We need to solve Eq. (28) for 0q - tanh -l (mq). However, we cannot compute since we do not know the stationary distribution p. Even if we knew p (for instance Boltzmann-Gibbs distribution) it would be of little help, since computation of m~/is intractable. In order to proceed, we assume that the distribution p is somehow close
0 ~"
.
5
~
0
-o.5 t
"'~.z~~~--~ -0.5
0 0.5 tanh(w)
Fig. 5. Manifold of probability distributions ~ is computed for a Boltzmann-Gibbs distribution on two variables p(sl,s21w, O) --exp(wsls2 + 0(Sl + se))/Z. Solid lines are lines of constant (sl) = ($2). Broken lines are lines of c o n s t a n t (sis2). Both (w, 0) and ((sl), (S1S2)) a r e coordinates systems of ~. d / i s given by the line w - 0. For any p E ~, the closest q r ~/~ satisfies (S)q- (S)p.
A n introduction to stochastic neural networks
541
to the submanifold ~g//. Define d O / -
0 i -0
q
and d w i j - w i j - 0 - wij. Expanding
d m i - mPii- m q to second-order we obtain Omi [ 0 - dmi ~ Z - ~ j dOj J
1 j ~x nt--~
q
a2mi ] dOjd0K, OOjOOK,q
(29)
,
where Oi = (0i, wij) is the vector of all weights and thresholds. In order to proceed we need to compute the dependence of mi on O, w in the factorizedpoint q. We can use Eqs. (13) and (21) and the definitions of the transition matrices T for sequential and parallel dynamics Eqs. (8), (9) and (6) to get the implicit relations
(tanh(hi(s)).
(Si) =
(30)
This equation holds for both sequential and parallel dynamics. The computation of the derivatives is tedious but straightforward. It is presented in Appendix A. The result is
Eq. (31) is our main result and gives the approximate mean firing rates for arbitrary dynamics and arbitrary (but small) synaptic connections. In the case of symmetric connections w~j = wji, Eq. (31) were first derived by Thouless, Anderson and Palmer and are referred to as the TAP equations [36]. The correlations can be computed in a similar manner, but depend on the type of dynamics. We restrict ourselves to sequential dynamics and equal time correlations. From Eq. (22) we obtain
(sisj) -- 1 (si tanh(hj(s))) + (i ~ j).
(32)
When we expand g;j around the factorized solution g;jq = 0, we obtain 1 (1 - m 2 ) ( 1
~ij - - -2
+
(1 - m
--
mj2 )Wij
l(1 -
-
+ 2 mimj(wji)
+
Jl
(33).
To evaluate the quality of our M F approximations, we compare them to results of Monte Carlo (MC) simulations. We consider networks of n = 100 neurons. We choose w~ i -r j random and independently from a normal distribution with mean zero and variance 1/v/-n. We consider two different types of weights: symmetric weights w ~ - w ) ~ and asymmetric weights, where w~ and w~ are drawn independently. We consider two types of thresholds: 0~ = 0 and 0~ random and independently from a normal distribution with mean zero and variance 1. Since the approximation is expected to deteriorate with increasing weights size, we consider 0 00) and vary 0 _< [3 _< 1. networks with (Wij, Oi) - - ~ ( W ij'
542
H.J. Kappen
We use MC simulations to estimate the mean firing rates (si) and correlations )r The states are generated using sequential Glauber dynamics. To minimize the initialization (burn in) effect, we start the network in a random state and do not include the first to iterations. We compute the average over the subsequent z states: (Si) MC - -1 t=~~ s,(t), T t=to
Z,j
_
l ' T
~T
,,(t)ss(t)
(34/
-- (s,) M c
(,j)MC
(35)
t=to
The results are rather dependent on a proper choice of to and z. We obtained stable results by choosing to -- 105n and z = 106n. These values are rather large, but necessary to get results accurate enough to compute the small Zij.'s. (The Zij's are small because to lowest order Zi9 oc wi# oc 1 Ix~ft.) From Eq. (31) we compute the M F approximation of the mean firing rates. In order to assess the importance of the second-order (TAP) contribution, we also compute these approximate values taking only the terms of O(w) into account (MF). In Fig. 6, we show the root-mean-square (RMS) values of the mean firing rates as a function of [3 for the MC solution, the M F solution and the TAP solution. The statistical errors in the MC results for m; are of the order 8rni ,~ 0.002. In addition, we show the RMS values of the difference between the M F and MC solution and between the TAP and M C solution. We conclude that the second-order approximation is significantly better than the first-order approximation when [3 < 1, both for symmetric and asymmetric networks. The results for the correlations are presented in Fig. 7. The statistical errors in the MC results for Z;j are of the order 8Zi/~ 0.002. We compute the TAP-values for the mean firing rates and insert these in Eq. (33) to compute the correlations. We consider again separately the O(w) approximation and the O(w 2) approximation. We conclude that the second-order approximation is significantly better than the first-order approximation when 13 < 0.5, both for symmetric and asymmetric networks.
7. Learning in neural networks 7.1. A t t r a c t o r
neural networks
In 1982, John Hopfield wrote a seminal paper, where he proposed a stochastic neural network in which the connections are the result of Hebbian learning [5]. Hebbian learning is the mechanism that neurons increase their connection strength when they are both active at the same time. The rationale is that when a presynaptic spike contributes to the firing of the postsynaptic neuron, it is likely that its contribution is of some functional importance to the animal and therefore the efficacy of the responsible synapse should be increased. If however, the presynaptic spike does not result in the firing of the postsynaptic cell, or vice versa, that the postsynaptic
An introduction to stochastic neural networks
543
rms(m), n = 1 0 0
symmetric, e = randn
symmetric, e = 0
0.8
0.8
~,0.6
~,0.6
~
.,
L.
0.2
+ .+
m rnf 0~, 0
_,,._ mtap +9
Eo.4
'0.4 0.2
m mc +
1
m mc - m mf mmC
0.2
m t
0.4
0.6
0.8
1
0
0
0.2
asymmetric, e = 0
1
0.4
0.6
0.8
1
asymmetric, e = randn
0.8
0.8
~-0.6
~,0.6
v
E "-" 0.4
"~" 0.4
0.2
0.2
0
0.2
0.4
0.6
0.8
1
O~ 0
"
.i.
9
0.2
:
.w.
-
"
.,.
0.4
'
-
.
-
-
'
0.6
'
0.8
1
Fig. 6. Mean firing rates as a function of the strength of the connections for sequential dynamics, n - 100. RMS values of Monte Carlo results (--), first-order approximation (+-), second-order approximation (,-). RMS values of difference between first-order approximation and MC value (+..) and difference between second-order approximation and MC
value (,-.). We define RMSZ(m) __ ;1 ~ i n m2. cell fires in the absence of the pre-synaptic spike, the synapse is probably not very important and its strength is decreased. One could summarize this behavior as ~ ) w i j - l"l(Yi(t)yj(t)- )~wij),
where yi(t) = 0, 1 denotes the firing of neuron i between time t and t + r as defined in Section 2, q the learning rate and )~ is a small positive constant. Although the mechanism of Hebbian learning has been confirmed in various experiments [37], the picture is considered to be too simple. In particular, synapses display an interesting history-dependent dynamics with characteristic time scales of several msec to hours. The analysis of stochastic networks with Hebbian connectivity was performed in a series of papers by Amit et al. [9,10]. They considered various 'Hebbian' learning rules which are similar, but not quite identical to the Hebbian mechanism discussed above. Nevertheless, one expects that the behavior of this model is qualitatively the same as for biological networks. Due to the symmetric connectivity, the stationary behavior of the network can be computed and is given by Eq. (17). The patterns ~ become stable attractors of the
H.J.
544
Kappen
rms(xij) , i<>j, n= 100 symmetric, 8 = 0
0.1
...
.*
0.06
0.08 +: 7*
,,~ 0.06
,-, 0.06
"- 0.04
"- 0.04
0.02
~mc -+-
--~ + .
Xt~P(O(w))
xtaP(o(w2))
,,+
~
0.2
,-
ztaP(o(w2))0.1
0.4 [~ 0.6
0.8
'
1
0
,.~ 0.06
.~ 0.06
" 0.04
'- 0.04
0
0.2
0.4
0.4
0.6
0.8
1
0.02
. + - ..,.0.6
0.2
asymmetric, 0 = randn
0.1
0.08
..§
..+ ..4"- ~, .~." ....
O:
0.08
0.02
.-.4-'-+
0.02
asymmetric, O = 0
Z mc - Z t a P ( o ( w ) ) Z mc _
symmetric, e = randn
o.1
0.8
1
O: 0
0.2
0.4
0.6
0.8
1
Fig. 7. Correlations as a function of the strength of the connections for sequential dynamics, n = 100. RMS values of Monte Carlo results (--), first-order approximation ( + - ) , second-order approximation ( , - ) . RMS values of difference between first-order approximation and MC value (+-.) and difference between second-order approximation and MC value (,--). We define RMS2(z)= ~ 9 Ei>jn ~'U"9 dynamics when the number of patterns is sufficiently small and 13is sufficiently large. Thus the network operates as a distributed memory. When 13 is too small, all attractors become unstable and the firing of the neurons becomes more or less uncorrelated. This behavior is similar to the paramagnetic phase discussed in the SK model. When the number of patterns is too large, the network behaves as a spin glass whose minima are uncorrelated with the stored patterns. This behavior is to a large extent independent of whether the neuron dynamics is sequential or parallel (see Section 3 for the definition of these terms).
7.2. Boltzmann machines Another well-known application of the Boltzmann-Gibbs distribution are Boltzmann Machines [8]. The basic idea is to treat the distribution Eq. (17) as a statistical model, and to use standard statistical tools to estimate its parameters w u and Oi. Let us partition the neurons in a set of nv visible units and nh hidden units (nv + nh = n). Let c( and 13 label the 2 "v visible and 2 "h hidden states of the network,
An introduction to stochastic neural networks
545
respectively. Thus, every state s is uniquely described by a tuple ctl3. Learning consists of adjusting the weights and thresholds in such a way that the BoltzmannGibbs distribution on the visible units p ~ - ~ F P ~ approximates a target distribution q~ as closely as possible. A suitable measure for the difference between the distributions p~ and q~ is the relative entropy [38] K - Z
q~ log q~.
(36)
P~
It is easy to show that K _> 0 for all distributions p~ and K - 0 iff p~ - q~ for all ~. Therefore, learning consists of minimizing K with respect to wi2 and 0i using gradient descent and the learning rules are given by [8,39] 8K ~0 i -- --1] ~//
- - ]] ( ( S i ) c - - (Si)),
8K 6wij -- --]]~wij - T~((siSj)c - (sisj)) i ~ j.
(37)
The parameter 11 is the learning rate. The brackets (.) and (')c denote the 'free' and 'clamped' expectation values, respectively. The 'free' expectation values are defined as usual:
aF s i sj p ~ .
(38)
N
The 'clamped' expectation values are obtained by clamping the visible units in a state a and taking the expectation value with respect to q~: (Si) c -- ~
Si~ q~PFl~,
s i sj q~P~l~, aF
s~~ is the value of neuron i when the network is in state a]3. P~I~ is the conditional probability to observe hidden state 13 given that the visible state is a. Note that in Eqs. (37)-(39), i and j run over both visible and hidden units. Thus, the BM learning rules contain clamped and free expectation values of the Boltzmann-Gibbs distribution. The computation of the free expectation values is intractable, because the sums in Eq. (38) consist of 2" terms. If q~ is given in the form of a training set of p patterns, the computation of the clamped expectation values, Eq. (39), contains p2 ~ terms. This is intractable as well, but usually less expensive than the flee expectation values. As a result, the exact version of the BM learning algorithm cannot be applied to practical problems.
546
H.J. Kappen
We therefore apply the MF approximation as discussed in the previous section. Due to the symmetric weights, the Boltzmann Machine is an equilibrium system and we can improve on our estimates of the correlations between neurons, Eq. (33), using the linear response theorem [40]. The starting point is to observe the exact relations (si) - ~ log_____fZ,
(40)
O0i ~2 log Z Zij =
~Oi~Oj
,
(41)
which follow immediately from the definition of Z. We can combine these equations and obtain O(Si)
~i~ = ~0;
Omi
~ ~0~
(42)
Thus, the correlations are given by the derivative of the equilibrium firing rates with respect to the thresholds. In the last step we have replaced these firing rates by their MF estimates, Eq. (31). We can compute the right-hand side of Eq. (42) from Eq. (31). Having obtained estimates for the statistics, this basically solves the learning problem. For arbitrary wij and 0 i w e can compute the mean firing rates and correlations (both clamped and free) and insert these values into the learning rule Eq. (37). The situation is particularly simple in the absence of hidden units, s In this case, (')c does not depend on wij and 0i and are simply given by the statistics of the data: If the data consist of p patterns with equal probability, s~, ~t = 1 , . . . , p , then ' Thus our task is to find Wij and 0i such that (Si) c = P1~ , s~ and (SiSj) c = p1~ o s~~s). the (MF approximations of the) free mean firing rates and correlations are equal to (si)~ and (sisj)~, respectively: mi--
(43)
(Si)c ,
~,ij -- (SiSj)c --
mimj, i r j.
(44)
Eqs. (43) and (44) are n + 89n(n - 1) equations with an equal number of unknowns wij and 0i and can be solved using standard numerical routines. We can however, make a significant improvement in the learning procedure when we observe that the TAP term in Eq. (31) represents a self-coupling to neuron i. Instead of using the TAP approximation to relate this self-coupling to the off-diagonal weights wij, we propose to introduce additional parameters, diagnonal weights wii, which we estimate in the learning process. We therefore need n additional equations for learning, for which we propose Zii = 1 - m2i . This equation is true by definition for the exact Z, but becomes an additional constraint on wij and 0i when Z is the linear response approximation Eq. (42). Thus our basic equations become 8
The following discussion can be extended to hidden units using an EM-type of iteration procedure.
An introduction to stochastic neural networks
mi ~,~1
547
0,) ~Oj__
~ij
(46)
= ~mi -- 1 -- m 2 -- wij.
Note, that the sum over j in the equation for mi now also includes a contribution wiimi. From Eqs. (43)-(46) we can compute the solution for wij and 0i in closed form: mi = (Si)c,
(47)
c~j = (sisj) c - ( S ~ ) c ( S j ) c ,
(48)
~)ij
wij = 1 - m
(C_ 1
2
(49)
)ij,
Oi -- tanh -1 (mi) -- f i wijmj. j--1 7.3. C l a s s i f i c a t i o n
(50)
of digits
We demonstrate the quality of the above MF approximation for Boltzmann Machine learning on a digit recognition problem. The data consists of 11,000 examples of handwritten digits (0-9) compiled by the US Postal Service Office of Advanced Technology. The examples are preprocessed to produce 8 x 8 binary images. Some examples are shown in Fig. 8. Our approach is to model each of the digits with a separate Boltzmann Machine. For each digit, we use 700 patterns for training using the approach outlined above. We thus obtain 10 Boltzmann distributions log p(s1141 ~) - - E ( s l W ~) - log z(w~),
a = 0 , . . . , 9,
where W~ - ( w ~ , 0~) are the weights and thresholds for digit a. We then test the performance of these models on a classification task using the same 700 training patterns per digit as well as the 400 test patterns per digit. We classify each pattern to the model ~ with the highest probability. The normalization log Z(W ~) is intractable and depends on a and therefore affects classification. We use its MF approximation given by [41,42]
1
log Z - --~ E wijmimj - Z Oimi u i
1Z((1
2 i
+
mi)log(1
+ mi) + (1
-
mi)log(1 - mi))
The correlation matrix cij in Eq. (48) is (close to) singular. This results in very large weights in Eq. (49) and we should question the validity of the MF approximation. We propose to solve this problem by adding a flat distribution to the training data:
H.J. Kappen
548
ra
, , '.,g
II-.
/ .o. 72 w m.ii
Fig. 8. Sample of patterns of the 8 • 8 handwritten digits of the US Postal Service Office of Advanced Technology. In each row from left to right: the mean digit per class, a nice example and two rather bad examples.
1
q~ -+ (1 - k)q~ + ~ 2-;'
(51)
(Si)c -+ (1 - k)(Si)c,
(52)
(sisj) c ~
(53)
(1
-
~)(SiSj) c + ~'~ij"
In Fig. 9 we show the result o f the B o l t z m a n n M a c h i n e classifier as a function of ~. W e see that the classification error d e p e n d s strongly on the value o f ~. H o w e v e r , there is no overfitting effect in the sense that a value that is optimal on the training set is also optimal on the test set. The optimal ~ on the training set is ~ = 0.24. The classification error on the test set for this value of ~ is 4.62%. In [43,44] this classification p r o b l e m is used on the same data to c o m p a r e a n u m b e r o f algorithms. The
549
An introduction to stochastic neural networks
0.12 - - trainl
- test I
0.1 ,_
0
W
/
0.08 \
0.06
/ "\
./"
0.04 0.02
0
0.5
1
Z,
Fig. 9.
Classification error of the Boltzmann Machine on the handwritten digits as a function of )v.
Table 1 Classification error rates for the test data set of handwritten digits. The first tree were reported by [43], the fourth was reported in [44] Nearest neighbor Back-propagation Wake-sleep Sigmoid belief Boltzmann machine
6.7% 5.6 % 4.8% 4.6% 4.6%
r e p o r t e d e r r o r rates on the test set are s u m m a r i z e d in Table 1. The result o b t a i n e d with the b a c k p r o p a g a t i o n m e t h o d is rather competitive: I tried to r e p r o d u c e it a n d it requires extensive training times a n d the result is n o t so g o o d in all runs. The three best m e t h o d s in Table 1 are all unsupervised methods. T h e y do density estimation on each o f the classes separately and are not optimized for classification. Therefore, it is e n c o u r a g i n g that these m e t h o d s are capable o f o u t p e r f o r m i n g the multi-layered perceptron. The B o l t z m a n n M a c h i n e yields as g o o d p e r f o r m a n c e as the best unsupervised m e t h o d k n o w n on this data. The m a i n a d v a n t a g e o f the Boltzrnann M a c h i n e is that no hidden structure is needed in contrast to all the other m e t h o d s in Table 1 except for the nearest n e i g h b o r m e t h o d . As a result, the B o l t z m a n n M a c h i n e solution is trained and tested in several minutes, whereas the other m e t h o d s require several hours. 9
A comparison on a larger OCR problem was done in [45] which yields the same conclusion regarding the unsupervised methods. In this case, however, significant improvements have been reported using supervised methods (see h t t p " //www. research, art. com/yann/ocr/mnist / index, html).
550
H.J. Kappen
Abbreviations BM, Boltzmann Machine Eq., equation Fig., figure MC, Monte Carlo MF, mean field PSP, Post Synaptic Potential RMS, root mean square SK-model, Sherrington-Kirkpatrick model TAP-equation, Thouless-Anderson-Palmer equation
Acknowledgements I would like to thank Wim Wiegerinck and Tom Heskes for useful discussions. This research was funded in part by the Dutch Technology Foundation (STW).
Appendix A. T A P equations
In this appendix we present the main steps to derive the TAP equations Eq. (31). We start with the computation of the derivatives in Eq. (29):
e(si) [ eOj q -where mi.q Similarly,
-
-
8(si) [' _
Z
~p(s) ~Oj [q tanh(O q) + q(s)(1 - m2i.q)~ij-- ( 1 - m2i,q)~ij
s
tanh(O q) is the mean firing rate of neuron i in the factorized model q.
(1 -
OWjk q
Using
m2q)~)ijmk.q.
mi = mi.p mi.q because of Eq. (28) we obtain to lowest order =
(A.1) This is equivalent to mi tanh ( ~-~'~jwijmj + ~). In a similar way one computes the second-order derivatives and the result is -
-
~2 E (Si) ] dOj dO, - - 2 mi(1 - m2)(dO)~, jk ~Oj~Ok q
~2Swk,(si>,1,dOj dwk, -- (1 - m2i)Z ((1 - m2) dOg.- 2mimj dO/) dwij, Z 8Oj jkl
q
j
551
An introduction to stochastic neural networks ~2
2 (Si) ] dwjkdw,m - ( 1 m 2) Z ( ( 1 m2)mjdwkjdwik jklm ~Wjk~Wlm q jk + (1 -- mj2)mk dwjk dwij - 2 mi (sjsk) dwij dwik).
Substituting this into Eq. (29) we obtain 0 -- d m i -
(1 - m 2)
i - miA 2 + Z ( 1 j
- m})wijAj - mi Z
w2"(1 - mj)
,
J
w h e r e we h a v e d e f i n e d Ai - d O / § ~-~j dwijmj. Since Ai - 0 § O(w2), according to
Eq. (A. 1), we obtain Ai - mi Z
w2(1 -
2 § mj)
O(W 3)
J
which is equivalent to Eq. (31). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
McCulloch, W.S. and Pitts, W. (1943) Bull. Math. Biophys. 5, 114-133. Rosenblatt, F. (1958) Psychol. Rev. 65, 386-408. Amari, S. (1967) IEEE Trans. on Electron. Comput. 16, 299-307. Rumelhart, D., Hinton, G. and Williams, R. (1986) Nature 323, 533-536. Hopfield, J. (1982) Proc. Nat. Acad. Sci. USA. 79, 2554-2558. Kohonen, T. (1982) Biol. Cybern. 43, 59-69. Sompolinsky, H. and Kanter, I. (1986) Phys. Rev. Lett. 57, 2861-2864. Ackley, D. Hinton, G. and Sejnowski (1985) T. Cog. Sci. 9, 147-169. Amit, D.J., Gutfreund, H. and Sompolinsky, H. (1985) Phys. Rev. A 32, 1007. Amit, D.J., Gutfreund, H. and Sompolinsky, H. (1985) Phys. Rev. Lett. 55, 1530-1533. Mason, A., Nicoll, A, and Stratford, K. (1990) J. Neurosci. 11, 72-84. Kats, B. (1966) Nerve, muscle and Synapse. McGraw-Hill, New York. Koch, Chr. (1999) Biophysics of Computation. Oxford University Press, Oxford. Abbott, L.F., Varela, J.A., Sen, K. and Nelson, S.B. (1997) Science 220-224. Markram, H. and Tsodyks, M. (1996) Nature 807-810. Rail, W. and Rinzel, J. (1973) Biophys. J. 648-688. Little, W.A. (1974) Math. Biosci. 19, 101-120. Peretto, P. (1992) An Introduction to the Modeling of Neural Networks. Cambridge University Press, Cambridge. Kappen, H.J. (1997) Phys. Rev. E, 55, 5849-5858; SNN-96-044, F-96-094. Grimmett, G.R. and Stirzaker, D.R. (1992) Probability and Random Processes. Clarendon Press, Oxford. Saul, L. and Jordan, M.I. (1994) Neural Comput. 6, 1174-1184. Onsager, L. (1936) J. Amer. Chem. Soc. 58, 1486-1493. Sherrington, D. and Kirkpatrick, S. (1975) Phys. Rev. Lett. 35, 1792-1796. Toulouse, G. (1977) Comm. Phys. 2, 115-119. Dfiring, A., Coolen, A.C.C. and Sherrington, D. (1998) J. Phys. A: Math. General 31, 8607-8621. Amit, D. (1989) Modeling Brain Function. Cambridge University Press, Cambridge. Gutfreund, J.D., Reger, H. and Yound, A.P. (1988) J. Phys. A: Math. General 21, 2775-2797.
552
H.J. Kappen
28. 29. 30. 31.
Crisanti, A. and Sompolinsky, H. (1988) Phys. Rev. A 37, 4865-4874. Nfitzel, K. and Krey, U. (1993) J. Phys. A: Math. General 26, L591-L597. Eissfeller, H. and Opper, M. (1994) Phys. Rev. E 50, 709-720. Goles, E. and Vichniac, G.Y. (1986) in: Proceddings AlP conference, ed J.S. Denker. pp. 165-181, American Institute of Physics. Crisanti, A., Falcioni, M. and Vulpiani, A. (1993) J. Phys. A: Math. General 26, 3441-3453. Plefka, T. (1982) J. Phys. A 15, 1971-1978. Amari, S.-I. (1992) IEEE Trans. Neural Networks 3, 260-271. Tanaka, T. (1999) in: Advances in Neural Information Processing Systems 11, eds M.S. Kearns, S.A. Solla and D.A. pp. 351-357, Cohn MIT Press, Cambridge. Thouless, D.J., Anderson, P.W. and Palmer, R.G. (1977) Philos. Mag. 35, 593-601. Kelso, S.R., Ganong, A.H. and Brouwn, T.H. (1986) Proc. Natl. Acad. Sci. 83, 5326-5330. Kullback, S. (1959) Information Theory and Statistics. Wiley, New York. Hertz, J., Krogh, A. and Palmer, R. (1991) Introduction to the Theory of Neural Computation, Santa Fe Institute, Vol. 1. Addison-Wesley, Redwood City. Parisi, G. (1988) Statistical Field Theory. Frontiers in Physics. Addison-Wesley, Reading, MA. Kappen H.J. and Rodrguez F.B. (1999) in: Advances in Neural Information Processing Systems 11, eds M.S. Kearns, S.A. Solla and D.A. Cohn. pp. 280-286, MIT Press. Kappen, H.J. and Rodrguez F.B. (1998) Neural Comput. 10, 1137-1156. Hinton, G.E., Dayan, P., Frey, B.J. and Neal, R.M. (1995) Science 268, 1158-1161. Saul, L.K., Jaakkola, T. and Jordan, M.I.J. (1996) Artificial Intell. Res. 4, 61-76. Leisink, M. and Kappen, H.J. (2000) in: Proceedings IJCNN Submitted.
32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
CHAPTER 14
Statistical Mechanics of Recurrent Neural Networks I - Statics
A.C.C. COOLEN Department of Mathematics, King's College London, Strand, London WC2R 2LS, UK
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
553
Contents 1.
Introduction
2.
Definitions and properties of microscopic laws
.................................................
555
............................
2.1. Stochastic dynamics o f neuronal firing states
..........................
558
2.2. Synaptic symmetry and L y a p u n o v functions
..........................
562
2.3. Detailed balance and equilibrium statistical mechanics 3.
4.
....................
6.
7.
565
Simple recurrent networks with binary neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . .
569
3.1. Networks with u n i f o r m synapses
569
.................................
3.2. P h e n o m e n o l o g y of Hopfield models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
572
3.3. Analysis of Hopfield models away from saturation
577
Simple recurrent networks o f coupled oscillators
......................
...........................
4.1. Coupled oscillators with u n i f o r m synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.
557
583 583
4.2. Coupled oscillator attractor networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
585
Networks with Gaussian distributed synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
592
5.1.
Replica analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
592
5.2.
Replica-symmetric solution and AT-instability
.........................
The Hopfield model near saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
595 600
6.1.
Replica analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
600
6.2.
Replica symmetric solution and AT-instability
606
Epilogue
.........................
...................................................
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References
.....................................................
554
615 617 617
I. Introduction
Statistical mechanics deals with large systems of stochastically interacting microscopic elements (particles, atomic magnets, polymers, etc.). The strategy of statistical mechanics is to abandon any ambition to solve models of such systems at the microscopic level of individual elements, but to use the microscopic laws to calculate equations describing the behavior of a suitably chosen set of macroscopic observables. The toolbox of statistical mechanics consists of methods to perform this reduction from the microscopic to a macroscopic level, which are all based on efficient ways to do the bookkeeping of probabilities. The experience and intuition that have been built up over the last century tells us what to expect, and serves as a guide in finding the macroscopic observables and in seeing the difference between relevant mathematical subtleties and irrelevant ones. As in any statistical theory, clean and transparent mathematical laws can be expected to emerge only for large (preferably infinitely large) systems. In this limit one often encounters phase transitions, i.e. drastic changes in the system's macroscopic behavior at specific values of global control parameters. Recurrent neural networks, i.e. neural networks with synaptic feedback loops, appear to meet the criteria for statistical mechanics to apply, provided we indeed restrict ourselves to large systems. Here the microscopic stochastic dynamical variables are the firing states of the neurons or their membrane potentials, and one is mostly interested in quantities such as average state correlations and global information processing quality, which are indeed measured by macroscopic observables. In contrast to layered networks, one cannot simply write down the values of successive neuron states for models of recurrent neural networks; here they must be solved from (mostly stochastic) coupled dynamic equations. Under special conditions ('detailed balance'), which usually translate into the requirement of synaptic symmetry, the stochastic process of evolving neuron states leads towards an equilibrium situation where the microscopic state probabilities are known, and where the techniques of equilibrium statistical mechanics can be applied in one form or another. The equilibrium distribution found, however, will not always be of the conventional Boltzmann form. For nonsymmetric networks, where the asymptotic (stationary) statistics are not known, dynamical techniques from nonequilibrium statistical mechanics are the only tools available for analysis. The 'natural' set of macroscopic quantities (or 'order parameters') to be calculated can be defined in practice as the smallest set which will obey closed deterministic equations in the limit of an infinitely large network. Being high-dimensional nonlinear systems with extensive feedback, the dynamics of recurrent neural networks are generally dominated by a wealth of attractors (fixed-point attractors, limit-cycles, or even more exotic types), and the 555
556
A. C. C. Coolen
/
Fig. 1. Information processing by recurrent neural networks through the creation and manipulation of attractors in state space. Patterns stored: the microscopic states e. If the synapses are symmetric we will generally find that the attractors will have to be fixed-points (left picture). With non-symmetric synapses, the attractors can also be sequences of microscopic states (right picture).
practical use of recurrent neural networks (in both biology and engineering) lies in the potential for creation and manipulation of these attractors through adaptation of the network parameters (synapses and thresholds). Input fed into a recurrent neural network usually serves to induce a specific initial configuration (or firing pattern) of the neurons, which serves as a cue, and the 'output' is given by the (static or dynamic) attractor which has been triggered by this cue. The most familiar types of recurrent neural network models, where the idea of creating and manipulating attractors has been worked out and applied explicitly, are the socalled attractor neural networks for associative memory, designed to store and retrieve information in the form of neuronal firing patterns and/or sequences of neuronal firing patterns. Each pattern to be stored is represented as a microscopic state vector. One then constructs synapses and thresholds such that the dominant attractors of the network are precisely the pattern vectors (in the case of static recall), or where, alternatively, they are trajectories in which the patterns are successively generated microscopic system states. From an initial configuration (the 'cue', or input pattern to be recognized) the system is allowed to evolve in time autonomously, and the final state (or trajectory) reached can be interpreted as the pattern (or pattern sequence) recognized by network from the input (see Fig. 1). For such programs to work one clearly needs recurrent neural networks with extensive 'ergodicity breaking': the state vector will during the course of the dynamics (at least on finite time-scales) have to be confined to a restricted region of state space (an 'ergodic component'), the location of which is to depend strongly on the initial conditions. Hence our interest will mainly be in systems with many attractors. This, in turn, has implications at a theoretical/mathematical level: solving models of recurrent neural networks with extensively many attractors requires advanced tools from disordered systems theory, such as replica theory (statics) and generating functional analysis (dynamics). It will turn out that a crucial issue is whether or not the synapses are symmetric. Firstly, synaptic asymmetry is found to
Statistical mechanics of recurrent neural networks I - statics
557
rule out microscopic equilibrium, which has implications for the mathematical techniques which are available: studying models of recurrent networks with nonsymmetric synapses requires solving the dynamics, even if one is only interested in the stationary state. Secondly, the degree of synaptic asymmetry turns out to be a deciding factor in determining to what extent the dynamics will be glassy, i.e. extremely slow and nontrivial, close to saturation (where one has an extensive number of attractors). In this paper (on statics) and its sequel (on dynamics) I will discuss only the statistical mechanical analysis of neuronal firing processes in recurrent networks with static synapses, i.e. network operation as opposed to network learning. I will also restrict myself to networks with either full or randomly diluted connectivity, the area in which the main progress has been made during the last few decades. Apart from these restrictions, the text aims to be reasonably comprehensive and selfcontained. Even within the confined area of the operation of recurrent neural networks a truly impressive amount has been achieved, and many of the constraints on mathematical models which were once thought to be essential for retaining solvability but which were regrettable from a biological point of view (such as synaptic symmetry, binary neuron states, instantaneous neuronal communication, a small number of attractors, etc.) have by now been removed with success. At the beginning of the new millennium we know much more about the dynamics and statics of recurrent neural networks than ever before. I aim to cover in a more or less unified manner the most important models and techniques which have been launched over the years, ranging from simple symmetric and non-symmetric networks with only a finite number of attractors, to the more complicated ones with an extensive number, and I will explain in detail the techniques which have been designed and used to solve them. In the present paper I will first discuss and solve various members of the simplest class of models: those where all synapses are the same. Then I turn to the Hopfield model, which is the archetypical model to describe the functioning of symmetric neural networks as associative memories (away from saturation, where the number of attractors is finite), and to a coupled oscillator model storing phase patterns (again away from saturation). Next I will discuss a model with Gaussian synapses, where the number of attractors diverges, in order to introduce the so-called replica method, followed by a section on the solution of the Hopfield model near saturation. I close this paper with a guide to further references and an assessment of the past and future deliverables of the equilibrium statistical mechanical analysis of recurrent neural networks.
2. Definitions and properties of microscopic laws In this section I define the most common microscopic models for recurrent neural networks, I show how one can derive the corresponding descriptions of the stochastic evolution in terms of evolving state probabilities, and I discuss some fundamental statistical mechanical properties.
558
A. C. C. Coolen
2.1. Stochastic dynamics of neuronal firing states 2.1.1. Microscopic definitions for binary neurons The simplest nontrivial definition of a recurrent neural network is that where N binary neurons cric { - 1 , 1} (in which the states '1' and ' - 1 ' represent firing and rest, respectively) respond iteratively and synchronously to post-synaptic potentials (or local fields) hi(or), with a = (CYl,...,CYu). The fields are assumed to depend linearly on the instantaneous neuron states:
Parallel: cyi(g + 1) - sgn[hi(~(g) ) + rn~(e)],
hi(a) -- ~
Jij(Yj + Oi.
(1)
J The stochasticity is in the independent random numbers rli(g)E ~ (representing threshold noise), which are all drawn according to some distribution w(q). The parameter T is introduced to control the amount of noise. For T = 0 the process (1) is deterministic: 13"i(g -3I- 1) = sgn[hi(~(g))]. The opposite extreme is choosing T = oc, here the system evolution is fully random. The external fields 0i represent neural thresholds and/or external stimuli, Jij represents the synaptic efficacy at the junction j --~ i (Jij > 0 implies excitation, Jij < 0 inhibition). Alternatively we could decide that at each iteration step g only a single randomly drawn neuron (Yit is t o undergo an update of the type (1):
Sequential: i r ie- (3"i(g+ 1) - (3"i(g)
(2)
i = i~: ~;(g + 1) = sgn[hi(n(g)) + Tq~(g)] with the local fields as in (1). The stochasticity is now both in the independent random numbers rl;(g) (the threshold noise) and in the site it to be updated, drawn randomly from the set {1,... ,N}. For simplicity we assume w ( - q ) = w(rl), and define
g[z] -
2
f0-drlw(rl)
9 g[-z] - -g[z],
:~lim g[z] - +1,
d
-~zzg[Z] ) 0
Popular choices for the threshold noise distributions are l
lrl2
w ( r l ) - (2~)-~e -~
9 g[z]- Erf[z/x/~],
w(rl) - 89[1 - tanh2(q)] 9 g[z] - tanh(z).
2.1.2. From stochastic equations to evolving probabifities From the microscopic Eqs. (1) and (2), which are suitable we can derive an equivalent but mathematically more terms of microscopic state probabilities pe(~). Eqs. (1) system state ~(g) is given, a neuron i to be updated will Prob[cri(g + 1)] = 89 + cy~(g + 1)g[f3hi(c~(g))]]
for numerical simulations, convenient description in and (2) state that, if the obey
(3)
559
Statistical mechanics of recurrent neural networks I - statics
with 13 = T -1 . In the case (1) this rule applies to all neurons, and thus we simply get Pe+l (6) - 1--IN1 89 [1 + c~i g[~hi(6(g))]]. If, on the other hand, instead of 6(g) only the probability distribution pe(6) is given, this expression for pe+l (6) is to be averaged over the possible states at time g:
Parallel: N -
-
6/
17[
1[1 + cyzg[13hi(C)]]
(4)
i-- 1
This is the standard representation of a Markov chain. Also the sequential process (2) can be formulated in terms of probabilities, but here expression (3) applies only to the randomly drawn candidate it. After averaging over all possible realizations of the sites ie we obtain:
l
(with the Kronecker symbol: 6ij = 1 if i = j, 8ij = 0 otherwise). If, instead of 6(g), the probabilities pg(6) are given, this expression is to be averaged over the possible states at time g, with the result: Pe+I (6) -- ~1 E . ~1 [1 +
(Yi g[~hi(a)]]pe(6)
l
1 [1 + + ~1 E. -2
(Yi g[~hi(Fi6)]]pe(Fi6)
1
with the state-flip operators F~r = (I)(Cyl,...,cYi-1,--cY;,eYi+I,...,CYN). This equation can again be written in the standard form pe+l (6) = ~'~6, W[6; C]pe(6'), but now with the transition matrix
Sequential: 1 m [ 6 ; 6 t] -- ~ 6 , 6 ' - ~ - ~ Z { w i ( F / 6 ) ~ 6 , F / 6 i
where 66,6, = I-[i 6(Yi,(YI
, --wi(6)~6,6,},
(5)
and
wi(6) - 89I1 - cri tanh[[3hi(6)]].
(6)
Note that, as soon as T > 0, the two transition matrices W[6; 6'] in (4) and (5) both describe ergodic systems: from any initial state C one can reach any final state 6 with nonzero probability in a finite number of steps (being one in the parallel case, and N in the sequential case). It now follows from the standard theory of stochastic processes (see e.g. [1,2]) that in both cases the system evolves towards a unique stationary distribution p ~ ( 6 ) , where all probabilities p ~ ( 6 ) are nonzero.
2.1.3. From discrete to continuous times The above processes have the (mathematically and biologically) less appealing property that time is measured in discrete units. For the sequential case we will now
A. C. C. Coolen
560
assume that the duration of each of the iteration steps is a continuous random number (for parallel dynamics this would make little sense, since all updates would still be made in full synchrony). The statistics of the durations is described by a function 7t~(t), defined as the probability that at time t precisely g updates have been made. Upon denoting the previous discrete-time probabilities as/~e(~), our new process (which now includes the randomness in step duration) will be described by
p t ( u ) - Z rt~(t)/~(u)- Z r o t ( t ) Z We[u; C]po(C) g>~0
g>~0
o'
and time has become a continuous variable. For roe(t) we make the Poisson choice rtt(t) - ~. (~)%-t/A. From (g)~ - t/A and (g2)~ _ t/A + t2/A 2 it follows that A is the average duration of an iteration step, and that the relative deviation in g at a given t vanishes for A---, 0 as V/(g2)~- (g)2/(g)~ = x/~/t. The nice properties of the Poisson distribution under temporal derivation allow us to derive: d A ~p,(6) - Z
W[6; 6']pt(6') - pt(6).
61
For sequential dynamics we choose A - ~ so that, as in the parallel case, in one time unit each neuron will on average be updated once. The master equation corresponding to (5) acquires the form d ~ p t ( 6 ) - Z{wi(F,.6)pt(F,.6) - wi(a)pt(6)}.
(7)
i
The wi(6) (6) now play the role of transition rates. The choice A - ~
implies
I
V/(g2)~ _ (g)2/(g)~ _ x/~/Nt ' so we will still for N ~ cr no longer have uncertainty ,v in where we are on the t axis.
2.1.4. Microscopic definitions for continuous neurons Alternatively, we could start with continuous neuronal variables (3"i (representing e.g. firing frequencies or oscillator phases), where i = 1,... ,N, and with stochastic equations of the form
cyi(t + A) - r
+ A~(6(t)) + x/2 TA~(t).
(8)
Here we have introduced (as yet unspecified) deterministic state-dependent forces f/.(6), and uncorrelated Gaussian distributed random forces ~/(t) (the noise), with (~i(t)) = 0 and (~i(t)~j(t')) = 8ijgt,t,. As before, the parameter T controls the amount of noise in the system, ranging from T = 0 (deterministic dynamics) to T - - e ~ (completely random dynamics). If we take the limit A --+ 0 in (8) we find a Langevin equation (with a continuous time variable): d dt cr/(t) - f-(6(t)) + rl;(t).
(9)
This equation acquires its meaning only as the limit A ~ 0 of (8). The moments of the new noise variables r l i ( t ) = ~i(t)v/2T/A in (9) are given by ( r l i ( t ) ) - 0 and
561
Stat&tical mechanics of recurrent neural networks I - statics
(qi(t)qj(t')) = 2TSijS(t- t'). This can be derived from the moments of the
~i(t). For
instance:
{qi(t)qT(t'))
2•
= limo~(~i(t)~j(t'))
-
1
2 T8i7 lirnox St,t, - 2 TCSijS(t- t').
The constant C is found by summing over t', before taking the limit A ~ 0, in the above equation: oo
0(3
dt'(qi(t)qj(t')) - lim 2 T E A--+0
(~i(t)~J (t')) - 2 T8i7 lim~ E
tt=--cx~
8t,r - 2 TSij.
t' =-oo
Thus C - 1, which indeed implies (qi(t)qj(t'))- 2TSijS(t- t'). More directly, one can also calculate the moment generating function
l e x p ( i / d t E ~ i ( it ) r l ( t ) ) )i- l i m o ~ i t
.
/ vdz / ~ e x p ( - ~ lz2
-+-iz~i(t)~
= lim 171 e-rA*~(t) = e-r f dt ~-]~*~(t).
) (1 O)
A~O t,t .
2.1.5. From stochastic equations to evolving probabilities A mathematically more convenient description of the process (9) is provided by the Fokker-Planck equation for the microscopic state probability density pt(ff) = (5[ff -- ~(t)]), which we will now derive. For the discrete-time process (8) we expand the 8-distribution in the definition ofpt+A(a) (in a distributional sense):
Pt+a(")-Pt(~)- /8 [r ~- -- E
or(t)- Af (r ~
0
~~(t)]
/ 8[" -- l~(t)] [ A f i ( . ( t ) )
)-
(8[r
or(t)])
+ 42 TA~i(t)] )
i ~2
+ TA E.. lj
O(yiO(yj (8[t~ -
t~(t)]~i(t)~j(t)) + (_9(A))
The variables a(t) depend only on noise variables ~y(t') with t' < t, so that for any function A: (A[a(t)]~i(t)) - (A[c~(t)])(~i(t)) - O, and (A[a(t)]~i(t)~y(t)) = 8iy(A[a(t)]). As a consequence: 1
0
[Pt+A(a) -- pt(a)] = -- E ~
(8[a -- a(t)JJ~(a(t)))
i 02 l
0 =
02
+rZ
-
i
i
+ o(a
)
562
A . C . C . Coolen
By taking the limit A ~ 0 we then arrive at the Fokker-Planck equation: a dpt(l~)-
_ Z
~
~)2 [pt(~)fi(~)]
i
_ql_TZ_~2iPt(l~)" i
(11)
2.1.6. Examples: graded response neurons and coupled oscillators In the case of graded response neurons the continuous variable O"i represents the membrane potential of neuron i, and (in their simplest form) the deterministic forces are given by f ( ~ ) - y'~4Jqtanh[7~j]- ~; + 0i, with ? > 0 and with the 0i representing injected currents. Conventional notation is restored by putting (Yi ~ Ui. Thus equation (9) specializes to d
d---~tui(t) -- Z Jij tanh[Tuj(t)] - ui(t) + Oi -q- rli(t).
(12)
J One often chooses T = 0 (i.e. rl;(t) = 0), the rationale being that threshold noise is already assumed to have been incorporated via the nonlinearity in (12). In our second example the variables 13"i represent the phases of coupled neural oscillators, with forces of the form f . ( a ) - ~ j J i j s i n ( ~ j - ~i)+coi. Individual synapses Ji~ now try to enforce either pair-wise synchronization (Jij > O) or pair-wise antisynchronization (Jij < 0), and the f.l)i represent the natural frequencies of the individual oscillators. Conventional notation dictates (3"i --+ ~)i, giving d dt qbi(t) - o3i + Z Jij sin[qbj(t) - ~i(t)]-~- rli(t ). J
(13)
2.2. Synaptic symmetry and Lyapunov functions 2.2.1. No&e-free symmetric networks of b&ary neurons In the deterministic limit T ~ 0 the rules (1) for networks of synchronously evolving binary neurons reduce to the deterministic map (14)
cyi(g + 1) - sgn[hi(t~(g))].
It turns out that for systems with symmetric interactions, Ji2 - Jji for all (ij), one can construct a Lyapunov function, i.e. a function of ty which during the dynamics decreases monotonically and is bounded from below (see e.g. [3]):
Binary & Parallel: Z[6] -- - ~
Ihi(6)[- ~ i
(YiOi 9
i
Clearly L>~ - ~-~i[~-~j [Jij] + 10i[] - ~ i p0/[. During iteration of (14) we find:
(15)
Stat&tical mechanics of recurrent neural networks I - statics
563
Oi[(Yi(e -[- 1) - cr/(g)]
-- Z i = -- Z
(Yi(g)hi(~(g AV l))
[hi(~(e Av l))[ Av Z i
i
-- - Z
Ihi(~(g + 1))1[1 - r
+ 2)cyi(g)] ~<0
i
(where we used (14) and Jij = Jji). So L decreases monotonically until a stage is reached where ~i(g + 2 ) = c~;(g) for all i. Thus, with symmetric interactions this system will in the deterministic limit always end up in a limit cycle with period ~<2. A similar result is found for networks with binary neurons and sequential dynamics. In the limit T ~ 0 the rules (2) reduce to the map
cri(g + 1) = ~i,iesgn[hi(g(g))] + [1 - ~i,ie](yi(g)
(16)
(in which we still have randomness in the choice of site to be updated). For systems with symmetric interactions and without self-interactions, i.e. Jii = 0 for all i, we again find a Lyapunov function:
Binary & Sequential: 1
L[~] -- - ~ ~ . . ~iJijcYJ -- Z U
(17)
cYiOi" i
This quantity is bounded from below: L>~ - 1 ~-~ij [Jij[- ~ i [0i[. Upon calling the site ie selected for update at step g simply i, the change in L during iteration of (16) can be written as:
L[6(g + 1)] - L[6(g)] - -0i[(Yi(~ Av 1) - (3i(g)] 1
2 ~ Jik[cYi(g + 1)r
+ 1) - cYi(g)cYk(g)]
k
1
2 Z Jji[~Yj(g + l)o'i(g -+- 1) - ~ j ( g ) ~ i ( g ) ] J
-
-[hi(~(e))l[!
-
+
Here we used (16), Jij = Jji, and absence of self-interactions. Thus L decreases monotonically until ~i(t + 1) = ~i(t) for all i. With symmetric synapses, but without diagonal terms, the sequentially evolving binary neurons system will in the deterministic limit always end up in a stationary state.
A. C. C. Coolen
564
2.2.2. Noise-free symmetric networks of continuous neurons One can derive similar results for models with continuous variables. Firstly, in the deterministic limit the graded response equations (12) simplify to
d dt ui(t) -- Z Jij tanh[Tuj(t)] - ui(t) + Oi. J
(18)
Symmetric networks again admit a Lyapunov function (there is no need to eliminate self- in teracti ons):
Graded response: 1
L[u] - - ~ Z Jij tanh[yui] tanh[yuj] ij
+ ~ . I 7f
Igidvv[1 - tanh2[Tv]] - Oitanh[Tu,]
Clearly L >~ - 89y~'~;jI J o ] - ~ ; 10i[ (the term in L[u] with the integral is nonnegative). During the noise-free dynamics (18) one can use the identity ~L/~ui-711 - tanh2[Tui]](du~/dt), valid only when Jij - J j ; , to derive
d OLdui = _ T Z [ 1 _ tanh2[Tu/] ] [ d~ 1ui2 ~<0. _dtL-Z~uidt__~ i
i
Again L is found to decrease monotonically, until d u i / d t - 0 for all i, i.e. until we are at a fixed-point. Finally, the coupled oscillator equations (13) reduce in the noise-free limit to
d dt ~pi(t) - oi + Z Jii sin[~j(t) - ~/(t)].
(19)
J Note that self-interactions Jii always drop out automatically. For symmetric oscillator networks, a construction of the type followed for the graded response equations would lead us to propose
Coupled oscillators: 1
Z[~]-
(20)
- ~ E Jij cos[(~i - ~)j] - Z o)i~)i. u i
This function indeed decreases monotonically, due ~ L - Z . ~)i dt 1
to ~L/~dpi-
-ddpi/dt:
1
. d--t~)i 40.
In fact (19) describes gradient descent on the surface L[~]. However, due to the term with the natural frequencies oi the function L[r is not bounded, so it cannot be a Lyapunov function. This could have been expected; when Jiy--0 for all (i,j), for instance, one finds continually increasing phases #pi(t) = #pg(O)+ oit. Removing the
565
Statistical mechanics of recurrent neural networks I - statics
COi, in contrast, gives the bound L >>, - ~-~j [Jij[. Now the system must go to a fixedpoint. In the special case coi = co (N identical natural frequencies) we can transform away the coi by putting ~(t) = ~i(t) + cot, and find the relative phases ~i to go to a fixed-point.
2.3. Detailed balance and equilibrium statistical mechanics 2.3.1. Detailed balance for binary networks The results obtained above indicate that networks with symmetric synapses are a special class. We now show how synaptic symmetry is closely related to the detailed balance property, and derive a number of consequences. An ergodic Markov chain of the form (4) and (5), i.e. Ps
(0") -- ~
W[~; ~']ps
(21)
~,t
is said to obey detailed balance if its (unique) stationary solution pc~(a) has the property (22)
W[~; cr']p~ (a') = Wilt'; a]p~ (~r) for all ~r, a'.
All pc~ (~) which satisfy (22) are stationary solutions of (21), this is easily verified by substitution. The converse is not true. Detailed balance states that, in addition to pc~(a) being stationary, one has equilibrium: there is no net probability current between any two microscopic system states. It is not a trivial matter to investigate systematically for which choices of the threshold noise distribution w(rl) and the synaptic matrix {J;y} detailed balance holds. It can be shown that, apart from trivial cases (e.g. systems with self-interactions only) a Gaussian distribution w(q) will not support detailed balance. Here we will work out details only for the choice w(rl) - 89 [1 - tanhZ(rl)], and for T > 0 (where both discrete systems are ergodic). For parallel dynamics the transition matrix is given in (4), now with g[z] = tanh[z], and the detailed balance condition (22) becomes
Hi cosh[ hi( r')] All
p~(~)
are
=
Hi cosh[ hi( r)]
nonzero
(ergodicity),
for all
so
we
(23)
may
safely
put
p~(~) =
e~[~/~ l--[icosh[f~hi(~)], which, in combination with definition (1) simplifies the detailed balance condition to: cri[J~j - Jig]Cry. for all or, ~'.
K(cr) - K(~r') --- ~
(24)
ij Averaging (24) over all possible ~' gives K(~) -= (K(~'))~, for all ~, i.e. K is a constant, whose value follows from normalizing p ~ (~). So, if detailed balance holds the equilibrium distribution must be:
A. C. C. Coolen
566
P e q ( g ) ~"
e~iOicri H c o s h [ [ 3 h i ( g ) ] .
(25)
i
For symmetric systems detailed balance indeed holds: (25) solves (23), since K(g) = K solves the reduced problem (24). For nonsymmetric systems, however, there can be no equilibrium. For K(g) = K condition (24) becomes ~ij CYi [Jij - Jji] cy~ -- 0 for all g, g' E {-1, 1 }U. For N >I 2 the vector pairs (g, g') span the space of all N • N matrices, so J i j - Jji must be zero. For N = 1 there simply exists no non-symmetric synaptic matrix. In conclusion: for binary networks with parallel dynamics, interaction symmetry implies detailed balance, and vice versa. For sequential dynamics, with w ( r l ) = 89 tanhZ(q)], the transition matrix is given by (5) and the detailed balance condition (22) simplifies to
e-~'~ih'(")p~ (g)
ef~ihi(F'a)pcc (Fig) .
.
.
_
_
cosh[[3h/(Fig)]
cosh[[3hi(g)]
for all g and all i.
Self-interactions Jii, inducing h;(F/g) :/: hi(a), complicate matters. Therefore we first consider systems where all Jii - O. All stationary probabilities p~ (g) being nonzero (ergodicity), we may write:
([;
p~ (g) - exp 13
1
Oil~Yi -Jr--2 Z (YiJij(~YJ -'~ K(g) ir
])
.
(26)
Using relations like ~-~'~k#ZJklF~-(cykcrt)-- ~~kr -- 2cYi ~kr -Jr-Jki]CYk we can simplify the detailed balance condition to K ( F i g ) - K(g) = (Yi ~ k r Jki](Yk for all g and all i. If to this expression we apply the general identity [1 - F i ] f ( g ) = 2~i(crif(g)),,, we find for i-r j: [ F j - 1][F~- 1 ] K ( g ) - - 2 ~icYj[Jiy- Jji]
for all g and all i-r j.
The left-hand side is symmetric under permutation of the pair (i,j), which implies that the interaction matrix must also be symmetric: Jij--Jig for all (i,j). We now find the trivial solution K(g) = K (constant), detailed balance holds and the corresponding equilibrium distribution is Peq(g) ~
e -~"("/,
1
H(g) - - ~ - ' ~ cYiJij~j - ~ OicYi 9 ir 9 i
(27)
In conclusion: for binary networks with sequential dynamics, but without self-interactions, interaction symmetry implies detailed balance, and vice versa. In the case of self-interactions the situation is more complicated. However, here one can still show that nonsymmetric models with detailed balance must be pathological,since the requirements can be met only for very specific choices for the {J j}. 2.3.2. Detailed balance for networks with continuous neurons Let us finally turn to the question of when we find microscopic equilibrium (stationarity without probability currents) in continuous models described by a Fokker-
567
Statistical mechanics of recurrent neural networks I - statics
Planck equation (11). Note that (11) can be seen as a continuity equation for the density of a conserved quantity: dpt(6) + ~ . Ji(~, t) = 0. The components J/(6, t) of the current density are given by Ji(g, t) =
(6) - T
Pt(~)
Stationary distributions p~(6) are those which give ~ i ~J~-(e, e,~, , ~ ) = 0 (divergencefree currents). Detailed balance implies the stronger statement J~(6, ~ ) = 0 for all i (zero currents), so j~(6) = T~ log p ~ ( 6 ) / ~ i , or
f,.(6) = -OH(6)/O~,
poo(e) ~ e -~H(')
(28)
for some H(6), i.e. the forces J}(6) must be conservative. However, one can have conservative forces without a normalizable equilibrium distribution. Just take H(6) = 0, i.e. J](6, t ) = 0: here we have Peq(6)= C, which is not normalizable for ~ c ~ n. For this P2articular case Eq. (11) is solved easily: pt(6)-[4~Tt]-N/2 f de,p0(6,)e-[~-,,'] /4Tt, SO the limit l i m t ~ pt(g) indeed does not exist. One can prove the following (see e.g. [4]). If the forces are conservative and if p~(6) ~ e -~H(~) is normalizable, then it is the unique stationary solution of the Fokker-Planck equation, to which the system converges for all initial distributions p0 C L I [ ~ N] which obey f~u d6 e ~/4(~)p02(6)< ~ . Assessing when our two particular model examples of graded response neurons or coupled oscillators obey detailed balance has thus been reduced mainly to checking whether the associated deterministic forces f-(a) are conservative. Note that conservative forces must obey for all 6, for all i r j : 8j~(6)/~(yj -- ~ f j ( a ) / ~ ( Y i - - -
O.
(29)
In the graded response equations (18) the deterministic forces are j~(u)= ~ j J i j tanh[yuj] - ui + Oi. Here 8f.(u)/~uj - 8fj(u)/~ui = y{Jij[1- tanhZ[yuj] Jji[1 - tanhZ[yu;]}. At u = 0 this reduces to Jij - Jy; = 0, i.e. the interaction matrix must be symmetric. For symmetric matrices we find away from u = 0 : 8 N ( u ) / ~ u j - ~ f j ( u ) / 8 u i - yJij{tanhZ[yui]- tanhZ[yuj.]}. The only way for this to be zero for any u is by having J~j = 0 for all i ~: j, i.e. all neurons are disconnected (in this trivial case the system (18) does indeed obey detailed balance). Network models of interacting graded-response neurons of the type (18) apparently never reach equilibrium, they will always violate detailed balance and exhibit microscopic probability currents. In the case of coupled oscillators (13), where the deterministic forces are fi(#) - ~ j J i j sin[~j - qbi] + c0i one finds the left-hand side of condition (29) to give 8 f ( ~ ) / ~ j - ~ f J ( ~ ) / 8 ~ i = [Jij-Jji]cos[~j-~i]. Requiring this to be zero for any # gives the condition Jgj. = Jji for any i r j. We have already seen that symmetric oscillator networks indeed have conservative forces: f . ( ~ ) - -8H(dp)/8~i, with H ( ~ ) = - 8 9 If in addition we choose all coi = 0 the function H(a) will also be bounded from below, and, although p ~ ( # ) ~ e -l~(*) is still not normalizable on ~ ~ NN, the full 2~-periodicity of the function H(e) now allows us to identify qbi + 2n = qb~ for all i, so that now
A. C.C. Coolen
568
I~1 E [--7I, K]N and f dff e -1~/(*) does exist. Thus symmetric coupled oscillator networks with zero natural frequencies obey detailed balance. In the case of nonzero natural frequencies, in contrast, detailed balance does not hold.
2.3.3. Equilibrium statistical mechanics The above results establish the link with equilibrium statistical mechanics (see e.g. [5,6]). For binary systems with symmetric synapses (in the sequential case: without self-interactions) and with threshold noise distributions of the form w ( q ) - i l l - tanh2(rl)], detailed balance holds and we know the equilibrium distributions. For sequential dynamics it has the Boltzmann form (27) and we can apply standard equilibrium statistical mechanics. The parameter 13 can formally be identified with the inverse 'temperature' in equilibrium, 13- T -1, and the function H(a) is the usual Ising spin Hamiltonian. In particular we can define the partition function Z and the free energy F: 1 e_~.(~ )
Peq(l~) -- Z
Z-
'
Z e -~"(~
H(~) =
1
-2Z
i#j
criJij(~J -- Z
i
Oiui,
F = _[~-1 logZ.
(30) (31)
The free energy can be used as the generating function for equilibrium averages. Taking derivatives with respect to external fields 0i and interactions Jij, for instance, produces (oi) - -OF/OOi and (oioj) = -OF/OJij, whereas equilibrium averages of arbitrary state variables f(tr) can be obtained by adding suitable generating terms to the Hamiltonian: H(~) ~ H(~) + 2f(~), 0C) - lim;~0 i~F/52. In the parallel case (25) we can again formally write the equilibrium probability distribution in the Boltzmann form [7] and define a corresponding partition function 2 and a free energy F: 1
Peq(~) -- ~ e -~('~) ' 2-
Z
e-~/Q(a)'
/-)(~) ---- -- Z i Oil~i -- ~| ~ i. log2cosh[13hi(a)], p -- _~-1
logZ,
(32) (33)
which again serve to generate averages: H(a) ~ / 4 ( a ) + )~f(a), ( f ) - lim;.~0 OP/OX. However, standard thermodynamic relations involving derivation with respect to 13 need no longer be valid, and derivation with respect to fields or interactions generates different types of averages, such as
--~Pl~Oi- ((Yi).qt_ (tanh[13hi(a)]),
-aP/~J,-, = (cy,tanh[13hi(o)]),
i # j ' - ~ f ' / ~ J i j = (cri tanh[13hj(a)] ) + (crj tanh[13hi(a)]). One can use (cyi) = (tanh[13hi(a)]), which can be derived directly from the equilibrium equation poq(a) - y~'~, W[~; a']peq(a'), to simplify the first of these identities.
569
Statistical mechanics of recurrent neural networks I - statics
A connected network of graded-response neurons can never be in an equilibrium state, so our only model example with continuous neuronal variables for which we can set up the equilibrium statistical mechanics formalism is the system of coupled oscillators (13) with symmetric synapses and absent (or uniform) natural frequencies COi. If we define the phases as l~i c [ - g , g] we have again an equilibrium distribution of the Boltzmann form, and we can define the standard thermodynamic quantities: Peq((~) - - ~ 1 e_~H(O ) ,
H(t~) -- - - ~1Z J i j c o s [ ~ i -
~j],
(34)
ij Z . . . .
dO e -~/4(~),
F - - ~ - 1 logZ.
(35)
These generate equilibrium averages in the usual manner. For instance {cos[qb/- qbj]) = - g F / g J i j , whereas averages of arbitrary state variables f ( ~ ) follow, as before, upon introducing suitable generating terms: H(~)---, H ( ~ ) + 2f(~), ( f ) = l i m ~ 0 5F/8~. In this chapter we restrict ourselves to symmetric networks which obey detailed balance, so that we know the equilibrium probability distribution and equilibrium statistical mechanics applies. In the case of sequential dynamics we will accordingly not allow for the presence of self-interactions. 3. Simple recurrent networks with binary neurons
3.1. Networks with uniform synapses We now turn to a simple toy model to show how equilibrium statistical mechanics is used for solving neural network models, and to illustrate similarities and differences between the different dynamics types. We choose uniform infinite-range synapses and zero external fields, and calculate the free energy for the binary systems (1) and (2), parallel and sequential, and with threshold noise distribution w ( r l ) = 12 [1 - tanhZ(q)] 9
Jij - Jji - J / N (i :/= j),
Jii - 0i = 0
forall i.
The free energy is an extensive object, l i m N ~ F I N is finite. For the models (1) and (2) we now obtain: Binary and sequential: lim F / N - - lim (13N)-1 log E
N---+oo
N---<xD
ef3N[1jm2(")]
Binary and parallel: lim F / N - - lim ([3N)-1 log Z
N--+oo
N--+oc
eN[l~176
with the average activity m(~) - ~ ~ cyk. We have to count the number of states with a prescribed average activity m = 2 n / N - 1 (n is the number of neurons i with cyi- 1), in expressions of the form
A.C.C. Coolen
570
.
o
.
.
1
-1
i
f/T
.
i
f/T
m
-1
-2
-2 -1.0
-0.5
010 m
0.5
-3 -1.0
1.0
-0.5
010 m
015
1.0
0 0.0
. . . . . 0.5
1.0
1.5
T/J
Fig. 2. The functions fseq(m)/T (left) and fpar(m)/T (middle) for networks of binary neurons and uniform synapses, and for different choices of the re-scaled interaction strength J / T 5 2, 3 89 1 3 ,_~ (from top to bot(T = 13-1). Left picture (sequential dynamics): J / T = 2, ,_, tom). Middle picture (parallel dynamics): J / T = + I, + 1,-t-1 (from top to bottom, here the free energy is independent of the sign of J). The right picture gives, for J > 0, the location of the nonnegative minimum Offseq(m) and fpar(m) (which is identical to the average activity in thermal equilibrium) as a function of T/J. A phase transition to states with nonzero average activity occurs at T/J = 1.
l l o g Z e N U [ m ( * ) ] - - I 1 o g ~ (N)emU[2n/m-l] . N .=o
N
_- _1 log N
fl
= log 2 +
1
Z
dm e N[l~ 2-c* (m)+U[m]] lim _1 log e NU[m({t)] N~N {t
max { U[m] - c* (m)} mE[-1.1]
with the entropic function c*(m) - 1 (1 + m) log(1 + m) + 1 (1 - m) log(1 - m). In order to get there we used Stirling's formula to obtain the leading term of the factorials (only terms which are exponential in N survive the limit N ~ ~ ) , we converted (for N - - , oo) the s u m m a t i o n over n into an integration over m = 2 n / N - 1 C [-1, 1], and we carried out the integral over m via saddle-point integration (see e.g. [8]). This leads to a saddle-point problem whose solution gives the free energies: lim F I N -
N---~c
lim F I N -
N---~cx~
min fseq(m)
mC[-1,1]
~
min fpar(m),
mE[-1.1]
13fseq(m)- c*(m) - l o g 2
1
-~f3Jm 2
(36)
~fpar(m) - c* (m) - 2 log 2 - log cosh[13Jm]. (37)
The functions to be minimized are shown in Fig. 2. The equations from which to solve the minima are easily obtained by differentiation, using ~d C* (m) - t a n h - 1 (m). F o r sequential dynamics we find
Statistical mechanics of recurrent neural networks I - statics
571
Binary and sequential: m = tanh[13 Jm]
(38)
(the so-called Curie-Weiss law). For parallel dynamics we find m = tanh[[3J tanh[[~Jm]]. One finds that the solutions of the latter equation again obey a Curie-Weiss law. The definition rh = tanh[131JIm ] transforms it into the coupled equations m=tanh[[~[Jlrh ] and rh=tanh[13]JIm], from which we derive 0 ~<[m - rh] 2 - [m - rh] [tanh[131JIrh] - tanh[IglJIm]] ~<0. Since tanh[131JIm ] is a monotonically increasing function of m, this implies rh = m, so
Binary & Parallel: m = tanh[[3[JIm ].
(39)
Our study of the toy models has thus been reduced to analyzing the nonlinear equations (38) and (39). If J / > 0 (excitation) the two types of dynamics lead to the same behavior. At high noise levels, T > J, both minimization problems are solved by m = 0 (see Fig. 2), describing a disorganized (paramagnetic) state. This can be seen upon writing the right-hand side of (38) in integral form: m 2 - m tanh[13Jm] - ~Jm 2
/0'
dz[1 - tanhZ[~Jmz]] <. ~Jm 2.
So m2[1 - ~J] ~<0, which gives m = 0 as soon as 13J < 1. A phase transition occurs at 7' = J (a bifurcation of nontrivial solutions of (38)), and for 7' < J the equations for m are solved by the two nonzero solutions of (38), describing a state where either all neurons tend to be firing (m > 0) or where they tend to be quiet (m < 0). This becomes clear when we expand (38) for small m: m = [3din + (9(m3), so precisely at [3,/- 1 one finds a de-stabilization of the trivial solution m = 0, together with the creation of (two) stable nontrivial ones (see also Fig. 2). Furthermore, using the identity c*(tanhx) = x t a n h x - logcoshx, we obtain from (36) and (37) the relation l i m N ~ F / N = 2 l i m x ~ F / N . For J < 0 (inhibition), however, the two types of dynamics give quite different results. For sequential dynamics the relevant minimum is located at m = 0 (the paramagnetic state). For parallel dynamics, the minimization problem is invariant under J --+ - J , so the behavior is again of the Curie-Weiss type (see Fig. 2 and Eq. (39)), with a paramagnetic state for T > IJI, a phase transition at T = ]JI, and order for T < [JI. This difference between the two types of dynamics for J < 0 is explained by studying dynamics. As we will see in a subsequent chapter, for the present (toy) model in the limit N ~ oc the average activity evolves in time according to the deterministic laws d dt m - tanh[13Jm] - m,
m(t + 1) = tanh[~Jm(t)]
for sequential and parallel dynamics, respectively. For J < 0 the sequential system always decays towards the trivial state m - 0, whereas for sufficiently large [3 the
572
A.C.C. Coolen
parallel system enters the stable limit-cycle re(t) -- MI~( - 1)t (where MI3 is the nonzero solution of (39)). The concepts of 'distance' and 'local minima' are quite different for the two dynamics types; in contrast to the sequential case, parallel dynamics allows the system to make the transition m ~ - m in equilibrium.
3.2. Phenomenology of Hopfield models 3.2.1. The ideas behind the Hopfield model The Hopfield model [9] is a network of binary neurons of the type (1) and (2), with threshold noise w(q) - 89 [1 - tanhz(rl)], and with a specific recipe for the synapses Jij aimed at storing patterns, motivated by suggestions made in the late 1940s [10]. The original model was in fact defined more narrowly, as the zero noise limit of the system (2), but the term has since then been accepted to cover a larger network class. Let us first consider the simplest case and try to store a single pattern ~ c { - 1 , 1 }u in noise-less infinite-range binary networks. Appealing candidates for interactions and thresholds would be Jij -- ~i~j and 0i - 0 (for sequential dynamics we put Jii -- 0 for all i). With this choice the Lyapunov function (17) becomes:
Zseq [~] -- -~ N - -~
~i(Yi
.
It will have to decrease monotonically during the dynamics, from which we immediately deduce Z
~i(Yi(O) > O" ~((X)) -- ~, i
Z
~i(Yi(O) < O" ~((2~3) -- --~. i
This system indeed reconstructs dynamically the original pattern ~ from an input vector a(0), at least for sequential dynamics. However, en passant we have created an additional attractor: the state - ~ . This property is shared by all binary models in which the external fields are zero, where the Hamiltonians H(~) (30) and H(a) (32) are invariant under an overall sign change a ~ - a . A second feature common to several (but not all) attractor neural networks is that each initial state will lead to pattern reconstruction, even nonsensical (random) ones. The Hopfield model is obtained by generalizing the previous simple one-pattern recipe to the case of an arbitrary number p of binary patterns ~ = ( ~ t , . . . , ~ N ) C {--1, 1}U:
1 P Jij -- ~ Z ~'~'~' la=l
0; = 0 for all i (sequential dynamics: J;/---, 0 for all i).
(40)
The prefactor N -l has been inserted to ensure that the limit N ~ ~ will exist in future expressions. The process of interest is that where, triggered by correlation between the initial state and a stored pattern ~z, the state vector ~ evolves towards ~ . If this happens, pattern ~ is said to be recalled. The similarity between a state vector and the stored patterns is measured by so-called overlaps
Stathstical mechanics of recurrent neural networks I - statics
573
Fig. 3. Information represented as specific microscopic neuronal firing patterns ~ in an N = 841 Hopfield network and drawn as images in the plane (black pixels: ~; = 1, white pixels: ~i = - 1). 1 m.(~) = ~ Z
~"o,.
(41)
i
Numerical simulations illustrate the functioning of the Hopfield model as an associative memory, and the description of the recall process in terms of overlaps. Our simulated system is an N = 841 Hopfield model, in which p = 10 patterns have been stored (see Fig. 3) according to prescription (40). The two-dimensional arrangement of the neurons in this example is just a guide to the eye; since the network is fully connected the physical location of the neurons is irrelevant. The dynamics is as given by (2), with T - 0.1. In Fig. 4 we first show (left column) the result of letting the system evolve in time from an initial state, which is a noisy version of one of the stored patterns (here 40% of the neuronal states cy; where corrupted, according to cy; ~ -r The top left row of graphs shows snapshots of the microscopic state as the system evolves in time. The bottom left row shows the values of the p = 10 overlaps m,, as defined in (41), as functions of time; the one which evolves towards the value 1 corresponds to the pattern being reconstructed. The right column of Fig. 4 shows a similar experiment, but here the initial state is drawn at random. The system subsequently evolves towards a mixture of the stored patterns, which is found to be very stable, due to the fact that the patterns involved (see Fig. 3) are significantly correlated. It will be clear that, although the idea of information storage via the creation of attractors does work, the choice (40) for the synapses is still too simple to be optimal; in addition to the desired states ~" and their mirror images -~", even more unwanted spurious attractors are created. Yet this model will already push the analysis to the limits, as soon as we allow for the storage of an extensive number of patterns ~".
3.2.2. Issues related to saturation: storage capacity and non-trivial dynamics In our previous simulation example the loading of the network was modest; a total of 89 1) - 353,220 synapses were used to store just p N - 8410 bits of information. Let us now investigate the behavior of the network when the number of patterns scales with the system size as p = a N (0~ > 0); now for large N the number 1) ~ 20~. This is called the saturation of bits stored per synapse will be p N / 8 9 regime. Again numerical simulations, but now with finite 0~, illustrate the main features and complications of recall dynamics in the saturation regime. In our ex-
Fig. 4. Information processing in a sequential dynamics Hopfield model with N = 841, p = 10 and T = 0.1, and with the p = 10 stored patterns shown in Fig. 3. Left pictures: dynamic reconstruction of a stored pattern from an initial state which is a corrupted version thereof. Top left: snapshots of the system state at times t = 0 , 1 , 2 , 3 , 4 iterations/neuron. Bottom left: values of the overlap order parameters as functions of time. Right pictures: evolution towards a spurious state from a randomly drawn initial state. Top right: snapshots of the microscopic system state at times t = 0 , 1 , 2 , 3 , 4 iterations/neuron. Bottom right: values of the overlap order parameters as functions of time.
Statistical mechanics of recurrent neural networks I - statics
575
10 0.8
0.6
m 5
0.4
0.2
0.0
0
5
10
15
20
25
0 0.0
0.2
t
0.4
0.6
0.8
L.0
m
Fig. 5. Simulations of a parallel dynamics Hopfield model with N - 3 0 , 0 0 0 and = T = 0.1, and with random patterns. Left: overlaps m = ml(~r) with pattern one as functions of time, following initial states correlated with pattern one only, with m1(~(0))c{0.1,...,0.9}. Right: corresponding flow in the (m,r) plane, with r = ~--1 ~-~g>l mp 2 (~) measuring the overlaps with nonnominated patterns. ample the dynamics is given by (1) (parallel updates), with T - 0.1 and threshold noise distribution w ( q ) - i l l - t a n h Z ( r l ) ] ; the patterns are chosen randomly. Figure 5 shows the result of measuring in such simulations the two quantities m -- ml(t$),
r--
~ - l~mg
2(~)
(42)
g>l
following initial states which are correlated with pattern ~1 only. For large N we can distinguish structural overlaps, where m g ( ~ ) = (9(1), from accidental ones, where m~(n) - (9(N- 89 (as for a randomly drawn ~). Overlaps with nonnominated patterns are seen to remain (9(N- 89 i.e. r(t) - (9(1). We observe competition between pattern recall (m ~ 1) and interference of nonnominated patterns (m ~ 0, with r increasing), and a profound slowing down of the process for nonrecall trajectories. The initial overlap (the 'cue') needed to trigger recall is found to increase with increasing (the loading) and increasing T (the noise). Further numerical experimentation, with random patterns, reveals that at any noise level T there is a critical storage level ac(T) above which recall is impossible, with an absolute upper limit of ~c = maxr ~c(T) = ~c(0) ~ 0.139. The competing forces at work are easily recognized when working out the local fields (1), using (40): l
hi(~) -- ~] ml (($) -J- ~ ~ g>l
~gi Z jr
p
C(N-1
).
(43)
The first term in (43) drives ~ towards pattern {1 as soon as ml (~) > 0. The second terms represent interference, caused by correlations between r and nonnominated patterns. One easily shows (to be demonstrated later) that for N ~ ec the fluctua-
A.C.C. Coolen
576
tions in the values of the recall overlap m will vanish, and that for the present types of initial states and threshold noise the overlap m will obey
m(t + 1) -
/ dzPt(z)tanh[13(m(t) + z)],
I
'im' /I -
9
-
--
--
B>I
j#i
j
j(t)
1)
,44,
If all 6/-(0) are drawn independently, Prob[cyi(0) - +~]] - 1 [1 4-m(0)], the central limit theorem states that Po(z) is Gaussian. One easily derives (z)0 - 0 and (z2)0 - 0t, so at t - 0 Eq. (44) gives re(l) =
~
dz
__k2 7t e - tanh[13(m(0) + zx/~)].
(45)
The above ideas, and Eq. (45) in particular, go back to [11]. For times t > 0, however, the independence of the states (3"i need no longer hold. As a simple approximation one could just assume that the O" i remain uncorrelated at all times, i.e. Prob[cy/(t) - -t-~]]- 89 + m(t)] for all t>~0, such that the argument given for t - 0 would hold generally, and where (for randomly drawn patterns) the mapping (45) would describe the overlap evolution at all times: m(t+l)-
~ dz e - - - ~ 2 tanh[13(m(t) + zv/~)]
(46)
This equation, however, must be generally incorrect. Firstly, Fig. 5 already shows that knowledge of m(t) only does not yet permit prediction of m(t + 1). Secondly, upon working out its bifurcation properties one finds that Eq. (46) predicts a storage capacity of arc - 2/rt ~ 0.637, which is no way near to what is actually being observed. We will see in the paper on dynamics that only for certain types of extremely diluted networks (where most of the synapses are cut) Eq. (46) is indeed correct on finite times; in these networks the time it takes for correlations between neuron states to build up diverges with N, so that correlations are simply not yet noticeable on finite times. For fully connected Hopfield networks storing random patterns near saturation, i.e. with 0t > 0, the complicated correlations building up between the microscopic variables in the course of the dynamics generate an interference noise distribution which is intrinsically non-Gaussian, see e.g. Fig. 6. This leads to a highly nontrivial dynamics which is fundamentally different from that in the l i m u ~ PIN = 0 regime. Solving models of recurrent neural networks in the saturation regime boils down to calculating this non-Gaussian noise distribution, which requires advanced mathematical techniques (in statics and dynamics), and constitutes the main challenge to the theorist. The simplest way to evade this challenge is to study situations where the interference noise is either trivial (as with asymmetric extremely diluted models) or where it vanishes, which happens in fully connected networks when 0 t - limu+~ p/ N = 0 (as with finite p). The latter 0 t - 0 regime is the one we will explore first.
Statistical mechanics of recurrent neural networks I - statics
577
1.5
m(O)=0.9
1.0
P(z) 0.5
m(O)=0.1
0.0
-3
-1
1
3
; ~,~cyj, as meaFig. 6. Distributions of interference noise variables Z i T- ~~}-~" g>l ~i ~ .JT: . sured in the simulations of Fig. 5, at t = 10. Uni-modal histogram: noise distribution following m(0)= 0.9 (leading to recall). Bi-model histogram: noise distribution following m(0) = 0.1 (not leading to recall).
3.3. Analysis of Hopfield models away from saturation 3.3.1. Equilibrium order parameter equations A binary Hopfield network with parameters given by (40) obeys detailed balance, and the Hamiltonian H(~) (30) (corresponding to sequential dynamics) and the pseudo-Hamiltonian H(~) (32) (corresponding to parallel dynamics) become
.(o)-
2(~) + ~p,
-
H(~) = -
g=l
~
E
log 2 cosh ]3Z ~m~(~) "
la=l
]
(47)
with the overlaps (41). Solving the statics implies calculating the free energies F and
P:
1 F = - ~ log ~
e-
13B(.)
,
/~
- -
1 -~
log Z e-~q(")"
Upon introducing the shorthand notation m - (ml,... ,mp) and { i - (~],.-., ~/P), both free energies can be expressed in terms of the density of states ~ ( m ) - 2 -N ~ , , 8Ira - m(t~)]"
1 log 2 - ~ 1 log f dm ~(m)e -89 F / N -- - -~ F / N = - ~ l o g 2 - ~ i flog
p -~ 2-N
dm ~(m)e~,=~ Nl o g 2 c o s h [ ~ { i . m ]
(48)
(49)
(note: f d m S [ m - r e ( a ) ] - 1). In order to proceed we need to specify how the number of patterns p scales with the system size N. In this section we will follow [12]
A.C.C. Coolen
578
(equilibrium analysis following sequential dynamics) and [13] (equilibrium analysis following parallel dynamics), a n d assume p to be finite. O n e can n o w easily calculate the leading c o n t r i b u t i o n to the density of states, using the integral r e p r e s e n t a t i o n o f the 8-function a n d keeping in m i n d that according to (48) a n d (49) only terms exponential in N will retain statistical relevance for N ~ c~: lim --l log ~ ( m ) -- lim --'logfdxeiNx'm<e--i~-~iNl'?'i'x> N N---~ N
N-+~
= lim --1 log f
N-~ N
d x e N[ixm+(l~176
J
with the a b b r e v i a t i o n (r -limu~ ~ ~-~U_, r The leading c o n t r i b u t i o n to b o t h free energies can be expressed as a finite-dimensional integral, for large N d o m i n a t e d by that saddle-point (extremum) for which the extensive e x p o n e n t is real and maximal" lim
F/N-
-~
FIN-
- ~ log f [3N
N---,~c
lim i~,- ~
[~N
1
log f dm dx dm dx
e -NI3f(m'x) - -
extrx,m f ( m , x)
e -Nf~f(m'x) --
extrx,m f ( m , x)
with f ( m , x) - - ~1 !112 - i x . m - 13- 1 (log 2 cos[13~ 9x])g f ( m , x) -
_1~-1
(log 2 cosh[13~ 9m])~, - ix- m - ~ - 1 (log 2 cos[13~ 9x])~.
The saddle-point e q u a t i o n s for f a n d f are given by:
f"
x-im,
im - (~ tan[[3~ 9x])g,
f"
x = i(~tanh[13~, m])~,
im-
(~ tan[13~ 9x])~.
In saddle-points x turns out to be purely imaginary. H o w e v e r , after a shift of the integration contours, putting x = ix* (m) + y (where ix* (m) is the i m a g i n a r y saddlepoint, a n d where y c ~P) we can eliminate x in favor o f y c ~P which does have a real saddle-point, by construction. ~ W e then obtain 2
Our functions to be integrated have no poles, but strictly speaking we still have to verify that the integration segments linking the original integration regime to the shifted one will not contribute to the integrals. This is generally a tedious and distracting task, which is often skipped. For simple models, however (e.g. networks with uniform synapses), the verification can be carried out properly, and all is found to be safe. Here we used the equation 8f(m, x)/gm = 0 to express x in terms of m, because this is simpler. Strictly speaking we should have used 8f(m,x)/gx = 0 for this purpose; our short-cut could in principle generate additional solutions. In the present model, however, we can check explicitly that this is not the case. Also, in view of the imaginary saddle-point x, we cannot be certain that, upon elimination of x, the relevant saddle-point of the remaining function f(m) must be a minimum. This will have to be checked, for instance by inspection of the T --+ c~ limit.
579
Stat&tical mechanics of recurrent neural networks I - statics
Sequential dynamics: m
(~ tanh[]3~ 9m])~,
-
Parallel Dynamics: m
--
(~ tanh[[3~- [(~' tanh[[3~' 9m])~,]])~
(compare to e.g. (38) and (39)). The solutions of the above two equations will in general be identical. To see this, let us denote nil = (~, tanh[J3~ 9m])~, with which the saddle point equation for f decouples into" m
(~tanh[13~. th])~,
-
th - (~tanh[J3~.m])~
so [m
-
fia]2 - ([(~. m) - (~. rh)][tanh(13~, tit) - tanh(fl~ 9m)])~.
Since tanh is a monotonicaly increasing function, we must have [ m - r h ] - ~ - 0 for each ~ that contributes to the averages (...)~. For all choices of patterns, where the covariance matrix C,~ - (~,,~,~)~ is positive definite, we thus obtain m - ill. The final result is" for both types of dynamics (sequential and parallel) the overlap order parameters in equilibrium are given by the solution m* of m
-
(~ tanh[[3~ 9m])~,
(50)
which minimizes 3 f ( m ) - ~ lm2 - ~ 1 (log2cosh[13~.m]) ~,.
(51)
The free energies of the ergodic components are l i m N _ ~ F / N - - f ( m * ) and limN_~ F / N -- 2f(m*). Adding generating terms of the form H ~ H + )~g[m(t~)] to the Hamiltonians allows us to identify (g[m(t~)])eq - lirnz_~0 ~F/ 9 = g[m*]. Thus, in equilibrium the fluctuations in the overlap order parameters m(ty) (41) vanish for N --, oc. Their deterministic values are simply given by m*. Note that in the case of sequential dynamics we could also have used linearization with Gaussian integrals (which we will use for coupled oscillators with uniform synapses) to arrive at this solution, with p auxiliary integrations, but that for parallel dynamics this would not have been possible.
3.3.2. Analysis of order parameter equations." pure states and mixture states We will restrict our further discussion to the case of randomly drawn patterns, so _
2-p
-
0,
-
~z{-1,1)~
We here indeed know the relevant saddle-point to be a minimum: the only solution of the saddle-point equations at high temperatures, m - - O , is seen to minimize f(m), since f ( m ) + ~ - 1 log2 = 89 - ]3) + (9(]33).
A. C. C. Coolen
580
(generalization to correlated patterns is in principle straightforward). We first establish an upper bound for the temperature for where nontrivial solutions m* could exist, by writing (50) in integral form: mr
-
m) fo
dk[1 - tanh2[13)~ 9m]] )
from which we deduce 0=m 2-13((~.m)2f0'dk[1-tanh2[13k~.m]])
~> m 2 - 1 3 ( ( ~ - m ) 2 ) r
= !112(1 - ~). For T > 1 the only solution of (50) is the paramagnetic state m - 0, which gives for the free energy per neuron - T l o g 2 and - 2 T l o g 2 (for sequential and parallel dynamics, respectively). At T - 1 a phase transition occurs, which follows from expanding (50) for small Iml in powers of r - 13- 1" 1
m r - (1 + z)m, - - 3 Z mvmpmx(~r~v~P~x)~ +(-9(m5' zm3) vpX
=m r l+'c-m
2+~m
+ C ( m 5,'cm 3)
The new saddle-point scales as m r - ~ r ~ 1/2 + (9(~3/2), with for each g: r~r - 0 or 0 - 1 - m2 + _}mr .-2 The solutions are of the form r~r E { - ~ , 0, r~}. If we denote with n the number of nonzero components in the vector m, we derive from the above identities: r~r - 0 or r~r - + v / - 3 / v / 3 n - 2. These saddle-points are called mixture states, since they correspond to microscopic configurations correlated equally with a finite number n of the stored patterns (or their negatives). Without loss of generality we can always perform gauge transformations on the set of stored patterns (permutations and reflections), such that the mixture states acquire the form n times
p - n times
m -- m , , (~1 , . . . , ~ 1 , 0 , . . . , 0 ) ,
mn -- [ '3/ ]2 3--n2
( 1 3 - 1 1/2) + . . .
(52)
These states are in fact saddle-points of the surface f(m) (51) for any finite temperature, as can be verified by substituting (52) as an ansatz into (50):
v~n ~>n"
0 - (~g tanh [Igm,~Z ~v] ) v~
581
Stat&tical mechanics of recurrent neural networks I - statics
The second equation is automatically satisfied since the average factorises. The first equation leads to a condition determining the amplitude mn of the mixture states:
(53)
mn - < [!~<~n~~ tanh[~mnv~<~n~V]>~" The corresponding values of f(m), to be denoted by f,, are
fn - nm~ -
log 2 cosh 13m~~
(54)
~v
v~
The relevant question at this stage is whether or not these saddle-points correspond to local minima of the surface f ( m ) (51). The second derivative of f ( m ) is given by
(55)
02f(m) = ~Sov- 13(r162 [1 - tanh2[13~ 9m]] )~
~m,~mv
(a local minimum corresponds to a positive definite second derivative). In the trivial saddle-point m = 0 this gives simply 8or(1 - 13), so at T - 1 this state destabilizes. In a mixture state of the type (52) the second derivative becomes:
tanh [0m ll>
Due to the symmetries in the problem the spectrum of the matrix D (~) can be calculated. One finds the following eigenspaces, with Q = (tanhZ[13m~ ~ 0 ~ ~0])~ and R - (~1 ~2 tanh2[13m~ }-~0~n ~0])~" Eigenvalue Eigenspace I
:x = (0,...,0,Xn+l,...,Xp)
1-1311-Q]
II : x = ( 1 , . . . , 1 , 0 , . . . , 0 ) 1 - 1311 - Q + ( 1 - n)R] III : x = ( X l , . . . ,Xn, 0 , . . . , 0), }--~gxg = 0 1 - 1311 - Q + R ] Eigenspace III and the quantity R only come into play for n > 1. To find the smallest eigenvalue we need to know the sign of R. With the abbreviation M~ -- Y~'~o~n~0 we find:
n(n-
1)R = (M~ tanh2[13mnMJ)~-
n(tanh2[~m~M~])~
= ([M~ - (M~2)~,]tanh2[13m.IM~l])~ _
-- ([M~ -(M~2)~,] {tanh 2 [ 1 3 m ~ ~
- t a n h 2 [13m~v/(M~2)~,J } ) i > 0 .
We may now identify the conditions for an n-mixture state to be a local minimum of f(m). For n = 1 the relevant eigenvalue is I, now the quantity Q simplifies consid-
582
A.C.C. Coolen
1
- 2 l ' ' ' I 'x\i\l''' I ' ' ' I'' 't
.6
-.4 ~
.2
-.6
0
0
.2
.4
T
.6
.8
1
-.7
0
\\\
.2
.4
T
.6
~
.8
i
Fig. 7. Left picture: Amplitudes m,, of the mixture states as functions of temperature. From top to bottom: n = 1,3, 5, 7, 9, 11, 13. Solid: region where they are stable (local minima off). Dashed: region where they are unstable. Right picture: corresponding 'free energies' fn. From bottom to top: n = 1,3, 5, 7, 9, 11, 13. Dashed line: 'free energy' of the paramagnetic state m = 0 (for comparison). erably. For n > 1 the relevant eigenvalue is III, here we can combine Q and R into one single average: n - - 1" 1-1311-tanh2[[3ml]] > 0 , n--2"
1-13>0,
n>~3" 1 - 13[1 - (tanh2 [13mn ~ = 3 ~o] )~ 1 > 0 . The n - 1 states, correlated with one pattern only, are the desired solutions. They are stable for all T < 1, since partial differentiation with respect to 13 of the n - 1 amplitude Eq. (53) gives ml -- tanh[13ml] ~ 1 - 1311 - tanhZ[13m,]]- m,[1 -tanhZ[~ml]](Ornl/~) -1 (clearly sgn[m~] = sgn[i~m,/~13]). The n = 2 mixtures are always unstable. For n >_-3 we have to solve the amplitude Eq. (53) numerically to evaluate their stability. The result is shown in Fig. 7, together with the corresponding 'free energies' fn (54). It turns out that only for odd n will there be a critical temperature below which the nmixture states are local minima of f ( m ) . F r o m Fig. 7 we can also conclude that, in terms of the network functioning as an associative memory, noise is actually beneficial in the sense that it can be used to eliminate the unwanted n > 1 ergodic components (while retaining the relevant ones: the pure n = 1 states). In fact the overlap equations (50) do also allow for stable solutions different from the n-mixture states discussed here. They are in turn found to be continuously bifurcating mixtures of the mixture states. However, for random (or uncorrelated) patterns they come
Stat&ticalmechanicsof recurrentneuralnetworksI- statics
583
into existence only near T = 0 and play a marginal role; phase space is dominated by the odd n-mixture states. We have now solved the model in equilibrium for finite p and N --+ c~. Most of the relevant information on when and to what extent stored random patterns will be recalled is summarized in Fig. 7. For nonrandom patterns one simply has to study the bifurcation properties of Eq. (50) for the new pattern statistics at hand; this is only qualitatively different from the random pattern analysis explained above. The occurrence of multiple saddle-points corresponding to local minima of the free energy signals ergodicity breaking. Although among these only the global minimum will correspond to the thermodynamic equilibrium state, the nonglobal minima correspond to true ergodic components, i.e. on finite time-scales they will be just as relevant as the global minimum. 4. Simple recurrent networks of coupled oscillators
4.1. Coupled oscillators with uniform synapses Models with continuous variables involve integration over states, rather than summation. For a coupled oscillator network (13) with uniform synapses J/j - J / N and zero frequencies o~/= 0 (which is a simple version of the model in [14]) we obtain for the free energy per oscillator: N++ ~-~ log N--~oolimF/N--limlpl,
f:f: .-.
d~
We would now have to 'count' microscopic states with prescribed average cosines and sines. A faster route exploits auxiliary Gaussian integrals, via the identity e~y =
Dze yz
(56)
with the shorthand Dx - (2~) -•2e-~1,.2. ~ ax (this alternative would also have been open to us in the binary case; my aim in this section is to explain both methods): N ~ ~-~-/log N---~oolimF/N--limlvi,
z
f~f~ -"
d~
o, exp
N~ ~
log
N~ ~
log
/
cos,,/, +,
Dx Dy
[/-i
dq qe -89
dqb e c~
s'n'*/']
v/~'(~2+y2)/N
..... 1
dqb e ~ [ J [ q c ~
,
A.C.C. Coolen
584
-1
f/T
-2
0
- 1.0
-0.5
0.0
0.5
1.0
'
0.0
0.5
q
1.0
1.5
T/J
Fig. 8. The function f ( q ) / T (left) for networks of coupled oscillators with uniform synapses Jij = J / N , and for different choices of the re-scaled interaction strength J / T (T = f3-1) 9 5 1' '1_ ~-(from top to bottom). The right picture gives, for J > 0, the location of the J/T= 2' nonnegative minimum off(q) (which measures the overall degree of global synchronisation in thermal equilibrium) as a function of T/J. A transition to a synchronised state occurs at v / J = 89
where we have transformed to polar coordinates, (x, y) = q v/f3lJlN(cos 0, sin 0), and where we have already eliminated (constant) terms which will not survive the limit N ~ c~. Thus, saddle-point integration gives us, quite similar to the previous cases (36) and (37): lim F I N - m i n f ( q ) N--,~ q>~O
J > 0: ~3f(q) = 89
2 - l o g [ 2 rtlo(BlJlq)]
J < O" ~ f (q) = 89
2 - l o g [ 2 rclo(iBlJIq)]
(57)
in which the In(z) are the modified Bessel functions (see e.g. [15]). The function f ( q ) is shown in Fig. 8. The equations from which to solve the minima are obtained by differentiation, using ~Io(z) = I1 (z)" I1 (~[J]q)
J > O. q
J < O" q = i II (i~lJ[q)
lo(f31Jlq)'
Io(if31JIq)
(58)
o
Again, in both cases the problem has been reduced to studying a single nonlinear equation. The physical meaning of the solution follows from the identity - 2 8 F / O J = (N -1 ~-~iTkj COS(~ i -- ~)j))"
COS((~i)
lim N---+oc
.
+ lim N - - ~
sin(~/)
= sgn(J)q 2.
.
F r o m this equation it also follows that q~< 1. Note: since Of(q)/Oq = 0 at the minimum, one only needs to consider the explicit derivative of f ( q ) with respect to J. If the synapses induce antisynchronization, J < 0, the only solution of (58) (and
Statbstical mechanics of recurrent neural networks I - statics
585
the minimum in (57)) is the trivial state q = 0. This also follows immediately from the equation which gave the physical meaning of q. For synchronizing forces, J > 0, on the other hand, we again find the trivial solution at high noise levels, but a globally synchronized state with q > 0 at low noise levels. Here a phase transition occurs at T - 89 (a bifurcation of nontrivial solutions of (58)), and for T < 89 the minimum of (57) is found at two nonzero values for q. The critical noise level is again found upon expanding the saddle-point equation, using Io(z) = 1 + (9(z2) and I1 (z) - lz + (9(z3) 9 q - 1 IMq + (9(q3). Precisely at 13J - 2 one finds a de-stabilization of the trivial solution q = 0, together with the creation of (two) stable nontrivial ones (see Fig. 8). Note that, in view of (57), we are only interested in nonnegative values of q. One can prove, using the properties of the Bessel functions, that there are no other (discontinuous) bifurcations of nontrivial solutions of the saddle-point equation. Note, finally, that the absence of a state with global antisynchronization for J < 0 has the same origin as the absence of an antiferromagnetic state for J < 0 in the previous models with binary neurons. Due to the long-range nature of the synapses J;j = J / N such states simply cannot exist: whereas any set of oscillators can be in a fully synchronized state, if two oscillators are in anti-synchrony it is already impossible for a third to be simultaneously in antisynchrony with the first two (since antisynchrony with one implies synchrony with the other).
4.2. Coupled oscillator attractor networks 4.2.1. Intuition and definitions Let us now turn to an alternative realization of information storage in a recurrent network based upon the creation of attractors. We will solve models of coupled neural oscillators of the type (13), with zero natural frequencies (since we wish to use equilibrium techniques), in which real-valued patterns are stored as stable configurations of oscillator phases, following [16]. Let us, however, first find out how to store a single pattern { c [-rt,/1~]x in a noise-less infinite-range oscillator network. For simplicity we will draw each component ~i independently at random from [-Tt, rt], with uniform probability density. This allows us to use asymptotic properties such as IN -1 }-~jeie~sl--(_9(N- 89 for any integer g. A sensible choice for the synapses would be J/j = cos[~/- ~j]. To see this we work out the corresponding Lyapunov function (20): 1
L[~] =
2N 2 ~ cos[~i- ~j] cos[qbi- qbj], /7
L[{] -
2 N 2 Z cOS2 [~i -- ~j] -- -- ~ -Jr- C /7
(the factors of N have been inserted to achieve appropriate scaling in the N ~ ec limit). The function L[~], which is obviously bounded from below, must decrease monotonically during the dynamics. To find out whether the state { is a stable fixedpoint of the dynamics we have to calculate L and derivatives of L at ~ = {:
586
A. C. C. Coolen
~L
1
2N 2 Z sin[2(~i- ~j)], J ~2L
] --
i~2L
1 i
J
~ j'~i~)
j ~
--
1
N 2 COS2[~i- ~].
Clearly l i m u ~ L[~]- -1. Putting t ~ - ~ + A~, with A ~ ; - (9(N~ we find -
--
+ -2 Z i
',~
1
= 4N Z
A~
1
2N 2Z
i 1
1
= ~
A'~iA~j ~ i ~ ) j
-~-
(9(A~3)
ij AdpiAdpj cOs2[~i - ~J] + (9(U- 89A~3)
ij
AO)~ -
~
A4),
--
At~i cos(2
~i)
9 1
-
A~i sin(2 ~i)
+ C(N- 89A~3).
(59)
In leading order in N the following three vectors in ~N are normalized and orthogonal: el
1
- ~ ( 1 , 1,..., 1),
e 2 - - x x/2 / N (cos(2 ~ 1) ~'''~ cos(2 ~N))
x/2 (sin(2 ~ 1) ~'''~ sin(2 ~N)) " e2--v/N We may therefore use A~ 2 >/(A~ .el into (59) leads to
)2 +
(A~. e2)2 + (A~ .e3)2, insertion of which 2
L[~ + A ~ ] - L[~] >/
+
1
A(~i sin(2 ~i)
+(9( N--~, A~3) 9
Thus for large N the second derivative of L is nonnegative at t~ - ~, and the phase pattern ~ has indeed become a fixed-point attractor of the dynamics of the noise-free coupled oscillator network. The same is found to be true for the states = +~ + ct(1,..., 1) (for any at).
587
Statistical mechanics of recurrent neural networks I - statics
4.2.2. Storing p phase patterns." equilibrium order parameter equations We next follow the strategy of the Hopfield model and attempt to simply extend the above recipe for the synapses to the case of having a finite number p of phase patterns ~ ' - ( ~ , . . . , ~ ) E [-~, ~]N, giving 1
p
Jij - -~ Z c~ ~t=l
- ~]
(60)
(the factor N, as before, ensures a proper limit N ~ ~ later). In analogy with our solution of the Hopfield model we define the following averages over pattern variables:
(g[~])~-
lim 1 N___+~NZ g[~i]' i
We can write the Hamiltonian H(~) of (34) in the form
-
1
P
2N Z
~t=l
Z
cos[~ - ~]
ij
COS[~)i- ~)j]
2 g=l mc~c(~))2q- m~s(~))2 q- m s c ( * ) 2 -+-m~s(~) 2 } in which
mcc
1 ~ Z
1
mcs(Op)--~Zcos(r
cos(r i
~t(~)_ 1 NZ
m~c
(61)
i
sin(~/~)COS(~)i),
~t(~)_ 1 m~ ~ Z
i
sin(~) sin(qbi).
(62)
i
The free energy per oscillator can now be written as
F / N -- - ~] log / -.-
J d~ e- 'H(r-
|
]3N log
/
...
J d~ e 89~-~"' ~--~"** m'
**(,)2
with ** E {cc, ss, cs, sc}. Upon introducing the notation m * * - (m~.,...,nr~**) we can again express the free energy in terms of the density of states ~({m**}) - (2rt) -N f . . . f d{b I-I** 5[m** - m**({b)]"
F / N - - - - ~ l1o g ( 2 r c ) - - ~1l o g f H din** ~ ({m**})eJl3N~-~**m2*
(63)
Since p is finite, the leading contribution to the density of states (as N ~ e~), which will give us the entropy, can be calculated by writing the 8-functions in integral representation:
A.C.C. Coolen
588
/E
i//d~
lim --1log ~({m**})
f~f
= lim 1 log
N-,ccN
1-I dx**eiNx**m** x **
...
" COS(~)sin(0/) + Xcs
x exp - i Z Z [xc~Cc~ i
(2rt) N
p
+Xs~ sin(~) cos((~/) -+-X~sssin(~) sin(~/)] ) = extr{x..} i ~ x** .m** + " cos(~,) + Xcs
log
~-exp
- i 5-~[x~ cos(r
sin(,) + x~c sin(~.,)cos(,)
+.ssSin,,,sin, ,,)l} r
The relevant extremum is purely imaginary so we put x~ = il3y** (see also our previous discussion for the Hopfield model) and, upon inserting the density of states into our original expression for the free energy per oscillator, arrive at lim F I N -
N--~,oc
extr{m..a,..if({m** y**}) 1
1
2
f({m**, y**})- - ~ l o g ( 2 n ) - ~ ~ m * * + E y * * 9m**
, {lo /d* (
13
~--nexp 13Z[Yc~c cos(~.)cos(,) + Y,~scos(~.) sin(,) ~t
+Y~scSin(~.)cos(d~)+Y~ssSin(~.)sin(d~)])). r Taking derivatives with respect to the order parameters m** gives us y** = m**, with which we can eliminate the y**. Derivation with respect to the m** subsequently gives the saddle-point equations .
mcc
=
fdqb cos[qb]exp(13cos[r ~v[mc~ccos[r + ms~sin[~]] + 13sin[O] Ev[mc"scos[r + msVssin[r
f dd~exp(~c~s[d~]y]~v[m~c~s[~v] + ~sin[~v]] + f5sin[~] y~v[~C~sc~s[~v]+ ~`'ssin[~v]])
\
/r (64)
m~cs= (cos[~.] f d~ sin[~]exp(13cos[~] y~v[mc~cos[~v] + ms~sin[~v]]+ 13sin[~] Y'jv[mcv cos[~v] + ms~ssin[~v]]))
f dd~exp(f5c~s[d~]y-~v[m~c~s[~v] + m~ sin[~v]] + ~5sin[d~]~-'v[mc~c~s[~v] + ms~sin[~v]])
~' (65)
Statbstical mechanics of recurrent neural networks I - statics
589
m~c-- (sin[r ] v v mcs v cos[~v] + mss v sin[~v]]) )~,, fdd~cos[d~]exp(~3cos[d~]Y~v[mccCOS[~v]+mscsin[~v]]+~3sin[d~]~v[ f dqbexp(13cos[qb] ~-~v[mc~cos[~v] + msV~sin[~v]] + 13sin[qb] ~v[m~v cos[~v] + m~ sin[~v]]) (66) m~ = (sin[~t ] f dO sin[qb]exp(13cos[dp] ~-~v[m~V~cos[~v] + m~ sin[~v]] + 13sin[qb] ~-'jv[mcV~cos[~v] + ms~sin[~v]]))~. f dOexp(13 cos[qb] ~--~v[m~)cos[~v] + m~) sin[~v]] + 13sin[qb] ~--]~v[m~ cos[~v] + m~ sin[~v]]) (67) The equilibrium values of the observables m**, as defined in (61) and (62), are now given by the solution of the coupled equations (64)-(67)which minimizes 1
2
'//
13 log
d , exp
(
13cos[,] Z
Imp; cos[~,,] + m,;
sin[~,,] 1
v
+ sinI,l
+ < sinI vll/? 9 v
(6a)
/I
We can confirm that the relevant saddle-point must be a minimum by inspecting the 2 13 = 0 limit (infinite noise levels): limp~0f({m**}) - 89 ~** m** - ~ log(2n).
4.2.3. Analys& of order parameter equations: pure states From now on we will restrict our analysis to phase pattern components ~/~ which have all been drawn independently at random from I-n, n], with uniform probability density, so that (g[~,])~ - (2rt) -p frtn.. "f-~Ttd~, g[~,]. At 13 - 0 (T - c~) one finds only the trivial state m.~. = 0. It can be shown that there will be no discontinuous transitions to a nontrivial state as the noise level (temperature) is reduced. The continuous ones follow upon expansion of the equations (64)-(67) for small {m**},which is found to give (for each p and each combination **): 1
m.~. - ~ 13m.~.+ C({m2**}). Thus a continuous transition to recall states occurs at T -- ~. Full classification of all solutions of (64)-(67) is ruled out. Here we will restrict ourselves to the most relevant ones, such as the pure states, where m.~.- m**8,z (for some pattern label )~). Here the oscillator phases are correlated with only one of the stored phase patterns (if at all). Insertion into the above expression for f({m**}) shows that for such solutions we have to minimize f({m** }) -- ~1 Z
2- ~ m**
~ log
f
dqb exp(13 cos[~] [m~ccos[~] + msc sin[~]]
+ 13sin[qb] [m~ cos[~] + m~ sin[~]]).
(69)
590
A.C.C. Coolen
We anticipate solutions corresponding to the (partial) recall of the stored phase pattern {z or its mirror image (modulo overall phase shifts ~/. ~ ~i + 8, under which the synapses are obviously invariant). Insertion into (64)-(67) of the state ~)i -- ~ Ar-~ gives (m~, msc, mc~, ms~) -- 89(cos 8, - sin 8, sin 8, cos 8). Similarly, insertion into (64)-(67) of i~ i __ __~Xi -'[- ~ gives (mcc, m~c, mcs, mss) -12 (cos 8, sin 8, sin 8 , - cos 8) Thus we can identify retrieval states as those solutions which are of the form (i) retrieval o f { x(ii) retrieval of - { x .
(mc~,m~,mc,,mss) -- m(cos S, - sin S, sinS, cosS) (m~c, re,c, m~,, re,s) -- re(cos 8, sin 8, sin 8, -cos 5)
with full recall corresponding to m - 89 Insertion into the saddle-point equations and into (69), followed by an appropriate shift of the integration variable ~, shows that the free energy is independent of 8 (so the above two ans/itze solve the saddlepoint equations for any 8) and that 1 f as) c o s [ * ] e [3me~ m = ~ f dqb e[3mc~ *]
1 '
f(m) - m 2 -
~
f cos(,] log j dd~ e 13m 9
Expansion in powers of m, using log(1 + z) = z - 892 + (9(z3), reveals that nonzero minima m indeed bifurcate continuously at T - 13-1 - 1 4" . ) m 2 + ~4~3m 4 + (_9(m6). f ( m ) + gllog[2g]-- ( 1 - - 1~f3
(70)
Retrieval states are obviously not the only pure states that solve the saddle-point equations. The function (69) is invariant under the following discrete (noncommuting) transformations: I" (mcc, msc, mcs, m~)
II: (mec, msc, mcs, mss)
--+ --~
(m~c,m~c,-mcs,-mss), (mr
We expect these to induce solutions with specific symmetries. In particular we anticipate the following symmetric and antisymmetric states: (iii) symmetric under I: (iv) antisymmetricunder I:
(mcc,msc,mcs,mss) = v~m(cosS, sinS,0,0), (mcc,m.~<,mc.~,mss) = v/2m(0, 0, cos S, sinS),
(v) symmetric under II: (mcc,msc,mcs,ms~) = m(cosS, sin 8,cosS, sin 8), (vi) antisymmetric under II: (m~,ms<,m~,ms~) = m(cosS, sin 8 , - c o s S , - sin 8). Insertion into the saddle-point equations and into (69) shows in all four cases the parameter 8 is arbitrary and that always m--~
1 f d~ cos[~,] fdqbc~176176 ~ f d , e~m~cos[,lcos[~]
f (m) _ m 2
lf
- g
~log
f
d~ e ~m~c~176
Statistical mechanics of recurrent neural networks I - statics
-0.5
f/T
0.5
-1.5
-2.5 - 1.0
591
m
-0.5
0.0 m
0.5
1.0
0.0 0.00
0.25
0.50
0.75
T
Fig. 9. The function f ( m ) / T (left) for networks of coupled oscillators with phase patterns stored via the synapses Jij = N -1 ~--~cos[~/~ - ~ ] , and for different choices of 13= T -1" 13= 1,3, 5, 7 (from top to bottom). The right picture gives the location of the nonnegative minimum of f(m) (which measures the overall degree of global synchronization with one recalled phase pattern in thermal equilibrium) as a function of T. A transition to a recall state 1 occurs at T = ~.
Expansion in powers of m reveals that nonzero solutions m here again bifurcate continuously at T = 1. 1 log[2 rt] -- ( 1 -- ~1 13) m 2 + ~3~4133m 4 0+( m 6 f ( m ) + -~
)9
(71)
However, comparison with (70) shows that the free energy of the pure recall states is lower. Thus the system will prefer the recall states over the above solutions with specific symmetries. Note, finally, that the free energy and the order parameter equation for the pure recall states can be written in terms of modified Bessel functions as follows: m
m
111~(13m)
m
210(13m) '
f (m) -- m 2 - ~ 1 log[2 rtI0(13m)]
The behavior of these equations and the observable m for different noise levels are shown in Fig. 9. One easily proves that [m[ ~< 89 and that l i m ~ m - 89 Following the transition to a state with partial recall of a stored phase pattern at T - 88further reduction of the noise level T gives a monotonic increase of retrieval quality until retrieval is perfect at T - 0.
A.C.C. Coolen
592
5. Networks with Gaussian distributed synapses The type of analysis presented so far to deal with attractor networks breaks down if the number of patterns stored p no longer remains finite for N ~ c~, but scales as p = a N (0~ > 0). Expressions such as (48) and (49) can no longer be evaluated by saddle-point methods, since the dimension of the integral diverges at the same time as the exponent of the integrand. The number of local minima (ergodic components) of Hamiltonians such as (30) and (32) will diverge and we will encounter phenomena reminiscent of complex disordered magnetic systems, i.e. spin glasses. As a consequence we will need corresponding methods of analysis, in the present case: replica theory.
5.1. Replica analysis 5.1.1. Replica calculation of the disorder-averaged free energy As an introduction to the replica technique solution of a recurrent neural network model which a single pattern ~ - ( ~ 1 , . . . , ~ N ) E { - 1 , type recipe) on a background of zero-average SK model, [17]):
Jij
-
Jo
+
J
we will first discuss the equilibrium with binary neurons ~ / c { - 1 , 1 } in 1}u has been stored (via a HebbianGaussian synapses (equivalent to the
zij - 0, z~ = 1,
(72)
in which J0 > 0 measures the embedding strength of the pattern, and the zij (i < j) are independent Gaussian random variables. We denote averaging over their distribution by = (the factors in (72) involving N ensure appropriate scaling and statistical relevance of the two terms, and as always Jii " - - 0 ) . Here the Hamiltonian H (30), corresponding to sequential dynamics (2), becomes
1
l
J
H(~) - - -~NJom2 (ff ) + -~Jo - - ~
i~<j (TYi(TYjZij
(73)
with the overlap m(~) - ~ Y~k Crk~k which measures pattern recall quality. We clearly cannot calculate the free energy for every given realization of the synapses, furthermore it is to be expected that for N ~ c~ macroscopic observables like the free energy per neuron and the overlap m only depend on the statistics of the synapses, not on their specific values. We therefore average the free energy over the disorder distribution and concentrate on
F
-
1 limlogZ,
- - ~N--~:x~
The disorder average is identity
Z--Ze-13H(*/ ~r
"
(74)
transformed into an average of powers of Z, with the
593
Statistical mechanics of recurrent neural networks I - statics
l o g Z = l i m l - - Z[~ - 1 ] n--*On
or, equivalently,
logZ - lim 1 logZ~"
(75)
n-+0n
The so-called 'replica trick' consists in evaluating the averages Z" for integer values of n, and taking the limit n ~ 0 afterwards, under the assumption that the resulting expression is correct for noninteger values of n as well. The integer powers of Z are written as a product of terms, each of which can be interpreted as an equivalent copy, or 'replica' of the original system. The disorder-averaged free energy now becomes fi -
- lim
,-~o ~1 log Z-~ - - ,~01im~1 log ,,,Z.. .,~ne-f~~ = ' H(,~)
From now Roman indices will refer to sites, i.e. i --- 1... N, whereas Greek indices will refer to replicas, i.e. a - 1 . . . n. We introduce a shorthand for the Gaussian measure, Dz - (2rt)- 89e@ 2dz, and we will repeatedly use the identity f Dz e~z - e89~. Upon insertion of the Hamiltonian (73) we obtain 1
N l o g 2 _ lim(~n) -1 log e-~--]~,<s~ ' ~ j Z ~0,- ~j //
1
-~Nlog2-
Oz
0
.
e ~~z~-]~ ~ )
.
-1
n-~01im(]3n)
• log exp/-~--~
Z~i~j(~(~ ~
" - [ - ~ - Z Z Oi -~ Oj Oi ~
We now complete the sums over sites in this expression,
Z(Yi~j iT~j
~ _
(~
1
-N,
~ _ j~ , ~ v ,i~ j Z(YiO
ir
_
,~7,Fi
"
1
-N.
The averaging over the neuron states {a ~} in our expression for F will now factorize nicely if we insert appropriate 8-functions (in their integral representations) to isolate the relevant terms, using
1=
1-
dql-I6 q ~ -
f
dmH8
1
~
[m ~ - ~ 1 t~.. ~icr7 ]
-
=
N
dqdq
eiN~~ 0~[q~-~~--~i~P~]
I2--~]n/d m d i l l eiN~ rh~[m~-~/~;~]
The integrations are over the n x n matrices q and ~] and over the n-vectors m and fla. After inserting these integrals we obtain
594
A. C. C. Coolen
lim fi'/N = N---,oc
--
x log
N (
•
1 ~log 2 - lim lim 1
N----~oc n---~O ~3Nn
dq dq dm dill e - 89176188
[
1
2
a1~2j2
2])
X ( e x p ( - - i / ~ . [~O~v'TcY~+~rha~i~ The neuronal averages factorize and are therefore reduced to single-site ones. A simple transformation (Yi ~ ~i(Yi for all i eliminates the pattern components ~i from our equations, and the remaining averages involve only one n-replicated neuron (Cyl,..., %). Finally one assumes that the two limits n ~ 0 and N ---, c~ commute. This allows us to evaluate the integral with the steepest-descent method: limvc n---*O lim Nn - -1 N---,
log [J dx eN~
= n--+O lim N---~ lim:x~ ~ 1 log e N extr (I)+-.-
__
lim -n1 extra.
(76)
n--,0
The result of these manipulations is lim f ' / N - lim extrf(q, m; q, ill)
N--~ac
(77)
n--+0
f(q, m; (~,m) = - ~1 log 2 - ~1 [ log(e -1" ~-"~~, ~ ,
era (yv- i ~'~ ~ ^
m~),
1 ay
2
~
~
1
~2j2
2]
(78)
cxy
Variation of the parameters {q~l~} and {m~} allows us to eliminate immediately the conjugate parameters {q~l~} and {rh~}, since it leads to the saddle-point requirements 1
dlaf~ = -~ i[32jZqaf~,
(79)
rh~ - i~Joma.
Upon elimination of {0~l~,rh~} according to (79) the result (77) and (78) is simplified to lim F l U = lim extrf(q, m),
N--,ac
f(q,m) =
(80)
n-+0
_
1 ~2 2 J0 2 ~log2 +-~--ffnZ q a v + ~ n n Z m ~ ~ty
_ 13__~ l o g l (e 11~2J2~ a r
~t qaYcYacrv+13J~'~-'~ mac~
/•"
(81)
595
Statistical mechanics of recurrent neural networks I - statics
Variation of the remaining parameters {qs~} and {ms} gives the final saddle-point equations
The diagonal elements are always qss = 1. For high noise levels, ~ ~ 0, we obtain the trivial result qsv=Ssv,
ms--0.
Assuming a continuous transition to a non-trivial state as the noise level is lowered, we can expand the saddle-point equations (82) and (83) in powers of q and m and look for bifurcations, which gives 0~ r 9): qko -- ~2jZqko -+- (-9(q,m) 2,
m~ -- ~J0mz + (-9(q,m) 2.
Therefore we expect transitions either at T = J0 (if J0 > J) or at T = J (if J > J0). The remaining program is: find the saddle-point (q, m) for T < max{J0,J} which for integer n minimizes f , determine the corresponding minimum as a function of n, and finally take the limit n --+ 0. This is in fact the most complicated part of the procedure.
5.2. Replica-symmetric solution and A T-&stability 5.2.1. Physical interpretation of saddle points To obtain a guide in how to select saddle-points we now turn to a different (but equivalent) version of the replica trick (75), which allows us to attach a physical meaning to the saddle-points (m, q). This version transforms averages over a given measure W: n-1
n---~0n 7= 1 ~l...~n
s--1
The trick again consists in evaluating this quantity for integer n, whereas the limit refers to noninteger n. We use (84) to write the distribution P(m) of overlaps in equilibrium as
596
A.C.C.
Coolen
P(m) -- ~-~'~8[m -- 1 ~-~i ~icri]e-f~H('~) ~-~ e-f~H(~) = lim 1 .-
8 m
~i(Y~
-
e-
on
If we average this distribution over the disorder, we find identical expressions to those encountered in evaluating the disorder averaged free energy. By inserting the same delta-functions we arrive at the steepest descend integration (77) and find P(m) -- lim 1 Z n---,0 n
8[m - my]
(85)
7
where {my} refers to the relevant solution of (82) and (83). Similarly we can imagine two systems ~ and ~' with identical synapses {Jij}, both in thermal equilibrium. We now use (84) to rewrite the distribution P(q) for the mutual overlap between the microstates of the two systems E,~,,~' 8[q - ~ E i cricY~]e-13"('~)-13"(e) P(q) =
~-]~.o., e - f 3 H ( n ) - f 3 H ( ~ ' )
=lim
1
[ l~. ] i ~ ~n(~/ 8 q-~ ~)~[ e-
n - ~ o n ( n - 1 ) ~-'~ Z ~r
~l ...or"
Averaging over the disorder again leads to the steepest descend integration (77) and we find 1
(86)
P(q) - - l i m ,,--,o n(n - 1 ) Z 8 [q - qxv], key
where {qxv} refers to the relevant solution of (82) and (83). We can now partly interpret the saddle-points (m, q), since the shape of P(q) and P(m) gives direct information on the structure of phase space with respect to ergodicity. The crucial observation is that for an ergodic system one always has
[
P(m) - 8 m - - ~
.
~i((Yi)eq
]
,
P(q) - 8 q - -~
. (O'i)eq
(87)
9
If, on the other hand, there are L ergodic components in our system, each of which corresponding to a pure Gibbs state with microstate probabilities proportional to exp(-[3H) and thermal averages (...)t, and if we denote the probability of finding the system in component g by ~ , we find P(m) - Z g= 1
W~8 m - ~
~i((Yi)g ' P(q) - Z "
g,g~= l
WeWe 8 q - ~
((Yi) "
597
Statistical mechanics of recurrent neural networks I - statics
For ergodic systems both P(m) and P(q) are 8-functions, for systems with a finite number of ergodic components they are finite sums of 8-functions. A diverging number of ergodic components generally leads to distributions with continuous pieces. If we combine this interpretation with our results (85) and (86) we find that ergodicity is equivalent to the relevant saddle-point being of the form: q~13--8oq3+q[1-8~13] ,
ms--m,
(88)
which is called the 'replica symmetry' (RS) ansatz. The meaning of m and q is deduced from (87) (taking into account the transformation c~/---+ ~/cy; we performed along the way): 1 m -- ~ Z
1
~i((Yi)eq '
2 ((3"i)eq"
q- ~ Z
i
i
5.2.2. Replica symmetric solution Having saddle-points of the simple form (88) leads to an enormous simplification in our calculations. Insertion of (88) as an ansatz into Eqs. (81)-(83) gives
1
f(q, m) - - ~ l o g 2
--41 ~j2 (1 -
-~---~log exp
q)2
+-~Joml 2
q~2j2
~
+~JomZ~ ~
+C(n),
(O'1 ~2 exP( 89q~2j2 [~--~'~cya]2+[3Jom ~--2~cya))~ q -m
(exp(lq[ 32J2 [Y'~a cYa]2+[3J0 m E a cya)), (c~l exp(!2 q~2j2 [~-~.~ (y~] 2 +~Jom ~
'
cy~))~
(exp(lq~2j2 l e a cYu]2q-~J0 m E ~ cr~))~ We linearize the terms [ y ~ (3"a] 2 by introducing a Gaussian integral, and perform the average over the remaining neurons. The solutions m and q turn out to be welldefined for n ---+0 so we can take the limit: limf(q, m ) - - ~ 1l o g 2 - ~1 j 2 ( 1 n--+0
'/
13
1 - q)2 _+__~Jom 2
Dz logcosh [IgJom + f3Jzv/-q] ,
q - / Dz tanh2 [13Join + f3JZv~],
m - / Dz tanh [J3Jom + f3Jzx/~ ] .
Writing the equation for m in integral form gives
m - ~ Join
/o' [ / d)~ 1 -
Dz tanh 2 [)~13Join + ~ JZv~ ] 9
(89) (90)
A.C.C. Coolen
598
I
I
I
I
]
I
I
I
I
1.5
T/J 1
i I
0
I
i
i
I
j
i
J
I
J
i\J
1
.5
i
I
J
i
i
J
1.5
I
2
Jo/J
Fig. 10. Phase diagram of the model (72) with Gaussian synapses, obtained from the replica-symmetric solution. P: paramagnetic phase, m = q = 0 (more or less random evolution). SG: spin-glass phase, m - 0, q # 0 ('frozen' equilibrium states without pattern recall). F: recall ('ferro-magnetic') phase, m # 0, q # 0. Solid lines: second-order transitions. Dashed: the AT instability.
From this expression, in combination with (90), we conclude: T>Jo:
m=O
T > J 0 and T > J :
m=q=0.
Linearization of (90) for small q and m shows the following continuous bifurcations: at J0>J: J0 < J :
T<max{Jo,J}:
T=Jo T=J T=Jo[1-q]
from m=q=0 re=q=0 m=0, q>0
to m-f0, q > 0 m=0, q>0 m#0, q>0
Solving numerically equations T = J 0 [ 1 - q] and (90) leads to the phase diagram shown in Fig. 10.
5.2.3. Breaking of RS: the AT instability If for the replica symmetric solution we calculate the entropy S = ~2~F/O~ numerically, we find that for small temperatures it becomes negative. This is not possible. Firstly, straightforward differentiation shows ~S/~[3- [3[(H)~q(HZ)eq]~<0, so S increases with the noise level T. Let us now write H(~) - H0 + / t ( n ) , where H0 is the ground- state energy and/q(~) ~>0 (zero only for ground-state configurations, the number of which we denote by No ~> 1). We now find
599
Stat&tical mechanics of recurrent neural networks I - statics
limS-lim{logEe-PH(")+[~(H)eq}
T--~O
13--~c~
6
= lim[logEe-I~H(~)+13(/t)eq]~>logN 0. 1~----+OO
We conclude that S~>0 for all T. At small temperatures the RS ansatz (88) is apparently incorrect in that it no longer corresponds to the minimum of f(q,m) (81). If saddle-points without RS bifurcate continuously from the RS one, we can locate the occurrence of this 'replica symmetry breaking' (RSB) by studying the effect on f(q, m) of small fluctuations around the RS solution. It was shown [19] that the 'dangerous' fluctuations are of the form qaf~ ~ 8~f~+ q[1 - 8~f~] + rlaf~,
Eq~pP
0 Va.
(91)
in which q is the solution of (90) and rl~p -rlp~. We now calculate the resulting change in f(q,m), away from the RS value f(qrs,mgs), the leading order of which is quadratic in the fluctuations {q~p} since the RS solution of (90) is a saddle-point: ~3j4
[~j2
f(q, m) - f(qrs, mRS)
2
-~-nZ q~
8n Z Z rip,rio~G~O~ ~r pCz
with ( cY~(YvCrpCrZexp ( 89q132j2 [~--]~cr~]2+ 13mJo}-~.acry) }. Gavp~" -
"
. ..(exP( . 89176
Because of the index permutation symmetry in the above average we can write for -r y and p -r ~: G~ypk -- 8o~pS~,k -+- 8~kS~,p -+- 6411 -- 8~p][1 -- 8%][1 - 8~][1 - By0] +
-
+
-
+
-
+
-
with
Ge - f Dz tanhe [13J0m + [3Jzx/~] cosh" [13J0m + [3Jzv~] f Dz cosh" [13Jom + [3Jzx/ff] Only terms which involve precisely two 8-functions can contribute, because of the requirements o~~: % p ~: ~ and ~ p q~p - 0. As a result: ~j2
f(q, m) -- f(qrs, mRS) -- W
[1--
~2j2
(1-2G2+G4)]Eq~
2
v.
The condition for the RS solution to minimize f(q, m), if compared to the so-called 'replicon' fluctuations (91), is therefore
600
A. C. C. Coolen
1 > ~2j2 lirn(1 - 2 G2 + G4). n 0
After taking the limit in the expressions Ge this condition can be written as 1 > ~2j2 f Dz cosh -4 [~Jom + ~Jzx/~] 9
(92)
The so-called AT line in the phase diagram where this condition ceases to be met, indicates a continuous transition to a complex 'spin-glass' state where ergodicity is broken (i.e. the distribution P(q) (86) is no longer a 8-function). It is shown in figure 10 as a dashed line for Jo/J > 1, and coincides with the line T / J - 1 for J0
6. The Hopfield model near saturation
6.1. Replica analysis We now turn to the Hopfield model with an extensive number of stored patterns, i.e. p -- aN in (40). We can still write the free energy in the form (48), but this will not be of help since here it involves integrals over an extensive number of variables, so that steepest descent integration does not apply. Instead, following the approach of the previous model (72), we assume [18] that we can average the free energy over the distribution of the patterns, with help of the replica-trick (75): - - lim 1 log E e-~ ~ - - ' 14(~). n~0 [3n nl ...nn
Greek indices will denote either replica labels or pattern labels (it will be clear from the context), i.e. ~, [3 - 1 , . . . , n and Ix, v -- 1 , . . . , p . The p x N pattern components { ~ } are assumed to be drawn independently at random from { - 1 , 1}.
6.1.1. Replica calculation of the disorder-averagedfree energy We first add to the Hamiltonian of (30) a finite number g of generating terms, that will allow us to obtain expectation values of the overlap order parameters m~ (41) by differentiation of the free energy (since all patterns are equivalent in the calculation we may choose these t nominated patterns arbitrarily):
O --4 m + Z ~la Z G'i~/g' W-I i
(mla(6))eq "-" ~ir~
F/N.
(93)
We know how to deal with a finite number of overlaps and corresponding patterns, therefore we average only over the disorder that is responsible for the complications: the patterns { ~ + 1 , . . . , ~p} (as in the previous section we denote this disorder-averaging by .-~.). Upon inserting the extended Hamiltonian into the replica-expression for the free energy, and assuming that the order of the limits N -~ e~ and n -* 0 can be interchanged, we obtain for large N:
601
Statistical mechanics of recurrent neural networks I - statics
1
1
FIN - -~~ - -~log 2 -
1
lim
o ~Nn l.t~
x
e ~N~
~
i
Y['~e[[~], ~ ]
.
We linearize the la~
1
.
F/N-2~--~log2-~l~N
1
n
Anticipating that only terms exponential in the system size N will retain statistical relevance in the limit N ---+ oe, we rescale the n x g integration variables m according to m ~ mv/-13N: -
1
1
F/N - -~~ - -~log 2 -
•
lim
1
n~O~3Nn
[~3NI ~ / ~ dine-89 ( }13~'~e~--~;cr~[m~-~~ e
~ / {~} . (94)
Next we turn to the disorder average, where we again linearize the exponent containing the pattern components using the identity (56), with Dz = ( D z l , . . . , Dz,): 2
={f
P--g
Dz~i c~189
= {/Dz
e~}-~z~z~}-~,~P~'~+e(-})}p.
p-e (95)
We are now as in the previous case led to introducing the replica order parameters q~t3:
A. C. C. Coolen
602
1--
__
/
dq l~I~i q ~ g - ~ ~ c y ~ c y ~
E '
N
]
dq dfi e
Inserting (95) and the above identities into (94) and assuming that the limits N ~ cr and n ---, 0 commute gives:
1
1
x exp
([
lim fi'/N- 2 a -- ~ log 2 -- U--~c N---~c lim n--~O lim ~ - 1~ log
/
N i ~ gl:~q~ - -~I3m2 +
~ log
'
la ~
zt
dm dq dq
i
/
1)
z~z~q~
Dz e~~
0t[3
i
{o~}
The n-dimensional Gaussian integral over z factorizes in the standard way after appropriate rotation of the integration variables z, with the result: log /
D z e ~ Y~'~,~"'"~q'~ -
1 log det[l - ~1
2
in which I denotes the n • n identity matrix. The neuron averages factorize and are reduced to single-site ones over the n-replicated neuron o = ( o l , . . . , o,): lim n----~0 lim - ~1 log N----~cclimFIN = 210 t - ~_1log 2 - N---,cx~
N i~O~q~-
xexp
la~
f
13m2-
ct
dm dq d~] ~logdet[I-~q]
ctl3
o
and we arrive at integrals that can be evaluated by steepest descent, following the manipulations (76). If we denote averages over the remaining g patterns in the familiar way - (~,,---,~e),
(r
= 2 -~
y~
r
~E{-I,I} ~
we can write the final result in the form lim
N---+~
f'/N =
lim extrf(m, q, q),
n--+0
(96)
Statistical mechanics of recurrent neural networks I - statics
1 1 f(m, q, q) - ~ a - -~log 2 P
603
1
+ i ~ ft~6qa~- -~13m2- ~ a log det[I - 6q] 9 Having arrived at a saddle-point problem we now first identify the expectation values of the overlaps with (93) (note: extremization with respect to the saddle-point variables and differentiation with respect to k commute): (m~t(~))eq
a --.~01im~im~ extrf (m ,q,~) (~" (lY]a {yaexp(13Y~"~<e~ a cYa~om~- i ~af~ c)a6cYacYf~))~ / = li~
(97)
(e---~;i ; ~ ~ ~ ~ ~ya ~Z m--~~ i--~~--~-0aZ~yacYZi i2
which is to be evaluated in the k = 0 saddle-point. Having served their purpose, the generating fields k, can be set to zero and we can restrict ourselves to the k = 0 saddle-point problem: 1 1 1 [ 1 1 f(m, q, r -- ~ - ~log2 - ~ L i ~ c)~qa~ -- ~ [~m2 -- ~ l o g det[I - 13q]
(98) Variation of the parameters {m~,O~,q~6} gives the saddle-point equations: m~-
~, /exp(13~g_<]~c.~c.~,m~_i~Z~_-~ii- ~ /r
qxp --
((cyxcyoexp( ~ ~.<_e ~--~acya~.m~"- i~a~ Oa~cy~cyO)).) (exp(13~._<e Y]--- 7---Z-g....--
1 [dzz~zoe _ l2z. [l--[~qlz 0)~0 -- ~ i ~ ~ f dz e -lz[I-l~q]z furthermore,
(99)
(lOO)
(101)
604
A. C. C. Coolen
(m~t(a))eq -- lim 1 Z m~ n--,On
(102)
replaces the identification (97). As expected, one always has q~ = 1. The diagonal elements ~ drop out of (99) and (100), their values are simply given as functions of the remaining parameters by (101). 6.1.2. Physical &terpretation of saddle po&ts
We proceed along the lines of the Gaussian model (72). If we apply the alternative version (84) of the replica trick to the Hopfield model, we can write the distribution of the g overlaps m = (ml,... ,me) in equilibrium as ~ ~ 8 [m - ~1 ~ . ~ , i ]i~e_13n(,,, )
P(m) - lim 1 ~ n--,0/1
7
... "
with ~ i - ( ~ ] , ' ' ' , ~ff)" Averaging this distribution over the disorder leads to expressions identical to those encountered in evaluating the disorder averaged free energy. By inserting the same delta-functions we arrive at the saddle-point integration (96) and (98) and find P(m) - lim 1 y ~ ~5[m- m~]
n~0 n
(103)
7
where my = (m~,..., m,t ) refers to the relevant solution of (99)-(101). Similarly we imagine two systems n and ~' with identical realization of the interactions {Jij}, both in thermal equilibrium, and use (84) to rewrite the distribution P(q) for the mutual overlap between the microstates of the two systems P(q)--lim
1
.~on(n-
Z 1)
8 [q - 1 ~ . ~cr[] H e-13/4(~/
~,...~.
.
Averaging over the disorder again leads to the steepest descend integration (96) and (98) and we find P(q) = lim 1 ,--.o ,(n - 1)
5 [q - q~v],
(104)
where {qzv} refers to the relevant solution of (99)-(101). Finally we analyze the physical meaning of the conjugate parameters {q~l~} for -r 13. We will do this in more detail, the analysis being rather specific for the Hopfield model and slightly different from the derivations above. Again we imagine two systems a and a' with identical interactions {Jij}, both in thermal equilibrium. We now use (84) to evaluate the covariance of the overlaps corresponding to nonnominated patterns:
605
Statistical mechanics of recurrent neural networks I - statics
r---
O'i ~il;t
~ It:g+ 1
i
O"i~/it ) eq
"
/ eq
[l~i ]b~i
=limN-g/ex
]~
(105)
(using the equivalence of all such patterns). We next perform the same manipulations as in calculating the free energy. Here the disorder average involves
=
{j. Dz e (~--~)~Z~ , ~ i 07~i / p-<-I / oz j
=
Dze(~)
2~z~2i~ ~
02
~ az~ez~
'
f
z~z~ ~(~) ~ z ~ <~,
Dz ~---
~
(after partial integration). We finally obtain an expression which involves the surface
(98):
f am dq dfi,L
j e-~nNf(m'q'cl)
I f dz zkZpc-lz'[1-]3q]z]
1 lim
1
~
lim
f dm dq dq e-13nNf(m,q,q)
The normalization of the above integral over {m, q, r follows from using the replica procedure to rewrite unity. The integration being dominated by the minima o f f , we can use the saddle-point Eq. (101) to arrive at
n-~On(n1 1) Z lim
-
1
(106)
q;cp -- ~ ie~ 2r.
The result (105) and (106) provides a physical interpretation of the order parameters Ergodicity implies that the distributions P(q) and P(m) are 8-functions, this is equivalent to the relevant saddle-point being of the form: m~-mit,
qvo-Svn+q[1-Svn],
1 [1 - 6v0]] , c)vo-~ia[32[RSvn+r
(107)
which is the RS ansatz for the Hopfield model. The RS form for {q~} and {m~} is a direct consequence of the corresponding distributions being 8-functions, whereas the RS form for {0~} subsequently follows from (101). The physical meaning of mit and q is
606
A. C. C. Coolen
m~t -- (m~(~))eq ,
1
2
q ~---N ~
(~i)eq" i
Before proceeding with a full analysis of the RS saddle-point equations, we finally make a few tentative statements on the phase diagram. For 13- 0 we obtain the trivial result qxp - 8~p, Oxp - 0, m~ = 0. We can identify continuous bifurcations to a nontrivial state by expanding the saddle-point equations in first-order in the relevant parameters: m~ - ~3m~ + . . . ,
qzo - - 2 iOzo + . . . ( k
r p),
1 i0t[~ [ [~ 8Z~]] + 0~o=21ZI3 8~0 + 1 - 1 3 q~p[1 . . . . Combining the equations for q and q gives q~o = ~
qxo + " " Thus we expect a
continuous transition at 7' = 1 + v ~ from the trivial state to an ordered state where q~o -r 0, but still (m~)eq - 0 (a spin-glass state). 6.2. Replica symmetric solution and AT-instability
The symmetry of the ansatz (107) for the saddle-point allows us to diagonalize the matrix A - I-13(I which we encountered in the saddle-point problem, A~[~ - [1 - 13(1 - q)]8~13 - 13q: eigenspace
eigenvalue
multiplicity
x-(I,...,1)
1-13(1-q)-~3qn
~-~'~x~ - 0
1 - 13(1 - q)
1
n- 1
so that log det A -
log[1 - 13(1 - q ) -
~qn] + ( n - l)log[1 - ]3(1 - q ) ]
= n log[l - 13(1 - q)] - 1 - 13(1 - q) + (9(n2)" Inserting the RS ansatz (107) for the saddle-point into (98), utilizing the above expression for the determinant and the shorthand m = ( m l , . . . ,me), gives 1
f(mRS, qRS, qRS) -- -- ~ l o g 2 + 1~[1 + 13r(1 - q)] |
+5
+~-~ l o g [ 1 - 1 3 ( 1 - q ) ] - l _ 1 3 ( 1 _ q )
1 ( ( 13n log e [~m~~
o~ +s~rl3, ~
-
[Y~'~~
2))
-1 +(9(n)
We now linearize the squares in the neuron averages with (56), subsequently average over the replicated neuron t~, use coshn[x] - 1 + n log cosh[x] + (9(n2), and take the limit n -+ 0:
607
Stat&tical mechanics of recurrent neural networks I - statics
lim FRS/N - lim f (mRS, qRS, {]RS) N--+~ n---~0 = ~1m 2 + ~1a 1 - ~
[l + [ 3 r ( 1 - q ) + ~
1 l o g [ l - 13(1 - q ) ]
Dz log 2 cosh 13Ira. ~ + z v ~
>
- 1 _ 13(1 _ q)
q J
.
(108)
The saddle-point equations for m, q and r can be obtained either by insertion of the RS ansatz (107) into (99)-(101) and subsequently taking the n ~ 0 limit, or by variation of the RS expression (108). The latter route is the fastest one. After performing partial integrations where appropriate we obtain the final result:
m-({fDztanhf~[m.{+zvGT]>, q=
tanh213[m.~+z~>
r-q[1-~(1-q)]
,
(109)
-2.
(110)
By substitution of the equation for r into the remaining equations this set can easily be further reduced, should the need arise. In case of multiple solutions of (109) and (110) the relevant saddle-point is the one that minimizes (108). Clearly for a = 0 we recover our previous results (50) and (51).
6.2.1. Analysis of RS order parameter equations and phase diagram We first establish an upper bound for the temperature T = 1/13 for nontrivial solutions of the set (109) and (110) to exist, by writing (109) in integral form: mvt
- 13<~,(~m)fol dk/
Dz[1 - tanh 213(~r 9m + z v ~ ) ] }~
from which we deduce
< fol/
0 - m 2 - 13 (~-m) 2
dE
Dz[1 - tanh 213(E~,.m + zv/~)]
>
Therefore m = 0 for 7' > 1. If 7" > 1 we obtain in turn from (110), using tanh 2 (x) < x 2 and 0 _< q _< 1" q - 0 or q _< 1 + ~ 7". We conclude that q - 0 for 7' > 1 + v~. Secondly, for the free energy (108) to be well defined we must require q > 1 - 7". Linearization of (109) and (110) for small q and m shows the continuous bifurcations: at
from
to
cx>0:
T=l+v/~
m=O,q=O
m=O,q>O
~x=O:
T= 1
m=O,q--O
m:/:O,q>O
608
A. C. C. Coolen
The upper bound T = 1 + x/~ turns out to be the critical noise level indicating (for cx > 0) a continuous transition to a spin-glass state, where there is no significant alignment of the neurons in the direction of one particular pattern, but still a certain degree of local freezing. Since m = 0 for T > 1 this spin-glass state persists at least down to T = 1. The quantitative details of the spin-glass state are obtained by inserting m = 0 into (110) (since (109) is fulfilled automatically). The impact on the saddle-point Eqs. (109) and (110) of having a > 0, a smoothening of the hyperbolic tangent by convolution with a Gaussian kernel, can be viewed as noise caused by interference between the attractors. The natural strategy for solving (109) and (110) is therefore to make an ansatz for the nominated overlaps m of the type (52) (the mixture states). Insertion of this ansatz into the saddle-point equations indeed leads to self-consistent solutions. One can solve numerically the remaining equations for the amplitudes of the mixture states and evaluate their stability by calculating the eigenvalues of the second derivative of f ( m , q, ~]), in the same way as for ~ = 0. The calculations are just more involved. It then turns out that even mixtures are again unstable for any T and a, whereas odd mixtures can become locally stable for sufficiently small T and a. Among the mixture states, the pure states, where the vector m has only one nonzero component, are the first to stabilize as the temperature is lowered. These pure states, together with the spin-glass state (m = 0, q > 0), we will study in more detail. Let us first calculate the second derivatives of (108) and evaluate them in the spin-glass saddle-point. One finds, after elimination of r with (110): a2f /am~am,, - 8pv[1 - 13(1 - q)],
a2f /am~aq = o.
The (g + 1) x (g + 1) matrix of second derivatives with respect to variation of (m, q), evaluated in the spin-glass saddle-point, thereby acquires a diagonal form 1 -
[3(1 -
q) ,
82f =
.. 1 - [3(1 - q)
O2f/Oq2
and the eigenvalues can simply be read off. The g-fold degenerate eigenvalue 1 - [ 3 ( 1 - q) is always positive (otherwise (108) would not even exist), implying stability of the spin-glass state in the direction of the nominated patterns. The remaining eigenvalue measures the stability of the spin-glass state with respect to variation in the amplitude q. Below the critical noise level T = 1 + x/~ it turns out to be positive for the spin-glass solution of (110) with nonzero q. One important difference between the previously studied case cx = 0 and the present case cx > 0 is that there is now an m = 0 spin-glass solution which is stable for all T < 1 + x/~. In terms of information processing this implies that for a > 0 an initial state must have a certain nonzero overlap with a pattern to evoke a final state with m r 0, in order to avoid ending up in the m = 0 spin-glass state. This is clearly consistent with the observations in Fig. 5. In contrast, for a = 0, the state with m - 0 is unstable, so any initial state will eventually lead to a final state with m :/: 0.
Stathctical mechanics of recurrent neural networks I - statics
.8
609
f
\k\\\\\ ......
\, \k
.6 m
f
/
.4
~ I
-.55L-
0
0
.2
.4
T
.6
.8
-.6
1
0
i'-i
i\ i \\\
i'\i \\\ ~ :, T, i'xi \\\~
:: \i
.2
~I
.4
T
.6
.8
~[~
1
Fig. 11. Left: RS amplitudes m of the pure states of the Hopfield model versus temperature. From top to bottom: ct = 0.000- 0.125 (Act -0.025). Right, solid lines: 'free energies' f of the pure states. From bottom to top: a t - 0.000- 0.125 (Act = 0.025). Right, dashed lines: 'free energies' of the spin-glass state m = 0 (for comparison). From top to bottom: ct = 0.000 - 0.125 (Act = 0.025).
Inserting the pure state ansatz m = re(l, 0 , . . . , 0) into our RS equations gives m-
Dz tanh 1 3 m + l _ 1 3 ( 1 _ q ) .
'
q-
Dz tanh 2 1 3 m + l _ 1 3 ( 1 _ q ) -
'
(lll)
l m2 f--~ 13
1 [ +~at (l-q)
1 + 13(1- q ) ( 1 3 - 2 ) l l o g [ 1 _ 13(1- q)]] [1-13(1-q)]2 +
Dz log 2 cosh 13m + 1 - 13(1 - q) "
(112)
If we solve Eq. (111) numerically for different values of {z, and calculate the corresponding 'free energies' f (112) for the pure states and the spin-glass state m = 0, we obtain Fig. 11. For ct > 0 the nontrivial solution m for the amplitude of the pure state appears discontinously as the temperature is lowered, defining a critical temperature TM (:t). Once the pure state appears, it turns out to be locally stable (within the RS ansatz). Its 'free energy' f , however, remains larger than the one corresponding to the spin-glass state, until the temperature is further reduced to below a second critical temperature Tc(cz). For T < Tc(Gt) the pure states are therefore the equilibrium states in the thermodynamics sense. By drawing these critical lines in the ({z,T) plane, together with the line Tg(~) = 1 + v ~ which signals the second-order transition from the paramagnetic to
610
A. C. C. Coolen
the spin-glass state, we obtain the RS phase diagram of the Hopfield model, depicted in Fig. 12. Strictly speaking the line TM would appear meaningless in the thermodynamic picture, only the saddle-point that minimizes f being relevant. However, we have to keep in mind the physics behind the formalism. The occurrence of multiple locally stable saddle-points is the manifestation of ergodicity breaking in the limit N ~ oo. The thermodynamic analysis, based on ergodicity, therefore applies only within a single ergodic component. Each locally stable saddle-point is indeed relevant for appropriate initial conditions and time-scales.
6.2.2. Zero temperature, storage capacity The storage capacity mc of the Hopfield model is defined as the largest m for which locally stable pure states exist. If for the m o m e n t we neglect the low temperature reentrance peculiarities in the phase diagram (12) to which we will come back later, the critical temperature TM(m), where the pure states appear decreases monotonically with m, and the storage capacity is reached for T - 0. Before we can put T ~ 0 in (111), however, we will have to rewrite these equations in terms of quantities with well-defined T ~ 0 limits, since q ~ 1. A suitable quantity is C - 13(1 - q ) , which obeys 0 < C <_ 1 for the free energy (108) to exist. The saddle-point equations can now be written in the form
1.5~ ,
,
,
,
I
9 ,
, -,
I
,
,
,
,
,
,
SG
r. TR
0
.05
.1
.15
,
-]
//
Cbc
.5
of
,
0.15
9
.2
Fig. 12. Phase diagram of the Hopfield model. P: paramagnetic phase, m = q = 0 (no recall). SG: spin-glass phase, m = 0, q-J-0 (no recall). F: pattern recall phase (recall states minimise f), m r 0, q -7(=0. M: mixed phase (recall states are local but not global minima off). Solid lines: separations of the above phases (Tg: second-order, TM and To: first-order). Dashed: the AT instability for the recall solutions (TR). Inset: close-up of the low temperature region.
Statistical mechanics o f recurrent neural networks I - statics
m=
Dz tanh ]3m +
611
[
C-~mm
D z t a n h 13m + 1
'
in which the limit T ~ 0 simply corresponds to tanh(13x) ~ sgn(x) and q -+ 1. After having taken the limit we perform the Gaussian integral: [m(1 - C)
m - eft[
-~-
C
(1 - C) ~/_-~ e -rn2
( l - C ) 2/2~
V ~7~
This set can be reduced to a single transcendental equation by introducing x=m(1-C)/~/2-d: xx/2 ~ -- F(x) ,
F(x) -- erf(x)
v2x~ e _x2.
(113)
Eq. (113) is solved numerically (see Fig. 13). Since F(x) is antisymmetric, solutions come in pairs ( x , - x ) (reflecting the symmetry of the Hamiltonian of the system with respect to an overall state-flip a ---+ - ~ ) . For a < ~c ~ 0.138 there indeed exist pure state solutions x # 0. For a > ac there is only the spin-glass solution x = 0. Given a solution x of (113), the zero temperature values for the order parameters follow from
limm erf l lime [l+? eX2]
T-+O
~
-!
T--+O
with which in turn we can take the zero temperature limit in our expression (112) for the free energy: 9
l~f
1
1
2
- ~erf2[x] + - e -x2 - -
e x2 + V/~]
xv/-fferf[x]
+
e-X2].
Comparison of the values for limr_-+0f thus obtained, for the pure state m > 0 and the spin-glass state m = 0 leads to Fig. 13, which clearly shows that for sufficiently small a the pure states are the true ground states of the system. 6.2.3. The AT-instability
As in the case of the Gaussian model negative entropies at sufficiently low broken. We can locate continuous RS (98) of small replicon [19] fluctuations q~ ~ ~
+ q[1 - 8~] + rl~,
(72), the above RS solution again generates temperatures, indicating that RS must be breaking by studying the effect on f ( m , q, q) around the RS solution:
rl~-rl[~,
rl~-0,
Zrl~-0.
(114)
The variation of q induces a similar variation in the conjugate parameters r through Eq. (101): 1 76
-
Fig. 13. Left: solution of the transcendental equation F ( x ) = x a , wherex = erf1"'(m). The storage capacity IX, 0.138 of the Hopfield model is the largest a for which solutions x # 0 exist. Middle picture: RS amplitudes rn of the pure states of the Hopfield model for T = 0 as a function of a = p / N . The location of the discontinuity, where rn vanishes, defines the storage capacity a, 0.138. Right picture, solid line: T = 0 'free energy' f of the pure states. Dashed lines: T = 0 'free energy' of the spin-glass state m = 0 (for comparison).
-
613
Statistical mechanics of recurrent neural networks I - statics
with f dz zazf~zTz6e-lz'[1-f3qRs]z
g~13r6 -
f d z e- 89z'[1-13qRs]z
f dz z~zl3e-lz[1-13q~]z g~ -
'
f dze -lz'[1-13qRs]z
Wick's theorem (see e.g. [4]) can now be used to write everything in terms of second moments of the Gaussian integrals only:
with which we can express the replicon variation in q, using the symmetry of { q ~ } and the saddle-point Eq. (101), as
78 = ~32~-~[RS~ r + r[1 - 8~v]] fly6 [R86[~ + r[1 - 8613]] v#6 - ~ 2 ( R - r)2qa6
(115)
since only those terms can contribute which involve precisely two &symbols, due to )--~ rl~ = 0. We can now calculate the change in f ( m , q, q), away from the RS value f(mRS, qRS,qRS), the leading order of which must be quadratic in the fluctuations {rl~} since the RS solution is a saddle-point: f(mRs, q, q) - f (mRs, qRS, qRS) 1 [ 1 det[1 - 13(qRS + q)] - i T r [ q R s " q] = 13n [ 2 a log det[1 - 13qRS]
(
(el3~-mRs~ cra--icr'[qRs +89
a) I " (116)
+ ~j32Tr[/l" q +/1" qRS]-- log
Evaluating (116) is simplified by the fact that the matrices qRS and 11 commute, which is a direct consequence of the properties (114) of the replicon fluctuations and the form of the replica-symmetric saddle-point. If we define the n • n matrix P as the projection onto the vector ( 1 , . . . , 1), we have Pc,13 = n-i,
P - q = q - P = 0, qRs
qRS " 111-- II " qRS --- (1 [1
-
--
-- (1 -
q)l + nqP, (117)
q)11
13qRS]-1_ 1 -- 1 - 13(1 - q ) 1 + [1 - 13(1 - q ) -
We can now simply log det M = Tr log M:
expand
the
relevant
fJnq
p.
j3nq][1 - 13(1 - q ) ] terms,
using
the
identity
614
A.C.C. Coolen
log
"1 det[l - [3(qRS + 11)] det[1- ]~qRS] = Tr logLlr _ 131111- [3qRS]-lJ
_-~r {-~,I.- ~ 1 ' _~1~ I,I.-~,Rsl- 11~) + ~,3/ 1
=-
[3~
2 {1 - [3(1 - q)12
Tr112 + (9(113)
(118)
Finally we address the remaining term in (116), again using the RS saddle-point Eqs. (109) and (110) where appropriate:
log
(elSe'mRs~ (Y~--/tI'IiRSli)li
- 1 ~132Tr[fl " qRs] +
1
(~2[34Z
. . . + Tl~l~rlYa[G~6Ya. H~I3Ya]
(119)
~6ya with
Gex[3y8_ { ((3"~(Y[3(YY~6 e[3{mRsE~ (Y~--i~'/IRS")~;
H~ya -
/ (O" O'[3e [3~'mRsE ey~--ili'/IRSa)~(O'y(3"6 e6r mRsE~ (Y~--i"'IiRS")t~/ (e [~----~-m R----sZ: o"-~---i----~cl RSO"---~; i ~----~. e inR'---'~S; CY---~---i-RS -~'CO'l------~; "
Inserting the ingredients ((115), (117)-(119)) into expression (116) and rearranging terms shows that the linear terms indeed cancel, and that the term involving H~l~ya does not contribute (since the elements H~ya do not depend on the indices for e~ =/= [3 and 7 r ~5), and we are left with: f(mRS, el, q) -- f(mRS, qRS, qRS) _ 1 - ]3--n
{
1 ~]3 2 4[1 - [3(1 - q)]2
1 ex2[38(R r)4 8 Z
Tr 112 _~_ (~4
~ya
~
(R
r)2 -
Tr
112
] rl~[~rlvaG~[~va + " "
Because of the index permutation symmetry in the neuron average we can write for r 7 and p r ~L:
615
Statistical mechanics of recurrent neural networks I - statics
with
Ge _ ( f Dztanhe f3[m "{ + zv/~7] cosh" f3[m " { + z ~ f D z c o s h " 13[m. { + z x / ~
) . g
Only terms which involve precisely two 8-functions can contribute, because of the replicon properties (114). As a result: f(mRS, q. I]) -- f (mRS. qRS. t]RS)
1
= 13nTr
1
112 [
~132
1
411 - 13(1 _q)]2 -[-2 ~[~4(R - r)2
-~ 1 ~21~8(R - r)4 [1 - 2G2 + G4]j] +-" Since Tr 112 - ~a[3 ]]2 the condition for the RS solution to minimize f ( m . q. q). if compared to the 'replicon' fluctuations, is therefore -
1 + 2 132(R - r) 2 - a136(R - r)4[1 - 2 G2 + G4] > 0. [1 - 13(1 - q)]2
After taking the limit in the expressions
(120)
Ge and after evaluating
lim R - 1 lim g ~ = lim 1 f dz zZe - 89 .40 ~.--+0 n--+0nl3 f dz e- 89z'[I-[3qRs]z =lim 1 [
n-1
n~0n-~ 1 . 13(1 . . q). +1
1
13(1
] q+nq)
1 1-13+213q -- - 13[1 - 13(1 - q)]2
and using (110), condition (120) can be written as [1_13( 1 _ q)]2 > a l 3 2 ( / D z cosh-4 i~[m . ~ + z x / ~ )
(121)
The AT line in the phase diagram, where this condition ceases to be met, indicates a second-order transition to a spin-glass state, where ergodicity is broken in the sense that the distribution P(q) (104) is no longer a 8-function. In the paramagnetic regime of the phase diagram, m = 0 and q = 0, the AT condition reduces precisely to T > Tg = 1 + x/~. Therefore the paramagnetic solution is stable. The AT line coincides with the boundary between the paramagnetic and spin-glass phase. Numerical evaluation of (121) shows that the RS spin-glass solution remains unstable for all T < Tg, but that the retrieval solution m r is unstable only for very low temperatures T < TR (see Fig. 12).
7. Epilogue In this paper I have tried to give a self-contained expos6 of the main issues, models and mathematical techniques relating to the equilibrium statistical mechanical
616
A. C. C. Coolen
analysis of recurrent neural networks. I have included networks of binary neurons and networks of coupled (neural) oscillators, with various degrees of synaptic complexity (albeit always fully connected), ranging from uniform synapses, via synapses storing a small number of patterns, to Gaussian synapses and synapses encoding an extensive number of stored patterns. The latter (complex) cases I only worked out for binary neurons; similar calculations can be done for coupled oscillators (see [16]). Networks of graded response neurons could not be included, because these are found never to go to (detailed balance) equilibrium, ruling out equilibrium statistical mechanical analysis. All analytical results and predictions have later also been confirmed comprehensively by numerical simulations. Over the years we have learned an impressive amount about the operation of recurrent networks by thinking in terms of free energies and phase transitions, and by having been able to derive explicit analytical solutions (since a good theory always supersedes an infinite number of simulation experiments ...). I have given a number of key references along the way; many could have been added but were left out for practical reasons. Instead I will just mention a number of textbooks in which more science as well as more references to research papers can be found. Any such selection is obviously highly subjective, and I wish to apologize beforehand to the authors which I regret to have omitted. Several relevant review papers dealing with the statistical mechanics of neural networks can be found scattered over the three volumes [20-22]. Textbooks which attempt to take the interested but nonexpert reader towards the expert level are [8,23]. Finally, a good introduction to the methods and backgrounds of replica theory, together with a good collection of reprints of original papers, can be found in [24]. What should we expect for the next decades, in the equilibrium statistical mechanics of recurrent neural networks? Within the confined area of large symmetric and fully connected recurrent networks with simple neuron types we can now deal with fairly complicated choices for the synapses, inducing complicated energy landscapes with many stable states, but this involves nontrivial and cutting-edge mathematical techniques. If our basic driving force next is the aim to bring our models closer to biological reality, balancing the need to retain mathematical solvability with the desire to bring in more details of the various electro-chemical processes known to occur in neurons and synapses and spatio-temporal characteristics of dendrites, the boundaries of what can be done with equilibrium statistical mechanics are, roughly speaking, set by the three key issues of (presence or absence of) detailed balance, system size, and synaptic interaction range. The first issue is vital: no detailed balance immediately implies no equilibrium statistical mechanics. This generally rules out networks with nonsymmetric synapses and all networks of graded response neurons (even when the latter are equipped with symmetric synapses). The issue of system size is slightly less severe; models of 1 networks with N < ~ neurons can often be solved in leading order in N-~, but a price will have to be paid in the form of a reduction of our ambition elsewhere (e.g. we might have to restrict ourselves to simpler choices of synaptic interactions). Finally, we know how to deal with fully connected models (such as those discussed in this paper), and also with models having dendritic structures which cover a long
Statistical mechanics of recurrent neural networks I - statics
617
(but not infinite) range, provided they vary smoothly with distance. We can also deal with short-range dendrites in one-dimensional (and to a lesser extent twodimensional) networks; however, since even the relatively simple Ising model (mathematically equivalent to a network of binary neurons with uniform synapses connecting only nearest-neighbor neurons) has so far not yet been solved in three dimensions, it is not realistic to assume that analytical solution will be possible soon of general recurrent neural network models with short range interactions. On balance, although there are still many interesting puzzles to keep theorists happy for years to come, and although many of the model types discussed in this text will continue to be useful building blocks in explaining at a basic and qualitative level the operation of specific recurrent brain regions (such as the CA3 region of the hippocampus), one is therefore led to the conclusion that equilibrium statistical mechanics has by now brought us as far as can be expected with regard to increasing our understanding of biological neural networks. Dale's law already rules out synaptic symmetry, and thereby equilibrium statistical mechanics altogether, so we are forced to turn to dynamical techniques if we wish to improve biological realism.
Acknowledgements It is my pleasure to thank David Sherrington and Nikos Skantzos for their direct and indirect contributions to this review. References 1. Van Kampen, N.G. (1992) Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam. 2. Gardiner, C.W. (1994) Handbook of Stochastic Methods. Springer, Berlin. 3. Khalil, H.K. (1992) Nonlinear Systems. MacMillan, New York. 4. Zinn-Justin, J. (1993) Quantum Field Theory and Critical Phenomena. U.P., Oxford. 5. Yeomans, J.M. (1992) Statistical Mechanics of Phase Transitions. U.P., Oxford. 6. Plischke, M. and Bergersen, B. (1994) Equilibrium Statistical Mechanics. World Scientific, Singapore. 7. Peretto, P. (1984) Biol. Cybern. 50, 51. 8. Peretto, P. (1992) An Introduction to the Theory of Neural Computation. U.P., Cambridge. 9. Hopfield, J.J. (1982) Proc. Natl. Acad. Sci. USA. 79, 2554. 10. Hebb, D.O. (1949) The Organization of Behaviour. Wiley, New York. 11. Amari, S.-I. (1977) Biol. Cybern. 26, 175. 12. Amit, D.J., Gutfreund, H. and Sompolinsky, H. (1985) Phys. Rev. A 32, 1007. 13. Fontanari, J.F. and K6berle, R. (1988) J. Physique 49, 13. 14. Kuramoto, Y. (1984) Chemical Oscillations, Waves and Turbulence. Springer, Berlin. 15. Abramowitz, M. and Stegun, I.A. (1972) Handbook of Mathematical Functions. Dover, New York. 16. Cook, J. (1989) J. Phys. A 22, 2057. 17. Sherrington, D. and Kirkpatrick, S. (1975) Phys. Rev. Lett. 35, 1972. 18. Amit, D.J., Gutfreund, H. and Sompolinsky, H. (1985) Phys. Rev. Lett. 55. 19. de Almeida, J.R.L. and Thouless, D.J. (1978) J. Phys. A 11, 983. 20. Domany, E., van Hemmen, J.L. and Schulten, K. eds (1991) Models of Neural Networks I. Springer, Berlin. 21. Domany, E., van Hemmen, J.L. and Schulten, K. eds (1994) Models of Neural Networks II. Springer, Berlin.
618
A. C. C. Coolen
22. Domany, E., van Hemmen, J.L. and Schulten, K. eds (1995) Models of Neural Networks III. Springer, Berlin. 23. Coolen, A.C.C. and Sherrington, D. (2000) Statistical Physics of Neural Networks. U.P., Cambridge. 24. M+zard, M., Parisi, G. and Virasoro, M.A. (1987) Spin-Glass Theory and Beyond. World Scientific, Singapore.
C H A P T E R 15
Statistical Mechanics of Recurrent Neural Networks I I - Dynamics
A.C.C. C O O L E N Department of Mathematics, King's College London, Strand, London WC2R 2LS, UK
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
619
Contents
1.
Introduction
2.
A t t r a c t o r neural networks with binary neurons
.................................................
621
............................
2.1. Closed macroscopic laws for sequential dynamics
3.
627
2.3. Closed macroscopic laws for parallel dynamics . . . . . . . . . . . . . . . . . . . . . . . . .
632
2.4. Application to separable a t t r a c t o r networks . . . . . . . . . . . . . . . . . . . . . . . . . . .
635
A t t r a c t o r neural networks with c o n t i n u o u s neurons
637
.........................
......................................
637
3.2. Application to graded response a t t r a c t o r networks . . . . . . . . . . . . . . . . . . . . . . .
641
C o r r e l a t i o n and response functions
646
4.1.
...................................
Fluctuation--dissipation theorems
.................................
646
4.2. Example: simple a t t r a c t o r networks with binary neurons . . . . . . . . . . . . . . . . . . .
650
4.3. Example: graded response neurons with uniform synapses
653
..................
Dynamics in the complex regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
655
5.1. Overview of methods and theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
655
5.2. Generating functional analysis for binary neurons
.......................
659
5.3. Parallel dynamics Hopfield model near saturation
.......................
667
5.4. Extremely diluted a t t r a c t o r networks near saturation 6.
623
2.2. Application to separable a t t r a c t o r networks . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. Closed macroscopic laws
5.
.......................
623
Epilogue
...................................................
Acknowledgements References
.............................................
.....................................................
620
.....................
675 682 683 683
1. Introduction
This paper, on solving the dynamics of recurrent neural networks using nonequilibrium statistical mechanical techniques, is the sequel of[l], which was devoted to solving the statics using equilibrium techniques. I refer to [1] for a general introduction to recurrent neural networks and their properties. Equilibrium statistical mechanical techniques can provide much detailed quantitative information on the behavior of recurrent neural networks, but they obviously have serious restrictions. The first one is that, by definition, they will only provide information on network properties in the stationary state. For associative memories, for instance, it is not clear how one can calculate quantities like sizes of domains of attraction without solving the dynamics. The second, and more serious, restriction is that for equilibrium statistical mechanics to apply the dynamics of the network under study must obey detailed balance, i.e. absence of microscopic probability currents in the stationary state. As we have seen in [1], for recurrent networks in which the dynamics take the form of a stochastic alignment of neuronal firing rates to postsynaptic potentials which, in turn, depend linearly on the firing rates, this requirement of detailed balance usually implies symmetry of the synaptic matrix. From a physiological point of view this requirement is clearly unacceptable, since it is violated in any network that obeys Dale's law as soon as an excitatory neuron is connected to an inhibitory one. Worse still, we saw in [1] that in any network of graded-response neurons detailed balance will always be violated, even when the synapses are symmetric. The situation will become even worse when we turn to networks of yet more realistic (spike-based) neurons, such as integrate-andfire ones. In contrast to this, nonequilibrium statistical mechanical techniques, it will turn out, do not impose such biologically nonrealistic restrictions on neuron types and synaptic symmetry, and they are consequently the more appropriate avenue for future theoretical research aimed at solving biologically more realistic models. The common strategy of all nonequilibrium statistical mechanical studies is to derive and solve dynamical laws for a suitable small set of relevant macroscopic quantities from the dynamical laws of the underlying microscopic neuronal system. In order to make progress, as in equilibrium studies, one is initially forced to pay the price of having relatively simple model neurons, and of not having a very complicated spatial wiring structure in the network under study; the networks described and analyzed in this paper will consequently be either fully connected, or randomly diluted. When attempting to obtain exact dynamical solutions within this class, one then soon finds a clear separation of network models into two distinct complexity classes, reflecting in the dynamics a separation which we also found in the statics. In statics one could get away with relatively simple mathematical techniques as long as the number of attractors of the dynamics was small compared to the number N of 621
622
A.C.C. Coolen
neurons. As soon as the number of attractors became of the order of N, on the other hand, one entered the complex regime, requiring the more complicated formalism of replica theory. In dynamics we will again find that we can get away with relatively simple mathematical techniques as long as the number of attractors remains small, and find closed deterministic differential equations for macroscopic quantities with just a single time argument. As soon as we enter the complex regime, however, we will no longer find closed equations for one-time macroscopic objects: we will now have to work with correlation and response functions, which have two time arguments, and turn to the less trivial generating functional techniques. 1 In contrast to the situation in statics [1], I cannot in this paper give many references to textbooks on the dynamics, since these are more or less nonexistent. There would appear to be two reasons for this. Firstly, in most physics departments nonequilibrium statistical mechanics (as a subject) is generally taught and applied far less intensively than equilibrium statistical mechanics, and thus the nonequilibrium studies of recurrent neural networks have been considerably less in number and later in appearance in literature than their equilibrium counterparts. Secondly, many of the popular textbooks on the statistical mechanics of neural networks were written around 1989, roughly at the point in time where nonequilibrium statistical mechanical studies just started being taken up. When reading such textbooks one could be forgiven for thinking that solving the dynamics of recurrent neural networks is generally ruled out, whereas, in fact, nothing could be further from the truth. Thus the references in this paper will, out of necessity, be mainly to research papers. I regret that, given constraints on page numbers and given my aim to explain ideas and techniques in a lecture notes style (rather than display encyclopedic skills), I will inevitably have left out relevant references. Another consequence of the scarce and scattered nature of the literature on the nonequilibrium statistical mechanics of recurrent neural networks is that a situation has developed where many mathematical procedures, properties and solutions are more or less known by the research community, but without there being a clear reference in literature where these were first formally derived (if at all). Examples of this are the fluctuation-dissipation theorems (FDTs) for parallel dynamics and the nonequilibrium analysis of networks with graded response neurons; often the separating boundary between accepted general knowledge and published accepted general knowledge is somewhat fuzzy. The structure of this paper mirrors more or less the structure of[l]. Again I will start with relatively simple networks, with a small number of attractors (such as systems with uniform synapses, or with a small number of patterns stored with Hebbian-type rules), which can be solved with relatively simple mathematical techniques. These will now also include networks that do not evolve to a stationary
A brief note about terminology: strictly speaking, in this paper we will apply these techniques only to models in which time is measured in discrete units, so that we should speak about generating functions rather than generating functionals. However, since these techniques can and have also been applied intensively to models with continuous time, they are in literature often referred to as generating functional techniques, for both discrete and continuous time.
Statistical mechanics of recurrent neural networks H - dynamics
623
state, and networks of graded response neurons, which could not be studied within equilibrium statistical mechanics at all. Next follows a detour on correlation and response functions and their relations (i.e. FDTs), which serves as a prerequisite for the last section on generating functional methods, which are indeed formulated in the language of correlation and response functions. In this last, more mathematically involved, section I study symmetric and nonsymmetric attractor neural networks close to saturation, i.e. in the complex regime. I will show how to solve the dynamics of fully connected as well as extremely diluted networks, emphasizing the (again) crucial issue of presence (or absence) of synaptic symmetry, and compare the predictions of the (exact) generating functional formalism to both numerical simulations and simple approximate theories. 2. Attractor neural networks with binary neurons
The simplest nontrivial recurrent neural networks consist of N binary neurons ~; c { - 1 , 1} (see [1]) which respond stochastically to postsynaptic potentials (or local fields) hi(e), with 6 = (Cyl,..., CYN). The fields depend linearly on the instantaneous neuron states, h;(6) = Y~'~yJijcyj+ 0;, with the Jij representing synaptic efficacies, and the 0; representing external stimuli and/or neural thresholds.
2.1. Closed macroscopic laws for sequential dynamics First I show how for sequential dynamics (where neurons are updated one after the other) one can calculate, from the microscopic stochastic laws, differential equations for the probability distribution of suitably defined macroscopic observables. For mathematical convenience our starting point will be the continuous-time master equation for the microscopic probability distribution pt(6) pt(6)
-
Z{wi(Fi.)pt(Fia) i
- wi(")pt(a)},
Wi(6) -- ~[l -- (5"i tanh[[3hi(~)]]
(1) with f/~(6) = ( I ) ( ( y l , . . . , ( Y i - 1 , - o ' i , o ' i + l , . . . , (YN) (see [1]). I will discuss the conditions for the evolution of these macroscopic state variables to become deterministic in the limit of infinitely large networks and, in addition, be governed by a closed set of equations. I then turn to specific models, with and without detailed balance, and show how the macroscopic equations can be used to illuminate and understand the dynamics of attractor neural networks away from saturation.
2.1.1. A toy model Let me illustrate the basic ideas with the help of a simple (infinite range) toy model:
Jij = (J/N)qi~j and 0i = 0 (the variables q/ and ~; are arbitrary, but may not depend on N). For rl; = ~; = 1 we get a network with uniform synapses. For qg = ~,i c { - 1 , 1} and J > 0 we recover the Hopfield [2] model with one stored pattern. Note: the synaptic matrix is nonsymmetric as soon as a pair (ij) exists such
A.C.C. Coolen
624
that flyby ~ qj~i, so in general equilibrium statistical mechanics will not apply. The local fields become h i ( o ) - Jqim(u) with m(u) _ 1 ~-]~k~kCrk 9Since they depend on the microscopic state u only through the value of m, the latter quantity appears to constitute a natural macroscopic level of description. The probability density of finding the macroscopic state m(u) = m is given by ~t[m] = ~ , p t ( u ) 8 [ m - m(u)]. Its time derivative follows upon inserting (1): dt ~@t[m] - Z ~
_
d dm
~--~pt(u)w,(u) 8 m - m(u) +-~,cyk
- 5[m - m(u)]
k=l
Z p , ( u ) 8 [ m _ m ( u ) ] ~ Z ~kcrkwk(u) ~ k=l
+ (9
Inserting our expressions for the transition rates wi(u) and the local fields h/(u) gives:
dt
d{ [ ,N
~
~t[m] m - -~ Z
~* tanh[rlklk/m]
]}
+ (9(N -1).
k=l
In the limit N ~ ~ only the first term survives. The general solution of the resulting Liouville equation is ~@,[m] = fdmo~o[mo]8[m-m(t[mo)], where m(t[mo) is the solution of d 1 N dt m -- lim --~-~ ~k tanh[rlkl3dm] - m, N~ocN k=l
m(0) = m0.
(2)
This describes deterministic evolution; the only uncertainty in the value of m is due to uncertainty in initial conditions. If at t - 0 the quantity m is known exactly, this will remain the case for finite time-scales; m turns out to evolve in time according
to (2).
2.1.2. Arbitrary synapses Let us now allow for less trivial choices of the synaptic matrix {Jij} and try t o calculate the evolution in time of a given set of macroscopic observables f~(u) = (f~l(u),..., f~n(u)) in the limit N ~ c~. There are no restrictions yet on the form or the number n of these state variables; these will, however, arise naturally if we require the observables f~ to obey a closed set of deterministic laws, as we will see. The probability density of finding the system in macroscopic state ~ is given by: ~@t[f~]- ~--~,pt(u)8[~- a(u)].
(3)
Its time derivative is obtained by inserting (1). If in those parts of the resulting expression which contain the operators F~ we perform the transformations u ~ Fiu, we arrive at
Statistical mechanics of recurrent neural networks H - dynamics
d
d~ ~t[n] -
E Pt(6)wi("){8[n - n(F/~r)]-
Z i
6
625
8[n -
n(~r)]}.
Upon writing ~.(Fi6) = ~ . ( 6 ) + Ai~t(6) and making a Taylor expansion in powers of {Ai.(6)}, we finally obtain the so-called Kramers-Moyal expansion:
g-------~.
at ~ t [ n ] - Z g ~> 1
It
involves
"'" ~q=l
conditional
8f~.,
averages
(f(6))n; t
Ajar(") = nB(Fj~r ) -- n~t(@): 2
K(
[n. t] -
(
wj(
8f~.e ~t[n]F([)..,e[n; t ] .
(4)
pe-1
)Aj.l ( - ) ' "
)
and
the
'discrete
derivatives'
f~;t
(s)
(f(6))n: t = E . p t ( 6 ) 8 [ f ~ - n ( 6 ) ] f ( 6 ) E pt(.)6[n- n(.)] "
Retaining only the g = 1 term in (4) would lead us to a Liouville equation, which describes deterministic flow in g~ space. Including also the g = 2 term leads us to a Fokker-Planck equation which, in addition to flow, describes diffusion of the macroscopic probability density. Thus a sufficient condition for the observables g~(6) to evolve in time deterministically in the limit N --+ oo is:
,im >~ 1 ~ g ,.. _
"
Pl=l
~_.~ x
- o.
(6)
pg=l j = l
In the simple case where all observables f~, scale similarly in the sense that all 'derivatives' A j , - f ~ , ( F i 6 ) - f~,(6) are of the same order in N (i.e. there is a monotonic function/~N such that Aj, -- (9(/~S) for all jt-t), for instance, criterion (6) becomes: lim n~NX/~
N---+oo
= 0.
(7)
If for a given set of observables condition (6) is satisfied we can for large N describe the evolution of the macroscopic probability density by a Liouville equation:
~=1
Expansion (14) is to be interpreted in a distributional sense, i.e. only to be used in expressions of the form f df~ ~ t ( ~ ) G ( ~ ) with smooth functions G(f~), so that all derivatives are well-defined and finite. Furthermore, (4) will only be useful if the Aj,, which measure the sensitivity of the macroscopic quantities to single neuron state changes, are sufficiently small. This is to be expected: for finite N any observable can only assume a finite number of possible values; only for N ~ oc may we expect smooth probability distributions for our macroscopic quantities.
626
A.C.C. Coolen
whose solution describes deterministic flow: ~t[~] = f d~0~0[~o]5[l'~- ~(t]~o)] with ~(t[~o) given, in turn, as the solution of d ~(t) - r ( l ) [ ~ ' ~ ( / ) ; t] dt
n(0) = n0.
(8)
In taking the limit N ~ ~ , however, we have to keep in mind that the resulting deterministic theory is obtained by taking this limit for finite t. According to (4) the g > 1 terms do come into play for sufficiently large times t; for N ~ ~ , however, these times diverge by virtue of (6).
2.1.3. The issue of closure Eq. (8) will in general not be autonomous; tracing back the origin of the explicit time dependence in the right-hand side of (8) one finds that to calculate F (1) one needs to know the microscopic probability density pt(a). This, in turn, requires solving Eq. (1) (which is exactly what one tries to avoid). We will now discuss a mechanism via which to eliminate the offending explicit time dependence, and to turn the observables ~ ( a ) into an autonomous level of description, governed by closed dynamic laws. The idea is to choose the observables ~ ( a ) in such a way that there is no explicit time dependence in the flow field F(~)[~; t] (if possible). According to (5) this implies making sure that there exist functions ~ [ ~ ] such that N
lim ~
N--- vc
wj(a)Aj~(a) - ~,[~(a)]
(9)
j=l
in which case the time dependence of F (~) indeed drops out and the macroscopic state vector simply evolves in time according to: d
dt ~ - ~[1"~],
~[~'~] = ((I) 1 [ ~ ' ~ ] , . . . ,
(I)n[~"~]).
Clearly, for this closure method to apply, a suitable separable structure of the synaptic matrix is required. If, for instance, the macroscopic observables ~ depend linearly on the microscopic state variables a (i.e. ~ ( a ) - ~ ~-~'~x_1C%jOj), we obtain with the transition rates defined in (1): d _f l y _
dt
lim 1 N N----~vc N Z j=l
c%j tanh([3hj(a)) - ~
(10)
in which case the only further condition for (9) to hold is that all local fields hk(a) must (in leading order in N) depend on the microscopic state a only through the values of the observables ~; since the local fields depend linearly on a this, in turn, implies that the synaptic matrix must be separable: if Jij = Y-]~Ki, c%j then indeed hi(o) = ~-2~K i ~ ( a ) + Oi. Next I will show how this approach can be applied to
Stat&tical mechanics of recurrent neural networks H - dynamics
627
networks for which the matrix of synapses has a separable form (which includes most symmetric and nonsymmetric Hebbian type attractor models). I will restrict myself to models with 0 / = 0; introducing nonzero thresholds is straightforward and does not pose new problems.
2.2. Application to separable attractor networks 2.2.1. Separable models: description at the level of sublattice activities We consider the following class of models, in which the interaction matrix has the form
1
Jij - ~ Q(~i; ~j),
~i - ( ~ , . . . , ~P).
(11)
The components ~ , representing the information ('patterns') to be stored or processed, are assumed to be drawn from a finite discrete set A, containing nA elements (they are not allowed to depend on N). The Hopfield model [2] corresponds to choosing Q(x; y ) = x . y and A = { - 1 , 1}. One now introduces a partition of the system {1,... ,N} into rflA so-called sublattices In:
{1,...,N}-UIn,
In-{i]{ i-R},
11cA p.
(12)
n
The number of neurons in sublattice I n is denoted by ]In[ (this number will have to be large). If we choose as our macroscopic observables the average activities ('magnetisations') within these sublattices, we are able to express the local fields hk solely in terms of macroscopic quantities:
1
mn(~) - - ~ ] i~ci cyi,
hk(~) - ZPnQ({k;q)mnn
(13)
with the relative sublattice sizes Pn = ]In]IN. If all Pn are of the same order in N (which, for example, is the case if the vectors {i have been drawn at random from the set Ap) we may write Ayn = Cg(nPAN - l ) and use (7). The evolution in time of the sublattice activities is then found to be deterministic in the N ~ e~ limit if limx~p/logN = 0. Furthermore, condition (9) holds, since ~
wj(~)Ajn(~ ) - tanh ~ ~-~Pn'O(ll; II')mn'
j=l
- ran.
n'
We may conclude that the situation is that described by (10), and that the evolution in time of the sublattice activities is governed by the following autonomous set of differential Eqs. [3]:
d
[
]
dt mn - tanh ]3Z p n , Q(ll; II')m n, - m n !!'
(14)
A.C.C. Coolen
628
We see that, in contrast to the equilibrium techniques as described in [1], here there is no need at all to require symmetry of the interaction matrix or absence of selfinteractions. In the symmetric case Q ( x ; y ) = Q(y;x) the system will approach equilibrium; if the kernel Q is positive definite this can be shown, for instance, by inspection of the Lyapunov function 3 5t'{mn}: ~c~a{mn} - 2l ZPnmnQ(R;nn'
R')mn'Pn' - -~1Zpnlogcosh[~y-~Q(q;q,)mn,pn,] n n'
which is bounded from below and obeys: dso__ dt
Z
n ct-tmn Q(R;
)
n' ~ m n '
<~ O.
(15)
nn'
Note that from the sublattice activities, in turn, follow the 'overlaps' mr(g ) (see [1]): 1
N
mr(g) - -N Z ~i (yi ZPnq.mn" -
(16)
-
i:1
n
Simple examples of relevant models of the type (11), the dynamics of which are for large N described by Eq. (14), are for instance the ones where one applies a nonlinear o p e r a t i o n , to the standard Hopfield-type [2] (or Hebbian-type) interactions. This nonlinearity could result from e.g. a clipping procedure or from retaining only the sign of the Hebbian values: Jij
-
l (
-~*
)
Z ta la " bt <~p
e.g. (I)(x)-
-K
for
x ~
x
for
-K<x
K
for
x>>.K
or
O(x) - sgn(x).
The effect of introducing such nonlinearities is found to be of a quantitative nature, giving rise to little more than a re-scaling of critical noise levels and storage capacities. I will not go into full details, these can be found in e.g. [4], but illustrate this statement by working out the p = 2 equations for randomly drawn pattern bits ~ c { - 1 , 1}, where there are only four sublattices, and where Pn = 1 for all 11. Using (I)(0) = 0 and (I)(-x) = -(I)(x) (as with the above examples) we obtain from (14): dtmn - tanh
13(I)(2)(mn - m_n) - m n.
(17)
Here the choice made for ~(x) shows up only as a rescaling of the temperature. From (17) we further obtain ~ (m n + re_n) - - ( m n + m_n). The system decays exA function of the state variables which is bounded from below and whose value decreases monotonically during the dynamics, see e.g. [5]. Its existence guarantees evolution towards a stationary state (under some weak conditions).
Statistical mechanics of recurrent neural networks H - dynamics
629
ponentially towards a state where, according to (16), mn = -m_n for all 11. If at t = 0 this is already the case, we find (at least for p - 2) decoupled equations for the sublattice activities.
2.2.2. Separable models." description at the level of overlaps Equations (14) and (16) suggest that at the level of overlaps there will be, in turn, closed laws if the kernel Q is bilinear: 4, Q(x; y) - ~-]~vx~A~vyv, or: Jij - ~
~/~A~v~.,
~i = (~],.-., ~P).
(18)
pv=l We will see that now the ~ need not be drawn from a finite discrete set (as long as they do not depend on N). The Hopfield model corresponds to A~v = ~v and ~/~ c { - l , 1}. The fields hk can now be written in terms of the overlaps m~" 1
hk(6) -- ~k. Am(a),
m-
(ml,...,mp),
N
m~(6)-~Z~icYi.
(19)
i=1
For this choice of macroscopic variables we find Aj, = (9(N-1), so the evolution of the vector m becomes deterministic for N ~ ~ if, according to (7), l i m u ~ p / V ~ = 0. Again (9) holds, since N
Z
1
N
wj(6)Aj~ (6) - ~ Z
j--1
{k tanh[13{k" Am] - m.
k=l
Thus the evolution in time of the overlap vector m is governed by a closed set of differential equations: ~m
-
({ tanh[13{ 9Am]){ - m,
(~({)){ -
d{ p({)~({)
(20)
with p({) = limN~o~ N -~ Y~'~i~[{ - {i]. Symmetry of the synapses is not required. For certain nonsymmetric matrices A one finds stable limit-cycle solutions of (20). In the symmetric case Ag~ = A~ the system will approach equilibrium; the Lyapunov function (15) for positive definite matrices A now becomes: 5~
= ~1m . Am - ~1 (log cosh[[3~. Am]) ~.
Fig. 1 shows in the ml,mz-plane the result of solving the macroscopic laws (20) numerically for p - 2, randomly drawn pattern bits ~ E { - 1 , 1}, and two choices of the matrix A. The first choice (upper row) corresponds to the Hopfield model; as the noise level T = 13-1 increases the amplitudes of the four attractors (corresponding to the two patterns ~ and their mirror images - ~ ) continuously decrease, until at the 4
Strictly speaking, it is already sufficient to have a kernel which is linear in y only, i.e. Q(x; y) = Y'~vfv(x)y~.
630
A.C.C. Coolen
T=0.1
T=0.6
i
li
T=1.1
w ~~~
~
1
0
-1
0
1 1
-1
0
0
~
-1
~
~
0
Trt!
Fig. 1.
1
~
~
1
I
I
I
I
i
I
i
]
I
I
I
I
~
I
i
I
1
i
i
i
i
9
-1 -1
I
0
-1
1
I
! 1 - -1
0 '071,1
w'tj
1
Flow diagrams obtained by numerically solving Eq. (20) for p = 2. Upper row: /
\
A~v = 5~v (the Hopfield model); lower row: A = (1_1
11) (here the critical noise level is
\
/
Tc = 1).
N= 1000 1
'v
~
.
.
,.,
,
,
,
|
-1
I
I
t \l
I
0 17"1. t
Fig. 2.
1,",
,
,
,
,~
I
I
,
I --1
1
,
,
,
,
1
0I
0
0
--1
Theory
N=3000 ,
-1
0
-1
I
-1
i
I
,,1
i
i
i
Ii
0
17t~
Comparison between simulation results for finite systems (N - 1000 and N = 3000) /
and the N = ~ analytical prediction (20), for p = 2, T = 0.8 and A = ( 1 \ -1
N
I)
1/"
critical noise level Tc = 1 (see also [1]) they merge into the trivial attractor in = (0, 0). The second choice corresponds to a nonsymmetric model (i.e. without detailed balance); at the macroscopic level of description (at finite time scales) the system clearly does not a p p r o a c h equilibrium; macroscopic order now manifests itself in the form of a limit-cycle (provided the noise level T is below the critical value Tc = 1 where this limit-cycle is destroyed). To what extent the laws (20) are in agreement with the result of performing the actual simulations in finite systems is illustrated in Fig. 2. Other examples can be found in [6,7].
Statistical mechanics of recurrent neural networks H - dynamics
50[_lllllll::
631
I I ,~l~,,i,,,
40
30--
20--
lo ~-
0
.2
.4
.6
.8
1
T
Fig. 3. Asymptotic relaxation times ~, of the mixture states of the Hopfield model as a function of the noise level T - 13-1. From bottom to top'n = 1,3,5,7,9, 11, 13.
As a second simple application of the flow Eq. (20) we turn to the relaxation times corresponding to the attractors of the Hopfield model (where A~v = 5~v). Expanding (20) near a stable fixed-point m*, i.e. r e ( t ) = m* + x(t) with Ix(t)l << 1, gives the linearized equation d d t X ~ - ~-~ [ [ 3 ( ~ tanh[[~- m*])~- 8~]x~ + C9(x2).
(21)
V
The Jacobian of (20), which determines the linearized Eq. (21), turns out to be minus the curvature matrix of the free energy surface at the fixed-point (c.f. the derivations in [1]). The asymptotic relaxation towards any stable attractor is generally exponential, with a characteristic time ~ given by the inverse of the smallest eigenvalue of the curvature matrix. If, in particular, for the fixed point m* we substitute an n-mixture state, i.e. m~ = m, (It ~< n) and m~ = 0 (It > n), and transform (21) to the basis where the corresponding curvature matrix D (') (with eigenvalues D~) is diagonal, x ~ ~, we obtain ~ (t) - ~ (0)e-tD~ + . . . so z - 1 - min~ D~, which we have already calculated (see [1]) in determining the character of the saddle-points of the free-energy surface. The result is shown in Fig. 3. The relaxation time for the n-mixture attractors decreases monotonically with the degree of mixing n, for any noise level. At the transition where a macroscopic state m* ceases to correspond to a local minimum of the free energy surface, it also destabilizes in terms of the linearized dynamic Eq. (21) (as it should). The Jacobian develops a zero eigenvalue, the relaxation time diverges, and the long-time behavior is no longer
A.C.C. Coolen
632
obtained from the linearized equation. This gives rise to critical slowing down (power law relaxation as opposed to exponential relaxation). For instance, at the transition temperature Tc = 1 for the n = 1 (pure) state, we find by expanding (20): d [~ 2 _ m 2 ] + C ( m 5) dt m~ - m, m, which gives rise to a relaxation towards the trivial fixed-point of the form m ~ t- 89 If one is willing to restrict oneself to the limited class of models (18) (as opposed to the more general class (11)) and to the more global level of description in terms o f p overlap parameters m, instead of rflA sublattice activities mn, then there are two rewards. Firstly there will be no restrictions on the stored pattern components ~ (for instance, they are allowed to be real-valued); secondly the number p of patterns stored can be much larger for the deterministic autonomous dynamical laws to hold (p << x/~ instead ofp << log N, which from a biological point of view is not impressive.
2.3. Closed macroscopic laws for parallel dynamics We now turn to the parallel dynamics counterpart of (1), i.e. the Markov chain .I~l 1 [1 + ~!
(3"i
tanh[13hi(gl)]]
(22)
"___
(with cyi C {-1, 1}, and with local fields hi(g) defined in the usual way). The evolution of macroscopic probability densities will here be described by discrete mappings, instead of differential equations.
2.3.1 The toy model Let us first see what happens to our previous toy model: Jij - (J/N)rli~j and 0i - 0. As before we try to describe the dynamics at the (macroscopic) level of the quantity m(~) = ~ ~-~'-k~kcYk 9The evolution of the macroscopic probability density ~@t[m] is obtained by inserting (22): ~t+l [m] - Z
~i[m - m(o)] W[~; ~']pt(~') - I dm' ~[m,m']~t[m']
(23)
,/
with W,[m,
m'] --
-
-
m(,,')]
~-~',,, ~5[m' - m(~')]pt(~') We now insert our expression for the transition probabilities W[~; ~'] and for the local fields. Since the fields depend on the microscopic state ~ only through m(~), the distribution pt(~) drops out of the above expression for Wt which thereby loses its explicit time dependence, ~ Ira, m'] ~ Wire, m']"
with (...)~ - 2 - s ~-'~... fY
Statistical mechanics of recurrent neural networks H - dynamics
633
Inserting the integral representation for the ~-function allows us to perform the average:
tp _ i~km + (log cosh ~[Jqm' - ik~])q,~ - (log cosh ]3[Jrlm'l)q. Since Wire, m'] is (by construction) normalized, f dm/~[m, m ' ] - 1, we find that for N ~ e~ the expectation value with respect to W[m, m'] of any sufficiently smooth function f ( m ) will be determined only by the value m*(m') of m in the relevant saddle-point of W:
dm f (m)
wire, m'] -
f dm dk f (m) eN~(m,m',k) fdmdkeN~e(m,m,,k ) ~ f(m*(m'))
(N ---+oe).
Variation of 9 with respect to k and m gives the two saddle-point equations: m - (~ tanh ~3[Jqm'- ~k])n,~,
k - 0.
We may now conclude that l i m x ~ W[m,m'] = 8[m-m*(m')] (~ tanh(13Jrlm'))q,~, and that the macroscopic Eq. (23) becomes: ~t+l[m]- /am'
film-(~tanh(~Jqm'))q~]~t[m']
with m*(m')=
(N --~ c~).
This describes deterministic evolution. If at t = 0 we know m exactly, this will remain the case for finite time scales, and m will evolve according to a discrete version of the sequential dynamics law (2):
mt+l - (~ tanh[13Jrlmt])q,~
(24)
2.3.2. Arbitrary synapses We now try to generalize the above approach to less trivial classes of models. As for the sequential case we will find in the limit N ~ ~ closed deterministic evolution equations for a more general set of intensive macroscopic state variables ~ ( a ) = f~l ( a ) , . . . , Dn(a) if the local fields hi(a) depend on the microscopic state a only through the values of ~ ( a ) , and if the number n of these state variables necessary to do so is not too large. The evolution of the ensemble probability density (3) is now obtained by inserting the Markov Eq. (22):
~,+, [sal - f an' wt[a, a']~,[a']
(25)
~[Sa, Sa'] -- E~176~[Sa- sa(.)]~[a' - a(~')] rv[,,;.']p,(~') E . , ~[sa' - a(~,)lp,(~,) = (3[~ - ~(a)]( eE;[f~ihi("')-l~176
(26)
A.C.C. Coolen
634
with (...)~ - 2 -N ~-'~..., and with the conditional (or sub-shell) average defined as in (5). It is clear from (26) that in order to find autonomous macroscopic laws, i.e. for the distribution pt(6) to drop out, the local fields must depend on the microscopic state 6 only through the macroscopic quantities f~(6)" hi(6)= hi[~(6)]. In this case ~ loses its explicit time dependence, ~[f~,f~'] ~ W[f~,~']. Inserting integral representations for the &functions leads to:
--i[3K 9~ + ~1l o g ( eP[~-]~;~/hi[n']-iNK.~-~(6)]), - ~1Z
logcosh[]3hi[~']]. i
Using the normalization f d ~ l~[f~, f ~ ' ] - 1, we can write expectation values with respect to 1~[s f~'] of macroscopic quantities f[~] as /d~f[~]
1~[~, a'] : f d ~ dK f[~] e Nv(n'n''K) f d ~ dK eNV(n'n'.K)
(27)
For saddle-point arguments to apply in determining the leading order in N of (27), we encounter restrictions on the number n of our macroscopic quantities (as expected), since n determines the dimension of the integrations in (27). The restrictions can be found by expanding qJ around its maximum q~*. After defining x = (~, K), of dimension 2n, and after translating the location of the maximum to the origin, one has V(x)
=
1 - -2 Z X~XvH~v + E XpXvXpLpvp + (9(x4) pv
pvp
giving fdxg(x)e Nv(x) fdxeNV(x) -o(o) f d x [g(x) - g(0)] e x p ( - 1 N x
fdxexp(- 89
9Hx + N ~-'~r,vpx~xvxpLm, p + (9 (Nx4))
Hx + N~,,wx~x,~xpL~v o + (9(Nx4))
f dy [g(y/x/~) - g(0)] exp (- 89 y. Hy + y'~,vpy~,yvypL~,vp/ ~ + (9 (y4/N)) f dy exp( - l y . Hy + E,vpy, yvyoL,vp/ x/~ + (9(y4/N))
fdyexp(- 89
[1 + ~-~pvoy~yvyoL~vo/v/N+ (9(y6/N)]
= (9(n2/N) + (9(n4/N 2) + nondominant terms, (N,n ~ c~) with H denoting the Hessian (curvature) matrix of the surface 9 at the minimum q~*. We thus find
Statistical mechanics of recurrent neural networks H - dynamics
lim n/v/N = 0:
N---~
635
f
lim [ d~f[~]/,V[~, ~'] - f [ ~ * (f~')]
N--+cx~ j
where 12"(12') denotes the value of 12 in the saddle-point where qJ is minimized. Variation of ~P with respect to f~ and K gives the saddle-point equations:
-- I~-~(~) e~[~-~i(yihi[~u162 ( e~[~-~i (~ihi[~t]-iNK'~-~(ff)]).
K=O.
We may now conclude that limN_~ W[~, ~'] = 8 [ ~ - ~* (~')], with
=
( e ~ ~-~i (~Yihi[~t]>
and that for N ~ ec the macroscopic Eq. (25) becomes ~t+1[~2] : f df~' 8[s ~2" (~')]5~t[~2']. This relation again describes deterministic evolution. If at t = 0 we know s exactly, this will remain the case for finite time scales and ~ will evolve according to ~"~(t + 1)
-- I~-~(~) e~ ~i (~ihi[~'~(t)]lr~
(28)
As with the sequential case, in taking the limit N ~ c~ we have to keep in mind that the resulting laws apply to finite t, and that for sufficiently large times terms of higher order in N do come into play. As for the sequential case, a more rigorous and tedious analysis shows that the restriction n/x/N-~ 0 can in fact be weakened to n/N ~ O. Finally, for macroscopic quantities s which are linear in ~, the remaining ~-averages become trivial, so that [8]: 1
f2~ (~) - ~ Z
c%icri"
f2~ (t + 1) - Nlim ~ N ~1
i
o3~itanh[~hi{s
(29)
i
(to be compared with (10), as derived for sequential dynamics).
2.4. Application to separable attractor networks 2.4.1. Separable models: sublattice activities and overlaps The separable attractor models (11), described at the level of sublattice activities (13), indeed have the property that all local fields can be written in terms of the macroscopic observables. What remains to ensure deterministic evolution is meeting the condition on the number of sublattices. If all relative sublattice sizes p~ are of the same order in N (as for randomly drawn patterns) this condition again translates into l i m N ~ p/log N = 0 (as for sequential dynamics). Since the sublattice activities are linear functions of the or/, their evolution in time is governed by Eq. (29), which acquires the form:
A.C.C. Coolen
636
u=0.75
v=l.0
l
u=0.5
"
u=0.25
'
i
u=0.0
/
I
I
I
I
I
I
.6 .4
I
o
I
20
40
60
80
I00
t (iterations)
Fig. 4. Evolution of overlaps m,(6), obtained by numerical iteration of the macroscopic parallel dynamics laws (31), for the synapses Ji9 = v-~Y~ ~i~~,j~ + - l~v ~ ~i+l ~ , with p = 10 and T = 0.5. (30) As for sequential dynamics, symmetry of the interaction matrix does not play a role. At the more global level of overlaps m~(6)= N -1 ~ ; ~ , ~ i we, in turn, obtain autonomous deterministic laws if the local fields hi(6) can be expressed in terms if m(6) only, as for the models (18) (or, more generally, for all models in which the interactions are of the form Jij - ~ <~p f ~ ) , and with the following restriction on the number p of embedded patterns: l i m u - ~ p / v ~ = 0 (as with sequential dynamics). For the bilinear models (18), the evolution in time of the overlap vector m (which depends linearly on the cyi) is governed by (29), which now translates into the iterative map: m(t + 1) = ({ tanh[]3{ 9Am(t)]){
(31)
with p({) as defined in (20). Again symmetry of the synapses is not required. For parallel dynamics it is far more difficult than for sequential dynamics to construct Lyapunov functions, and prove that the macroscopic laws (31) for symmetric systems evolve towards a stable fixed-point (as one would expect), but it can still be done. For nonsymmetric systems the macroscopic laws (31) can in principle display all the interesting, but complicated, phenomena of nonconservative nonlinear systems. Nevertheless, it is also not uncommon that the Eq. (31) for nonsymmetric systems can be mapped by a time-dependent transformation onto the equations for related symmetric systems (mostly variants of the original Hopfield model). As an example we show in Fig. 4 as functions of time the values of the overlaps {m~} for p = 10 and T = 0.5, resulting from numerical iteration of the macroscopic laws (31) for the model
Statistical mechanics of recurrent neural networks H - dynamics
Jij
=
v NZ g
gg ~i~J-'~-
1- v ~-+1 ~'~ N Z"
637
(g'modp)
g
i.e. A)~p = vS)~p -q- (1 - V)8)~,p+I ( ~ , p : mod p), with randomly drawn pattern bits ~/~ c { - 1 , 1}. The initial state is chosen to be the pure state mo - 8~,1. At intervals of At = 20 iterations the parameter v is reduced in Av = 0.25 steps from v = 1 (where one recovers the symmetric Hopfield model) to v = 0 (where one obtains a nonsymmetric model which processes the p embedded patterns in strict sequential order as a period-p limit-cycle). The analysis of Eq. (31) for the pure sequence processing case v = 0 is greatly simplified by mapping the model onto the ordinary (v = 1) Hopfield model, using the index permutation symmetries of the present pattern distribution, as follows (all pattern indices are periodic, rood p). Define mo(t) = M._t(t), now
We can now immediately infer, in particular, that to each stable macroscopic fixedpoint attractor of the original Hopfield model corresponds a stable period-p macroscopic limit-cycle attractor in the v = 1 sequence processing model (e.g. pure states +--, pure sequences, mixture states ~ mixture sequences), with identical amplitude as a function of the noise level. Fig. 4 shows for v = 0 (i.e. t > 80) a relaxation towards such a pure sequence. Finally we note that the fixed-points of the macroscopic Eqs. (14) and (20) (derived for sequential dynamics) are identical to those of (30) and (31) (derived for parallel dynamics). The stability properties of these fixed points, however, need not be the same, and have to be assessed on a case-by-case basis. For the Hopfield model, i.e. Eqs. (20) and (31) with A~v = 8or, they are found to be the same, but already for Aov - - 8 ~ the two types of dynamics would behave differently. 3. Attractor neural networks with continuous neurons
3.1. Closed macroscopic laws 3.1.1. General derivation We have seen in [1] that models of recurrent neural networks with continuous neural variables (e.g. graded response neurons or coupled oscillators) can often be described by a Fokker-Planck equation for the microscopic state probability density d
02
0
~pt( ~ ) = I ~ ~ [Pt (~)~ (~)] + r ~ ~2i Pt(a)" i
(32)
i
Averages over pt(~r) are denoted by (G) = f d~rpt(a)G(~r, t). F r o m (32) one obtains directly (through integration by parts) an equation for the time derivative of averages:
A.C.C. Coolen
638
d dt ( a ) -
/SG)/~. ~ +
If " 838G) (g)+T~i
(33)
In particular, if we apply (33) to G(g, t) = 8 [ ~ - ~(g)], for any set of macroscopic observables ~(g) = (~l(g),..., f~,(g)) (in the spirit of Section 2), we obtain a dynamic equation for the macroscopic probability density P t ( ~ ) = ( 8 [ ~ - ~(g)]), which is again of the Fokker-Planck form: d
dtPt(~)--~~-~
~ 0 )}
Pt(~)~
+ T Z 8a, Sav
(~)+T~i
Pt(a)
~-~i~,(~)n;t
~ . ~-~/a,(a)
pv
av(-)
(34) ~;t
with the conditional (or subshell) averages: (a(g))n. ' = f d g p t ( g ) 8 [ a - a(g)]G(~) f d~pt(g)8[a - a(~)] "
(35)
From (34) we infer that a sufficient condition for the observables ~(~) to evolve in time deterministically (i.e. for having vanishing diffusion matrix elements in (34)) in the limit N ~ ec is lim
8 n~ (g) ~-~.
N---~9c
- O.
(36)
~:t
If (36) holds, the macroscopic Fokker-Planck Eq. (34) reduces for N ~ ec to a Liouville equation, and the observables ~(g) will evolve in time according to the coupled deterministic equations: at
o
N---, ~c
.
......, ~"~P( ~ )
(37)
~:t
The deterministic macroscopic Eq. (37), together with its associated condition for validity (36) will form the basis for the subsequent analysis. 3.1.2. Closure: a toy model again. The general derivation given above went smoothly. However, Eq. (37) are not yet closed. It turns out that to achieve closure even for simple continuous networks we can no longer get away with just a finite (small) number of macroscopic observables (as with binary neurons). This I will now illustrate with a simple toy network of graded response neurons: d dtUi(t) - Z
Jijg[uj(t)] - ui(t) + rli(t ) J
(38)
Statistical mechanics of recurrent neural networks H -
639
dynamics
with g[z]- 89 1] and with the standard Gaussian white noise hi(t) (see [1]). In the language of (32) this means f-(u) - }-]jJijg[uj] - ui. We choose uniform synapses Ji# - J / N , so J}(u) ~ (J/N) ~-~j g[uj] - ui. If (36) were to hold, we would find the deterministic macroscopic laws
d
/~. [J~. dt f~€ - N---~oclim . . g[uj]-
bli
8]8
_.t_T~iui ~u/f~,(u)
>
.
(39)
f~;t
In contrast to similar models with binary neurons, choosing as our macroscopic level of description f~(u) again simply the average re(u) = N -1 ~ i ui now leads to an equation which fails to close:
d
dt m - N---~oc lim J
g[uj]
>
-m.
m;t
The term N -1 ~7g[uj] cannot be written as a function of N -1 }-2~;ui. We might be tempted to try dealing with this problem by just including the offending term in our macroscopic set, and choose n ( u ) = (N -1 ~-~i ui, N-1 ~ig[ui]) 9This would indeed solve our closure problem for the m-equation, but we would now find a new closure problem in the equation for the newly introduced observable. The only way out is to choose an observable function, namely the distribution of potentials
p(u; u) - ~ ~
8[u - ui],
p(u) - (p(u; u)) -
8[u - ui] .
(40)
i
This is to be done with care, in view of our restriction on the number of observables: we evaluate (40) at first only for n specific values u~ and take the limit n --+ oc only after the limit N ~ oc. Thus we define f ~ , ( u ) - ~ ~ i 8[b/g- b/i], condition (36) reduces to the familiar expression l i m N ~ n / v ~ = 0, and we get for N ~ ec and n ---, e~ (taken in that order) from (39) a diffusion equation for the distribution of membrane potentials (describing a so-called 'time-dependent Ornstein-Uhlenbeck process' [9,10]):
d t P ( U ) - - - ~ u {8 P(U d
)
[j
/dulp(d
)g[u']
-u]
}+
~2
T
~ u2
P(u) .
(41)
The natural 5 solution of (41) is the Gaussian distribution
Or(u) - [2 rczZ(t)] -1 e - 89
(42)
in which 22 - IT + (222 _ T)e-Zt] l, and ~ evolves in time according to
d
dt~t - J
5
/
Dzg[~ + Zz] - ~,
(43)
For non-Gaussian initial conditions po(U) the solution of (41) would in time converge towards the Gaussian solution.
A. C. C. Coolen
640
#
p(s)
0
0
5 t
10
0
5 t
10
0
0.0
0.5 s
Fig. 5.
Dynamics of a simple network of N graded response neurons (38) with synapses and nonlinearity g[z]=~[1 +tanh(~,z)], for N ~ c ~ , ~ / = J = 1, and T E {0.25, 0.5, 1,2, 4}. Left: evolution of average membrane potential (u) - ~, with noise levels T increasing from top graph (T = 0.25) to bottom graph (T ~ 4). Middle: evolution of the width Y. of the membrane potential distribution, s 2 = (u2) - (u) ~, with noise levels decreasing from top graph (T = 4) to bottom graph (T = 0.25). Right: asymptotic (t = co) distribution of neural firing activities p(s) = (8[s - g[u]]), with noise levels increasing from the sharply peaked curve (T = 0.25) to the almost flat curve (T = 4).
Jij--J/N
(with Dz = (2n)- 89e-~-~dz). We can now also calculate the distribution p(s) of neuronal firing activities s i - g[ui] at any time"
j p(s)
-
_ du
p(u)5[s - g[u]]
F o r our choice g[z] = 89+ 89 nation with (42):
p(r fo ds'p(gin'[s']) we have ginV[s] = ~ l o g [ s / ( 1 -
s)], so in combi-
e x p ( - 8 9[(27)-' log[s/(1 - s)] - fi]z/E2) 0<s<
1"
p(s)
-
f) ds, e x p ( _ l [(27)_l log[s,/(l _ s,)] _ ~]2/Z2 )
.
(44)
The results of solving and integrating numerically (43) and (44) are shown in Fig. 5, for Gaussian initial conditions (42) with ~0 = 0 and ~0 = 1, and with parameters = J = 1 and different noise levels T. F o r low noise levels we find high average m e m b r a n e potentials, low m e m b r a n e potential variance, and high firing rates; for high noise levels the picture changes to lower average m e m b r a n e potentials, higher potential variance, and uniformly distributed (noise-dominated) firing activities. The extreme cases T = 0 and T = e~ are easily extracted from our equations. F o r T = 0 one finds Z ( t ) - s -t and ~ a - J g [ ~ ] ~. This leads to a final state where t~ - 1 j + 1 j t a n h [ ~ ] and where p(s) - 8 [ s - a/J]. F o r T = c~ one finds s = c~ (for any t > 0) and ~ 8 9 t~. This leads to a final state where ~ = l j and where p ( s ) = l for a l l 0 < s < 1. N o n e of the above results (not even those on the stationary state) could have been obtained within equilibrium statistical mechanics, since any n e t w o r k of connected graded response neurons will violate detailed balance [1]. Secondly, there
641
Statistical mechanics of recurrent neural networks H - dynamics
appears to be a qualitative difference between simple networks (e.g. Jzj - J / N ) of binary neurons versus those of continuous neurons, in terms of the types of macroscopic observables needed for deriving closed deterministic laws: a single number m = N -1 ~--~i (3"i versus a distribution p(cr) = N - 1 E i 8[O" -- O'i]. Note, however, that in the binary case the latter distribution would in fact have been characterized fully by a single number: the average m, since p(cy) - 89 [1 + m]8[cy - 1] + 1 [1 - m]8[cr + 1]. In other words: there we were just lucky.
3.2. Application to graded response attractor networks 3.2.1. Derivation of closed macroscopic laws I will now turn to attractor networks with graded response neurons of the type (38), in which p binary patterns ~,~ - (~,~,..., ~ ) c {-1, 1}x have been stored via separable Hebbian-type synapses (18): Jij -- (2/N) ~-'~lav=l P ~A~v~jv (the extra factor 2 is inserted for future convenience). Adding suitable thresholds 0 ; - - 8 9 ~2,jJij to the right-hand sides of (38), and choosing the nonlinearity g[z] - 89 + tanh[yz]) would then give us
d ui(t) - Z
d--t
r
1
gv
Z
r tanh[yuj(t)] - ui(t) + rli(t )
j
so the deterministic forces are f.(u) = N -a }-~'~vr ~ j r tanh[Tuj] - ui. Choosing our macroscopic observables ~(u) such that (36) holds, would lead to the deterministic macroscopic laws d dt n . -
lim ~
N--+ oc pv
A.v
( [ 1 j~. "
~j tanh[Tuj]
][~.
0 ]/ ~ ~ n~ (u)
~;t
As with the uniform synapses case, the main problem to be dealt with is how to choose the ~ ( u ) such that (45) closes. It turns out that the canonical choice is to turn to the distributions of membrane potentials within each of the 2p sublattices, as introduced in (12): I.
--
{il{ i - , } "
1
On(U;u) = ~ ~
8[u - ui],
On(u) -- {On(u; u))
(46)
with R E {-1, 1}P and limu+~ ]I,[/N = p,. Again we evaluate the distributions in (46) at first only for n specific values u u and send n + oc after N ~ oo. Now condition (36) reduces to l i m s _ ~ 2P/x/N = 0. We will keep p finite, for simplicity. Using identities such as ~ i . . . . Y~'~.~ie/. "-" and i_1 ~
i E I,i "
~/P.l(u;u) -- -II,
~[u-
~2
u;]
~2
b~ui2p,(u'u~ ) = II, I-~ ~ U 2 8[u- u;]
A.C.C. Coolen
642
we then obtain for N ~ ~ and n ~ ~ (taken in that order) from Eq. (45) 2p coupled diffusion equations for the distributions pq(u) of membrane potentials in each of the 2p sublattices Iq" dt pq(u) = - ~ u
pq(u) _ ~ rl~,A,v~pq, rl'~ lay= 1
du'pq,(u')tanh[Tu']-u
qt
~2
+ T~USu2pq(u ).
(47)
Eq. (47) is the basis for our further analysis. It can be simplified only if we make additional assumptions on the system's initial conditions, such as 6-distributed or Gaussian distributed pq(u) at t - 0 (see below); otherwise it will have to be solved numerically.
3.2.2. Reduction to the level of pattern overlaps It is clear that (47) is again of the time-dependent Ornstein-Uhlenbeck form, and will thus again have Gaussian solutions as the natural ones: p,.q(u) = [2rtE2(t)] --'-"e--'-'[u-oq(t)]-'/Y~,(')
(48)
in which 2 q ( t ) - IT + (22(0) - T)e-2t] ~-, and with the ~q(t) evolving in time according to
d
dt~q - ~ ' p q , ( q . Aq')
/
Dz tanh[7(~q, + Eq,z ) ] - ~q.
(49)
q'
Our problem has thus been reduced successfully to the study of the 2p coupled scalar Eqs. (49). We can also measure the correlation between the firing activities si(ui) - 89 [1 + tanh(q, ui)] and the pattern components (similar to the overlaps in the case of binary neurons). If the pattern bits are drawn at random, i.e. l i m N ~ [Iq[/N- p q - 2 -p for all q, we can define a 'graded response' equivalent m~(u) = 2N -1 ~ i ~Si(Ui) C [-1, 1] of the pattern overlaps: 2 m,(u) = ~ Z
1 ~si(u) = ~ ~ i
~ tanh(yu/) + C(N- 89 i
= ~--~pqrl0 / du pq(u; u)tanh(~/u) + (9(N- 89
(50)
q
Full recall of pattern g implies si(ui)- l[~i' + 1], giving m , ( u ) - 1. Since the distributions pq(u) obey deterministic laws for N ---, oo, the same will be true for the overlaps m - - ( m l , . . . ,mp). For the Gaussian solutions (49) of (47) we can now proceed to replace the 2p macroscopic laws (49), which reduce to ~ ~q = q - A m - aq and give ~q--t~q(0)e-t-q - q. A f~ dseS-tm(s), by p integral equations in terms of overlaps only:
643
Statistical mechanics of recurrent neural networks H - dynamics
m~(t) - ~_~p.rlu !1
/Dz [(
tanh 3' ~.(0)e -t + 11-A
I'
dse'-tm(s)
+z~/T + (E2,(0)- T) e-2t)]
(51)
with Dz - (2r~)-1 e-~1~,2dz. Here the sublattices only come in via the initial conditions.
3.2.3. Extracting the physics from the macroscopic laws The equations describing the asymptotic (stationary) state can be written entirely without sublattices, by taking the t --+ ~ limit in (51), using ~ --+ II- Am, Z~ ---+ x/T, and the familiar notation (9(~))~ - limx--+oc1 ~ i g ( ~ i ) -- 2-p ~c{-1,1} p 9(~)" p,1( u ) = [ 2 rtT] -1 e--}['-'rAm]2/r .
m,-(~gJDztanh[7({.Am+zv/-T)])~
(52)
Note the appealing similarity with previous results on networks with binary neurons in equilibrium [1]. For T = 0 the overlap Eq. (52) becomes identical to those found for attractor networks with binary neurons and finite p (hence our choice to insert an extra factor 2 in defining the synapses), with 3' replacing the inverse noise level [3 in the former. For the simplest nontrivial choice A,v - 8,v (i.e. Jiy = (2/N) ~ g ~ig~y' " as in the Hopfield [2] model) Eq. (52) yields the familiar pure and mixture state solutions. For T = 0 we find a continuous phase transition from nonrecall to pure states of the form m, = mg~v (for some v) at 7c = 1. For T > 0 we have in (52) an additional Gaussian noise, absent in the models with binary neurons. Again the pure states are the first nontrivial solutions to enter the stage. Substituting m, = mg,v into (52) gives m
-
f Dz tanh[3'(m + zV/-T)].
(53)
Writing (53) as mZ-3'm~odk[1-fDztanh2[3'(k+zx/-T)]] <~7m 2, reveals that m = 0 as soon as 3' < 1. A continuous transition to an m > 0 state occurs when 3'-1 _ 1 - fDztanhZ[3'zx/T]. A parametrization of this transition line in the (7, T)plane is given by 3'-1 (x) - 1 - f Dz tanhZ(zx),
T(x)
-- X2/3'2(X),
X
~ O.
(54)
Discontinuous transitions away from m = 0 (for which there is no evidence) would have to be calculated numerically. For 3' - oc we get the equation m - erf[m/v/2-T], giving a continuous transition to m > 0 at T~ = 2/re ~ 0.637. Alternatively, the latter number can also be found by taking limx+oo T(x) in the above parametrization: rc(V -
- xlim x2 1 -
[j" - li+rn
Dz tanh2(zx)
d ]2 [ / ]2 D Zdzz tanh(zx) - 2 Dz 6(z) - Z/re.
644
A. C. C. Coolen
1.0
P T
0.5
0.0
9 0.0
O.5
1.0
0 1.5
0.0
0.3
u~ 0.6
0.9
1/v Fig. 6. Left: phase diagram of the Hopfield model with graded-response neurons and (Z/N) y'~o~,.~~j, o away from saturation. P: paramagnetic phase, no recall. R: pattern recall phase. Solid line: separation of the above phases, marked by a continuous transition. Right: asymptotic recall amplitudes m = (2/N) ~i ~/si of pure states (defined such that full recall corresponds to m = 1), as functions of the noise level T, for ?-l E {0.1,0.2,..., 0.8, 0.9} (from top to bottom).
Jij =
The resulting picture of the network's stationary state properties is illustrated in Fig. 6, which shows the phase diagram and the stationary recall overlaps of the pure states, obtained by numerical calculation and solution of Eqs. (54) and (53). Let us now turn to dynamics. It follows from (52) that the 'natural' initial conditions for ~ and ~ are of the form: ~ ( 0 ) = q. k0 and Z~(0) = Z0 for all q. Equivalently: t = 0"
p,l(u) = [2 rtE~]-' e- 89[u-'lk~
ko E NP, Eo C N.
These would also be the typical and natural statistics if we were to prepare an initial firing state {si} by hand, via manipulation of the potentials {ui}. For such initial conditions we can simplify the dynamical Eq. (51) to
m~t(t)--(~t/Dztanh [?(~-[koe -t +Afotdse~-tm(s)] +zv/T+(EZ-T)e-Zt)])
.
(55)
For the special case of the Hopfield synapses, i.e. A~v - 8~v, it follows from (55) that recall of a given pattern v is triggered upon choosing ko.r, -- koS~,, (with k0 > 0), since then Eq. (55) generates rn~,(t)= m(t)8~v at any time, with the amplitude m(t) following from
m(t) - /Dztanh[y[koe-'+
ftdse'-tm(s)+zv/T+(EZ-T)e-Zt]
1
(56)
which is the dynamical counterpart of Eq. (53) (to which indeed it reduces for t ----+ CO).
Statistical mechanics of recurrent neural networks H - dynamics
645
1.0 I
m 0.5
0.0
0
10
20
30
Fig. 7. Overlap evolution in the Hopfield model with graded-response neurons and Jij = (2/N) ~--~~t~i~j, ~ ~ away from saturation. Gain parameter: y = 4. Initial conditions pn(u) = 8[u- k0rlv] (i.e. triggering recall of pattern v, with uniform membrane potentials within sublattices). Lines: recall amplitudes m = (2/N) ~ i ~si of pure state v as functions of time, for T = 0.25 (upper set), T = 0.5 (middle set) and T = 0.75 (lower set), following different initial overlaps m0 6 {0.1,0.2,..., 0.8, 0.9}.
We finally specialize further to the case where our Gaussian initial conditions are not only chosen to trigger recall of a single pattern ~v, but in addition describe uniform membrane potentials within the sublattices, i.e. k0,~ = k08~v and E0 = 0, so pn(u) = 8 [ u - k 0 q v ] . Here we can derive from (56) at t = 0 the identity m0 = tanh[yk0], which enables us to express k0 as k0 = (27) -1 log[(1 + m0)/(1 - m0)], and find (56) reducing to
m(t)--j'Dztanh[e-tl~ I+I -m:]89 "
z~/T(l_ e_Zt)-] .] (57)
Solving this equation numerically leads to graphs such as those shown in Fig. 7 for the choice y = 4 and T C {0.25, 0.5, 0.75}. C o m p a r e d to the overlap evolution in large networks of binary networks (away from saturation) one immediately observes richer behavior, e.g. nonmonotonicity. The analysis and results described in this section, which can be done and derived in a similar fashion for other networks with continuous units (such as coupled oscillators), are somewhat difficult to find in research papers. There are two reasons for this. Firstly, nonequilibrium statistical mechanical studies only started being carried out around 1988, and obviously concentrated at first on the (simpler) networks with binary variables. Secondly, due to the absence of detailed balance in networks of graded response networks, the latter appear to have been suspected of consequently having highly complicated dynamics, and analysis terminated with pseudo-equilibrium studies [11]. In retrospect that turns out to have been too pessimistic a view on the power of nonequilibrium statistical mechanics: one finds that
A. C.C. Coolen
646
dynamical tools can be applied without serious technical problems (although the calculations are somewhat more involved), and again yield interesting and explicit results in the form of phase diagrams and dynamical curves for macroscopic observables, with sensible physical interpretations. 4. Correlation and response functions
We now turn to correlation functions Cij(t, t') and response functions Gij(t, t'). These will become the language in which the generating functional methods are formulated, which will enable us to solve the dynamics of recurrent networks in the (complex) regime near saturation (we take t > t'):
Cij(t,{) = (~i(t)cyj(t')),
(58)
Gij(t,t') = O(~i(t))/OOj({).
The {cy~} evolve in time according to equations of the form (1) (binary neurons, sequential updates), (22) (binary neurons, parallel updates) or (32) (continuous neurons). The 0~ represent thresholds and/or external stimuli, which are added to the local fields in the cases (1) and (22), or added to the deterministic forces in the case of a Fokker-Planck Eq. (32). We retain 0;(t) = 0/, except for a perturbation 80j(t') applied at time t' in defining the response function. Calculating averages such as (58) requires determining joint probability distributions involving neuron states at different times.
4.1. Fluctuation-dissipation theorems 4.1.1. Networks of binary neurons For networks of binary neurons with discrete time dynamics of the form p~+~(t~) = y]~, W[~;tr162 the probability of observing a given 'path' t~(g') t~(g' + 1) ~ -.. ~ ty(g- 1) ~ t~(g) of successive configurations between step g' and step g is given by the product of the corresponding transition matrix elements (without summation): Prob[t~(g'),..., t~(g)] = W[tr(g); ~(g - 1)] W[t~(g - 1); t~(g - 2)]... x W[t~(( + 1); t~(()]pe(tT(()). This allows us to write
Cij(g, g') - Z
" "Z Prob[t~(g'),..., t~(g)]cyi(g)cyj(g') - Z
~(t')
~(t)
c~c~ Wt-e' [tT;t~']pe (re),
~' (59) (60)
~ t ~tt
From (59) and (60) it follows that both Cij(g, g') and G;j(g, g') will in the stationary state, i.e. upon substituting pe,(tT') =p~(t~'), only depend on g - g ' : Ciy(g,g') Cij(g- g') and Gij(g, g ' ) ~ Giy(g- g'). For this we do not require detailed bal-
Statisticalmechanicsof recurrentneuralnetworksH- dynamics
647
ance. Detailed balance, however, leads to a simple relation between the response function G~i('c) and the temporal derivative of the correlation function Gj(z). We now turn to equilibrium systems, i.e. networks with symmetric synapses (and with all Jii "- 0 in the case of sequential dynamics). We calculate the derivative of the transition matrix that occurs in (60) by differentiating the equilibrium condition Peq(a) = ~--2.~,W[a; aqPeq(a') with respect to external fields:
~Oj Pen (a) -- Z~t
{~w[.;~'] - -~Oj
Peq (a t) -+- W[a; a t] ~-0-TPeq(a t)
9
Detailed balance implies P e q ( a ) - Z - l e -13/4('~) (in the parallel case we simply substitute the appropriate Hamiltonian H -+ H), giving 8Peq(a)/80j -[Z-ISZ/~Oj + f38H(6)/8Oj]Peq(6), so that
Peq(at) -- 13 Z6, m[a; a t] OH(at) ~0j Peq( at ) -- ~~H(a) - - ~ j Peq(6)
~, - -~0j
(the term containing Z drops out). We now obtain for the response function (60) in equilibrium:
Gij(g)-[3ZcyiWg-I[6;gq(ZW[6';g"]~H(6")(6"~H(6~) } p) e q ••,
,,,
O0j
O_s peq(a')
" (61)
The structure of (61) is similar to what follows upon calculating the evolution of the equilibrium correlation function (59) in a single iteration step:
Cij(g) - Cij(g -1) -- Za~r,(yiwg-l [a;at] ( Za. W[6';~"]cy'fPeq(a")-
(3"j.Peq(at) }. (62)
Finally we calculate the relevant derivatives of the two Hamiltonians H(6) -- - ~ i < j J;jeyicyj - ~ i 0icyi and / t ( a ) - - y~'~;0;cy;- 13-1 ~ log 2 cosh[13h;(6)] (with h~(6) - ~jJ;jcyj + 0z), see [1]:
8H(6)/8Oj - -%-,
8/t(~r)/80j - -c~j - tanh[13hj(~r)].
For sequential dynamics we hereby arrive directly at a FDT. For parallel dynamics we need one more identity (which follows from the definition of the transition matrix in (22) and the detailed balance property) to transform the tanh occurring in the derivative of H:
tanh[~hj(g')]Peq(a') - Z girt
cY~~W[a''; 6']Peq(a') - Z
W[6';
att]cyfPeq(att).
~tt
For parallel dynamics g and g' are the real time labels t and t', and we obtain, with
z --t-f:
648
A . C. C. C o o l e n
Binary & Parallel: Gij(r, > O) = -~[Cij(r, + 1) - Cij(r, - 1)],
Gij(r ~< 0) = 0.
(63)
For the continuous-time version (1) of sequential dynamics the time t is defined as and the difference equation (62) becomes a differential equation. For perturbations at time t' in the definition of the response function (60) to retain a nonvanishing effect at (re-scaled) time t in the limit N ~ c~, they will have to be rescaled as well: 60j(t') ~ NSOj(t~). As a result: Binary & Sequential: t = g/N,
a,j(~) -
- f 3 0 ( ~ ) ~d
Cij('c) .
(64)
The need to re-scale perturbations in making the transition from discrete to continuous times has the same origin as the need to re-scale the random forces in the derivation of the continuous-time Langevin equation from a discrete-time process. Going from ordinary derivatives to functional derivatives (which is what happens in the continuous-time limit), implies replacing Kronecker delta's ~t,t' by Dirac deltafunctions according to ~)t,t' "-+ A S ( t - t'), where A is the average duration of an iteration step. Eqs. (63) and (64) are examples of so-called fluctuation-dissipation theorems (FDT). 4.1.2. N e t w o r k s with continuous neurons
For systems described by a Fokker-Planck Eq. (32) the simplest way to calculate correlation and response functions is by first returning to the underlying discrete-time system and leaving the continuous time limit A ~ 0 until the end. In [1] we saw that for small but finite time-steps A the underlying discrete-time process is described by t - - gA,
pgA+A(6) -- [1 + A&~f'n+ 6O(A3)]ptA(6)
with g - 0, 1,2,... and with the differential operator (65)
l
From this it follows that the conditional probability density pea(~l~', CA) for finding state 6 at time s given the system was in state 6' at time g'A, must be peA (616', g'A ) -- [1 + A5% + (9(A3)]t-e'816 - 6'].
(66)
Eq. (66) will be our main building block. Firstly, we will calculate the correlations:
Cij(gA, gtm)
--
(o'i(~A)o'j(gtA))=
S d6d6' erioj.peA(6i6', g'A)pea(6' )
= S d6 cyi[l + A~f'a +
o(A)l' "f d6' or} 8[6-6']pe, A(6')
-- j" d6 trill + A:LP~+ C(A~)]g-t' [6jpg, A(6)].
649
Statistical mechanics of recurrent neural networks H - dynamics
At this stage we can take the limits A ~ 0 and g, g' ~ oe, with t = gA and t' - gA finite, using limA_0[1 + AA]k/a -- ekA"
Cij(t, t') - / d a ~i e (t-t')-~ [c~jpt,(a)].
(67)
Next we turn to the response function. A perturbation applied at time t' = g'A to the Langevin forces f-(a) comes in at the transition a(g'A) ---, a(g'A + A). As with sequential dynamics binary networks, the perturbation is re-scaled with the step size A to retain significance as A ~ 0:
Gij(gA, gtA) - AO0j(g'A) = AS0j(g tA) a d a d a ' (y, pea(a[at, etA)pea(a ') -
-
f
d a d a t da t' (yipga(alat',gtA + A)[
- [da
g'a)]
pea(a t)
da' da" cyi[1 + A5% + (9(A~)]g-e-18[a - a"]
x
[1 +
+
"-
pe,
- - f da d~ tda 't (Yi[1 -71--A~Pa .qt_ (fi(A~)] g-g'-I ~[ff _ fftt]~[fftt _ at] X -
~-Jr-
-/da
(Q(A '-] 2) p/~tA({]r, )
cyi[i + A ~ , + (9(A~)]e-e-' [~-~s
+ e ( a l)
J
We take the limits A ~ 0 and g, g' ~ ~ , with t = gA and t' = gtA finite:
Gij(t, t') - - / d a cyi e (t-t')~ &yjpt,(a). ~
(68)
Eqs. (67) and (68) apply to arbitrary systems described by Fokker-Planck equations. In the case of conservative forces, i . e . f . ( ~ ) = - ~ H ( ~ ) / ~ c y i , and when the system is in an equilibrium state at time t' so that C~j(t,t')= Cij(t-t') and Gij(t, t') = Gij(t- t'), we can take a further step using pt,(~) --Peq(~) -- Z -1 e -~/4(~). In that case, taking the time derivative of expression (67) gives O---~Cij(T,) --
f
dl~ (y i e ~ ~
~~ [o'jPeq (~)] 9
Working out the key term in this expression gives ~G[~jPeq(~)] = - ~ - ~ 5-~ l
(~)- T
[cYjJi(~)] i
[~jPeq(~)] -- Th-~ Peq(~)
650
A.C.C. Coolen
with the components of the probability current density J;(6) - [J~(6) - T ~-~7]Peq(6) In equilibrium, however, the current is zero by definition, so only the first term in the above expression survives. Insertion into our previous equation for OCij(r)/O~, and comparison with (68) leads to the FDT for continuous systems: Continuous: d Gij( ~) - - [30( ~) - ~ Cij( ~) .
(69)
We will now calculate the correlation and response functions explicitly, and verify the validity or otherwise of the FDT relations, for attractor networks away from saturation. 4.2. Example: simple attractor networks with binary neurons 4.2.1. Correlation and response functions f o r sequential dynamics We will consider the continuous time version (1) of the sequential dynamics, with the local fields hi(6) = ~ j J i j ~ j + 0i, and the separable interaction matrix (18). We already solved the dynamics of this model for the case with zero external fields and away from saturation (i.e. p << x/N). Having nonzero, or even time-dependent, external fields does not affect the calculation much; one adds the external fields to the internal ones and finds the macroscopic laws (2) for the overlaps with the stored patterns being replaced by
d i n ( t ) - N---~:~c lim N1 E dt
~i tanh[[3{i 9AIn(t) + 0i(t)] - re(t).
(70)
i
Fluctuations in the local fields are of vanishing order in N (since the fluctuations in m are), so that one can easily derive from the master Eq. (1) the following expressions for spin averages: d ----(cri(t)) -- tanh 13[~i 9Am(t) + 0i(t)]- ((Yi(t)) dt iCj:
d
(71)
(cYi(t)crj(t)) - tanh ~[~i" Am(t) + Oi(t)l((Yj(t)) -'[- tanh 13[~j 9Am(t) + Oj(t)](oi(t)) - 2(cyi(t)cyj(t)).
(72)
Correlations at different times are calculated by applying (71) to situations where the microscopic state at time t' is known exactly, i.e. where pt,(6) = ~,~, for some 6': (cYi(t))l~(t,)__a, = cyl e-(t-t') +
f
t
dseS-ttanh~[~i
9Am(s;6',t')-it- 0i(s)]
(73)
with m(s;6',t ~) denoting the solution of (70) following initial condition lli(t t) --1~"~ i ISYti~i. If we multiply both sides of (73) by cy~ and average over all possible states 6' at time t' we obtain in leading order in N:
Statistical mechanics of recurrent neural networks H - dynamics
(cyi(t)cyj(t')> - (r
651
e -(t-t') ds e'-t(tanh 13[{i 9Am(s; ~(t'),t') + Oi(s)]~j(t')).
+
Because of the existence of deterministic laws for the overlaps m in the N ~ cr limit, we know with probability one that during the stochastic process the actual value m(c~(t')) must be given by the solution of (70), evaluated at time t'. As a result we obtain, with Cij(t, t') = (cyi(t)cyj(t')):
f
Cij(t,{) = Cij({,{)e -(t-t') -Jr-
t
dseS-ttanh~[~i.Am(s)
+Oi(s)](cyj(t')>.
(74)
Similarly we obtain from the solution of (71) an equation for the leading order in N of the response functions, by derivation with respect to external fields: O
•
oj(t')
+
-
or
Gij(t, {) = ~6ijO(t - { ) e -(t-t') [1 - tanh 2 ~[~i" Am({) + 0i({)]] + ~ O ( t - t')
dse "-t [1 - tanh 213[{i 9Am(s) + Oi(s)]]
1
• ~ Z (~i" A~k)Gkj(s, {).
(75)
k
For t = t' we retain in leading order in N only the instantaneous single site contribution lim Gij(t, {) - ~6ij[1 - tanh 2 ~[~i" Am(t) + 0i(t)]]. t' Yt
(76)
This leads to the following ansatz for the scaling with N of the Gij(t, t'), which can be shown to be correct by insertion into (75), in combination with the correctness at t = t' following from (76):
i = j:
Gii(t, {) = (9(1),
i =fi j:
Gij(t, {) -- (_9(N-1).
Note that this implies -~ ~k(~i" A~k)Gkj(s, t') = 0(~v). In leading order in N we now find
Gij(t, {) -- ~SijO(t - {) e -(t-t') [1 - tanh 2 ~[~i" Am({) + 0i({)]].
(77)
652
A.C.C. Coolen
For those cases where the macroscopic laws (70) describe evolution to a stationary state m, obviously requiring stationary external fields 0;(t) --- 0i, we can take the limit t ~ c~, with t - t' = z fixed, in the two results (74) and (77). Using the t ~ ~ limits of (71) and (72) we subsequently find time translation invariant expressions: l i m t ~ Cij(t, t - z) -- Cij('~ ) and l i m t ~ Gij(t, t - ~) = Gij('c), with in leading order in N Cij(z)
-
tanh 13[~i 9Am + 0i] tanh p[~j. Am + 0j] (78)
+ 8ij e-~ [ 1 - tanh2 P[~i" Am + 0i]]
(79)
Gij(z) - p6ijO(z)e -~ [1 - tanh 2 P[{i" Am + Oi]] for which indeed the F D T (64) holds: aij(T) = - - ~ 0 ( ~ ) ~
Cij(T).
4.2.2. Correlation and response functions for parallel dynamics We now turn to the parallel dynamical rules (22), with the local fields h i ( 6 ) - ~_,jJij~j + Oi, and the interaction matrix (18). As before, having timedependent external fields amounts simply to adding these fields to the internal ones, and the dynamic laws (31) are found to be replaced by
m(t+ 1)-
lim N1 ~ N---~vc
i
~i tanh[13~i- Am(t) + Oi(t)l.
(80)
Fluctuations in the local fields are again of vanishing order in N, and the parallel dynamics versions of Eqs. (71) and (72), to be derived from (22), are found to be ( ~ i ( t -Jr- 1))
i :/: j"
(81)
-- tanh P[~i" Am(t) + 0i(t)],
((yi(t + 1)crj(t + 1)) -- tanh P[~i" Am(t) + 0i(t)] tanh 13[~j 9Am(t) + 0j(t)]. (82)
With re(t; 6', t') denoting the solution of the map (80) following initial condition m(t') = ~ ~--~;crib,/, we immediately obtain from Eqs. (81) and (82), the correlation functions:
Cij(t, t) - 8/j + [1 - 8ij] tanh 13[~/ 9Am(t - 1) + 0/(t - 1)] x tanh p[~j. Am(t - 1) + 0j(t - 1)],
t>{"
Cij(t,t') - (tanh P[~i" A m ( t -
(83)
1; 6({), {) + Oi(t- 1)]t~j({))
= tanh 13[~i 9Am(t - 1) + Oi(t -- 1)] X tanh 13[~j 9Am(t' - 1) + 0j(t' - 1)].
(84)
From (81) also follow equations determining the leading order in N of the response functions Gij(t, t'), by derivation with respect to the external fields 0j(t'):
653
Statistical mechanics of recurrent neural networks H - dynamics t' > t -
1:
o j(t,t') =0,
t' = t -
l"
Giy(t,t') - [~Sij[1 - tanh 2 ~[~i" A m ( t - l) + O i ( t - 1)]],
t' < t -
1"
Gij(t,t') = 1311 - tanh 213[{i 9A m ( t - 1) + O i ( t - 1)]]
(85)
1 )< ~ E k ( ~ i " A~,k)Gkj(t - 1, t').
It now follows iteratively that all off-diagonal elements must be of vanishing order in N: Gij(t, t - 1) = 8iyG~i(t, t - 1) - . Gij(t, t - 2) = 8ijGii(t, t - 2) -~ ..., so that in leading order Gij(t, t') = ~SiiSt,t,+, [1 - tanh 2 [3[~,i. Am(t') +
Oi(tt)]].
(86)
For those cases where the macroscopic laws (80) describe evolution to a stationary state m, with stationary external fields, we can take the limit t --+ oc, with t - t' = fixed, in (83), (84) and (86). We find time translation invariant expressions: limt_~ Cij(t, t - ~) = Cij(T~) and limt_~ Gij(t, t - ~) = Gij(z), with in leading order in N: Cij('c) = tanh 13[{i 9Am + 0;] tanh 13[~,j-Am + 0j] + 6ij8~,o [1 - tanh 2 ~[{i" Am + 0i]]
(87)
(88)
Gu(x ) = [36u~,1 [1 - tanh 2 ~[~i" A m nt- 0i]]
obeying the F D T (63)" Gq(x > 0) = -[3[Cij('c + 1 ) - C/y(~- 1)]. 4.3. Example." graded response neurons with uniform synapses
Let us finally find out how to calculate correlation and response function for the simple network (38) of graded response neurons, with (possibly time-dependent) external forces Oi(t), and with uniform synapses J i j - J / N : dui(t ) J d-tt = -N Z
(89)
g[yuj(t)] - ui(t) + Oi(t) + qi(t). J
For a given realization of the external forces and the Gaussian noise variables {rli(t)} we can formally integrate (89) and find ui(t) -- ui(O)e -t +
/0
ds e s-t
du p(u; u(s))g[yu] + Oi(s) + rli(S)
J
(90)
with the distribution of membrane potentials p ( u ; u ) = N -I } - ~ i S [ u - ui]. The correlation function Cij(t, t') = (ui(t)uj(t')) immediately follows from (90). Without loss of generality we can define t > t'. For absent external forces (which we only need to define the response function), and upon using ( q i ( s ) ) = 0 and ( r l i ( s ) q j ( s ' ) ) = 2 T 8 i ~ 8 ( s - s'), we arrive at
654
A.C.C. Coolen
Cij(t,t') - r s i j ( e r - t -
x
(E
e -t'-t) +
[
u i ( O ) e -t + J
//o
uj(O) e -e + J
/
dug[yu]
/ot
dsC-tp(u;u(s))
ds' e"-"p(u; u(s'))
du g['~u]
]1 .
For N ~ oc, however, we know the distribution of potentials to evolve deterministically: O ( u ; u ( s ) ) ---, p,(u) where p,(u) is the solution of (41). This allows us to simplify the above expression to Cij(t, t') = TSij(e/-' - e-t'-') +
N ~ ec"
x
uj(O)e-" + J
(I
ui(0) e-' + J
du g[~u]
/
du g[,fu]
ds' e"-eO,,(u)
/0 .
ds e'-tps(u)
1
(91)
Next we turn to the response function Gig(t,/)= 8(ui(t))/8~,j(t') (its definition involves functional rather than scalar differentiation, since time is continuous). After this differentiation the forces {0i(s)} can be put to zero. Functional differentiation of (90), followed by averaging, then leads us to Gij(t, t') - O(t - t')Sij e t'-t - J
(
~iI~ a [ u - u,(s)] ~)Oj(t')
/
du g[yu]-~u
ds e "-t
"
In view of (90) we make the self-consistent ansatz 8 u k ( s ) / 8 ~ j ( s ' ) - C(N -1) for k # j. This produces N ---+ cx~"
Gij(t, t') - O ( t -
(92)
tt)Sije t'-t.
Since Eq. (41) evolves towards a stationary state, we can also take the limit t ~ ~ , with t - t ' = z fixed, in (91). Assuming nonpathological decay of the distribution of potentials allows us to put l i m t ~ : fo d s C - t P s ( u ) - p(u) (the stationary solution of (41)), with which we find not only (92) but also (91) reducing to time translation invariant expressions for N ~ ~ , l i m t ~ C i j ( t , t - x ) = Cij(T,) and l i m t ~ G u ( t , t - ~) = Gij(z), in which Cij('c) -- T~)ije -~ -+- j 2
{/
}-)
du p(u)g[yu]
,
Gij(Y,) - O(y,)aije -~.
(93)
Clearly the leading orders in N of these two functions obey the F D T (69): G~y(~) - - 6 0 ( ~ ) d Cig(z). As with the binary neuron attractor networks for which we calculated the correlation and response functions earlier, the impact of detailed balance violation (occurring when A~v r Avp in networks with binary neurons and synapses (18), and in all networks with graded response neurons [1]) on the validity of the FDTs, vanishes for N ~ e~, provided our networks are relatively simple and
Statistical mechanics of recurrent neural networks H - dynamics
655
evolve to a stationary state in terms of the macroscopic observables (the latter need not necessarily happen, see e.g. Figs. 1 and 4). Detailed balance violation, however, would be noticed in the finite size effects [12].
5. Dynamics in the complex regime The approach we followed so far to derive closed macroscopic laws from the microscopic equations fails when the number of attractors is no longer small compared to the number N of microscopic neuronal variables. In statics we have seen [1] that, at the work floor level, the fingerprint of complexity is the need to use replica theory, rather than the relatively simple and straightforward methods based on (or equivalent to) calculating the density of states for given realizations of the macroscopic observables. This is caused by the presence of a number of 'disorder' variables per degree of freedom which is proportional to N, over which we are forced to average the macroscopic laws. One finds that in dynamics this situation is reflected in the inability to find an exact set of closed equations for a finite number of observables (or densities). We will see that the natural dynamical counterpart of equilibrium replica theory is generating functional analysis.
5.1. Overview of methods and theories Let us return to the simplest setting in which to study the problem: single pattern recall in an attractor neural network with N binary neurons and p - aN stored patterns in the nontrivial regime, where a > 0. We choose parallel dynamics, i.e. (22), with Hebbian-type synapses of the form (18) with A p v - 8~v, i.e. J i j ~i ~j' giving us the parallel dynamics version of the Hopfield model [2]. Our interest is in the recall overlap m(a) - N -1 ~ ; c~i~] between system state and pattern one. We saw in [1] that for N ~ oo the fluctuations in the values of the recall overlap m will vanish, and that for initial states where all cyi(0) are drawn independently the overlap m will obey
m(t + 1) - I dzPt(z)tanh[13(m(t) + z)], ,/
lira'S( Ez
-
-
,94
and that all complications in a dynamical analysis of the a > 0 regime are concentrated in the calculation of the distribution Pt(z) of the (generally nontrivial) interference noise.
5.1.1. Gaussian approximations As a simple approximation one could just assume [13] that the cyi remain uncorrelated at all times, i.e. P r o b [ c y ; ( t ) - - + - ~ ] - 1[1 • m(t)] for all t~> 0, such that the argument given in [1] for t - 0 (leading to a Gaussian P(z)) would hold generally, and where the mapping (94) would describe the overlap evolution at all times:
A.C.C. Coolen
656
Pt(z) - [2 rta]-' e-}2/~ :
m(t + 1) = / Dz tanh[13(m(t) + zv/-~)]
(95)
with the Gaussian measure Dz = (2n) -89e-~2dz. This equation, however, must be generally incorrect. Firstly, Fig. 5 in [1] shows that knowledge of re(t) only does not permit prediction of m(t + 1). Secondly, expansion of the right-hand side of (95) for small re(t) shows that (95) predicts a critical noise level (at at = 0) of Tc = ~ c 1 - 1, and a storage capacity (at T = 0) of ac = 2/n .~ 0.637, whereas both numerical simulations and equilibrium statistical mechanical calculations [1] point to ac ~ 0.139. Rather than taking all cyi to be independent, a weaker assumption would be to just assume the interference noise distribution Pt(z) to be a zero-average Gaussian one, at any time, with statistically independent noise variables z at different times. One can then derive (for N ~ c~ and fully connected networks) an evolution equation for the width E(t), giving [14,15]:
Pt(z) = [2 rtze(t)] - 89 -'/z2(t) 9 m(t + 1) - / Dz tanh[13(m(t) + zZ(t))] Z2(t + 1) = ~ + 2 ~m(t + 1)m(t)h[m(t), Z(t)] + Z2(t)h2[m(t), Z(t)] with h[m,E] - 1311 - f D z tanhZ[13(m +zE)]]. These equations describe correctly the qualitative features of recall dynamics, and are found to work well when retrieval actually occurs. For nonretrieval trajectories, however, they appear to underestimate the impact of interference noise: they predict Tc-- 1 (at at = 0) and a storage capacity (at T = 0) of 0tc ~ 0.1597 (which should have been about 0.139). A final refinement of the Gaussian approach [16] consisted in allowing for correlations between the noise variables z at different times (while still describing them by Gaussian distributions). This results in a hierarchy of macroscopic equations, which improve upon the previous Gaussian theories and even predict the correct stationary state and phase diagrams, but still fail to be correct at intermediate times. The fundamental problem with all Gaussian theories, however sophisticated, is clearly illustrated in Fig. 6 of [1]: the interference noise distribution is generally not of a Gaussian shape. Pt(z) is only approximately Gaussian when pattern recall occurs. Hence the successes of Gaussian theories in describing recall trajectories, and their perpetual problems in describing the nonrecall ones.
5.1.2. Non-Gaussian approximations In view of the non-Gaussian shape of the interference noise distribution, several attempts have been made at constructing non-Gaussian approximations. In all cases the aim is to arrive at a theory involving only macroscopic observables with a single time argument. Fig. 6 of [1] suggests that for a fully connected network with binary neurons and parallel dynamics a more accurate ansatz for Pt(z) would be the sum of two Gaussians. In [17] the following choice was proposed, guided by the structure of the exact formalism to be described later:
Stat&tical mechanics of recurrent neural networks H - dynamics
657
P,(z) : P+ (~) + P;- (z), -
i
~,,~,(t) j•i
la>l
1 + re(t) e_ 89 Pt!(z) =- 2 2 ( t ) v ' ~ followed by a self-consistent calculation of d(t) (representing an effective 'retarded self-interaction', since it has an effect equivalent to adding hi(~(t))~ hi(~(t)) + d(t)~i(t)), and of the width E(t) of the two distributions Pt~(z), together with m(t +
1) - 1 [1 + re(t)] J Dz tanh[]3(m(t) +
d(t) +
+21 [1 - m(t)] J Dz tanh[13(m(t) -
zZ(t))]
d(t) + z2(t))]
The resulting three-parameter theory, in the form of closed dynamic equations for {m, d, E}, is found to give a nice (but not perfect) agreement with numerical simulations. A different philosophy was followed in [18] (for sequential dynamics). First (as yet exact) equations are derived for the evolution of the two macroscopic observables m(~) - ml (~) and r(r - ~-1 ~ > l m~(r with m~(r - N -1 ~ i ~/~(Yi, which are both found to involve Pt(z): dt m =
dz Pt(z) tanh[13(m + z)],
~ r - -~
dzPt(z)ztanh[~(m + z)] + 1 - r.
Next one closes these equations by hand, using a maximum-entropy (or 'Occam's Razor') argument: instead of calculating Pt(z) from (94) with the real (unknown) microscopic distribution pt(r it is calculated upon assigning equal probabilities to all states a with re(a) = m and r(a) = r, followed by averaging over all realizations of the stored patterns with ~t > 1. In order words: one assumes (i) that the microscopic states visited by the system are 'typical' within the appropriate (m,r) subshells of state space, and (ii) that one can average over the disorder. Assumption (ii) is harmless, the most important step is (i). This procedure results in an explicit (nonGaussian) expression for the noise distribution in terms of (re, r) only, a closed two-parameter theory which is exact for short times and in equilibrium, accurate predictions of the macroscopic flow in the (m, r)-plane (such as that shown in Fig. 5 of [1]), but (again) deviations in predicted time dependencies at intermediate times. This theory, and its performance, was later improved by applying the same ideas to a derivation of a dynamic equation for the function Pt (z) itself (rather than for m and r only) [19]; research is still under way with the aim to construct a theory along these lines which is fully exact.
A.C.C. Coolen
658
5.1.3. Exact results: generat&g functional analysis The only fully exact procedure available at present is known under various names, such as 'generating functional analysis', 'path integral formalism' or 'dynamic meanfield theory', and is based on a philosophy different from those described so far. Rather than working with the probability pt(~) of finding a microscopic state tT at time t in order to calculate the statistics of a set of macroscopic observables fl(t~) at time t, one here turns to the probability Probity(0),..., t~(tm)] of finding a microscopic path t ~ ( 0 ) ~ t ~ ( 1 ) 7 . . . ~ t~(tm). One also adds time-dependent external sources to the local fields, hi(ty) ~ h;(t~) + Oi(t), in order to probe the networks via perturbations and define a response function. The idea is to concentrate on the moment generating function Z[~], which, like Prob[t~(0),..., t~(tm)], fully captures the statistics of paths:
Z[,l,] - ( e-' E, :g:"_-'o
(96)
It generates averages of the relevant observables, including those involving neuron states at different times, such as correlation functions Cij(t,t')= (cyi(t)cyj(t')) and response functions Gij(t,t')= i~(~i(t))/~Oj(t'), upon differentiation with respect to the dummy variables {~i(t)}:
(cyi(t))
= i lim ~
'
Cij(t, t')
= -lim (97)
Gij(t, t')
- i lim
Next one assumes (correctly) that for N ~ c~ only the statistical properties of the stored patterns will influence the macroscopic quantities, so that the generating function Z[~] can be averaged over all pattern realizations, i.e. Z[~] ~ Z[~]. As in replica theories (the canonical tool to deal with complexity in equilibrium) one carries out the disorder average before the average over the statistics of the neuron states, resulting for N ~ ~ in what can be interpreted as a theory describing a single 'effective' binary neuron cy(t), with an effective local field h(t) and the dynamics Prob[cy(t + 1) - +1] - 89 + tanh[13h(t)]]. However, this effective local field is found to generally depend on past states of the neuron, and on zero-average but temporally correlated Gaussian noise contributions qb(t): h(t]{~}, {~}) -
re(t) + O(t) + a Z R ( t , { ) c y ( { )
+ x/~dp(t).
(98)
t' < t
The first comprehensive neural network studies along these lines, dealing with fully connected networks, were carried out in [20,21], followed by applications to a-symmetrically and symmetrically extremely diluted networks [22,23] (we will come back to those later). More recent applications include sequence processing networks [24]. 6 For N ~ e~ the differences between different models are found to show up only in the 6
In the case of sequence recall the overlap m is defined with respect to the 'moving' target, i.e. t m(t) = ~1 ~i c~i(t)~i.
Statistical mechanics of recurrent neural networks H - dynamics
659
actual form taken by the effective local field (98), i.e. in the dependence of the 'retarded self-interaction' kernel R(t, t') and the covariance matrix (dp(t)qb(t)) of the interference-induced Gaussian noise on the macroscopic objects C = { C ( s , s ' ) = l i m u ~ 1 ~-~i Cii( S, st) } and G - { G(s, s') - l i m N ~ 1 ~ i Gii( S, st) }" For instance: 7 Model
Synapses Jij
Fully connected, static patterns
1 ~-~1 ~Y
Fully connected, pattern sequence
g=l i
Symm extr diluted, static patterns
-7
Asymm extr diluted, static patterns
cij
~j
~i ~ ~c
c ~-'~t=l ~i~J g la
R(t,t')
@(t)~b(t'))
[(1 -- G)-lGl(t,t ')
[(1-G)-Ic(1-Gt)-II(t,t')
0
Y~'~._>o[(Gt)" CG"] (t. t ' )
G(t.t')
C(t.t')
0
C(t,t')
with the cij drawn at random according to P ( c i j ) - ~vScij,1 + (1-~v)Scij,0 (either symmetrically, i.e. cij = cji, or independently) and where cii = 0, l i m N ~ c / N = O, and c--+ ec. In all cases the observables (overlaps and correlation and response functions) are to be solved from the following closed equations, involving the statistics of the single effective neuron experiencing the field (98):
m(t) = (or(t)),
C(t,t') = (cy(t)cy(t')),
G(t,t')--~(cr(t))/~O(t').
(99)
It is now clear that Gaussian theories can at most produce exact results for asymmetric networks. Any degree of symmetry in the synapses is found to induce a nonzero retarded self-interaction, via the kernel K(t,t'), which constitutes a nonGaussian contribution to the local fields. Exact closed macroscopic theories apparently require a number of macroscopic observables which grows as C(t 2) in order to predict the dynamics up to time t. In the case of sequential dynamics the picture is found to be very similar to the one above; instead of discrete time labels t c {0, 1 , . . . , tin}, path summations and matrices, there one has a real time variable t c [0, tm], path-integrals and integral operators. The remainder of this paper is devoted to the derivation of the above results and their implications.
5.2. Generating functional analysis for binary neurons 5.2.1. General definitions I will now show more explicitly how the generating functional formalism works for networks of binary neurons. We define parallel dynamics, i.e. (22), driven as usual by local fields of the form hi(~; t) - }-~jJijcYj + Oi(t), but with a more general choice of Hebbian-type synapses, in which we allow for a possible random dilution (to reduce repetition in our subsequent derivations): p
Jij
7
cijv~--
- -- ~
c ~t=l
~ ~t
~i ~j ,
p - ~c.
(100)
In the case of extremely diluted models the structure variables are also treated as disorder, and thus averaged out.
A. C. C. Coolen
660
Architectural properties are reflected in the variables c/j E {0, 1}, whereas information storage is to be effected by the remainder in (100), involving p randomly and independently drawn patterns { ~ - ( ~ , . . ' , ~ , ~ u ) C {--1, 1}N. I will deal both with symmetric and with asymmetric architectures (always putting c,/= 0), in which the variables c;j are drawn randomly according to Symmetric:
cij = cji,
Vi < j
c
P(cij) = -~ 8c~j.1 +
1 - -~ 8c~j,o.
(101)
Asymmetric:
V i =fi j
P(cij) = ~ 8~,j.1 +
( 1c-) ~
8~,.~..0
(102)
(one could also study intermediate degrees of symmetry; this would involve only simple adaptations). Thus ckl is statistically independent of cij as soon as (k, l)~ { (i, j), (j, i) }. In leading order in N one has ( ~ j cij) = c for all i, so c gives the average number of neurons contributing to the field of any given neuron. In view of this, the number p of patterns to be stored can be expected to scale as p = ~c. The connectivity parameter c is chosen to diverge with N, i.e. limu__,~ c-1 _____0. If C = N we obtain the fully connected (parallel dynamics) Hopfield model. Extremely diluted networks are obtained when limN_~ c / N = O. For simplicity we make the so-called 'condensed ansatz': we assume that the system state has an C(N ~ overlap only with a single pattern, say g = 1. This situation is induced by initial conditions: we take a randomly drawn ~(0), generated by 1[ 1 -mo]8~,(o),_~] }
I~.{ 1 [1 +
(103)
1 so
m0 i
The patterns ~t > 1, as well as the architecture variables cij, are viewed as disorder. One assumes that for N ~ oc the macroscopic behaviour of the system is 'selfaveraging', i.e. only dependent on the statistical properties of the disorder (rather than on its microscopic realisation). Averages over the disorder are written as =-=.-. We next define the disorder-averaged generating function: Z[l[/]
~
(e-i~-~i~-~t r
(104)
in which the time t runs from t - 0 to some (finite) upper limit tm. Note that Z[0] = 1. With a modest amount of foresight we define the macroscopic site-averaged and disorder-averaged objects re(t) - N -1 }-~i ~ (cri(t)), C(t, t') - N -1 }-~i (~i(t)cyi(t')) and G(t,t') = N -1 ~i~(cyi(t))/~Oi(t'). According to (97) they can be obtained from (104) as follows:
m(t) = lim i ~Z[~J ,~o "N ~. ~j 8~j(t)' J
(105)
661
Statistical mechanics of recurrent neural networks H- dynamics
C(t,
{)
-
-
lim 1 ~
G(t,{)-
~2~[~]
,+oNZT-'. ~ j ( t ) ~ j ( t ' ) '
lim i /~.
~2Z[~]
~,+o-N . ~%(t)~Oj(t')"
(106)
So far we have only reduced our problem to the calculation of the function Z[~] in (104), which will play a part similar to that of the disorder-averaged free energy in equilibrium calculations (see [1]).
5.2.2. Evaluationof the disorder-averagedgeneratingfunction As in equilibrium replica calculations, the hope is that progress can be made by carrying out the disorder averages first. In equilibrium calculations we use the replica trick to convert our disorder averages into feasible ones; here the idea is to isolate the local fields at different times and different sites by inserting appropriate ~5-distributions:
l- Hit/dhi(t)~)[hi(t)- Z Jijcyj(t)- 0i(t)] 9
j
- f{dhdh}exp(i~it [ti(t)[hi(t)-~j Jijcrj(t)-Oi(t)]) with {dh dh} -
I-Iit[dhi(t)dhi(t)/2rc],giving
Z[~/] -- f{dhd]l}eiEithi(t)[hi(t)-~
J~
)pf
in which (...)pf refers to averages over a constrained stochastic process of the type (22), but with prescribed fields at all sites and at all times. Note that with such prescribed fields the probability of generating a path {6(0),..., 6(tm)} is given by
{hi(t)}
6(tm)l{hi(t)}] =P(6(O))exp(Z[f3cyi(t 1)hi(t)-log2cosh[f3hi(t)]])
Prob[6(0),...,
SO
z[,] = f{dh dh}
a(tmP(6(0))eNg[{~}'{h}] )~ Hexp(i/ai(t)[hi(t)it--0i(t)]
--iqti(t)(Yi(t) + ~(Yi(t + l )hi(t) - l o g 2 cosh[~hi(t)])
(107)
with 1 o~[(.}, {h}] = ~log
[e_i~-~it~i(t)~-~/ji/cy/(t) ]
(10s)
662
A.C.C.
Coolen
We concentrate on the term o~[...] (with the disorder), of which we need only know the limit N ~ oc, since only terms inside Z[O] which are exponential in N will retain statistical relevance. In the disorder-average of (108) every site i plays an equivalent role, so the leading order in N of (108) should depend only on site-averaged functions of the {c~i(t),[~i(t)}, with no reference to any special direction except the one defined by pattern {l. The simplest such functions with a single time variable are
a(t; (6}) = - ~1 Z ~ ] c Y i ( t ) ,
k(t; {h)) = ~1Z
i
~l[~i(t) i ,
(109)
i
whereas the simplest ones with two time variables would appear to be 1
q(t, t'; {6}) -- ~ Z ~i(t)CYi(t'),
1
Q(t,t'; {h}) - ~ ~
[~i(t)hi(t'),
x(t,t';
h}) =
(110)
i
i
1
(111) i
It will turn out that all models of the type (100), with either (101) or (102), have the crucial property that (109-111) are in fact the only functions to appear in the leading order of (108): o~[...] - ~ [ { a ( t ; . . . ) , k ( t ; . . . ) , q ( t , t ' ; . . . ) , Q(t,t';...),K(t,t';...)}]
+ . - . (U --~ oc) (112)
for some as yet unknown function q)[...]. This allows us to proceed with the evaluation of (107). We can achieve site factorization in (107) if we isolate the macroscopic objects (109-111) by introducing suitable 8-distributions (taking care that all exponents scale linearly with N, to secure statistical relevance). Thus we insert
lm/
1 - I-I
da(t)
[aItl
-
lt;
t=O
--
a(t) a(t) - -~ Z. ~J cyj(t)
da da exp iN
tmj 1 - II
,
J
dk(t)8[k(t) - k(t; {h})]
t=O
=
N
dkdlcexp
iN~/r
(t)-~
. ~/~j(t)
,
tm / 1 -
H
dq(t,t'l
[q(t,t'l
- q(t,t';
t,d = 0
--
dq dq exp iN Z t,#
O(t' t') q(t, t') - ~ ~
Gj(t)cyj(t') j
,
663
Statistical mechanics of recurrent neural networks H - dynamics
1 -- H
dQ(t, {)8[Q(t, {) - Q(t, {; (h})]
t,tt =O
/
(
[
1
j
1
j
dO d0 exp iN Z Q(t, {) Q(t, {) - ~ Z hj(t)l~j({)
,m/ dK(t, {)8[K(t, {) - K(t,
1= H
t,t ~
1),
{; {a, I]})]
t,t~=O
j
(
dK dl(exp iN Z Is t,t'
[
1).
t') K(t, t') - ~ Z l~j(t)cyj(t')
Insertion of these integrals into (107), followed by insertion of (112) and usage of the shorthand Via, fi, k, k, q, 61,Q, Q, K, K] - i Z[gt(t)a(t) + [c(t)k(t)] t
+ i Z[c)(t, {)q(t, {) + Q(t, {)Q(t, {) + I~(t, {)K(t, {)]
(113)
t ,fl
then leads us to Z[~]- f dadfidkdf~dqdqdQd0 dKdl~exp(NW[a, fi,k,l~,q,q, Q, I~,K,I~] + N*[a, k,q, Q,K] + (9(...)) j {dh dh} ,(~o)"""ff(/m)~P(*(O)) X Hexp(ihi(t)[hi(t) it
- 0 i ( t ) ] - iqli(t)(yi(t ) + ~(yi(t + 1 ) h i ( t ) - l o g 2 c o s h [ ~ 3 h i ( t ) ] )
x HexPi (-i~] Z[~(t)cyi(t)t
+ ~:(t)hi(t)]- i~-~[gl(t,')cyi(t)cyi(t')t,
+ Q(t,{)hi(t)hi({)+K(t,{)hi(t)cyi({)])
(114)
in which the term denoted as (9(...) covers both the nondominant orders in (108) and the (9(logN) relics of the various pre-factors [N/2rc] in the above integral representations of the 8-distributions (note: tm was assumed fixed). We now see explicitly in (114) that the summations and integrations over neuron states and local fields fully factorize over the N sites. A simple transformation {cyi(t),hi(t),fzi(t)) ---+ {~]cYi(t), ~lhi(t ) i , ~]/~/(t)} brings the result into the form
A. C. C. Coolen
664
/{dhdh} ~(~o)"""Z P ( g ( 0 ) ) "(tin) • H exp(ihi(t)[hi(t) -
~)0/(t)] -i~]~li(t)cyi(t )
it
-+-~i(t -+-1)hi(t) • I~exp (-i~]
log 2 cosh[~hi(t)])
~-'~[gt(t)cyi(t)t
+ k(t)]'li(t)]
--i Z[q(t't.t' t')~i(t)~i(t')
+ Q(t, t')hi(t)tti(t') + K(t, t')hi(t)~i(t')])
= eNE[fi.k.iI.Q .R]
with
1
E[fi, l ~ , t ] , 0 , 1 ( ] - ~ Z l o g i
/
{dhdh}
Z
n0(cy(0))
cy(0).-.~(tm)
• exp(Z{ih(t)[h(t)-~]Oi(t) ] i~]~i(t)c~(t)}) • exp ( Z{[3cy(tt + 1)h(t)- log2cosh[[3h(t)]}
-iZ[a(t)cy(t ) +/~(t)h(t)]- i Z[O(t, {)~(t)cy({) t
t,t'
+ Q(t, {)h(t)h({) + Is {)h(t)cy({)])
(115)
in which {dh dh} - I-I,[dh(t)dh(t)/2n] and no(Cy) - 89 [1 + m018~,, + 89 [1 - m018,~,-1. At this stage (114) acquires the form of an integral to be evaluated via the saddlepoint (or 'steepest descent') method: Z[{~(t) }] = f da dfi dk dl~ dq dq dQ dO dK dl~ eN{v[]+*[]+z[l}+e()
(116)
in which the functions V[...], (I)[...] and Z[...] are defined by (112), (113) and (115).
5.2.3. The saddle-point problem The disorder-averaged generating function (116) is for N ~ c~ dominated by the physical saddle-point of the macroscopic surface
T[a, fi, k, l~,q,(], Q, 0, K, I(] + ~[a,k,q, Q,K] + E[fi, i~,l], 0, I(]
(117)
with the three contributions defined in (112), (113) and (115). It will be advantageous at this stage to define the following effective measure (which will be further simplified later):
665
Statistical mechanics of recurrent neural networks H - dynamics
(f[{,,), {h), (~)]), 1
{f{dhd[~}~_,<~(o)...<~(t,,)Mi!.{_eY2,{h),{h)]f[{er),{h),{ft)] } --N~i f{dh dh} 2~(o)...~(,m)M/[{~, ~ ; (-~7i
(118)
with
M,[(~), {h), {~)} - ~o(~(0)) /
x exp (~{i[~(t)[h(t)
\
- ~]0i(t)]- i~] ~ti(t) cy(t)
t \
+ ~c~(t + 1)h(t) -log2cosh[~h(t)])) /
x exp ( - i Z[fi(t)~(t) + k(t)/~(t)] \ t
i~-~[O(t, {)cy(t)cy({) t,tI
+ Q(t, t')h(t)h(t') + I~(t, tt)h(t)cy(t')]) /
in which the values to be inserted for {th(t),[c(t),O(t,t'), Q(t,t'),Is are given by the saddle-point of (117). Variation of (117) with respect to all the original macroscopic objects occurring as arguments (those without the 'hats') gives the following set of saddle-point equations:
gt(t) = i~r O(t, t')
~:(t) =- i~/~k(t),
-- iO@/~q(t, t'),
Q(t, t') -
(119)
io@/~Q(t, t'),
Is t') - i~/OK(t, t'). (120)
Variation of (117) with respect to the conjugate macroscopic objects (those with the 'hats'), in turn, and usage of our newly introduced short-hand notation (...),, gives:
a(t) = (cy(t)),,
k(t) = (/~(t)),,
q(t,t')- (cy(t)cy(t')),,
(121)
Q(t,t')-- (h(t)h(t')),,
K(t,t')- (h(t)cy({)),. (122)
The coupled equations (119)-(122) are to be solved simultaneously, once we have calculated the term q)[...] (112) which depends on the synapses. This appears to be a formidable task; it can, however, be simplified considerably upon first deriving the physical meaning of the above macroscopic quantities. We apply (105)-(116), using identities such as ~z[...] =
i
1
[f{dh d/~) Z~(0)...~(tm)Mj[{cy}, {h}, {/~)]r
~a[...] =
i
1
[f{dh dh} ~(O)".~(tm)Mj[{CY}, {h}, {h}]h(t)]
~,l,j(t)
~0j(t)
x' 'J L 7-i-d
i,5 7'-;:
i-';:;;
_i
666
A. C. C. Coolen
1 [f{dh d/~} E~(o)...~(t~)M~[(cr}, {h}, {/~}]cy(t)cr(t')] N
5:E[...] 5~j(t)8~j(t')
oz[..(,,]]) -N [[0-:,[....3] o,~(,) ] [o~,~
[
a2Z[. . .] OOj(t)OOj(t')
1 f{dh d/~} ~(o)...~(,,.)Mj[{ey}, {h}, {h)]h(t)]~(t')] N
], i [f{dhd/~} ~(o)...~(t~)Mj[{~}, {h}, {/;}]cr(t)h(t').]
~%(t)~oj(t')
[~z[ 3] pz[.. 31
- N L Oqlj(t) J Li~0j(t') J
and using the short-hand notation (118) wherever possible. Note that the external fields {q~i(t), Oi(t)} occur only in the function El...], not in W[...] or O[...], and that overall constants in Z[~] can always be recovered a posteriori, using Z[0] - 1: i
m (t ) = limo~ ~i ~~
f da 9" d R [[e,;ft)j NO=-] oN[V+*+E] +((---) "
f d-a : : : -~ -~ [~
~
= lira (cr(t)) ~0
*'
C(t, t') - - ~lim I Z o N
f d a . . . dI([-[Oqti(t)~i(t') NO=-= -Jr-O~li(t) NeZ O]~t~)"] eN[V+r i f d a . . . dK eN[V+r = lim (~(t)cy(t')), ~0
1
iG(t, t / )
lim
V~
f da.. dR [ N~2Z "
Z..., i = lirn(cy(t)h(t')/ . ~0
N0Z
N~.E ]
LO,i(t)ooi(t, ) + o,i(t ) o~i(t ~)
eN[V+r
f d a ... dl~ eN[V+c+zl+r
*
Finally we obtain useful identities from the seemingly trivial statements N - ' ~ i ~]OZ[O]/OOi(t) = 0 and N - ' ~ i 02Z[O]/aOi(t)OOi(t') = 0: 0
i S-"
[ NO-=] ..,~N[V+O+E]+C(...) f da . . . .d~ . L~o,(t)j
= lim (h(t)/ ~0
*'
667
Statistical mechanics of recurrent neural networks H - dynamics
0 -- - *---+0 lim N1 Z
f da "'" dl~ [ooi(t)ooi(t') [. NozE 4- OOi(t) N~E 6-~-(7) N~E ] eX[q'+O+Z]+0(..) i f d a . . . dI( eU[~'+O+Z]+(~()
= lim ([7(t)h(t')) . ~-+0 * In combination with (121) and (122), the above five identities simplify our problem considerably. The dummy fields ~i(t) have served their purpose and will now be put to zero, as a result we can now identify our macroscopic observables at the relevant saddle-point as:
a(t) = m(t),
k(t) = O,
q(t, {) = C(t, {),
Q(t, {) = O,
K(t, {) = iG({, t). (123)
Finally we make a convenient choice for the external fields, Oi(t)- ~0(t), with which the effective measure (...), of (124) simplifies to
(f[{~}, {h}, {~i)]>, -~
/'{dh d/~} ~-~'~(O)...,~(tm)M[{cy}, {h}, {h)] f[{cy}, {h), {/~}] f{dh d/~} 2,,(0)...o-(t.,)M[{~}, {h}, {h}] (124)
with
M[{~,}, {h), {~}] - ~0(~,(0)) x exp ( Z { i h ( t ) [ h ( t ) t
0(t)] + ]3cy(t+ 1)h(t)
- l o g 2cosh[~h(t)]} - i Z [ g t ( t ) c r ( t ) +/c(t)/~(t)]) t t,tt
In summary our saddle-point equations are given by (119)-(122), and the physical meaning of the macroscopic quantities is given by (123) (apparently many of them must be zero). Our final task is finding (112), i.e. calculating the leading order of ~-[{~}, {h}]
-
1 log
~
[e_i~-~it~i(t)~-~yj~jcyj(t)]
(125)
which is where the properties of the synapses (100) come in.
5.3. Parallel dynamics Hopfield model near saturation 5.3.1. The disorder average The fully connected Hopfield [2] network (here with parallel dynamics) is obtained upon choosing c = N in the recipe (100), i.e. cij = 1 - 6ij and p = adV. The disorder
668
A. C. C. Coolen
average thus involves only the patterns with ~t > 1. In view of our objective to write (125) in the form (112), we will substitute the observables defined in (109)-(111) whenever possible. Now (125) gives ~[...] : ~log
xp(-iN-' Z
Z t
Z
~t
{~{~h,(t)c~j(t))
iT~j
: iotE K(t,t;{~, h}) - iZ a(t)k(t) t
t
+~176
[~i~,ihi(t)/v/N][~i~,icYi(t)/x/~])]+(9(N-m)" (126)
We concentrate on the last term:
=
/
[ (--~
dx dy d~ d~ ei[x.x+y.y_x.y] exp (271;) 2(t''+l)
_J'dxdyd~d, ( (2 ~-)-2(-3-~-m-~iiexp i[R-x + ~ ' . y -
"
)]
~iE[Yc(t)~i(t) + f~(t)[~i(t)] t
x. y]
+~l~176
_-/dxdYd~d~ ( i[R-x + ~-y _ x (2 ~)2(tm+l) e x p
21N ~,
y]
' [2(t)~i(t)+ f2(t)hi(t)]/ +(9(N -I))
_-fdxdyd'~d:~(
(2X)2(t,,,+l) exp i [ i - x + ~,.y
_ x . y] _'-2Z[x(t)2(t')q(t't')t.t'
+ 22(t)fi(t')K(t',t) + ,f(t)fi(t')O(t,t')] + (9(N-1)). Together with (126) we have now shown that the disorder average (125) is indeed, in leading order in N, of the form (112) (as claimed), with
669
Statistical mechanics of recurrent neural networks H - dynamics
O[a, k, q, Q, K] - is Z
K(t, t) - ia. k + a log / dx dy dz~ d~, t
( 2 n ) 2(tm+l)
( i[R.x+~'.y-x.y]-~
•
1 [~: " q:~ + 2 ' " K:~ + ~' " Q-9])
=i~ZK(t,t)-ia.k t
+ otlog
/
dudv ( l[u.qu+2v.Ku_2iu.v+v. (2 T~)tm-~-I exp - ~
Qv]) (127)
(which, of course, can be simplified further). 5.3.2. Simplification o f the saddle-point equations We are now in a position to work out Eqs. (119) and (120). For the single-time observables this gives ci(t) - k(t) and/~(t) - a(t), and for the two-time ones: 1
f du dv u(t)u(t') e x p ( - 89[u .qu § 2 v. Ku - 2 iu. v + v. Qv]) f du dv e x p ( - 89[u .qu + 2 v. Ku - 2 iu. v § v. Qv])
1
f du dv v(t)v(t') e x p ( - 89[u-qu + 2 v. Ku - 2 iu- v + v. Qv]) fdudvexp(- 89 + 2v. K u - 2iu. v + v. Qv])
(l(t, t') -- -- -~ ~i
Q( t, t') -- - -~ oti
Is
f du dv v(t)u(t') e x p ( - 89[u .qu + 2 v. Ku - 2 iu. v + v. Qv]) {) -- - ~ i ~ f du dv e x p ( - 89[u .qu + 2 v- Ku - 2 iu. v + v. Qv])
-- Ot~t,t,.
At the physical saddle-point we can use (123) to express all nonzero objects in terms of the observables re(t), C(t, t') and G(t, t'), with a clear physical meaning. Thus we find fi(t) - 0,/c(t) - m(t), and 1 f du dv u(t)u(t') e - 89 O(t,t') -- --~o~i fdudve_l[,.cu_2iu.[l_G]v ] t
: 0
l[uCu 21u [1 G]v]
1 .fdudvv(t)v(t)e-~ " - " Q(t,t')---ou ~ - - - - y . ~ 2 f d u d v e-~ [ucu-2Zu'[1-G]v]
=
1
~i
(128) [[(1-G)
1
- C ( 1 - G ~)
-1
](t,t') (129)
/~(t, t') + ~St,t' -- - ~ i f du dv v(t)u(t') e- 89[u'cu-2iu[1-G]v] : f du dve -l[u'Cu-2iu'[1-G]v]
- G) -1 (t, t') ( 30)
(with Gt (t, t ' ) - G(t', t), and using standard manipulations of Gaussian integrals). Note that we can use the identity (1 - G) -1 - 1 - ~e~>0 G e - 1 - Y~'~e>oGe-G ( 1 - G) -1 to compactify (130) to
A.C.C. Coolen
670
(131)
/C(t, t') -- cx[G(1 - G)-l](t, t').
We have now expressed all our objects in terms of the disorder-averaged recall overlap m = {m(t)} and the disorder-averaged single-site correlation and response functions C = {C(t,t')} and G = {G(t,t')}. We can next simplify the effective measure (124), which plays a crucial role in the remaining saddle-point equations. Inserting a(t) = O(t,t') = 0 and k(t) = re(t) into (124), first of all, gives us
+ [3cy(t+ 1)h(t) - l o g 2 cosh[[3h(t)]} -i~Q(t,t')h(t)h(t')).t.t, (132) Secondly, causality ensures that G(t,t')= 0 for t ~< t', from which, in combination with (131), it follows that the same must be true for the kernel/C(t, t'), since
IC(t,t') - cx[G(1 - G)-'](t, t') - cx{G + G 2 + G 3 + - - . } ( t , t'). This, in turn, guarantees that the function M[...] in (132) is already normalized: f{dhd/~}
Z M[{cy}, {h}, {/~}] - 1. ~(0)...o(t,,,)
One can prove this iteratively. After summation o v e r O(tm) (which due to causality cannot occur in the term with the kernel IC(t,t')) one is left with just a single occurrence of the field h(tm) in the exponent, integration over which reduces to 8[/~(tm)], which then eliminates the conjugate field h(tm). This cycle of operations is next applied to the variables at time tm - 1, etc. The effective measure (124) can now be written simply as
/{dhd~} M[{~},{h},{/~}]f[{cy), {h}, {/~)]
(f[{cy}, {h}, {/t}]), = ~(0)---~(t,,,)
with M[...] as given in (132). The remaining saddle-point equations to be solved, which can be slightly simplified by using the identity (cr(t)/~(t')), = iO(cy(t)),/OO(t'), are
m(t) = (o(t)),,
C(t,t') = (cy(t)cy(t')),,
G(t,t') = O(cy(t)),/OO(t').
(133)
5.3.3. Extracting the physics from the saddle-point equations At this stage we observe in (133) that we only need to insert functions of spin states into the effective measure (...), (rather than fields or conjugate fields), so the effective measure can again be simplified. Upon inserting (129) and (131) into the function (132) we obtain (f[{cy}]), - ~,~(0)...~(t~)Prob[{cy}]f[{cr}], with
Statistical mechanics o f recurrent neural networks H -
dynamics
671
Prob[{cy}]- ~0(c~(0))f{d~}P[{~}] tl-I[~[1 + cy(t+ l)tanh[]3h(t {cy}, {qb})]] (134) in which rCo(c~(O))- 1[1 + ~(O)mo], and h(tl{cy}, { , } ) - m(t) + O(t) + ~ Z [ G ( 1 - G)-'J(t, {)cy({) + ~ 89
(135)
tt
P[{qb)] -
exp(-I
*It)Ill-
F
1/'- ")]/', t')*/")) ~
(2 rc)(t~+l)/2det -1 L(~ - Gt)C -I (1 - G)]
(136)
(note: to predict neuron states up until time tm we only need the fields up until time tm -- 1). We recognize (134) as describing an effective single neuron, with the usual dynamics Prob[~(t + 1) - +1] - 89 + tanh[~h(t)]], but with the fields (135). This result is indeed of the form (98), with a retarded self-interaction kernel R(t, t t) and covariance matrix (d?(t)d?(t')) of the Gaussian qb(t) given by
R(t,{) -- [G(1 - G)-l](t, {),
(~(t)~({)) - [(1 - G)-IC(1 - Gt)-l](t, {). (137)
For a -+ 0 we loose all the complicated terms in the local fields, and recover the type of simple expression we found earlier for finite p: m(t + 1) = tanh[13(m(t) + 0(t))]. It can be shown [25] (space limitations prevent a demonstration in this paper) that the equilibrium solutions obtained via replica theory in replica-symmetric ansatz [26] can be recovered as those time-translation invariant solutions s of the above dynamic equations which (i) obey the parallel dynamics FDT, and (ii) obey lim~_~ G ( z ) = 0. It can also be shown that the AT [27] instability, where replica symmetry ceases to hold, corresponds to a dynamical instability in the present formalism, where so-called anomalous response sets in: l i m ~ G(z) r 0. Before we calculate the solution explicitly for the first few time-steps, we first work out the relevant averages using (134). Note that always C(t,t) = (cy2(t)), = 1 and G(t, t t) = R(t, t') = 0 for t ~< t'. As a result the covariance matrix of the Gaussian fields can be written as (qb(t)d~(t')) - [(1 - G ) - I c ( 1 - Gt)-l](t, t')
= Z
[~,s + R(t,~)]c(~,~')[~,~, + R(t',~')]
s,# >~ 0 t
= Z
t~
Z [St,~ + R(t,s)]C(s,s')[8~,,t, + R(t',s')].
s=O s'=O
8
i.e. m(t) = m, C(t, t') = C ( t -
t') and G(t, t') = G ( t -
t').
(138)
A.C.C. Coolen
672
Considering arbitrary positive integer powers of the response function immediately shows that (G~)(t,t ' ) = 0
ift'>t-g
(139)
which, in turn, gives t-/
R(t, {) - ~-~(Gt)(t, t') - Z ( G e ) ( t , {). t>0
(14o)
C=I
Similarly we obtain from (1 - G ) -~ - 1 + R that for t' i> t: (1 - G ) - l ( t , t ') - St,t,. To suppress notation we will simply put h(tl..) instead of h(t]{~}, {~}); this need not cause any ambiguity. We notice that summation over neuron variables or(s) and integration over Gaussian variables ~(s) with time arguments s higher than those occurring in the function to be averaged can always be carried out immediately, giving (for t > 0 and t' < t):
re(t) -
~ ~0(cy(0)) f{d~}P[{~}] tanh[13h(t- 1[..)] Or o(0)...o(t- l) xU~t-21 [1 + cy(s + 1) tanh[13h(s[..)]]
(141)
s-----0
G(t,t')
~lC(t,t'+
1)-
t
~
~0(~(0))/{dr
11..)]
cy(0)...t~(t- 1 ) "x
'-2~ 1 [1 + c~(s + 1)tanh[]3h(s I..)]]~ x tanh[13h(t']..)] __ H s--0
(142)
J
(which we obtain directly for t' = t - 1, and which follows for times t' < t - 1 upon using the identity cy[1- tanh2(x)] = [1 + cytanh(x)][cy- tanh(x)]). For the correlations we distinguish between t ' - t - 1 and t' < t - 1:
C(t,t-1)-
Z
rt0(cr(0))
f {dr162
ll..)]
cy(0)...cy(t-2)
t-3 ~1 [1 + cy(s + 1) tanh[13h(s I"")]], z tanh[13h(t- 21..)] 1-I
(143)
s=0
whereas for t' < t - 1 we have
C(t,t') -
~ cy(0)...cy(t- 1)
rt0(cy(0)) f{dr
tanh[13h(t - 1]..)]cy(t')
,/
xU2t-21 [1 + cy(s + 1)tanh[13h(s I..)]]. s=O
(144)
673
Statistical mechanics of recurrent neural networks H - dynamics
Let us finally work out explicitly the final macroscopic laws (141)-(144), with (135) and (136), for the first few time steps. For arbitrary times our equations will have to be evaluated numerically; we will see below, however, that this can be done in an iterative (i.e. easy) manner. At t = 0 we just have the two observables m(0) = m0 and c ( 0 , 0) = 1.
5.3.4. The first few time-steps The field at t - 0 is h(0 ..) - m0 + 0(0) + a 89 since the retarded self-interaction does not yet come into play. The distribution of ~(0) is fully characterized by its variance, which (138) claims to be (qb2(0)) -- C(0, 0) - 1. Therefore, with Dz m(1)
-
(2re)-1 e -zz2 2 dz, we immediately find (141)-(144) reducing to
/ Dz tanh[J3(m0 + 0(0) + zx/~)],
C(1,0) - m0m(1),
G(1,0)-13{1-/DztanhR[~(mo+O(O)+zv~)]}.
(145)
(146)
For the self-interaction kernel this implies, using (140), that R(1,0) = G(1,0). We now move on to t - 2. Here Eqs. (141)-(144) give us
'
m(2)-5~--~ ~,(o)
/
d*(0)dqb(1)P[qb(0),qb(1)]tanh[13h(ll..)][1 +~(0)mo],
C(2, 1) - ~ ~ ,~(o)
dqb(1) dqb(0)P[qb(0), qb(1)] tanh[j3h(l[..)]
x tanh[13h(OI..)][1+ ~(O)mo],
'
C(2,0) = ~ ~
/
{dqb}P[{qb}]tanh[~h(ll..)]~(0 )
cy(0)cy(1)
x~l [1 + r G(2, 1) - 13 1 - ~ Z ~,(0)
tanh[13h(0[..)]][1 + r dqb(0)dqb(1)P[qb(0), .(1)]
x tanhZ[[3h(ll..)][1 + r
}, )
G(2, 0) - 13 C(2, 1) - ~ Z ~(o)
dqb(0) dqb(1)P[qb(0), qb(1)] tanh[13h(ll..)]
x tanh[13h(O[..)][1 + ~(O)mo] ~ - O.
J
674
A.C.C. Coolen
We already know that (~2(0)) - l; the remaining two moments we need in order to determine P[~(0), ~(1)] follow again from (138): 1
( ~ ( 1 ) ~ ( 0 ) ) -- Z [ ~ l . s --t- ~ o . s R ( 1 , O ) ] C ( s , O )
--
C(1,0) + G(1,0),
S---0 1
1
-
Z[
ls +
+
0R(1,0)]
s=O st:l
= G2(1,0)+ 2C(0, I)G(1,0)+ 1. We now know P[~(0), ~(1)] and can work out all macroscopic objects with t - 2 explicitly, if we wish. I will not do this here in full, but only point at the emerging pattern of all calculations at a given time t depending only on macroscopic quantities that have been calculated at times t' < t, which allows for iterative solution. Let us just work out m(2) explicitly, in order to compare the first two recall overlaps re(l) and m(2) with the values found in simulations and in approximate theories. We note that calculating m(2) only requires the field ~(1), for which we found (~)2(1)) -- O2(1,0) + 2C(0, 1)G(1,0) + 1"
1 /
m(2) - ~ ~ o(0)
d , ( l ) P [ , ( 1 ) ] tanh[13(m(1) + 0(1)
+ ~G(1,0)cy(0) + ocl~(1))] [1 + cy(0)mo] I
1[1 + mo] f Dz tanh[~(m(1) + 0(1) 2 + ocG(1,0) + zv/cx[G2 (1, 0) + 2 mom(1)G(1,0) + 1])] +21 [1
-
too] f Dz tanh[[3(m(1 ) + 0(1) - aG(1 0)
+ zV/~[G2 (1, 0) + 2 mom(1)G(1,0) + 1])].
5.3.5. Exact results versus simulations and gaussian approximations I close this section on the fully connected networks with a comparison of some of the approximate theories, the (exact) generating functional formalism, and numerical simulations, for the case 0 ( t ) = 0 (no external stimuli at any time). The evolution of the recall overlap in the first two time-steps has been described as follows:
Naive Gaussian Approximation: m(1) -- / Dz tanh[13(m(0) + zv/-~)], m(2) - / Dz tanh[13(m(1) + zv~)].
Statistical mechanics of recurrent neural networks H - dynamics
675
Amari-Maginu theory: m(1) - f Dztanh[13(m(0) + zx/~)], m(2) = f Dztanh[J3(m(1) + zErO)I, Z 2 = l + 2 m ( 0 ) m ( 1 ) G + G 2, G - 1311- J" DztanhZ[13(m(0)+zx/~)]
9
Exact solution: m(1) - f Dztanh[13(m(0) + zx/~)], 1
m(2) = ~
'
I1 +
m0] i Dztanh[13(m(1) + aG + z]~ v/-~)]
+~[1 -m0]
/ Dz t a n h [ 1 3 ( m ( 1 ) - ~ O + z Z v ~ ) l ,
E 2 = l + 2 m ( 0 ) m ( 1 ) G + G 2,
[1-/Oztanh I /o(0 +z ll. We can now appreciate why the more advanced Gaussian approximation (AmariMaginu theory, [14]) works well when the system state is close to the target attractor. This theory gets the moments of the Gaussian part of the interference noise distribution at t = 1 exactly right, but not the discrete part, whereas close to the attractor both the response function G(1,0) and one of the two pre-factors 1 [1 + m0] in the exact expression for m(2) will be very small, and the latter will therefore indeed approach a Gaussian shape. One can also see why the non-Gaussian approximation of[17] made sense: in the calculation of m(2) the interference noise distribution can indeed be written as the sum of two Gaussian ones (although for t > 2 this will cease to be true). Numerical evaluation of these expressions result in explicit predictions which can be tested against numerical simulations. This is done in Fig. 8, which confirms the picture sketched above, and hints that the performance of the Gaussian approximations is indeed worse for those initial conditions which fail to trigger pattern recall.
5.4. Extremely diluted attractor networks near saturation Extremely diluted attractor networks are obtained upon choosing limN~oo c / N = 0 (while still c ---+co) in definition (100) of the Hebbian-type synapses. The disorder average now involves both the patterns with l-t > 1 and the realization of the 'wiring' variables c;j c {0, 1}. Again, in working out the key function (125) we will show that for N ---, oc the outcome can be written in terms of the macroscopic quantities (109)(111). We carry out the average over the spatial structure variables {c;j} first:
676
A.C.C.
1.0
1.0
1.0
m 0.5
Coolen
0.5
0.5 q I 4 q I
0.0
0
1
2
0.0
3
0
1
t
2
0.0
3
0
t
1
2
3
t
Fig. 8. The first few time steps in the evolution of the overlap rn(o)= N -1 ~--]icr/~] in a parallel dynamics Hopfield model with 0t = T = 0.1 and random patterns, following initial states correlated with pattern one only. Left: simulations (o) versus naive Gaussian approximation (o). Middle: simulations (o) versus advanced Gaussian approximation (AmariMaginu theory, o). Right: simulations (o) versus (exact) generating functional theory (o). All simulations were done with N -- 30,000.
~[...] - ~log
exp
--Zcij C
~i~
[~i(t)c~j(t)
ir
At this stage we have to distinguish between symmetric and asymmetric dilutions.
5.4.1. The disorder average First we deal with the case of symmetric dilution: cij over the c/j, with the distribution (101), is trivial:
i
(i
- - cij ~
9 .
C
--
~i~~j" Z[kti(t)cYj(t) + [~j(t)r
la
1+~ 9
+~-~:
(t)l
cji
for all i r j. The average
)
t
-c
.
-
rt /
~/0~
[/,;(t) ,~j (t) +/,j(t),~/(t)]
+ c0(c-~)
Statistical mechanics of recurrent neural networks H - dynamics
- . .H exp
-
i
~i~~)~ Z[f~i(t)cyj(t) + [Tj(t)eyi(t)]
~Z ~t
1
--
2cN Z
677
t
~i~t~)g Z
1
hi(t)cyj(t) -+- hj(t)cyi(t)
+(9
x~~
c
+ (9 ~
.
t
We separate in the exponent the terms where g - v in the quadratic term (being of the form }-~v.- .), and the terms with l a - 1. Note: p - ac. We also use the definitions (109)-(111) wherever we can:
~[...] = -iZa(t)k(t ) --2l otZ[q(s,t)Q(s,t) + K(s,t)K(t,s)] + (9(c- 89+ (9(c/N) t
st
+ll~
~ihi(t)][~j ~cyJ(t)J
)}
~i" ~)"g~iv~/v"[hi(s)cyj(s) + hj (s) cyi(s) ][kti(t) cyj(t) + hj( t) cyi(t) ]
,
4cNZZ~ i C j gT~v st
Our 'condensed ansatz' implies that for g > 1" N- 89 and N- 89~ i ~p/~i(t) - (9(1). Thus the first term in the exponent containing the disorder is (9(c), contributing (9(c/N) to ~-[...]. We therefore retain only the second term in the exponent. However, the same argument applies to the second term. There all contributions can be seen as uncorrelated in leading order, so that ~ir ~gr .... (9(Np), giving a nonleading (9(N -1) cumulative contribution to ~ [ . . . ] . Thus, provided limN__+~c-1 = limN~c/N = 0 (which we assumed), we have shown that the disorder average (125) is again, in leading order in N, of the form (112) (as claimed), with
Symmetric: 9 [a, k, q, Q,K] = - i a . k -
1
-~aZ[q(s,t)Q(s,t) + K(s,t)K(t,s)].
(147)
st
Next we deal with the asymmetric case (102), where Again the average over the cij is trivial; here it gives
i
= 9
,
cij
and
cji
are independent.
1]}
678
A.C.C. Coolen
= ~. .
{ ~tC I~~ 1-~
r162
• 1-~
r162
1 [~ r162~thi(t)(yj(t) ]2-~-(~(C-2)3] }
hi(t)(yj(t)-~--~c2
r162
hj(t)cri(t)
+e(cq/
(in which the horizontal bars of the two constituent lines are to be read as connected) -- H e x p i<j
i -
r C/Z[hi(t)~j(t) + (t)
-
la
] - ~1
r162
]li(t)(Yj(t)
t
1
1
c
Again we separate in the exponent the terms where g - v in the quadratic term (being of the form Y~'~,v-..), and the terms with g - 1, and use the definitions (109)(111): 1
~[. . .] -- -i Z a(t)k(t) - -~ Z q(s, t)Q(s, t) + C(c- 89+ C(c/n) d.,
t
st
+ll~
[~i ~ihi(t)l [~j ~ ~j(t)] , i#j O#v
-~i ~j -~i Cj E hi(s)(YJ(S)hi(t)(YJ(t) st
)}
.
The scaling arguments given in the symmetric case, based on our 'condensed ansatz', apply again, and tell us that the remaining terms with the disorder are of vanishing order in N. We have again shown that the disorder average (125) is, in leading order in N, of the form (112), with
Asymmetric: 1
SIq(s, t)Q(s, t).
(148)
9 [a, k, q, Q, K] - - i a - k - ~ at Z
5.4.2. Extracting the physicsfrom the saddle-point equations First we combine the above two results (147)-(148) in the following way (with A - 1 for symmetric dilution and A - 0 for asymmetric dilution): 1
9 [a, k, q, Q, K] - - i a . k - ~ot E [ q ( s , t)Q(s, t) + AK(s, st
t)K(t,s)].
(149)
Statistical mechanicsof recurrentneuralnetworks H- dynamics
679
We can now work out Eqs. (119) and (120), and use (123) to express the result at the physical saddle-point in terms of the trio {m(t), C(t, t'), G(t, t')}. For the single-time observables this gives (as with the fully connected system) h ( t ) = k ( t ) and /~(t) - a(t); for the two-time ones we find:
O(t,t I) - _li~c(t,t,),
g:(t, t') =
O(t, t') -0,
AG(t, t').
We now observe that the remainder of the derivation followed for the fully connected network can be followed with only two minor adjustments to the terms generated by Is and by Q(t,t I) a G ( l - G ) - I ~ sAG in the retarded self-interaction, and (1 - G)-IC(1 - GTj :1 ---. C in the covariance of the Gaussian noise in the effective single neuron problem. This results in the familiar saddle-point equations (133) for an effective single neuron problem, with state probabilities (134) equivalent to the dynamics Prob[~(t + 1) - + 1 ] - 1 [1 -t- tanh[13h(t)]], and in which rc0(o-(O)) -- 89[1 + cr(0)m0] and h(t]{cy}, {qb}) - m(t) + O(t) + aA E
G(t,t')cr(t') + u 89
d
e ~E,.,, 4,(t)c-'(t,t')4,(t')
(150)
(2 rt)(tm+l)/2detlC "
P[{(~}] -
5.4.3. Physics of networks with asymmetric dilution Asymmetric dilution corresponds to A = 0, i.e. there is no retarded self-interaction, and the response function no longer plays a role. In (150) we now only retain h ( t l . . . ) - m(t)+ O(t)+ a 89 with ( ~ 2 ( t ) ) - C(1, 1 ) - 1. We now find (141) simply giving
m(t + 1) -
f Z rc0(cr(0))/{d~}P[{q~}]tanh[13h(t]...)] J ~(0)...o(t) t-1 1
x I-I ~ [1 + cr(s + 1) tanh[13h(s[...)]] s=0
(151)
f Dztanh[~3(m(t) + O(t) + ZV/-~)].
Apparently this is the one case where the simple Gaussian dynamical law (95) is exact at all times. Similarly, for t > t' Eqs. (142)-(144) for correlation and response functions reduce to f d~a dqbbexp (
C(t,t')=
l doZ+dP~-2C(t-l't'-l)dOad~b'~
-~ 1-c2(t-l,t,-1) 2~tv/l -C-~--- ill---i)
j
x tanh[13(m(t' - 1) + 0(t' - 1) + qbbv/-~)],
tanh[13(m(t-1)+0(t-1) + qba~/-~)] (152)
680
A. C. C. Coolen
G ( t , t ' ) - [38t.t,+, { 1 - / D z t a n h Z [ f ~ ( m ( t
-
1)+ 0(t-1) +zx/~)]).
( 53)
Let us also inspect the stationary state re(t) = m, for 0(t) = 0. One easily proves that m = 0 as soon as T > 1, using m 2 - [3m fomdk[1 - fDztanh2[f3(k + zx/-~)]] ~< 13m2. A continuous bifurcation occurs from the m = 0 state to an m > 0 state when T = 1 - f D z t a n h 2 [ f 3 z v ~ . A paramerization of this transition line in the (a, T)plane is given by
T(x) - 1 - f Dz tanh 2(zx),
a(x) = x 2T 2(x),
x >/0.
For a = 0 we just get m = tanh(J3m) so Tc = 1. For T = 0 we obtain the m - - - e r f [ m / v / ~ ] , giving a continuous transition to m > 0 solutions 2/rt ~ 0.637. The remaining question concerns the nature of the m = 0 serting m(t) = 0(t) ---0 (for all t) into (152) tells us that C(t, t') = f [ C ( t for t > t' > 0, with 'initial conditions' C(t, O) - m(t)mo, where
d~ a d~b f[C] -
2 roy/1 --------~exp - C
(
equation at ac = state. In1, t ' - 1)]
1 , 2 + , 2 _ 2 C*a*b'~ tanh[IgV~*a] tanh[13v~*b].
In the m = 0 regime we have C(t, 0) = 0 for any t > 0, inducing C(t, t') = 0 for any t > t', due to f[0] = 0. Thus we conclude that C(t, t~) = St,t, in the m = 0 phase, i.e. this phase is paramagnetic rather than of a spin-glass type. The resulting phase diagram is given in Fig. 9, together with that of symmetric dilution (for comparison).
5.4.4. Physics of networks with symmetric dilution This is the more complicated situation. In spite of the extreme dilution, the interaction symmetry makes sure that the spins still have a sufficient number of common ancestors for complicated correlations to build up in finite time. We have h(tl{r
{~)) -- m(t) + O(t) + ot Z
G(t,t')cy(t') + ~ ( t ) ,
fl
P[{dp}] = e x p ( - 89 ~t,t' +(t) C-1 (t, ttll~)(tt)) .
(154)
(2 rt)(t'+l)/2det 89 The effective single neuron problem (134) and (154) is found to be exactly of the form found also for the Gaussian model in [1] (which, in turn, maps onto the parallel dynamics SK model [28]) with the synapses Jgj = Jo~g~j/N + Jzij/x/N (in which the zij are symmetric zero-average and unit-variance Gaussian variables, and Jig = 0 for all i), with the identification: J~v/~,
J0 ~ 1
(this becomes clear upon applying the generating functional analysis to the Gaussian model, page limitations prevent me from explicit demonstration here). Since one can
Statistical mechanics of recurrent neural networks H - dynamics
1.5
.,_
.
.
.
,. . . . .
,
.
1.5
.
1.0
1.0
0.5
0.5
681
9
-
.'
9
,
J
J
,/"
R
'/:,,I
SG
I"11 /
I I I I
/ I /
0.0
. 0.0
.
.
.
0.5
. . 1.0
.
.
.
.
~'~
0.0 1.5
0.0
,
i
,
0.5
1.0
Fig. 9. Phase diagrams of extremely diluted attractor networks. Left: asymmetric dilution, cij and cji are statistically independent. Solid line: continuous transition, separating a nonrecall (paramagnetic) region (P) from a recall region (R). The line reaches T = 0 at 0~c = 2/re ~ 0.637. Right: symmetric dilution, cij =cji for all i,j. Solid lines: continuous transitions, separating a nonrecall region (P) from a recall region (R), for cx < 1, and from a spin-glass region (SG), for ~ > 1. Dashed-dotted line: the AT instability. The R ~ S G line (calculated within RS) reaches T = 0 at czcRS = 2/re ~ 0.637. In RSB the latter is replaced by a new (dashed) line, giving a new storage capacity of uc.RSB c = 1. show that for J0 > 0 the parallel dynamics SK model gives the same equilibrium state as the sequential one, we can now immediately write d o w n the stationary solution of our dynamic equations which corresponds to the F D T regime, with q = lim~_~o~ l i m t _ ~ C(t, t + ~): q = I DztanhZ[13(m + z v / ~ ) ] '
(155)
m - I Dz tanh[13(m + z v / ~ ) ] .
These are neither identical to the equations for the fully connected Hopfield model, nor to those of the asymmetrically diluted model. Using the equivalence with the (sequential and parallel) SK model [28] we can immediately translate the phase transition lines as well, giving: SK model P-+F:
T=Jo
P---~ SG :
T=J
F ~ SG(in RS):
for J0 > J for J0 < J
T = J0(1 - q )
for T < J0
Symmetrically diluted model
T=I T=l-q
F--4 SG(inRSB) : J0 = J for T < J
a=l
AT -- line:
r 2 =
T2 =
j2
fDzcosh-4 ~[Jom + Jzv/q ]
for 0~< 1
T=v~
force> 1 for T < 1 for T < v/~
cx f D z c o s h
-4
~[m
+ Zv/-~]
where q = f Dz tanh 2 13[m + z v r ~ ]. N o t e that for T = 0 we have q = 1, so that the equation for m reduces to the one found for asymmetric dilution" m - e r f [ m / x / ~ ] . However, the phase d i a g r a m shows that the line F ~ SG is entirely in the RSB
682
A.C.C. Coolen
region and describes physically unrealistic re-entrance (as in the SK model), so that the true transition must be calculated using Parisi's replica-symmetry breaking (RSB) formalism (see e.g. [29]), giving here ac = 1. The extremely diluted models analyzed here were first studied in [30] (asymmetric dilution) and [23] (symmetric dilution). We note that it is not extreme dilution which is responsible for a drastic simplification in the macroscopic dynamics in the complex regime (i.e. close to saturation), but rather the absence of synaptic symmetry. Any finite degree of synaptic symmetry, whether in a fully connected or in an extremely diluted attractor network, immediately generates an effective retarded self-interaction in the dynamics, which is ultimately responsible for highly nontrivial 'glassy' dynamics.
6. Epilogue In this paper I have tried to explain how the techniques from nonequilibrium statistical mechanics can be used to solve the dynamics of recurrent neural networks. As in the companion paper on statics in this volume, I have restricted myself to relatively simple models, where one can most clearly see the potential and restrictions of these techniques, without being distracted by details. I have dealt with binary neurons and graded response neurons, and with fully connected and extremely diluted networks, with symmetric but also with nonsymmetric synapses. Similar calculations could have been done for neuron models which are not based on firing rates, such as coupled oscillators or integrate-and-fire type ones, see e.g. [31]. My hope is that bringing together methods and results that have so far been mostly scattered over research papers, and by presenting these in a uniform language to simplify comparison, I will have made the area somewhat more accessible to the interested outsider. At another level I hope to have compensated somewhat for the incorrect view that has sometimes surfaced in the past that statistical mechanics applies only to recurrent networks with symmetric synapses, and is therefore not likely to have a lasting impact on neuro-biological modeling. This was indeed true for equilibrium statistical mechanics, but it is not true for nonequilibrium statistical mechanics. This does not mean that there are no practical restrictions in the latter; the golden rule of there not being any free lunches is obviously also valid here. Whenever we wish to incorporate more biological details in our models, we will have to reduce our ambition to obtain exact solutions, work much harder, and turn to our computer at an earlier stage. However, the practical restrictions in dynamics are of a quantitative nature (equations tend to become more lengthy and messy), rather than of a qualitative one (in statics the issue of detailed balance decides whether or not we can at all start a calculation). The main stumbling block that remains is the issue of spatial structure. Short-range models are extremely difficult to handle, and this is likely to remain so for a long time. In statistical mechanics the state of the art in short-range models is to be able to identify phase transitions, and calculate critical exponents, but this is generally not the type of information one is interested in when studying the operation of recurrent neural networks.
Statistical mechanics of recurrent neural networks H - dynamics
683
Yet, since dynamical techniques are still far less hampered by the need to impose biologically dubious (or even unacceptable) model constraints than equilibrium techniques, and since there are now well-established and efficient methods and techniques to obtain model solutions in the form of macroscopic laws for large systems (some are exact, some are useful approximations), the future in the statistical mechanical analysis of biologically more realistic recurrent neural networks is clearly in the nonequilibrium half of the statistical mechanics playing field.
Acknowledgements It is my pleasure to thank Heinz Horner, David Sherrington and Nikos Skantzos for their direct and indirect contributions to this review.
References
1. Coolen, A.C.C. (2000) in: Handbook of Biological Physics IV: Neuro-Informatics and Neural Modelling. Elsevier Science, Amsterdam. 2. Hopfield, J.J. (1982) Proc. Natl. Acad. Sci. USA 79, 2554. 3. Riedel, U., Kfihn, R. and Van Hemmen, J.L. (1988) Phys. Rev. A 38, 1105. 4. Domany, E., Van Hemmen, J.L. and Schulten, K., eds (1991) Models of Neural Networks I. Springer, Berlin. 5. Khalil, H.K. (1992) Nonlinear Systems. MacMillan, New York. 6. Buhmann, J. and Schulten, K. (1987) Europhys. Lett. 4, 1205. 7. Coolen, A.C.C. and Ruijgrok, T.W. (1988) Phys. Rev. A 38, 4253. 8. Bernier, O. (1991) Europhys. Lett. 16, 531. 9. Van Kampen, N.G. (1992) Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam. 10. Gardiner, C.W. (1994) Handbook of Stochastic Methods. Springer, Berlin. 11. Kfihn, R., B6s, S. and Van Hemmen, J.L. (1991) Phys. Rev. A 43, 2084. 12. Castellanos, A., Coolen, A.C.C. and Viana, L. (1998) J. Phys. A: Math. Gen. 31, 6615. 13. Amari, S.I. (1977) Biol. Cybern. 26, 175. 14. Arnari, S.I. and Maginu, K. (1988) Neural Networks 1, 63. 15. Nishimori, H. and Ozeki, T. (1993) J. Phys. A: Math. Gen. 26, 859-871. 16. Okada, M. (1995) Neural Networks 8, 833. 17. Henkel, R.D. and Opper, M. (1990) Europhys. Lett. 11, 403. 18. Coolen, A.C.C. and Sherrington, D. (1994) Phys. Rev. E 49, 1921; Phys. Rev. E 49, 5906. 19. Coolen, A.C.C., Laughton, S.N. and Sherrington, D. (1996) in: Neural Information Processing Systems 8, eds D.S. Touretzky, M.C. Moser and M.E. Hasselmo. p. 252, MIT Press, Cambridge. 20. Rieger, H., Schreckenberg, M. and Zittartz, J. (1988) Z. Phys. B 72, 523. 21. Horner, H., Bormann, D., Frick, M., Kinzelbach, H. and Schmidt, A. (1989) Z. Phys. B 76, 383. 22. Kree, R. and Zippelius, A. (1991) in: Models of Neural Networks I, eds R. Domany, J.L. Van Hemmen and K. Schulten. p. 193, Springer, Berlin. 23. Watkin, T.UH. and Sherrington, D. (1991) J. Phys. A: Math. Gen. 24, 5427. 24. Dfiring, A., Coolen, A.C.C. and Sherrington, D. (1998) J. Phys. A: Math. Gen. 31, 8607. 25. Coolen, A.C.C. and Sherrington, D. (in preparation) Statistical Physics of Neural Networks. U.P., Cambridge. 26. Fontanari, J.F. and K6berle, R. (1988) J. Physique 49, 13. 27. de Almeida, J.R.L. and Thouless, D.J. (1978) J. Phys. A 11, 983. 28. Sherrington, D. and Kirkpatrick, S. (1975) Phys. Rev. Lett. 35, 1792.
684
A. C. C. Coolen
29. M+zard, M., Parisi, G. and Virasoro, M.A. (1987) Spin Glass Theory and Beyond. World Scientific, Singapore. 30. Derrida, B., Gardner, E. and Zippelius, A. (1987) Europhys. Lett. 4, 167. 31. Gerstner, W. and Van Hemmen, J.L. (1994) in: Models of Neural Networks II, eds R. Domany, J.L. Van Hemmen and K. Schulten. p. 1, Springer, Berlin.
CHAPTER 16
Topologically Ordered Neural Networks J.A. FLANAGAN Neural Network Research Center, Helsinki University of Technology, P.O. Box 5400, Fin-02015 HUT, Finland
O 2001 Elsevier Science B.V.
AN rights reserved
Handbook of Biological Physics Volume 4 , edited by F. Moss and S. Gielen
Contents
1.
The brain and topologically ordered maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
687
2.
M o d e l i n g the self-organizing process
690
3.
The self-organizing m a p (SOM)
4.
Physiological interpretation o f the S O M
..................................
..................................... ................................
5.
Theoretical analyses of t o p o l o g y preserving maps
6.
T o p o l o g y preserving maps and vector quantization
7.
Variants of the S O M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
718
8.
Generalizing the principle of self-organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
720
Abbreviations
726
References
..........................
696 701
................................................
.....................................................
686
.........................
704 715
726
I. The brain and topologically ordered maps The human brain is extraordinary in the way that it can create a perception of an external stimulus using only the electric impulses transmitted to it from the sensory organ being stimulated. In the case of a speech wave impinging on the ear we can perceive what is being said, or for light waves impinging on the retina we perceive an image. The complexity of the process of interpreting the sensory signals becomes obvious when it is realized that a sensory organ typically consists of a huge number of sensing cells. For example, there are about 100 million rods and 4.5 million cones spread over the surface of the retina in a human eye [1], each one of which can be transmitting sensory signals at the same time. What is more, it seems plausible that for the brain to form a true perception of the image impinging on the retina, then the spatial relations, which exist on the retinal surface between the rods and cones, must somehow be encoded into the sensory signals transmitted to the brain. It would follow that to correctly interpret the signals arriving from the retina, the brain itself should be able to robustly and efficiently decode these spatial relations. Understanding how such spatial relations are encoded and decoded, in an ordered manner, is an important part in trying to understand the principles of information processing in the brain. The example of the retina is only one of several, where information on the spatial relations between sensory cells is used. It has long been known that the brain is ordered into many functionally specific areas. This is especially true for the cerebral cortex in humans which is responsible for integrating sensory impulses and higher intellectual functions. Many cortical regions, such as the somatosensory areas of the cortex, are further divided into areas specific to different regions of the body, for example foot, finger, lips, etc. More recent research has shown that in areas such as the somatosensory and visual cortex there exists another form of ordering. This ordering is more sophisticated in that it is not just a case of mapping all the sensory signals from one sensory organ into the same region of the cortex, but rather the topology of the sensory cells in the sensory organ is mapped onto the same topology of receiving neurons in the brain. For example in the visual cortex, the two-dimensional retinal image is mapped to the visual cortex in such a way that spatial relationships present in the input stimulus are preserved when the image is transmitted to the visual cortex [2,3]. Such a connection is commonly referred to as topology preserving. This topological mapping is the means by which information on the spatial relations between sensory cells, is transmitted to, and decoded by the brain. How these topology preserving mappings can be formed in an unsupervised manner, and how they can be applied to processing information, form the basis of this chapter. The first question is, how does the brain achieve such a high level of unsupervised ordering of all its various structures, down as far as the formation of 687
688
J.A. Flanagan
topology preserving mappings between the sensory organs and the cortex. Historically the most widely studied case has been the formation of retino-tectal connections in fish [4]. Other studies on the retino-tectal system have shown that the first stage of forming connections is axonal outgrowth in a fiber bundle from the sensory organ towards the target area in the cortex. To form the topographic mapping then requires an initial innervation of the target cells, which is followed by the selective removal of certain synaptic connections and the selective strengthening or weakening of synaptic connections in the target structure [5,6]. Several theories have been put forward as to how the topographical order is maintained during this growth stage one of which is the control of the destination by chemical markers [7,8]. It is clear however that none of these mechanisms can fully account for the formation of topographic maps. First of all it is known that there is an important reorganization of the sensory signals before they reach the cortex. Secondly, because the mechanisms mentioned previously work independently of any external stimulation, they cannot account for the formation of topographic maps of a more abstract nature, which are ordered according to an external stimulus. For example in the auditory cortex there exists the tonotopic map in which the spatial order of cell responses corresponds to the pitch or acoustic frequency of tones perceived. The tonotopic map could be explained as an anatomical projection of the basilar membrane of the inner ear, but it should also be noted that in the auditory pathway, the order of the transmitting neurons is changed before reaching the cortex. It has also been shown that abstract topological maps can be formed according to sensory stimulation. For example, maps of the geographic surroundings have been measured in the hippocampus. The measurements were performed on a rat, which had been trained in a maze. After training it was found that the activity of cells in the hippocampus was correlated with the rats position in the maze [9]. It is believed that other kinds of topographic maps exist in the cortex, thalamus, hippocampus and other parts of the brain. Furthermore, there exist experimental results, which suggest that neuronal connections can depend on external stimulation. Experiments which involved the destruction of sensory organs, brain tissue or the deprivation of sensory stimulation at a young age, have resulted in neural connections which were not developed, and the corresponding area being occupied by other projections. The ability to form abstract topographical maps, stimulus-based topographical maps etc., suggest that there is a certain element of adaptability, or neural plasticity in the formation of these maps. Given that there is a reasonable degree of topography present at birth it would seem likely that the formation of topographic maps is not completely carried out through a combination of external stimuli and the plasticity of the neural connections. From the point of view of genetics it would seem unlikely that exact neural connections could be programmed for every neuron in the brain, given the huge number of connections. The most likely scenario would seem to be that genetic coding defines coarse topological mappings of neurons in the brain, which is then fine tuned by neural plasticity in combination with neural activity. This idea was suggested by v o n d e r Malsburg [10], and leads to the idea of self-organization of
Topologically ordered neural networks
689
topographic maps. In general, the notion of self-organization means that order can be created out of disorder without the use of a teacher or supervisor. The notion of emergent behavior is used when talking about the nonrandom, nonchaotic, complex behavior of very large spatio-temporal systems, comprising many interconnected simple units [11]. The formation of topographic maps can thus be considered as an emergent behavior in the brain. In what follows different aspects of self-organization in the brain will be described. First the initial attempts to model the formation of topographical maps in the brain will be presented. However, before presenting models for the formation of topographic maps it will be first necessary to examine the dynamical model of the neurons used. Following a discussion on different models of topographic map formation, one model in particular will be described in more detail, the self-organizing map (SOM). It just remains to justify why the brain has evolved into such a highly organized structure. In the beginning it was suggested that if the brain, based on sensory signals, is to create a true perception of the world, then the sensory signals must somehow encode spatial information. It would thus seem that at least in some cases the spatial information is not coded into the signals but is physically coded by the paths along which the sensory signals travel, by means of the formation of topology preserving maps. Why should such a physical encoding of the spatial relations between sensory signals be used over and above another form of coding? For example in the visual system, another means of mapping an image from the eye to the visual cortex could use a raster format as in television. It seems that there are other advantages to having adjacent neurons in the brain encoding adjacent sensory signals: 9 Topological ordering results in a form of collective responsibility through the fact that performance of the system as a whole does not depend on a single neuron. This ensures a graceful degradation of the system and gives a certain robustness. 9 During the processing of sensory information, it seems likely that signals from adjacent sensory neurons will be treated together. By having the neurons physically closer the interconnection distances between neurons can be reduced. 9 If spatially segregated neurons code sensory signals that are segregated in the signal space then the errors arising from any form of crosstalk would have less effect. 9 From an information processing point of view "clustering" is obtained for free. It has also been pointed out by Durbin and Mitchison [12] that the formation of topographic maps in the visual cortex can be understood in terms of dimensionreducing mappings from many-dimensional parameter spaces to the surface of the cortex. They postulate this function on the observation [13,14] that orientation selectivity and ocular dominance are mapped in a topology preserving fashion in patterns of stripes or patches in the visual cortex. Thus the mapping is from the four-dimensional space, whose dimensions are the retinotopic position, orientation and ocular dominance, onto what is considered to be a two-dimensional visual cortex. Once again the main reason why this should be desirable is that neighboring
J.A. Flanagan
690
points in parameter space are mapped close together on the cortex, and hence connection lengths between neurons are decreased.
2. Modeling the self-organizing process Before describing various attempts which have been made to model the mechanism, which leads to the formation of topology preserving maps in the brain, it is first necessary to describe models of the basic functional unit of the brain, the neuron. Describing the function of a neuron is in itself a problem. One approach is to analyze all the different phenomena in the neural cell, electrical polarization of the cell membrane, concentrations of different kinds of ions, active permeability of the membrane, etc, and all these factors are also dependent on the location in the cell. Taking all these factors into account results in a formidable array of equations involving many nonlinear differential equations, and all this just to model one neuron. With this approach in mind the Hodgkin-Huxley equations [15] describe the change of the neuron cell potential. In the other extreme the neuron is considered as a simple summing device with a thresholded output. A good overview of biologically plausible neuron models is given in the chapter by Gerstner in this book. For our purposes, we will start with the basic model of the neuron (Fig. 1) introduced by McCulloch and Pitts [16], which is the static one, where the neuron receives n input signals ~ j , j - 1 , . . . , n , which are then weighted and summed to give the input activation Ii as
Ii - ~_~ ~[ij~j,
(1)
j=l
where the la;j describe the synaptic efficiency or weights between neurons i and j. The output activity vii is given by a nonlinear function of the activity as vii = const, f(Ii - Oi),
(2)
rli = f(I i)
Fig. 1.
Static nonlinear model for a neuron, where f is a monotonically changing function with upper and lower saturation limits.
Topologically ordered neural networks
691
where f ( e ) could be the Heaviside function (i.e. f ( x ) = 1, x > O, f ( x ) = 0, otherwise), and 0i is a threshold. This computationally simple model of the neuron is used in many different artificial neural network (ANN) algorithms (e.g. the perceptron). This model represents the very basic function of the neuron, and ignores its dynamical behavior. Dynamical models of the neuron have been developed, many based on the so-called additive model, where the change in neural activity in its most basic form is given by dqi
_
at - -rig + ~
[excitatory i n p u t s ] - Z
[inhibitory inputs]
(3)
(see the chapters by Gerstner and Coolen in this book). Note the introduction of the inhibitory input term, which reflects a more complicated interaction of each neuron with other neurons than in the static neuron model. Grossberg [17] reviews a number of neural models that are versions of the additive equation. Kohonen [18] uses an even simpler model, where the neuron dynamics are treated as a form of integration of the neuron inputs, with nonlinear losses dn2 dt = Ii - y(qi),
(4)
where y(e) describes the sum of all nonlinear loss or leakage effects, and for large values of rli it should be convex. It should be kept in mind that the activity rli is in fact a frequency and as such cannot be negative. One interesting point to note about the formulation of Eq. (4) is in the stationary state (i.e. d q i / d t = 0), that, qi = v - l (Ii).
(5)
Given the assumptions on the shape of 7, it is seen that y-1 is compatible with the definition of the form of the thresholding function f ( e ) mentioned earlier. This last model will be used later on in discussing the physiological interpretation of the SOM. The models of neurons discussed above describe the dynamical behavior of their activities, where little reference has been made to the neuron weights ~tij and how they change with respect to different levels of activity. Now some examples of different learning laws are presented, which will be used later on to describe how the neuron weights change when subjected to a stimulus. Most learning laws are based on the hypothesis of Hebb [19] which expresses quite elegantly how the neural synapses are modified "When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." This hypothesis is written down in analytical form as d P i j = arli~j, dt
(6)
692
J.A. Flanagan
where ~tij is the synaptic strength or neural weight between neuron j, which transmits a signal to neuron i, ~ a scalar parameter called the learning rate, 1]i the postsynaptic activity and ~j is the presynaptic activity. Since it was originally described, the Hebbian learning principle has been modified in many ways. One of the characteristics of Hebb's law as expressed in Eq. (6) is that the synaptic weight can only increase in value as the activity is always nonnegative, and naturally there must be some saturation level. To overcome this problem Grossberg [20] introduced a passive decay term to give dgij dt = urli~J - gi].
(7)
One important class of linear associative memories pioneered by Kohonen [21], Anderson [22] and Nakano [23] is based on a Hebbian learning rule. Two other learning laws based on the Hebbian law are Oja's PCA-type learning law [24,25] and Kohonen's Riccati-type learning law [25]. The Riccati-type learning law was introduced in order to account for the influences on the plasticity of a neuron from activity in neighboring cells. Kohonen expressed the law analytically as dBij dt = P(~j - Qgij).
(8)
The scalar value P is a plasticity control term that depends on many factors, such as activities of neighboring neurons, diffuse chemical control, etc., The factor P is in fact a generalization of the presynaptic activity term ~j in the Hebb law of Eq. (6). The factor Q is introduced as a forgetting-rate functional, which is a function of the synaptic activities at the membrane of cell i. This active forgetting term prevents the neuron weight from diverging to infinity, and in fact it would be preferable if [imil] remains constant, where mi = [P/l, P i 2 , " ' , ~.lin]T, the vector of neuron weights for neuron i. To avoid the situation where gij would converge to zero with passive inputs, the parameter P must control the total learning rate, hence the form of Eq. (8). The two parameters P, Q can also be considered as describing extracellular and intracellular effects, respectively. Kohonen [25] has shown how Eq. (8) is in fact a formulation of the Riccati equation and showed that for a stochastic -- (~1, ~ 2 , ' ' ' , ~n) T the most probable trajectories of the neuron weight vector mi converge to m *i , where
m~ = I1~11v ~
(9)
and ~ is the mean of the stochastic vector ~. The PCA-type learning rule by Oja [24] is very similar to that of the Riccati type. However, the output activation is given by the linear relationship ni -
(lo)
j=l
693
Topologically ordered neural networks
and the learning law is given by dlau - ~rl (~j - rlila/j), dt i
(11)
where the term ~ j - rligij can now be interpreted in the Hebb law of Eq. (6) as the effective presynaptic activity of the neuron. Oja shows that with this learning rule, the neuron tends to extract the principal component from a stationary input sequence vector. It is seen that the Riccati-type learning law learns first-order statistics while the PCA-type learns second-order statistics and as such carries out a feature detecting role. This previous discussion gives an idea as to how a single neuron can be modeled, which is important before considering how a model of the self-organizing mechanism in the brain can be developed. For the most part the models developed have been an attempt to model the formation of topology preserving maps in the visual cortex and the first attempt at simulating this was by v o n d e r Malsburg [10]. His aim was to reproduce by simulation the results of measurements made by Hubel and Wiesel in cats and monkeys. These measurements showed selective sensitivity, of neurons in a column perpendicular to the cortical surface, to light bars with a particular orientation, and that neighboring columns tend to respond to stimuli of similar orientation [26-28]. Von der Malsburg's simulations demonstrated a learning scheme for the local ordering of feature-selective cortical cells. This model served as an inspiration for many other models of orientation selectivity in the visual cortex and overviews of these and other models can be found in [29,30]. A model for the formation of retinotopic maps was presented in a more general framework in the Willshaw-von der Malsburg model [31]. The approach was to consider two lattices of neurons, a presynaptic layer and a postsynaptic layer, with each neuron in the presynaptic layer connected to each neuron in the postsynaptic layer by a synaptic weight. Fig. 2 shows the two layers, the presynaptic and
Synaptic Layer
Post
o
~
0 0
0
0 /
0 " Synaptic Layer
Fig. 2. Illustration of Willshaw-von der Malsburg Model, showing two lattices of neurons, represented by circles, and the synaptic connections between one neuron in the presynaptic layer and all neurons in the postsynaptic layer are represented by the lines.
J.A. Flanagan
694
postsynaptic layer, each with a lattice of equal numbers of neurons. Each neuron is represented by a circle and the lines from one neuron in the presynaptic layer to the postsynaptic layer represent the synaptic connections. In the model every neuron in the presynaptic layer has a connection with every neuron in the postsynaptic layer. When subjected to an input signal the neuron weights were adapted using a Hebbian learning law, followed by renormalization. The activity of a neuron was similar to Eq. (3). The exact form of the excitatory and inhibition interactions on the learning rate between neurons in the postsynaptic layer was such, that all neurons in the adjacent neighborhood of a neuron were excitatory (positive weight connection) while those further away were inhibitory (negative weight connection). Fig. 3 illustrates a single function which defines such an interaction, normally referred to as the Mexican hat function. The effect of this excitation/inhibition is to create competition between neurons for activity. The neurons, that respond best to an input, strengthen all their neighbor's responses while decreasing that of neurons further away. After repeatedly stimulating the neurons, clusters begin to form, and the neuron weights become organized. Organization occurs even when all the initial synaptic weight values have been set to approximately the same value. This organization is shown in Fig. 4, where the stimulation is two-dimensional and is randomly chosen from the area bounded by the outer square. The synaptic weights, also two-dimensional, are plotted as points in input space, and lines connect synaptic weights of immediately neighboring neurons in the postsynaptic layer. The fact that these lines do not cross except at the points of the synaptic weight vectors, is interpreted as the weights being organized. The lettered neurons were used as polarity markers, which break the symmetry of the map and ensure that the weights
0.8 0.6 tr
~- 0.4
00 r
.o_ o
c -9 tO
0.2
o
o -0.2
-0.4 0
Fig. 3.
-5
i
0 Lattice distance
i
5
10
One-dimensional Mexican hat function used to implement excitatory and inhibitory connections.
Topologically ordered neural networks
695
Fig. 4. Typical result of Willshaw-von der Malsburg model for a 6 x 6 lattice. The outer box indicated the square which is the support from which the input samples were drawn. Each synaptic weight is plotted as a point and the synaptic weight for each neuron is connected to the synaptic weight of each of its immediate neighbors in the postsynaptic layer. converge to the correct one of the eight possible orientations. This model explains clearly what is meant by self-organization and topology preserving mappings between neuron layers. This model exhibits shortcomings in terms of its self-organizing ability. For any stimulus more than one neuron may be highly active, and given the fact, that the lateral interactions are not long range, then in very large maps with lots of neurons it is likely, that the self-organization will not be global. Another model by Amari [32,33] is based on the Willshaw-van der Malsburg model, but differs in the sense that the neurons are not considered discrete but are continuous in a neuron field. Another difference is that the lateral interconnections are modifiable. Also the inhibitory feedback in Amari's model extends over the whole network. Analyses of the associated differential equations showed that a well-organized map can be obtained from a roughly organized map. The models presented so far are seen to be limited in some sense. Apart from being computationally quite expensive during simulation, they are not very robust and self-organization is usually local. However, what these models do suggest is the mechanism which performs self-organization. The first requirement is some form of competition between the neurons. The winning neurons or the neurons, that respond maximally to an input, increase in a positive way the activity of their neighboring neurons in such a way, that their responsiveness to this type of input is increased, while at the same time decreasing the response of neurons further away. This competition and changing of the synaptic weights in the models discussed so far take place simultaneously. Kohonen realized that the two mechanisms could work independently and be based on different phenomena (Section 4). This insight led to the development of what was formerly referred to as the Kohonen neural network (KNN), and is now referred to as the self-organizing map (SOM)
J.A. Flanagan
696
[34,25]. The next section is devoted to this algorithm because of all the models of a self-organizing process, the SOM has had by far the most influence on the field of artificial neural networks and information processing. Its self-organizing ability has opened up a whole new area of research and led to many new information processing techniques and applications. It is also explained in what way the algorithm may be used to understand, how the self-organizing mechanism in the brain may actually operate. Since its conception in the early 1980s there have appeared over 3700 scientific publications in some way related to the SOM algorithm. A regularly updated bibliography of these publications is available at the following WEB addresses http://www.cis.hut.fi/nnrc/refs, http://www.icsi.berkeley.edu/jagota/NCS/. 3. The self-organizing map (SOM) The SOM algorithm describes a mechanism, which allows for the formation of globally organized topology preserving maps. Originally presented as a simple numerical algorithm, it soon became clear that the algorithm could be stated in a much more general or abstract form and could be applied in many different settings. First consider a K-dimensional lattice of N neurons, where the position of each neuron j in the lattice is given by a coordinate vector ij = (/.1"1, ij2,--., ijK). Define a metric space (I, da) and assume that ij c I, Vj. Associated with each neuron j is a D-dimensional weight vector m j - (lajl,laj2,...,lajD) x where D~> K. There is a D-dimensional input x -- (~l, ~,2,---, ~D) which is "presented" to each neuron. Fig. 5 shows an illustration of this structure for K = 2, and the neurons are represented by the circles. Define a metric space (X,d) where x c X and mi C X , 1 <~ i <~ N, and d is a measure which satisfies the usual requirements of a distance metric. The SOM algorithm is carried out in a series of discrete time steps t, and at each time an input
X Fig. 5. Illustration of the structure for the SOM, with a two-dimensional lattice of neurons represented by circles. The input x is shown connected to each neuron by lines which represent the neural weights mi.
697
Topologically ordered neural networks
signal x(t) is taken and d(x, mi) is evaluated for each i. This step can be interpreted as a measure of the activity of a neuron in response to the input, the smaller the measure for the neuron the greater its activity. In the SOM algorithm the neuron with the greatest level of activity is called the winner v(t), formally given by
v(t) - arg mind(x(t) mi(t)).
(12)
l <~i<~N
This is an important difference compared to the self-organizing models discussed in the previous section, where there could be multiple maximally responding neurons. In the SOM there is a single unique winner neuron for each input. This principle of one winning neuron is commonly referred to as the winner take all (WTA). The winner neuron is then used to define the change in the values of the neuron weights. The general principle is to change the weight values such that d(x(t),mi(t+ 1 ) ) < d ( x ( t ) , m i ( t ) ) . If self-organization is to occur, then some weighting of these updates, dependent on the distance da(iv(t),ij) on the neuron lattice, between the winning neuron and the other neurons j must be used. Referring back to the excitation/inhibition Mexican hat function used in the models of the previous section, a similar type function h is used in the SOM. However, in this case h acts as a weighting function in the control of the synaptic plasticity during learning. Unlike the Mexican hat function, it does not describe the feedback activity of the signals. The function h is commonly referred to as the neighborhood function. A typical function h is shown in Fig. 6, where the strongest weighting is given to the update of the weights, whose neurons are closest to the winning neuron on the lattice. Note that unlike the Mexican hat function of Fig. 3, the function h is never negative. The update of the neuron weights is formally given as
mi(t + 1) = mi(t) -- ot(t)h(da(v(t),i))Smi(t),
(13)
where or(t) ~ O, t ~ c~ is a gain factor and, (14)
6mi(t) = ~;Vmid(x, mi),
0.8 >
0.6
~" 0.4 0.2
~ Fig. 6.
da(v,i)
U
/3
lo
Plot of a typical neighborhood function h(da(v,i)).
J.A. Flanagan
698
where e is a constant, sufficiently small to ensure that d(6mi(t), O)<<,d(x(t),mi(t)) V t and where ~Tmi represents the gradient relative to m;. Intuitively, it can be seen that the effect of the algorithm is to cluster the weights of neighboring neurons together, which eventually leads to a global self-organization of the weights. To understand what is meant by the self-organized state of the neuron weights, consider a SOM with K - - D = 2 and x uniformly distributed on the unit square. The metrics d, da in this case are taken to be the Euclidean distances. Fig. 7 shows a series of plots of the neuron weight vectors, plotted on the unit square, the support of the input signal. The lines in the plot join the weight vectors of adjacent neurons on the neuron lattice. Fig. 7a shows a random initialization of the neuron weights, after 10 iterations Fig. 7b shows the weights converging to the center of the support. In Fig. 7c after 100 iterations the weights are already approaching an organized state. Finally after 100,000 iterations, Fig. 7d shows the weights in an organized configuration spreading out over the support of the input. In Fig. 7d the meaning of topographic order is quite clear with none of the lines joining the weight values intersecting. This means that the neuron weights are organized in a similar fashion to the way the neurons are ordered on the lattice. This example represents a very simple case of the SOM. Fig. 8a shows a plot of the neuron weights after 50,000 training iterations for an SOM with a two-dimensional input and a one-dimensional neuron lattice. The curve shown approximates that of a
Fig. 7.
(a) Initial state.
(b) After 10 iterations.
(c) After loo iterations.
(d) After 100kiterations.
Plot of the neuron weights on the square support of the input signal, for a twodimensional SOM.
Topologically ordered neural networks
699
(a)
(b)
Fig. 8. (a) Plot of the neuron weights on the square support of the input signal, for an SOM with a two-dimensional input and one-dimensional neuron lattice. (b) Two-dimensional SOM with a topological fault. Peano space filling curve [25], and shows the dimension reducing ability of the SOM, by mapping a two-dimensional space onto a one-dimensional lattice of neurons. Once again in this example the weights have reached an organized configuration, although in this case the definition of organized is more difficult to describe in general terms than it is to understand it intuitively. This problem of the definition of the organized state will be discussed in more detail in Section 5. Figure 8b shows an example of a topological defect for a two-dimensional SOM, where the weights can be considered to be locally organized, but not globally organized, since there is a twist in the distribution of the weights. This effect is sometimes referred to as the butterfly effect. These examples show a second characteristic of the SOM algorithm. Not only does it form an organized mapping but it tends to "spread" itself out or regresses onto the probability distribution of the input signal. This characteristic in itself is very important in applications and means that the SOM can be used as a vector quantizer (VQ) [35]. This application will be analyzed in more detail in Section 6. For now it is enough to say that in the VQ application the neuron weight vector mj is used as a codebook vector to quantize any input signal which belongs to its Voronoi tessellation flj, where, ~y = {x: d(x, mj) < d(x, mk)
Vk r j}.
(15)
The organization and vector quantization tendencies of the SOM require somewhat contradictory conditions. To arrive at a stage, where the SOM is globally organized and forms a good approximation of the input probability distribution, requires knowing a few "rules of thumb", gained from experience. These rules are quite robust but not perfect. The most important factor influencing the ability of the SOM to form globally organized mappings is the neighborhood function. A typically used form of the neighborhood function is Gaussian in nature
h(do(v,i)) -- { exp( - d~(v'i)~ ~2 } 0
if da(v,i)~ < W, otherwise.
(16)
700
J.A. Flanagan
Some known effects of the neighborhood function will be discussed later on, but for now it is enough to say that if the width W of the neighborhood is too small, then the chances of global ordering happening are decreased. Another important factor in achieving an organized state is that the gain function a(t) does not decrease too rapidly. If its value becomes too small too quickly, then the SOM risks converging to a nonorganized state. On the other hand, the wider the neighborhood function, the stronger its clustering effect, which tends to pull the neuron weights together. If the neuron weights are to spread out to form a true representation of the input space, then the influence of the neighborhood function must be decreased, that is W ~ 0, t ---, c~. When W - 0 the SOM algorithm reduces to a standard VQ algorithm. Similarly the gain function 0t(t) must reach small values to reduce the statistical variations in the value of the neuron weights, allowing them to converge to a state of optimal representation of the input distribution. Thus the two objectives, formation of a topologically ordered map and the representation of the probability distribution of the input require opposing conditions. Normally a compromise is reached, where the training of the SOM is divided into two phases: (a) The ordering phase, where large W and 0t are used to allow for topological ordering of the weights. (b) The convergence phase, where W is decreased towards 0, and 0t is small and decreases slowly to 0. This scheme works well in general, because once the neuron weights reach an organized state there is a strong tendency for them to remain in this organized state, even during phase (b) of training. A semi-empirical rule has been given by Mulier and Cherkassky [36] for the average optimum gain, ~(t) as Qt(t) -
A t+B
(17)
for A, B suitably chosen constants. The idea is that earlier and later input values are taken into account with approximately similar average weighting. This condition satisfies the Robbins-Monro conditions to be discussed in Section 5. The SOM presented here was for a very general case, where the activity of a neuron in response to an input was calculated in terms of a distance between the input vector and the neuron weight vector. There is another method of determining the neuron, which responds maximally to the input vector by measuring instead the correlation between the neuron weight vector and the input vector. The winner neuron is defined by v(t) -
arg max x T(t) 9mi(t) l <~i <~N
(18)
and the dot-product is used as a measure of the correlation. Using the dot-product however means that the input needs to be normalized before being used. From the practical point of view this is a disadvantage as it requires extra processing. However, it is an advantage from the numerical accuracy point of view as the dynamic range is limited. Using the dot-product as a measure of activity also means, that to be compatible, the update rule for the neuron weights must be modified to
701
Topologically ordered neural networks mi(t)+&(t)x(t) m i ( t + 1) --
[Imi(t)+~*(t)x(t)[I
if l i - v(t)l < W,
mi(t)
otherwise,
(19)
where now the gain function 0 < Gt*(t) < ~ . While the matching criterion requires a normalization of the input, the dot-product can be very quickly carried out in analog or optical systems. It may also have a connection to physiological processes, which is discussed in the next section.
4. Physiological interpretation of the SOM At this stage the fundamental mechanism of self-organization has been discussed and examples of different attempts to model this process have been given. Finally the SOM algorithm was presented as an idealized theory of the self-organizing principle, and as an effective practical computing method with no reference as to how it might be physiologically implemented. In this section an argument by Kohonen [37,38] for the possible existence of a physiological implementation of the idealized SOM algorithm is presented. As already discussed the SOM algorithm was conceived to achieve what previous, biology-based models of the self-organizing mechanism could not, and that was a global, robust organization of the synoptic strengths or neuron weights. Given that the SOM achieves its goal of global self-organization, the next logical question is; given the SOM, what can this tell us about the physiology of the biological selforganizing mechanism? It has already been stated that the reason for the global selforganizing ability of the SOM is the fact, that compared to other models, there is a single neuron chosen which has maximum response to an input and that the change of synaptic weights for each neuron depends only on this input and on the physical position of the neurons with respect to this winner. Can this mechanism help in the understanding of the biological self-organizing mechanism, and can it suggest mechanisms of learning in the brain hitherto unknown? To bridge the gap between the idealized model of the SOM and a biological mechanism, Kohonen [37] presented possible means by which the SOM algorithm could be implemented using known physiological components. The first step in the SOM process is to define a winner, or the winner take all (WTA) mechanism. The neuron model used by Kohonen is that of Eq. (4)
drli = Ii - 7(q;)dt
(20)
To model the WTA function the neurons must be allowed to interact with each other when an input signal is present. As before, there are two kinds of input to each neuron, an external input and lateral feedback between neurons. The activity due to inputs is written as i, - i ; +
(21)
702
J.A. Flanagan
where I e is due to the external inputs and can be simply described by I e -- x "r
mi - ~
l.tij~j
(22)
j=l
and ff is due to the laterally connected neurons and given by 1f - ~
(23)
gifflj.
j= 1
The coefficients 9;j E R are the effective lateral connection strengths of the cells. In Kohonen's analysis these were constrained such that gii > 0 and they have the same value V i. Also for all i , j with i :/= j then 9ij < O, lYij[ > 19iil and the 0ij are mutually equal. In a further study by Kaski and Kohonen [38] it was shown to be possible to implement these lateral interactions with interneurons whose dynamics are also described by Eq. (20). Using this model they have shown that starting from arbitrary initial positive values of the synaptic weight vector mi(0) and zero initial activity of all the neurons, the output activity rl~. of the neuron for which x T 9!11i is maximum (i.e. the winner neuron), converges to an asymptotically high value, whereas the activity rl;, i r v of all the other neurons converges to zero. This happens in a robust manner for a persistent input. Hence, a unique winning neuron is obtained and this model performs a W T A operation. In the case of a biological neural network however, the neurons must be able to respond to different inputs and there must be a reset of the neuron activities before the presentation of a new input. In Kohonen's model this reset is carried out by local, slow inhibitory interneurons with output variable ~i- The dynamic model of Eq. (20) can be used for these interneurons but for simplification purposes it is written as
dt
= arli-
O,
(24)
where a, 0 are scalar constants. This leads to a modification of the dynamic equation of the principal neurons in Eq. (20). Including the decay term ~i it reads as dqi = li - a d p i - 7(qi). dt
(25)
The result of this system of two coupled differential equations, which describes the dynamics of the neuron activity, is a W T A circuit with an automatic reset function. The system operates in cycles, where each cycle corresponds to a new input and is the equivalent of one iteration in the SOM algorithm. Hence this argument has suggested that a biological W T A circuit is not physiologically infeasible. Further discussion on lateral connections and their use in modeling the self-organizing process can be found in [39]. The next step in the SOM algorithm after the selection of the winner neuron is the adaptive modification of the synaptic weights. Once again Hebb's law is used as the basis for this update. However, as pointed out previously, Hebb's law in its simplest form is not entirely feasible from a biological
Topologically ordered neural networks
703
point of view. The greatest drawback with Hebb's law for synaptic updates is that the output signal of a neuron must be propagated back to the synapses, which, if it is to happen intracellularly, would not be very efficient. Kohonen proposed a modification to Hebbs's law such that the change of the synaptic weights is not directly effected by the postsynaptic activity but by another agent which is directly related to the neural activity. What he proposed was a chemical substance, rapidly diffusing extracellularly from the site of neuron activity to the synapses of adjacent neurons. These chemical agents then effect the level of adaptation of the synaptic strengths, in the same way that the neighborhood function is used in the adaptation of the weights in the SOM algorithm. These agents must be propagated extracellularly because only in that way can they reach the synapses quickly. As pointed out by Kohonen, It is necessary that such agents have a short decaytime, approximately of the same order of magnitude as one WTA cycle, because they must disappear before a new input becomes effective The existence of such extracellular agents is not completely out of tune with what is known in neurophysiology as it has recently been reported that nitric oxide (NO) is produced in proportion to the postsynaptic potential and that it controls synaptic plasticity. Also it is present extracellularly and is propagated quite quickly [40]. Although Kohonen never explicitly assumed that the chemical agent was NO, the effect of such a chemical agent can be included into the model for the update of the synaptic weights using the dot-product formulation of the SOM algorithm given in Eq. (19). Using a Taylor series expansion Eq. (19) can be shown to reduce to,
mi(t + 1) ~ mi(t) + h(da(i,v(t)))Ix(t ) - mi(t)mT(t)x(t)].
(26)
It should be noted that this equation has a tendency to normalize the weight values m;. This equation can now be compared to the Riccati-type learning law expressed in Eq. (8) which is a biologically inspired model based on Hebb's law. The plasticity parameter P in the Riccati-type learning law corresponds to the factor h(da(i, v(t))). It would thus seem that the idealized dot-product SOM rule reduces to a physiologically feasible model for the plasticity of the synapse using the idea of a neighborhood function implemented by diffuse chemical agents. A "physiological SOM" based on Eq. (25) for the activity of a neuron and Eq. (8) for the synaptic weight update was simulated by Kohonen [25] for a two-dimensional lattice of neurons and a twodimensional input. The neighborhood function was obtained from the solution of a diffusion equation, which modeled the extracellular diffusion of a chemical substance from the region of the winning neuron. The result was the formation of topographic maps similar to those presented in Fig. 7, suggesting that the SOM algorithm may be biologically feasible. This implementation can be summarized as [37]: 9 A very simple nonlinear dynamic model for a neuron. 9 A very simple laterally connected network model that implements an effective WTA function. 9 The automatic resetting of the WTA function by local integrating inhibitory loops.
704
J.A. Flanagan
9 A modified Hebb hypothesis, that is physiologically possible and normalizes the synaptic vectors. 9 The implementation of the neighborhood function h(da(i,v(t))) by diffuse chemical agents. Despite the fact that it is possible to find a plausible physiological framework which could implement the idealized SOM algorithm, the theory was not meant to be definitive, but to show that the concept of the physiological SOM is plausible. There are some conflicts which need to be solved or explained in another way. For example in the WTA function, it has been assumed that the inhibitory lateral connections connect all neurons. This is not entirely true as has been shown in [41,42]. However, it may also be possible that there are other inhibition mechanisms, for example a global inhibition carried out by a diffuse chemical inhibitory agent, whose effect is proportional to the sum of all excitations. In terms of the rule for adapting the synaptic strengths, chemical agents were proposed by Kohonen as extracellular agents capable of implementing a neighborhood function. The first problem with chemical agents implementing the neighborhood function is that, for example, in the case of NO studied in [43], simulations show the range of diffusion of NO is constant at about 100 ~tm which would not be consistent with the shrinking neighborhood of the SOM. A second problem with this model is that, it inherently assumes the time scale of the dynamics of diffusion and decay of the chemical agents is of the same order as the time scale of the dynamics of the action potential and synaptic connections. However, the time necessary for known chemical agents to diffuse and decay is of the order of seconds [44], and the changes in the action potential and synaptic connections happen over a time scale of the order of 10's of milliseconds. Clearly there is at least an order of magnitude which needs to be accounted for. In conclusion, it would seem that even if the idealized SOM algorithm can provide a theoretical starting point for understanding how the self-organizing mechanism operates in a biological system such as the cortex, there still remains plenty of work to determine what the real mechanisms are. The principles of the SOM have also been used in the receptive-field laterally interconnected synergetically self-organizing map (RF-LISSOM) model by Miikkulainen [45], which specifically aims to model self-organizing mechanisms in the visual cortex. In this model the adaptation of lateral connections is explicitly taken into account.
5. Theoretical analyses of topology preserving maps The analysis of topology preserving maps is still somewhat in its infancy, and is far from being a unified theory. Grossberg has provided detailed analyses of his adaptive resonance theory (ART) [46], which is very much used as a model of selforganization in a biological setting, as well as having practical applications. However, the automatic emergence of topological ordering in the ART has never been fully demonstrated. As mentioned previously, Amari [32] also provided an analysis
Topologically ordered neural networks
705
of self-organization in his model of nerve fields. The model of self-organization, which has received most analysis is perhaps the SOM. Reasons for this include the simplicity of the algorithm and its widespread practical use. For the most part analyses of the SOM tend to disregard any biological factors which may be relevant. Over the years variants of the SOM with no biological significance have appeared, because they either allow for an easier theoretical analysis or they produce a behavior, which is seen to be more advantageous than that produced by the standard SOM. Given that the SOM incorporates the fundamental mechanisms necessary for the formation of topology preserving maps, it would seem that if an understanding of the information processing principles, which are used in such areas of the brain as the cortex, are to be understood, then understanding how the SOM processes information would seem like a very good starting point. It is with this in mind that this section details some of the results coming from more theoretical analyses of the SOM algorithm. Until now the terms "self-organizing maps" and "topology preservation" have not been clearly defined. In the case of a one-dimensional input and one-dimensional neuron lattice it will be shown later that the definition of self-organization is quite trivial. However, for high-dimensional inputs with similar or lowerdimensional neuron lattices, the meaning of "self-organized" or "topology preserving" are much more difficult to define. In fact the lack of general, workable definitions of these terms has probably acted as a brake to completing a general analysis of the self-organizing process. It must be said that in terms of biological self-organization, the definition of topology preservation could be limited to the two-dimensional case, although it was discussed in Section 1 how in such areas as the visual cortex, the mapping is from a four-dimensional input space to a twodimensional neuron lattice. However from the general information processing point of view the general dimension case is more interesting. Many measures of topology preservation have been proposed, all of which are not just for the SOM. Some examples include: (a) The topographic product P [47], which relates the sequencing of the input space neighbors to the output space neighbors, for each neuron. (b) Spearman's 9 c [-1, 1], which has been used in [48], where 9 = 1 indicates perfect topology preservation. (c) Zrehen [49] uses a measure based on a geometrical argument. (d) The Goodhill measure [50] applies a cost functional, which is a function of the products of the distance metrics between neighboring weight vectors and their positions in the neuron lattice. (e) In [51,52] the definition of topology preservation is based on the continuity of a mathematical mapping between two topological spaces, which are appropriately defined in terms of the SOM. In what follows, in the discussion of the results of different analyses of the dynamics and behavior of the SOM algorithm, it will be noticeable that these definitions and measures of topology preservation are not used. This suggests that there is some way to go before a complete understanding is achieved, of how the dynamics of self-organization and topology preservation are related. The learning process which occurs in the brain is a dynamic process. Stimuli are presented and the synaptic weights are adjusted accordingly. Thus all models of
706
J.A. Flanagan
learning are dynamical in nature and can be described by time-dependent differential equations. The SOM is no exception, but there are two factors which add to the complexity of its behavior. First, its dynamics are nonlinear in that there is a WTA function. Second, there is a spatial component which is incorporated into the process through the neighborhood function. In dynamical linear systems there exist many well established, general, techniques which allow the dynamics of the system to be understood by simply analyzing its stationary points. In the case of nonlinear dynamical systems, this is not true as the most interesting behavior occurs far from equilibrium. This poses problems, first of all for building models of a self-organizing process and secondly for analyzing them. The SOM algorithm is no exception to this rule. In most cases the form of the SOM algorithm, which has been analyzed, is the one where the metric d(.) is the Euclidean metric and the update Eq. (13) can be written as m i ( t + 1) = mi(t) + ~ x ( t ) h ( d a ( v ( t ) , i ) ) ( x ( t ) - mi(t)).
(27)
The time series of stimuli x(t) used to train the map is assumed to be random and i.i.d (independent and identically distributed), and described by a probability density function (pdf)p. The form of Eq. (27) means that the neuron weight vector M(t) = (ml (t),mz(t),..., raN(t)) x
(28)
can be treated as a stochastic process [53]. Because its value at time t + 1 depends only on its current value M(t) and the current input x(t), it can be considered as a first-order Markov process [54]. For the purposes of analyzing the SOM algorithm, it is generally viewed as consisting of two phases, the ordering phase and the convergence phase. These two phases are normally analyzed separately and in different ways. In some respects the analysis of the convergence phase is easier than that of the ordering phase as it is assumed that the weights are already in an organized configuration. It is perhaps interesting to describe the different techniques which have been used to analyze the SOM and then mention what results have been obtained using each technique. Where possible it is explained how the basic SOM algorithm may have been varied to allow for a more complete analysis of a selforganizing process. The three main techniques which have been used with varying degrees of success to analyze the SOM are: (1) The Markov chain method, (2) Stochastic approximation [55,56], (3) Fokker-Planck Eq. [57]. The Markov chain method can be used as a tool to prove self-organization, by showing that there exist, with positive probability, sets of samples of inputs (x(0), x(1),...), which take the neuron weights from all initial conditions M(0) to an organized configuration 9 , in a finite time. For the one-dimensional SOM with a one-dimensional input (i.e. K ---D = 1) the organized configuration is given by -- { M ' m l < m2 < . - . <
mu} U { M "
ml > m2 > . . - >
ms},
(29)
and it is known to be absorbing, which means that once the weights are in this configuration they can never leave it. The implication is that if the sets of samples of
Topologically ordered neural networks
707
positive probability exist, then the Markov chain reaches an absorbing configuration in a finite time and hence the neuron weights organize with probability one. Unfortunately for any other values of K,D there is no known absorbing configuration for the neuron weights, and it is widely accepted that none exist, which means the Markov chain is irreducible. This in turn means it is only possible to show that the weights can reach an organized configuration in a finite time with positive probability, but it is not possible to state that they reach an organized configuration with probability one. Cottrell and Fort [58] produced the first theoretical proof of self-organization in the SOM, using the Markov chain method. Their proof was for a one-dimensional SOM with a one-dimensional input x (i.e. K = D = 1) uniformly distributed on [0, 1] and for a neighborhood function whose width W = 1 (i.e. see Eq. (16)) with a constant gain factor a. There was also a restriction on the initial conditions in that it was assumed mi(O) r mj(O), i r j. This initial proof has since been generalized, also using the Markov chain method, for different conditions, but still with K = D -- 1. Erwin et al. [59] assuming that the neighborhood function was monotonically decreasing, i.e. that
h(da(i,j)) < h(da(i,k))
for
da(i,j) > da(i,k)
(30)
and with W = N, were able to outline a similar type proof of organization for a continuous distribution of inputs. Bouton and Pag6s [60] extended Cottrell and Fort's proof to the case of a nonuniform input probability distribution. However, there is still the condition that the diffuse component pc of the probability distribution P of the input x, must be such that its support has a nonempty interior. This in effect means that the proof does not cover the case where the support of P is a set of discrete points. Flanagan [61] has taken a different approach to the same problem, where a set of general conditions are stated which the support of P must satisfy along with the neighborhood function. In [61] it is shown that for W - N and for a monotonically decreasing neighborhood function, the neuron weights self-organize, even if the support of P only consists of two or more discrete points. This proof also includes a proof of the case of diffuse P, and by using a monotonically decreasing neighborhood function the proof even applies to the case where m;(0) = mj(0), i r j. This problem of m~(0) -- mj(0), i ~ j was overcome by Sadeghi [62,63] in a different manner, when he redefined the winner neuron. By redefining the winner he was able to prove, with probability one, self-organization of the neuron weights from all initial conditions and for any decreasing neighborhood function. There is a restriction on the probability distribution P, which must be continuous with respect to the Lebesgue measure, and once again this excludes the case of the support of P being a set of discrete points. Flanagan in [64,65] generalized the proof of [61] for any W and showed that if P is not Lebesgue continuous and its support consists of a set of m discrete points, then sufficient conditions for self-organization to occur require at least that, N
~< log2 m.
(31)
708
J.A. Flanagan
Note that the proofs referred to so far all apply to the K = D = 1 case. What about the cases for K -r 1 and/or D -r 1? There are several problems to analyzing the selforganizing process in this situation. The first is that there is no definition of an organized configuration in higher dimensions which can be easily used in the framework of the Markov chain method. Secondly, even if there does exist a welldefined organized configuration, as mentioned earlier, it is probably not an absorbing configuration and hence the Markov chain is not reducible. Flanagan [61] has shown in a particular case with K - D f> 1 that, by defining what might be considered an intuitively satisfying organized configuration, the neuron weights will reach this organized configuration in a finite time with positive probability which may be less than 1. Fort and Pages [66] have shown in the K - D - 2 case, that for W = 1, the exit time of the weights from this configuration is finite with positive probability. These two analyses of self-organization in the SOM for the multidimensional case using the Markov chain method are the only ones so far, to our knowledge. The ergodicity of the Markov process, as given by the multidimensional SOM, and its convergence to an invariant probability distribution have been analyzed by Sadeghi [63]. Fort and Pag6s [66] have described different "strengths" of self-organization for the general SOM, where they consider a map to be organized if and only if the Voronoi tessellations of the closest neighboring neurons are connected. Their definitions are as follows. Definition 1 (strong organization). There is a strong organization if there exists a set of organized states S such that, 9 S is an absorbing class of the Markov chain M(t). 9 The entering time in S is almost surely finite, starting from any random weight vectors. This definition implies in the K - - D = 1 case, that there is strong organization. The next definition uses ideas from stochastic approximation theory which will be discussed next. Definition 2 (Weak organization). There is a weak organization, if there ex&ts a set of organized states S such that all the possible attracting equilibrium points of the associated stochastic approximation ODE's belong to the set S. A second method used in the analysis of stochastic processes and which has been applied with some success to the analysis of the SOM, is the Stochastic Approximation Method [55,56], which here will also be referred to as the ordinary differential equation (ODE) method. As the name suggests, there is a set of ODEs which are generated from the stochastic process to be analyzed. The general approach of the O D E method is to average, in an appropriate manner, the stochastic recursive equations over all possible input patterns, which results in a set of ODEs. If the process satisfies certain conditions, then the stable stationary points of the ODEs represent the asymptotic limit points of the process. This means that once the ODEs for the stochastic process have been attained (which is not always a trivial matter),
Topologically ordered neural networks
709
then analyzing the convergence properties of the stochastic process becomes an exercise in the analysis of a set of deterministic ODEs. In resume, given the stochastic process M(t + 1) = M(t) + ~(t)SM(M(t), x(t)),
(32)
then a set of ODEs associated with this process is given by dM = f(SM), dt
(33)
where the function f is an ensemble average of 8M and t is a pseudo-time variable. It has been shown by Ljung [55] that all locally asymptotically stable stationary points M ~ of the set of ODEs in Eq. (33) are possible, stable stationary points of the stochastic process M(t). One of the most useful results is embodied in the Kushner-Clark theorem [56], which states that given M(t) is bounded and equicontinuous, and given that the gain function 0t(t) satisfies the Robbins-Monro conditions: 0(3
Z
or(i)
i=0
(34)
~-~ ~2(i) < oc, i=0
then M ( t ) ~ M~,t--+ 2 , with probability one, given that M(t) visits a compact subset of the basin of attraction of M ~ infinitely often. It is now interesting to examine the difference between the ODE method and the Markov Chain method used previously when it comes to analyzing self-organization. The main difference probably is that the Markov Chain method requires a priori knowledge of what you want to show. For example, to prove self-organization, it is first necessary to know what an organized configuration is. Using the ODE method no such a priori knowledge is required as it is known that the process can only converge to stable stationary points of the associated ODEs. This would suggest that the definitions for self-organization and topology preservation would result naturally from knowledge of the configuration of the stationary points of the ODEs associated with the SOM. This sort of idea is used in Definition 2 of weak organization. A second difference is that the ODE method deals with the average paths followed by the stochastic process, while the Markov chain method deals with a subset of possible paths of the process. In some sense this means that the ODE method gives a more general picture of how the stochastic process evolves. Finally, the ODE method deals with convergence and the result of eliminating statistical variations by letting the gain function 0t(t)~ 0, t ~ c~, whereas for the Markov chain method, at least in the analysis of the self-organization phase of the SOM, the gain function must be assumed to be constant or lower bounded to achieve a result.
710
J.A. Flanagan
No analysis of the self-organization phase of the SOM for a(t) ~ 0, t ~ e~ exists in the context of the Markov chain method. The principle of applying the O D E is quite straightforward, although the implementation of the method can be quite difficult because averaging usually involves an integration or summation. For the SOM the ODEs can be written more specifically as dmi_ fifn h(da(i,j))(x-mi)dP(x), d~ j=l ./
(35)
where f~/is the Voronoi tessellation of neuron j. For a one-dimensional input, the Voronoi tessellation corresponds to an interval on the line. For a higher-dimensional input it corresponds to a polytope and the bounding hyperplanes are functions of the weights mi. For each configuration of the neuron weights there is a different formulation for the ODEs associated with the process. These factors complicate the analysis but some general results have been obtained for the convergence phase of learning in the SOM. The following theorem, based on results obtained for a K = D = 1 SOM using the O D E method, and taken from [67] combines the results of [58,66,68]. Theorem 1. Assume that: 9 ~(t) E (0, 1) satisfies the inequalities of Eq. (34). 9 The neighborhood function is such that h(i + 1) < h ( i ) f o r some i < ( N - 1)/2. 9 The input distribution P has a density p such that p > 0 on ]0, 1[ and ln(p) is strictly concave (or only concave, with limo+p and liml-p positive). Then, 9 The mean function f ( S M ) has a unique zero M s in 9 +. 9 The dynamical system dM/d'c = f ( S M ) is cooperative on 9 + (i.e. the nondiagonal elements of ~7f (SM) are positive). 9 M ~ is attracting. S o / f M ( O ) C ~+, M(t) ---, M ~ , almost surely. Where 9 + c ~ is the configuration 9 + = {M: ml < m2 < " - < raN}. The condition of log-concavity of the distribution includes all the usual (truncated) probability distributions (uniform, exponential, gamma distribution with parameter ~> 1). The nature of cooperative dynamical systems is discussed by Hirsch [69]. The O D E method has also been applied to the analysis of the higher-dimensional SOMs, but the results are more restricted. In [70] the ODEs for an SOM with K = D = d >~ 1 have been analyzed, where the probability distribution of the inputs is independent in each coordinate and P = 1~ | P2... | Pd, with [0, 1] as the support of each P~. The neighborhood function is also a product function and corresponds, for example, to the eight nearest neighbors when d = 2. The d-dimensional neuron lattice is defined by I =I1 • I2 • --. x Ia, a d-dimensional lattice with Ij = { 1 , 2 , . . . , N j } , 1 <<,j<<,d, and Nj is the number of neurons along the jth dimension of the lattice. If for 1 <<,j<~d, given m~i/, 1 <~ij<~Nj is a stationary point for the
Topologically ordered neural networks
711
one-dimensional case, and let M ~ = (m~ijj, 1 <~ij <~Nj, 1 <~j<~d), then the following was shown in [70]. Theorem 2. 9 M ~ are stationary points of the ODEs in the d-dimensional case. 9 For d = 2, if P1, P2 have strictly positive densities pl,p2 on [0, 1], and if the neighborhood function is strictly decreasing, then Moo is not stable if N1 is large enough and if the ratio N1/N2 is large (or small) enough (i.e. N1 ~ +oe and N1/N2 ~ +oe or 0). 9 For d = 2, if P1,P2 have strictly positive densities pl,P2 on [0, 1], and the neighborhood function is such that W = O, then Moo is stable if N1,N2 ~<2, and is not stable in any other case. What in effect this theorem shows is that the ratio of the number of neurons along each dimension of the lattice can effect the stability of the stationary points. From the vector quantization point of view it shows, that the product of one-dimensional quantizers does not give the correct vectorial quantization. Using the O D E method the dimension selection effect found by applying the Fokker-Planck equation, to be discussed later, to the SOM was proved mathematical|y [70]. Other results have been obtained in the W = 0 case but these will be treated later in the general context of using the SOM as a vector quantizer. The O D E method, as already stated, is most suited to analyzing the convergence phase of training in the SOM. By modifying the SOM algorithm Flanagan [71] has shown that it is possible to completely analyze a self-organizing process using the ODE method. Similar algorithms and less complete analyses have been proposed in [72,73]. In [71] the algorithm analyzed uses the exact same structure as the SOM, with the only difference being that there is no input signal x. Rather, each neuron i has a constant probability pi of being chosen as a winner. The neuron weights are then updated as,
mi(t + 1) = mi(t) + cx(t)h(da(i, v(t)))(mv(t) - mi(t)),
(36)
where v(t) is the index of the neuron chosen as winner at time t. Note, that the difference between this form of update and that of the standard SOM in Eq. (13) is that the value of the weight of the winner neuron replaces the input of the standard SOM. To avoid complete collapse of the map it is necessary for example in the K - - D - - 1 case that ml (t) = O, mN(t) = 1, ~/ t. The most important point of this algorithm is that its ODEs are linear, and for the K = D = 1 case they look like N
dmj __ Z p i h(da(i,j))(mi - mj) dz " i=1
(37)
For this case if a unique solution exists then it has been shown, that M ~ the stable stationary point of the ODEs is such that m l < m2 < ... < raN, which corresponds to an organized configuration for the SOM. This same algorithm has also been also analyzed in the K - D ~> 1 case.
712
J.A. Flanagan
Now some results, which come from the averaging of the neuron weight vectors, are described as they provide a useful insight into the self-organizing process. The results to be discussed make use of the averaging principle of the ODE method to obtain O D E expressions for the trajectories of the neuron weights in the SOM. One such study by Erwin et al. [74] showed how the existence of metastable states or stationary points of the ODEs not in an organized configuration (e.g. Fig. 8b) of the K - - D = 1 SOM depended on the type of neighborhood function. They showed that if a neighborhood function has a convex form, then all stationary points of the ODEs belong to an organized configuration, whereas if the neighborhood function has a concave form, there can be metastable states. Fig. 9 shows an illustration of these two types of convex and concave neighborhood functions. The presence of these metastable states was shown to be important during the training of the SOM, as even if it does not prevent global organization of the neuron weights, they can significantly slow down the rate of convergence to an organized configuration. The effect of the presence of metastable states on the trajectories followed by the neuron weights is illustrated in Fig. 10 taken from [75]. The figures show the plots of the averaged trajectories of two neuron weights, here denoted Xl ,x2, in a K = D - 1 SOM, with N = 3 and a uniformly distributed input. The average trajectories where obtained by starting the simulation from the same initial condition several times, and then at each time the averages of the x~ ,x2 were evaluated. This procedure was carried out for several different initial points. Fig. 10a shows the trajectories converging on two stationary points p, q which correspond to the two different possible organized configurations. Fig. 10b is for the same SOM except that this time the neighborhood function has been changed to induce (nonorganized) metastable states, which appear on the plot as points r , s , t, v along with the stationary points
0.8
v
0.6
> t~
"13 ..E
v
0.4
0.2
0
I
0
Fig. 9.
2
__1
4
da(v,i)
i.
i
6
8
10
Figure showing the form of a convex neighborhood function plotted with + symbols, and concave neighborhood function.
Topologically ordered neural networks
713
X2
X2
(a)
x,
(b)
•
Fig. 10. (a) Plot of the average trajectories for weights xl,x2 with two organized stationary states p, q. (b) Plot of the average trajectories for weights xl ,x2 with two organized stationary states p, q, and the four points r,s, t, v corresponding to nonorganized metastable states. which correspond to the organized configurations p , q . Note also the trajectories from t, r towards p, and from s, v towards q. Based on using averages of the weight vectors another approach has been used to prove self-organization in the SOM. The idea is to find an energy function which describes the average evolution of the neuron weights. If this is possible, then the SOM could be viewed as performing stochastic gradient descent on an energy function. This allows powerful tools from stochastic approximation theory to be used to analyze the convergence properties of the algorithm. Tolat [76] described an energy function for each neuron weight, which is then shown to be minimized under certain constraints in an organized configuration. However, this approach was shown to be an approximation, which is no longer valid in the case of highly disordered maps and a steep neighborhood function. Erwin et al. [59] have shown that based on the average ODEs of the weights, even in a quite simple case they cannot be the gradient of a general energy function. This is not totally surprising given that the SOM algorithm is heuristic, based on a principle rather than being derived mathematically from any law. Kohonen [25] has shown that the SOM algorithm is quite closely associated with the gradient of an energy function of the form N
VN(M)- ~
N
f ~ Zi__lh ( d a ( j ' i ) ) l l x - l n i ] 1 2 p ( x ) d x .
(38)
However, the SOM algorithm only derives exactly from this energy function when the neighborhood function is such that W --0. In the special case where the probability density function p(x) is discrete valued, the original SOM algorithm can be derived from an energy function [77]. Another approach has been to redefine the standard SOM algorithm so that it can be derived from an energy function, while
714
J.A. Flanagan
still behaving in a similar manner. One such example was given by Heskes [78,79], where local errors ei where defined for each neuron i as, 1
h(i,j)llx
ei -- -~ Z
-
mill.
(39)
J The associated energy function is defined as, E = (m!n ei),
(40)
l
where ( ) denotes the average over the input. The derivative of this function with respect to the weight vectors gives the update Eq. (13) for the neuron weights, but there is a need to redefine the winner neuron as v(t) - arg min ei.
(41)
l <~i<<.N
While it is possible to carry out a thorough analysis of this algorithm, from the implementational point of view, the calculation of the winner neuron is more complex than in the standard SOM algorithm. A third technique for analyzing Markov chains, which has been applied to the analysis of the SOM algorithm, is the Fokker-Planck equation. The Fokker-Planck equation [57] is a partial differential equation describing the spatio-temporal evolution of the probability distribution P(M) of a stochastic process M. It usually takes the general form, ~P(M) i~2 i ~ ~ = - V ( f ( x , t ) P ( M ) ) + Z . 8m;mj (Qi.j/2)P(M).
(42)
t.j
The function f is associated with the "drift" of the stochastic process, while the function Qi4 is associated with the diffusion component of the stochastic process. If it was possible to write and find a solution to the Fokker-Planck equation for the SOM algorithm, it is probably the most complete description of what happens during the self-organizing process. It would give a solution for the probability distribution of the neuron weights at every instant in time. We would expect the probability density of the neuron weights to be maximum in organized configurations. To generalize and solve such an equation for the SOM would be very complicated. Once again it is informative to compare an analysis of a stochastic process using the Fokker-Planck equation to an analysis by the Markov chain method or the ODE method. Compared to the Markov chain method, what the Fokker-Planck equation would give is the probability of reaching any configuration from any initial state of the weights, whereas in the Markov chain method only a small set of possible trajectories is analyzed. The ODE method can be seen as something of an analysis of the drift component of the stochastic process, or the evolution of the average value of the neuron weights in time, which is one part of the Fokker-Planck equation. The Fokker-Planck equation not only uses information on the drift of the stochastic process, but also uses extra information on the diffusion parameter of the stochastic
Topologically ordered neural networks
715
process. The Fokker-Planck equation has been applied to the analysis of the SOM in one case by Ritter and Schulten [80]. They examine the behavior of the SOM in the vicinity of stationary points. They also pointed out an effect referred to as the SOMs ability to automatically select feature dimensions. This means that in the case of dimension reduction (i.e. K < D) the weights spread out along hyperplanes of the dimensions of the input space which have the highest variance. This characteristic of the SOM resembles the function of principle component analysis (PCA).
6. Topology preserving maps and vector quantization Most of the discussion so far has concentrated on the topology preserving and selforganizing ability of the SOM algorithm. However another important aspect of the algorithm is its ability to form a nonparametric regression, or skeleton structure of the probability distribution of the input signal. This function corresponds to vector quantization, a technique which is very important in the area of digital signal processing and communication, where compression of information is required. The basic idea is that the instantaneous value of a time signal is represented, or coded, by one of a set of vectors taken from a codebook. In the SOM, the set of N neuron weight vectors constitutes the codebook. Vector quantization necessarily introduces a distortion into the signal and the codebook is generally designed to minimize some distortion criterion. In this sense it is interesting to analyze the vector quantization function of the SOM algorithm, to better understand what distortion criterion the SOM algorithm optimizes. It is known that the SOM algorithm with no neighborhood function (i.e. W = 0), belongs to a family of vector quantization algorithms based on competitive learning (CL). Given an input signal x, quantized with the best matching codebook vector mi, one possible measure of the average distortion is given by the potential function VN(m), defined as rain [Ix - mill 2 dP(x). VN(m) - - ~1 / l<~i<~N
(43)
If the distribution P is continuous and has a density p, then
VN(m) :
~1 L i:|~
i
Ilx - m;ll2p(x)dx,
Vx(m) can be written as (44)
where once again ~"~i corresponds to the Voronoi tessellation of neuron i. This particular distortion function has been subject to much analysis as evidenced in [81]. If all the codebook vectors are different with inputs falling on the Voronoi borders with probability 0, then this previous expression holds and VN(m) is differentiable with a gradient function 27VN(m). The associated stochastic gradient descent algorithm associated with this potential can be written as mi(t -+- 1) : mi(t) -Jr- ~(t)l[,~i(x ) (x(t)
- -
mi(t)),
(45)
716
J.A. Flanagan
where
1~;- {1 0
if X E ~"~i(X),
(46)
otherwise.
Note that these two equations correspond to the equations for the standard SOM algorithm without a neighborhood function. Some of the results obtained from an analysis of the zero neighborhood function in the K = D = 1 case include, the almost surely convergence of M to one of the local minima of VN(m), if there is a finite number of minima [81-83]. If the density function p is positive on ]0,1[ and ln(p) is strictly concave, then there is a unique stationary point of 27Vx(m) and it is stable. Finally it is easily shown in the uniform density case on [0, 1] that the stationary point is given by 2i-1
ml = 2N '
l <~i<,U.
(47)
In the higher-dimensional case (i.e. K, D > 1) the main results can be summarized as; if ~7Vx(m) has finitely many zeros in an organized configuration, and if all these zeros have all their components pairwise distinct, then M(t) converges almost surely to one of these local minima. There is a point which has been referred to by Cottrell [67] about a confusion, which has appeared in the literature on the analysis of the SOM as a vector quantizer. In a general look at vector quantizers Zador [84] presents an analysis of two different types of vector quantizer, the "optimal quantizer" and the "random quantizer". Associated with a vector quantization of a signal is a discrete probability measure PN defined for each codebook vector as, N
PN -- ~ P(f~i(m*))Sm*
(48)
i=l
a probability density function, with a support consisting of discrete points at the codebook vectors mT, weighted by the probability of the Voronoi tessellation associated with the codebook vector i. It is know that PN converges in distribution to the probability measure P when N ---+oo. PN defines the vector quantization property, and shows how to reconstruct the input distribution from the N codebook vectors after convergence. If FN and F denote the distribution function of PN and P, respectively, then an "optimal quantizer" for a given N is a solution mTv which minimizes both the distortion measure Vu(m) of Eq. (44) and the quadratic norm [IF--FuJI 2. These properties have been used by Pag+s to numerically compute integrals [82]. The second type of quantizer is the "random quantizer", where the codebook vectors (Y1, Y2, 9 9 YN) are i.i.d random variables with a density function g. A measure A(p, g) is defined as
A(p, g) : li+m N2/DEo [~-~ f~ IlYi - xllp(x)dx], i:1
i
(49)
Topologically ordered neural networks
717
where Eg is an ensemble average with respect to g. For a random quantizer given p, g and assuming some weak conditions then A (p, g) is minimized by the density g* where
(50) The exponent ? in this case is D / ( D + 2), the inverse of which is called the magnification factor. Note that as D ~ ~ this factor y ~ 1. Several studies have been carried out to determine the magnification factor of the SOM. However some of these studies refer to Zador's result [84], which according to Cottrell [67] is inappropriate, as in fact the SOM with W = 0 is an optimal vector quantizer based on the minimization of the distortion, while Zador's result refers to a random quantizer. The studies carried out on the SOM have defined the magnification factor to be the inverse of the quantization density of the input space, or the number of quantization units per unit interval, which is not what Zador's result was derived for. However a result in the form of Eq. (50) has been obtained by Ritter and Schulten [85], as an expression for the quantization density of the weights, with a value of y = 1/3 in the one-dimensional SOM with W = 0, for a large number of neurons. In the case of W > 0 they derived [86] a limit for y as N ~ e~ of 2 V~ - -
3
1
3(w2 + (w +
(51)
A similar type of result for a monotonically decreasing neighborhood can be found in [87]. The results of some simulations in the one-dimensional SOM by Kohonen have shown, that the weights in the SOM simulations converged to points different from those calculated from the distortion measure in Eq. (44) [88]. For the modified SOM algorithm, where the winner neuron is chosen as in Eq. (41), a value of ~, = 1/3 has been found [89], which is independent of the value of W. In terms of biological networks the magnification factor as defined by the inverse of the quantization density, has been interpreted by Kohonen [25] as different parts of the receptive surface, for example of the retina, being mapped in different scales to the brain. In terms of the SOM theory he speculates that this is translated in the brain as the area allocated to the representation of a feature in a brain map being somehow proportional to the statistical frequency of occurrence of that feature in observations. Purely from the information theoretical point of view, and leaving aside concerns about system robustness, these results on the discretization of the input space suggest that the SOM does not form the most efficient regression of the probability distribution of the input signal. The most efficient quantization of the input space would be where the probability of the Voronoi tessellation for each codebook vector was l / N , that is an equiprobable distribution of the probabilities of the Voronoi tessellations. The information theoretic entropy, or the channel capacity of the quantizer is given by N
I -- - Z i=1
P(Oi)logz(P(f~i)),
(52)
718
J.A. Flanagan
which is maximized and equal to log2(N) for P(~i) = P(~j), i 5k j. A vector quantizer designed to achieve this goal is called a maximum entropy quantizer. A topology preserving map, which achieves this goal, has been proposed by van Hulle [90], and is called the maximum entropy learning rule (MER). One of the major drawbacks of this algorithm compared to the SOM algorithm is that it is only defined for K = D.
7. Variants of the S O M
In this section some modifications that have been made to the standard SOM algorithm are described. Unlike some of the variations discussed in Section 5, which were introduced to facilitate the analysis of a self-organizing or vector quantizing mechanism, the variants discussed here are for the most part related to improving the performance of the standard SOM in a practical situation. A biological justification for any of these variants has not been established. They are presented to further reinforce the different factors important in the self-organizing mechanism, and maybe raise questions as to how a real biological system could cope with these deficiencies, if they are really deficiencies in terms of neural information processing. Some illustrative examples of variants of the SOM are presented here, but a more detailed list and discussion can be found in [25]. The various factors of the self-organizing process as defined in the SOM, which could be varied, include the type of input data, the neuron lattice, the neuron weights, definition of the winner neuron, neighborhood function, and the manner in which the neuron weights are updated. The discussion of variations in the input data, neuron weights and updating the neuron weights is discussed in Section 8, as they are considered more than just variations of the SOM algorithm, but rather, generalizations or abstractions of the self-organizing mechanism. While there exists no rigorous proof, it seems clear that if in a standard SOM the neuron weights are to reach an organized configuration, independent of the initial conditions, then the parameters of the SOM should be such that during training every neuron weight must be updated at some time. If S is the support of the probability distribution of the input and if it is finite and bounded, then as x(t) E S, Vt, and given the nature of the update Eq. (13), the weights can only converge to some set B such that S c B. While it has not been proved in general, it is conjectured here that in an organized configuration
mi E Cvx(S)
V1 ~i<~N,
(53)
where Cvx(S) is the convex hull of S, or the smallest convex set containing S. It is found that in the case where S :/: Cvx(S), that the standard SOM can be inefficient in terms of its vector quantization properties. For example, if S is composed of two separated regions, then a result of training the SOM with such data could be neuron weights in the inter region area. Although the neurons may have reached an ordered configuration, from the vector quantization point of view an inter region neuron i
Topologically ordered neural networks
719
may be such that P(f~i) = 0, and thus serves no purpose. Such a neuron is commonly referred to as a dead neuron. To avoid this situation occurring in the SOM, the neighborhood relations have been redefined by Kangas et al. [91], not in terms of the position of the neurons on the lattice, but in terms of the relative magnitudes of the differences between the neuron weight vectors mi. The neighborhood relations are defined along the minimal spanning tree (MST) [92], an algorithm which assigns arcs between the nodes so that all the nodes are connected through single linkages and the total sum of the lengths of the arcs is minimized. In this case the lengths of the arcs are defined as the unweighted Euclidean distances, between the neuron weight vectors. The neighborhood of a neuron is then defined by the arcs originating from the neuron. As in the original SOM algorithm, learning starts with a wide neighborhood, and as learning progresses the width of the neighborhood decreases. This process does not lead to spatially ordered mappings, but simulations [91,25] show that it leads to a faster and more stable vector quantization of the input with few if any dead neurons. A similar approach has been described by Martinez and Schulten [93] in what is referred to as the "Neural Gas" algorithm. It requires maintaining a variable Cij E {0, 1} for each pair of neurons i,j which describes an adaptively changing topological relation between the two neurons. If Cid = 0 then i, j are not topological neighbors. If C;,j r then i,j are topological neighbors. For an input x then a sequence (i0, i l , . . . ,iN) can be formed, where
IIx -
m/oil
<
IIx -
mi,
II <"
IIx -
mi~
II.
(54)
In this case i0 corresponds to the winner neuron in the standard SOM. Each weight is then updated as mi~ (t + 1) = mik (t) + 0t(t)e -~k (x(t) - mi~ (t))
(55)
for an appropriate constant ~r. Using exp(-~rk) as it is in this equation, means that the neighborhood function is a function of the distance between the neuron weight vectors as in the MST. However, unlike the MST, the topographic relation between neuron io,j is adapted by changing the value of C;0y, which also depends on an aging variable tioj. If C/0i1 = 0, then set C/0i, = 1 and tioi, = 0. If C/0i, = 1, then set tioil = O. The age of all connections to i0 is increased by setting tioj -- ti0j + 1, Vj if Cioj = 1. If tioj > T then set Ci0j = 0, where T is some predefined constant. This means that only neurons with the closest neuron weight vectors are considered to be topographic neighbors, and the topographic ordering is given after learning by the variables Cgj. Through simulation it has been found that in addition to a form of topographical order, a good vector quantization is achieved as well. It must be noted however, that the algorithm is computationally expensive and requires extra memory compared to the standard SOM. Further variations of the SOM and the neural gas algorithms have been proposed by Fritzke [94], where the number of neurons is increased or decreased, according to different requirements as learning progresses. Another variant of the SOM algorithm intended to speed up the search for the winner neuron is the tree structured SOM. This algorithm has a hierarchical
720
J.A. Flanagan
searching scheme that involves several SOMs trained with the same input, organized in a pyramidal structure. The idea is to start with a "top", one neuron SOM, which is trained and then held fixed. Successively lower and larger SOMs are then trained and held fixed. To locate the winner in a SOM that is under formation, the previous higher fixed levels are used as a search tree. For a given x the winner at a lower level will be found amongst the descendants of the winner neuron in the current level. Fig. 11 shows an illustration of the descendants from one level to the next joined by continuous dark lines, neighbors on the same level are joined by the dotted lines. Unlike a standard tree search, the winner neuron is not only searched for amongst the direct descendant of a winner at a higher level, but also among the neighbors of each descendant at the same level.
8. Generalizing the principle of self-organization The theoretical analysis of the SOM algorithm has led to a better understanding of the dynamics of self-organization, and gives an insight into the vector quantization capabilities of the SOM. However the standard SOM algorithm is seen to be restricted to the type of information it can be applied to. In this section it is shown that the principle of self-organization can be generalized to include a much broader type of data set. From the biological point of view such a generalization may help in understanding how the brain can process higher level information such as language. The main restriction to the type of data and neuron weight sets referred to so far is the fact, that they are vectors defined on a metric space. Generalizations of the neuron weights and data sets have been made to broaden the concept of selforganization in the adaptive sub-space SOM (ASSOM) and the batch SOM applied to symbol strings and to fast evolutionary training. The SOM algorithm, through its self-organized state, allows for the extraction of features from the input signal. However this extraction, described so far, is not optimal in that it is not an invariant feature detector. This means, that if the source feature is subjected to a form of transformation, such as a rotation or translation in
Fig. 11.
The tree structure of SOMs, continuous lines showing direct descendants, dotted lines joining neighboring neurons on the same level.
721
Topologically o r d e r e d neural n e t w o r k s
space or time, then the SOM would not recognize it as the source feature or what will also be referred to as the input pattern. In this sense the SOM is not an optimal feature extractor. From the brain processing point of view it would seem, that for higher levels of abstraction, the ability to perform feature invariant extraction from sensory signals would be a fundamental requirement. One such feature map based on the SOM has been described by Kohonen [25] and is referred to as the ASSOM. The main idea is to replace the neuron weights with the concept of a "neural unit", which in fact defines a manifold such as a linear subspace. The basis vectors, which span this subspace, are adaptable, and the measure of similarity between an input pattern and a neural unit is obtained from the projection of the pattern onto the different subspaces associated with each unit. The different stages of the self-organizing mechanism as applied in the ASSOM are described using an example from Kohonen [25]. The example is for a time signal f(t) which is sampled at discrete time intervals q , . . . , t=, and results in an input vector x(t), x(t) =
( f ( t - tl),f(t-
t2),...
, f ( t - t=)).
(56)
Assume that each node i in the lattice has been initialized by a set of basis vectors 1 , 2 , . . . , k which span the subspace 5~ The input vector x(t) is then decomposed into two orthogonal components x~ c Yi and x~ 2 _ S i such that x(t) - x~ + x~. The winner neuron is then defined as
bij,j-
v(t)
= arg max [Ix~[I.
(57)
l <~i <<.N
Compared to the definition of the winner in Eq. (12) for the standard SOM, the winner in the ASSOM does not involve a direct distance measure between the input and the neuron unit, rather the subspace spanned by the sample vectors is compared with the subspace defined by the neural units. There are several ways in which the input vector x(t) can be decomposed into orthogonal components, for example with the Gram-Schmidt process or, as suggested by Kohonen, the novelty filter [95]. The next step is to update the neuron units in the neighborhood of the winner neuron. The update involves a rotation of the current basis vectors towards the input x(t). This is carried out in a similar way to that of Kohonen's learning subspace method [25], and the update of the basis vectors is given explicitly by
bij(t Jr 1) --
[1 4- ~(t)x(t)x(t) T]
9bij(t).
(58)
The samples x(t) should preferably be normalized. In principle the bij need not be normalized or orthogonalized but the learning is stabilized if orthonormalization is carried out after every few learning steps. In practice 0~(t) is also made to depend on the magnitude of the projections in order to make the rotation angles proportional to the angle between x(t) and 5~ It is explained in [25] how the ASSOM has been used to generate wavelet filters for time domain speech signals. However in this case an extra concept of "Representative Winner" is used. The idea is that the nodes
J.A. Flanagan
722
learn the general linear combination of consecutive sequences of x(t), which is referred to as an episode E. This in some sense modifies the meaning of the neighborhood used in the standard SOM. This approach of course assumes that the process which generates these sequences during one episode changes very little or not at all. If the episode E consists of the consecutive sequences x ( t l ) , . . . , X(tq), then there are the corresponding projections x~ ( t l ) , . . . , Xq~(tq) onto the subspace ~ i . The next problem is to decide how the representative winner neuron should be chosen. Kohonen has proposed that a winner neuron v1(t) be chosen for each sequence l in the episode as in Eq. (57) and then choose the "representative winner" as,
v(t) - arg max I[x~i(')il" ~ , <~i~q
(59)
This measure is not unique and the definition could be based on average projections or on the majority of matches over the episode. The adaption of the basis vectors is quite similar to that of Eq. (58), but this time all the sequences in the episode are used in the update as,
bij(t
+ , / = II[' +
bi ltl
/60/
tkEE
Note that if the ASSOM is to be applied in a practical situation or in a biological framework, it would necessitate a memory function capable of holding all sequences of the episode, so that the sequences used to chose the winner are available for updating the basis vectors. In practical applications this is not so much a problem, especially not if the signal is slowly varying and periodic. The ASSOM algorithm shows how the self-organizing mechanism can be generalized in the case of an invariant feature detector. Another algorithm capable of organizing nonvectorial data is based on the Batch map algorithm [96], which is a computational short cut version of the SOM. As the name suggests the data is processed in batches, which means that all the data or a representative set of the data must be available during training. The inspiration for the batch map SOM algorithm can be found in an analysis of the stationary point of the neuron weight vectors of the standard SOM algorithm. Consider the ODEs for the SOM given in Eq. (35), the stationary point value of mi is given by
.
~-~N_, f n h ( d a ( i , j ) ) x p ( x ) d x
(61)
mi -'- ~N=, ff~/ h(d~(i,j))p(x) dx "
It should be noted that the right-hand side of this equation is a function of the m~ through the Voronoi tessellations ~j. The steady-state solution of this equation can be found using an iterative process whose generalization is of the form z~+l = f ( z , )
(62)
and is referred to as a contraction mapping. For Eq. (61), assume that the neighborhood function is such that h(d~(i,j)) = 1 for d~(i,j)<<, W and is 0 otherwise. The
Topologically ordered neural networks
723
final solution m* is seen to correspond to the centroid of p(x) over the Voronoi tessellations of the neurons belonging to the neighborhood of neuron i. The form of Eq. (61) inspires the batch algorithm as there is no incremental update of the neuron weights involved. Rather, the new value of the neuron string is given as the average value of the inputs over all the Voronoi tessellations in the neighborhood of the neuron. This average value can be calculated. The batch map algorithm can be summarized as follows: 1. For the initial values of the neuron weights, take a set of N samples of the learning set. 2. For each neuron i, collect all the input samples which belong to the Voronoi tessellations of each neuron j in the neighborhood of neuron i. 3. The mean of the collection associated with each neuron i is the new value of the weight for neuron i. 4. Repeat from step 2 a sufficient number of times until the neuron weights no longer change. It has been shown by Kohonen [97,98] that the Batch map principle is also applicable to nonvectorial data, such as symbol strings, as long as some distance measure between the items is defined. In the above algorithm the mean used at step 3 is replaced by the "generalized mean" of the collection. The generalized mean of the data set can be calculated using the distance measure between the strings. For example the mean is the string m which minimizes
d2(x, m),
(63)
J
where d(o) is the measure of distance between two strings. Another area where the self-organizing mechanism of the SOM has been generalized is evolutionary programming [25]. In this case the input is not defined on a metric space and as such there is no metric to measure the distance between data samples. However, ordered models can emerge based on a fitness function f . Polani [99] has shown that the topological ordering in the SOM can be used to improve the efficiency of training in genetic algorithms. This idea has been applied in the use of evolutionary training for signal analysis [100,101]. The batch map algorithm can be used in an even more abstract setting than that of clustering of symbol strings. It has also been used in the case of evolutionary learning. Evolutionary learning is associated with genetic algorithms (GA) [102]. The basic idea in applying evolutionary programming to the batch algorithm is, that there is no distance measure defined for the input data, but rather there is a fitness function f which is defined, and which is used to define the input data associated with each neuron. The neuron models are then updated by using a probabilistic variation of the models to increase the fitness. In the case of evolutionary training in the SOM the winner model v is defined as the one for which the input x maximizes the fitness function v(t)
--
arg max f ( x ( t ) mi(t)) l~i<.N
(64)
J.A. Flanagan
724
in a way similar to the definition of the winner neuron in the standard SOM. The next step is the variation of the models mi(t) in the neighborhood of the winner model in such a way that,
f ( x ( t ) , m i ( t + 1)) > f(x(t),mi(t)).
(65)
In simple evolutionary learning this increase in fitness is achieved by the replacement of some model parameter by a randomly selected value, or copying a randomly selected subset of parameters from another model in the same population. More advanced GA techniques could also be introduced to this kind of learning. This form of evolutionary learning is however computationally very expensive, and Kohonen [103] has suggested using evolutionary learning with the batch map algorithm, to decrease the computational load. He defines it in general terms as follows: 1. Initialize the models m/., for example, by a random choice of their parameter values from a set of possible values. 2. Input a number of x values and list each of them under their respective winner unit, that is the mi, for which the fitness function f ( x , m;) is maximized. 3. Find a new value of each m/such that the sum of the fitness functions f ( x , m/.) for each x in the list of every neuron in the neighborhood of each i is increased. 4. Repeat from step 2. Kohonen [103] presents a simple example of using the evolutionary learning batch map. In this section it has been shown that the principles of self-organization expressed in the SOM algorithm can be successfully generalized by generalizing the concepts of the winner using different measures of similarity between the inputs, which are not necessarily vectorial in nature, and secondly by changing the update rule from an incremental one, to an update based on the mean or median of some subset of the data. It will surely be possible to extend these abstractions to even more general situations in the future. From the biological point of view, the generalization of the self-organizing mechanism in the case of the SOM can be paralleled and used as a tool to provide an understanding into how the brain can form abstract feature maps. The formation of abstract feature maps in the brain has been discussed by Kohonen and Hari [104]. Two examples of the formation of abstract feature maps using the SOM are now described. The first refers to the formation of a phonemotopic map and the second to the formation of a word-category map. In the case of phonemotopic maps, first simulated by Kohonen [105,106], continuous Finnish speech was analyzed and the input vectors x formed by taking the average signal powers at 15 different frequency bands regularly sampled every 20 ms. The SOM was trained over several tens of thousands of input samples and the model vectors began to represent models of phonemes in an ordered fashion. An illustration of the map obtained is shown in Fig. 12, where the circles denote the neurons, and the symbols denote the phoneme, or its short-term spectrum, to which it is best tuned. While pure phoneme maps have never been found in the human brain, indirect evidence as to their existence exists. For example, phonemic repre-
Topologically ordered neural networks
725
ir i
Fig. 12. A p h o n e m o t o p i c m a p o f F i n n i s h p h o n e m e s self-organized on the basis o f shorttime spectra o f speech; # m e a n s / k / / p / o r / t / t a k e n as one b r o a d p h o n e m i c class. T h e p r o n u n c i a t i o n guide is as f o l l o w s : / a / l i k e the ' u ' in c u t , / / l i k e ' a ' in c a t , / e / l i k e the 'e' in b e t , / i / like the 'i' in bit, / o / l i k e the ' o ' in pot, / u / l i k e the ' u ' in put, a n d / y / l i k e the ' u ' in d u c (French).
A
sells visits
.
works
.
buys
.
speaks
,
.
oo runs
~
phon~s
/
.
,
,
s
.
/
,
Mary
.
,
Jim
.
Bob
.
bread
.
meat
71 9
/.
.
.
dog
.
.
]
9
9 drinks
9/
,
horse
Jr
walks
.
.
.
likes
,
.
.
,
poorly 9
. well
.
.
/ [
./ / ,
. .
,
. . w a t e r
.
. .
. .
.
. .
. .
.
little
.
.
o ft en
.
fast
,
seldom
.
.
slowly
.
Fig. 13. S e l f - o r g a n i z a t i o n o f w o r d s in the S O M b a s e d o n their c o - o c c u r r e n c e with p r e v i o u s w o r d s in the text. The r o u n d s y m b o l s o f n e u r o n a l elements have been left o u t a n d the s y m b o l s are wr itte n in a best m a t c h i n g l o c a t i o n s ( r e c t a n g u l a r grid). T h e curves t h a t s e p a r a t e the w o r d classes have been d r a w n m a n u a l l y , b u t similar classes c a n also be f o u n d by a u t o m a t e d clustering analysis. A d a p t e d with permission, f r o m [107].
726
J.A. Flanagan
sentations in children are modified early on by exposure to the native language, which agrees with the result from the simulation, that the map is organized with respect to the sensory input it receives. Also magnetoencephalographic evokedresponse studies indicate that the auditory cortex responds differentially to stimuli exceeding the vowel-category boundaries and that these boundaries depend on the language exposure of the subject. The formation of word-category maps in the SOM was described in [107], which demonstrated that it is possible to identify semantic aspects of words from texts in an unsupervised way. Words in written text are symbolic items that were converted into numerical input vectors x as follows; each word in the text was replaced by a random vector ri that consisted of seven random components, and i is the word position in the text. The random vectors were unique but randomly assigned for each word in the vocabulary. This randomness ensures that there is no dependence on the word appearance on the results. To take into account the co-occurrences of neighboring words, the input to the SOM consisted of a pair of code vectors of adjacent words of the type
X--(ri_l, ri).
(66)
The SOM was then trained using the x vectors from the complete text. When training was completed each neuron was labeled with the word, whose x was the best matching input for that neuron model. Fig. 13 shows the resulting semantic map. It is noticeable from this that there is a segregation of the words into word classes and the fine structure that distinguishes, for example, the animate objects from the inanimate ones. Such experiments show that using even the simple ideas of selforganization expressed in the SOM algorithm, it is possible to form ordered mappings of more abstract data sets. Abbreviations
ANN, Artificial neural network A S O M M , Adaptive subspace SOM K N N , Kohonen neural network MST, minimal spanning tree ODE, ordinary differential equation PCA, Principle component analysis SOM, Self-organizing map RC-LISSOM, Receptive field laterally interconnected synergetically selforganizing map VQ, Vector quantization WTA, Winter take-all
References 1. Hart, W. ed (1992) Adler's Physiology of the Eye, 9th Edn. Mosby year book. 2. Daniel, P. and Whitteridge, D. (1961) J. Physiol. (Lond) 159, 203-221.
Topologically ordered neural networks
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50.
727
Gattas, R., Sousa, A. and Gross, C. (1988) J. Neuroscience 8, 1831-1845. Hope, R., Hammond, B. and Gaze, R. (1976) Proc. R. Soc. Lond. B194, 447--466. Gaze, R. and Keating M.(1972 June) Nature 237, 375-378. Easter, S., Burrill, J., Marcus, R., Ross, L. and Taylor, J. (1994) Prog. Brain Res. 102, 79-93. Sperry, R.W. (1963) Proc. Nat. Acad. Sci. USA fi0, 703-710. Gottlieb, D. and Glaser, L. (1980) Annu. Rev. Neurosci. 3, 303-318. Olton, D.S. (1977 June) Scient. Ameri. 236(6), 82-98. von der Malsburg, C. (1973) Kybernetik 14, 85-100. Nicolis, G. and Prigogine, I. (1989) Exploring Complexity. W.H. Freeman, New York. Durbin, R. and Mitchison, G. (1990 February) Nature 343, 644-647. Hubel, D. and Wiesel T. (1969 February) Nature 221(5182), 74%750. Blasdel, G. and Salama, G. (1986) Nature 321, 579-585. Hodgkin, A. and Huxley, A. (1952) J. Physiol. 117, 500-544. McCulloch, W. and Pitts, W. (1943) Bull. Math. Biophys. 9, 127-147. Grossberg, S. (1988) Neural Networks 1, 17-61. Kohonen, T. (1988) Neural Networks 1, 3-16. Hebb, D.O. (1949) The Organization of Behavior, Chapter 4. Wiley, New York. Grossberg, S. (1968) Proc. Nat. Acad. Sci. USA 59, 368-372. Kohonen, T. (1972) IEEE Trans. Comput. C-22, 701-702. Anderson, J. (1972) Math. Biosci. 14, 197-220. Nakano, N. (1972) IEEE Trans. Syst. Man Cybern. SMC-2, 381-388. Oja, E. (1982) J. Math. Biol. 15, 267-273. Kohonen, T. (1995) Self-Organizing Maps, 2nd extended Edn., 1997. Springer, Berlin Heidelberg. Hubel, D. and Wiesel, T. (1962) J. Physiol. (London) 160, 106-154. Hubel, D. and Wiesel, T. (1963) J. Neurophysiol. 26, 994-1002. Hubel, D. and Wiesel, T. (1968) J. Physiol. (London) 195, 215-243. Erwin, E., Obermayer, K. and Schulten, K. (1995) Neural Comput. pp. 425-468. Swindale, N.V. (1996 May) Network: Comput. Neural Syst. 7(2), 161-247. Willshaw, D. and vonder Malsburg, C. (1976) Proc. R. Soc. Lond. B194, 431-445. Amari, S.I. (1980) Bull. Math. Biol. 42, 339-364. Amari, S.I. (1983 September/October) IEEE Trans. Syst. Man, Cybern. SMC-13(5), 741-748. Kohonen, T. (1990 September) Proc. IEEE 78(9), 1464-1480. Gersho, A. and Gray, R. (1991) Vector Quantization and Signal Compression. Kluwer Academic Publishers, Boston, Dordrecht London. Mulier, F. and Cherkassky, V. (1994) In: Proceedings of the International Conference on Pattern Recognition, vol. II, pp. 224-228, IEEE service center, Piscataway, NJ. Kohonen, T. (1993) Neural Networks 6, 895-905. Kaski, S. and Kohonen, T. (1994) Neural Networks 7(6/7), 973-984. Morasso, P., Sanguineti, V. and Frisone, F. (1999) in: Kohonen Maps, eds E. Oja and S. Kaski. pp. 267-278, Elsevier, Amsterdam. Fazeli, M. (1992) Trends Neurosci. lg, 115-117. Gilbert, C.D. and Wiesel, T. (1983) J. Neurosci. 3, 1116-1133. KisvS.rday, Z. and Eysel, U. (1992) Neuroscience 46, 275-286. Wood, J. and Garthwaite, J. (1994) Neuropharmacology 33, 1235-1244. Ford, P., Wink, D. and Stanbury, D. (1993) FEBS Lett. 326, 1-3. Miikkulainen, R., Bednar, Y.C.J.A. and Sirosh, J. (1999) in: Kohonen Maps, eds E. Oja and S. Kaski. pp. 243-252, Elsevier, Amsterdam, Lausanne, New York. Grossberg S. (1976) Biol. Cybern. 23, 187-202. Bauer, H.U. and Pawelzik, K. (1992) IEEE-TNN 3(4), 570-579. Bezdek, J. and Pal, N. (1995) Pattern Recognition 28, 381-391. Zrehen, S. (1993) in: Proc. ICANN93, pp. 609-612. Goodhill, G., Finch, S. and Sejnowsky, T. (1995) in: Proceedings of the Second Joint International Symposium on Neural Computation, Vol. 5, pp. 191-202. La Jolla, CA.
728
J.A. Flanagan
51. Villmann, T. (1999) in: Kohonen Maps, eds E. Oja and S. Kaski. pp. 279-292, Elsevier, Amsterdam, Lausanne, New York. 52. Villmann, T., Der, R., Herrmann, M. and Martinez, T. (1997) IEEE-TNN 8(2), 256-266. 53. Parzen, E. (1962) Stochastic Processes, Holden-Day, San Fransisco, London, Amsterdam. 54. Gardiner, C. (1985) Stochastic Methods, 2nd Edn. Springer, New York, Berlin. 55. Ljung, L. (1977 August) IEEE Trans. Autom. Control AC-22(4), 551-575. 56. Kushner, H. and Clark, D. (1978) Stochastic approximation methods for constrained and unconstrained systems, Applied Mathematical Sciences, vol. 26. Springer, Berlin. 57. Risken, H. (1984) The Fokker-Planck equation. Springer, New York, Berlin. 58. Cottrell, M. and Fort, J.C. (1987) Ann. Inst. Henri Poincar~ 23(1), 1-20 (in French). 59. Erwin, E., Obermayer, K. and Schulten, K. (1992) Biol. Cybern. 67(1), 47-55. 60. Bouton, C. and Pag6s, G (1993) Stochastic Proc. Appl. 47, 249-274. 61. Flanagan, J.A. (1996) Neural Networks 9, 1185-1197. 62. Sadeghi, A. (1998) in: Proceedings ESANN98. pp. 173-178, D Facto ed. Brussels. 63. Sadeghi, A. (1999) A mathematical study of self-organizing neural networks. P h . D . thesis, University of Kaiserlautern, D386. 64. Flanagan, J.A. (1999) in: ICANN99. pp. 156-161. 65. Flanagan, J.A. (2000) Self-organisation in the SOM with a finite number of possible inputs pp. 261-266. 66. Fort, J.C. and Pag6s, G. (1995) Neural Networks 9(5), 773-785. 67. Cottrell, M., Fort, J.C. and Pag6s, G. (1997) in: Proceedings of the WSOM97. pp. 246-267, Espoo, Finland. 68. Benaim, M., Fort, J.C. and Pages, G. (1998) Adv. Appl. Prob. 30, 850-869. 69. Hirsch, M. (1985) SIAM J. Math Anal. 16, 423-439. 70. Fort, J.C. and Pag6s, G. (1995) Ann. Appl. Probab. 5(4), 1177-1216. 71. Flanagan, J.A. (1997 July) Neural Networks 10(5), 875-883. 72. Cottrell, M. and Fort J.C. (1986) Biol. Cybern. 53, 405-411. 73. Yang, H. and Dillon, T. (1992) Neural Networks 5, 485-493. 74. Erwin, E., Obermayer, K. and Schulten, K. (1992) Biol. Cybern. 67(1), 35-45. 75. Flanagan, J.A. (1994) Self-organising neural networks. Ph.D. thesis, Swiss Federal Institute of Technology, Lausanne (EPFL). 76. Tolat, V. (1990) Biol. Cybern. 64, 155-164. 77. Ritter, H., Martinez, T. and Schulten, K. (1992) Neural Computation and Self-organizing Maps: An Introduction. Addison-Wesley, Reading, MA. 78. Heskes, T. and Kappen, B. (1993) in: ICNN IEEE, Vol. 3, pp. 1219-1223. New York. 79. Heskes, T. (1999) in: Kohonen Maps, eds E. Oja and S. Kaski. pp. 303-315, Elsevier, Amsterdam, Lausanne, New York. 80. Ritter, H. and Schulten, K. (1988) Biol. Cybern. 60, 59-71. 81. Llyod, S. (1982) IEEE Trans. Infor. Theory IT-28(2). 82. Pag6s, G. (1993) in: Proceedings of the ESANN 93, pp. 221-228 Quorum ed, Brussells. 83. Lamberton, D. and Pag6s, G. (1966) in: Proceedings of the ESANN 96. ed M. Verleysen. Editions De Facto, Bruges. 84. Zador, P. (1982 March) IEEE Trans. Infor Theory IT-28(2), 139-149. 85. Ritter, H. and Schulten, K. (1986) Biol. Cybern. 54, 99-106. 86. Ritter, H. and Schulten, K. (1991) IEEE Trans. Neural Networks 2, 173-175. 87. Dersch, D. and Tavan, P. (1995) IEEE Trans. Neural Networks 6(1), 230-236. 88. Kohonen, T. (1999) Neural Comput. 11, 2171-2185. 89. Luttrell, S. (1991) IEEE Trans. Neural Networks 2, 427-436. 90. van Hulle, M. (1997) Neural Comput. 9, 595-606. 91. Kangas, J. (1990) IEEE Trans. Neural Networks 1(1), 93-99. 92. Sedgewick, R. (1983) Algorithms. Addison-Wesley, Reading, MA. 93. Martinez, T. and Schulten, K. (1991) in: Artifical Neural Networks. eds T. Kohonen, K. M~ikisara, O. Simula and J. Kangas. pp. 397-402, Elsevier, North-Holland.
Topologically ordered neural networks
729
94. Fritzke, B. (1999) in: Kohonen Maps, eds E. Oja and S.Kaski. pp. 131-144, Elsevier, Amsterdam, Lausanne, New York. 95. Kohonen, T. and Oja, E. (1976) Biol. Cybern. 21, 85-95. 96. Kohonen, T. (1992) in: SYNAPSE'92 Symposium on Neural Networks; Alliances and Perspectives in Senri. Osaka, Japan. 97. Kohonen, T. (1996) Self-organizing maps of symbol strings. Technical Report, Helsinki University of Technology. Report A42. 98. Kohonen, T. and Somervuo, P. (1997) in: Proceedings of the WSOM'97 pp. 2~7. Espoo, Finland. 99. Polani, D. and Uthmann, T. (1992) in: Parallel Problem Solving from Nature, pp. 421-429. Elsevier, Amstrerdam. 100. Hy6tyniemi, H., Nissinen, A. and Koivo, H. (1997) in: Proceedings of the Third Nordic Workshop on Genetic Algorithms and their Applications, pp. 135-152 Helsinki, Finland. 101. Nissinen, A. and Hy6tyniemi, H. (1998) in: Proceedings of the WCCI'98. Alaska. 102. Mitchell, M. (1996) An Introduction to Genetic Algorithms. MIT press, Cambridge, MA. 103. Kohonen, T. (1999) Neural Process. Lett. 9, 153-162. 104. Kohonen, T. and Haft, R. (1999 March) Trends Neurosci. 22(3). 105. Kohonen, T. (1982) in: Sixth International Conference on Pattern Recognition, pp. 114-128, Munich, Germany. 106. Kohonen, T., M~ikisara, K. and Saramfiki, T. (1984) in: Proceedings of the Seventh International Conference on Pattern Recognition, pp. 182-185. Montreal, Canada. 107. Ritter, H. and Kohonen, T. (1989) Biol. Cybern. 61, 241-254.
This Page Intentionally Left Blank
C H A P T E R 17
Geometry of Neural Networks" Natural Gradient for Learning K. F U K U M I Z U The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569 Japan
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
731
Contents 1.
Introduction
2.
N e u r a l n e t w o r k s as p a r a m e t r i c e s t i m a t i o n 2.1.
.................................................
Functions of multilayer networks
2.2. Probabilities o f neural n e t w o r k s
3.
733
...............................
.................................
734
..................................
2.3.
L e a r n i n g in neural n e t w o r k s
2.4.
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
737
....................................
739 740
R i e m a n n i a n metric, i n f o r m a t i o n geometry, a n d n a t u r a l gradient
.................
3.1.
Statistical a s y m p t o t i c theory
3.2.
R i e m a n n i a n metric on a statistical m a n i f o l d . . . . . . . . . . . . . . . . . . . . . . . . . . .
742
3.3. N a t u r a l gradient - gradient o f a f u n c t i o n on a R i e m a n n i a n m a n i f o l d
4.3.
Natural gradient and Newton method
4.4.
N a t u r a l gradient a n d G a u s s - N e w t o n m e t h o d . . . . . . . . . . . . . . . . . . . . . . . . . .
750 751
..............................
752 754
................................
Singularity o f m u l t i l a y e r neural n e t w o r k s
754
...............................
5.1.
Smaller n e t w o r k s e m b e d d e d in a larger m o d e l
5.2.
Singularities o f the R i e m a n n i a n structure
747 750
..........................
756
.........................
756
............................
5.3. Topics related to the singularity of R i e m a n n i a n structure
758
..................
759
N a t u r a l gradient learning in m u l t i l a y e r networks . . . . . . . . . . . . . . . . . . . . . . . . . . .
760
6.1.
760
Saddle points a n d plateaus
.....................................
6.2. N a t u r a l gradient escapes plateaus 7.
...........
.................................
4.5. A d a p t i v e n a t u r a l gradient learning
6.
743
..............................
N a t u r a l gradient learning for various models
4.2. A p p r o x i m a t i o n by empirical d a t a
5.
742
....................................
N a t u r a l gradient o f neural n e t w o r k models 4.1.
734
Conclusion
.................................
761
..................................................
Abbreviations
765
................................................
Acknowledgements
766
.............................................
A p p e n d i x A. D u a l vector space
766
.....................................
766
A p p e n d i x B. P r o o f o f T h e o r e m 2
....................................
766
A p p e n d i x C. P r o o f o f T h e o r e m 3
....................................
767
A p p e n d i x D. A p p r o x i m a t i o n o f the F i s h e r i n f o r m a t i o n a r o u n d the critical p o i n t References
.....................................................
732
.......
767 768
1. Introduction
One of the important characteristics of neural systems is its capability of learning. A biological neural system is a network of a large number of neurons. The basic structure of a neuron includes dendrites that receive the input signal from other neurons, a cell body that integrates the signals and generates a response, and a branching axon that conveys information to other neurons. The point of contact between an axion and a dendrite is known as a synapse. Learning in a neural system involves plastic changes in the effectiveness of synaptic connections (see also chapters by Segev and Meunier, and by Gerstner in this book). Artificial neural networks are defined based on the simplification of this biological structure. In artificial neural networks, the processing units are called neurons or units by analogy. They are linked by weighted connections, which represent the effectiveness of synaptic connection. A neuron integrates the signals from other neurons, and generates a response value according to some nonlinear input-output relation. Learning in an artificial neural network is achieved by adjustment of the connection weights. A neural network can realize various functions according to the choice of the weights. From the viewpoint of machine learning, an artificial neural network is a parametric model for estimation and approximation. The connection weights are variable parameters of a machine. By choosing a specific connection topology and optimizing the connection weights, one can obtain a network that approximates the relation between input and output vectors, or the probability of a random variable. The desired relation or probability is usually represented by a set of training data, and the criterion of learning is expressed by a loss function defined with the training data. Learning of an artificial neural network is a process of finding the optimum parameter that attains the minimum value of the loss function. To obtain a desirable network, the weight parameters are updated according to a learning rule. Although artificial neural networks are much simpler than real neural systems, they show diverse and complex behaviors, which are of interest to many researchers, but sometimes difficult to analyze because of their nonlinearity and parallel computation. In this chapter, the geometric structure of multilayer networks and the dynamics of learning caused by their structure are discussed. A multilayer neural network has several layers, including the input and output layer, and connections between the neighboring layers. It realizes a function from the input vector to the output vector. Famous examples are multilayer perceptrons [1] and the radial basis functions [2]. Consider now all of the functions from the space of input vectors to the space of output vectors. They form a functional space, which is infinite dimensional in general. In the space of all the functions, a model of multilayer neural networks 733
734
K. Fukumizu
forms a finite-dimensional subspace, which is parameterized by a finite number' of connection weights. The geometry of this space is not so simple. It has singular subsets caused by the layered structure which is often seen also in a biological system. This singular geometry explains interesting aspects of learning, like local minima and plateaus, in multilayer neural networks. The geometry of multilayer networks will be the topic of Section 5. As an effective method for optimization, the natural gradient was introduced in [3]. Its major applications are multilayer neural networks [4,5] and blind source separation/deconvolution [6,7]. The natural gradient is a very general concept, which can be introduced in optimization in a Riemannian space. However, this chapter focuses only on the natural gradient in parametric estimation with a statistical model, in particular, learning in neural networks. The general explanation of the natural gradient in parametric estimation will be given in Section 3, and application to multilayer neural networks will be explained in Section 4. One of the main topics of this chapter is why the natural gradient works well in multilayer neural networks. Section 6 will describe it in connection with the singular geometric structure of neural networks. 2. Neural networks as parametric estimation 2.1. Functions o f multilayer networks
In many artificial neural network models, a neuron or a unit processes the input signals from other neurons and generates a response according to
where x l , . . . ,XL are real-valued input signals from other L neurons, w l , . . . , WL and h are the weight and the bias parameters, respectively, and q0(t) is a function defined on the real line. The input-output relation q0(t) is sometimes called activation function. In many cases, q0(t) is a bounded, monotonically increasing function. A typical example is the sigmoidal function (Fig. 1) defined by 1
q ~ ( t ) - ,e_-------+, ~.
(2)
The weights Wl,... , W L are the effectiveness of the connections between the neurons, and the bias parameter h specifies the extent how easily the neuron is activated. However, the bias h is omitted in this chapter for simplicity. This assumption is not restrictive, because the bias is replaced by a weight parameter by assuming a constant input XL+~ = 1. The variable parameter within a single neuron is the weight vector w = ( w ~ , . . . , WL)T, where T denotes the transpose. For a general expression of a neuron, the function with a parameter w, q~(x; w) = q~(wTx) is used hereafter.
(3)
Geometry of neural networks: natural gradient for learning
735
1
w1
9
w
2,
-5
0 t
5
Fig. 1. Artificial neuron model (left), and the sigmoidal function (right).
X2
.
xL
YM
Fig. 2.
Multilayer neural network.
A multilayer neural network is defined as a system that consists of neurons of the above type and connections with a layered structure. There is an input layer, which receives an input signal from the outside of the system, and an output layer, which gives an output signal from the system. There are also hidden layers between the input and output layer. A unit in each layer receives signals only from the units in the previous layer. The number of hidden layers is arbitrary in general, and all of the units in the previous layer are not necessarily connected to a unit in the next layer. However, this chapter focuses on the model with only one hidden layer and full connections with units in the previous layer (Fig. 2). This type of model is called a three-layer neural network model 1 (or sometimes multilayer neural network model, if confusion is avoided). More precisely, if the hidden layer has H units, the model is defined by
o,_
J:l
(1 ~
(4)
where x - (Xl,...,xL) T is an input vector given to the input layer, wj = (Wjl,..., wjL) the weight vector of the connection from the input units to the jth hidden unit, vs. - (Vlj,..., vMj)T the weights of the connection from the jth hidden unit to the units in the output layer, and 0 - (v~,..., v~, w]F,..., w~) T summarizes all the parameters of the network. The space of all the parameters for networks with H hidden units is denoted by OH. This is H(L + M)-dimensional space. 1
This model is sometimes called two-layer. This chapter calls it three-layer following the terminology of [1].
736
K. Fukumizu
In many cases of this chapter, the nonlinear function q~(t) is not used for the output units. Although the nonlinearity is natural for a model of neural networks, it just causes a scalar transformation of the output space if q~(t) is monotonic. Theoretically, this nonlinearity can be omitted by converting the output space. The new definition of the multilayer neural network model is given by H
f(x; 0) -- Z vs- q~(x; wj).
(5)
j--I
This sometimes makes a theoretical treatment easier. As Eq. (4) or (5) shows, a parameter 0 defines a function from the L-dimensional input vector x to the M-dimensional output vector. Hereafter, the space of possible input vectors and output vectors is denoted by ~ and ~, respectively. All of the functions defined by the multilayer network model span a finite-dimensional space in the infinite-dimensional functional space of all the functions from 59 to ~ (Fig. 3). The geometric structure of the space of the neural networks is very important in learning. This topic will be explained in Sections 5 and 6. The purpose of learning in multilayer neural networks is to construct a desired function by optimizing 0. In many cases, the desired relation between the input vector and the output vector is given by a set of examples { (x (v), y(V))lv = 1,..., N}, which are called training data. Given a loss function g(y, z), which is a two-variable function to evaluate the difference between y and z, learning is a process to minimize the empirical loss function N gemp(O) -- Z g(Y(")' V--1
vj
f(x("); 0)).
(6)
~
Space of all the functions 1 )
The parameter space /~ ~ (finite d i m e n s i o n a l ~ / ~
:ii
~
~
0)
S eo itonsg,v n by multilayer networks (finite dimensional)
Fig. 3.
Space of the functions given by the neural network model.
Geometry of neural networks." natural gradient for learning
737
A popular choice of the loss function g(y,z) is the square error function
e(y, z) - 1 [ly- zll 2,
(7)
which gives the least mean square criterion for learning. 2.2. Probabilities of neural networks In many problems, the output of training data includes stochastic factors, which reflect noise of observation or ambiguity of pattern classification. What is expressed in training data is not a deterministic relation, but is a probabilistic rule from x to y. More precisely, the true conditional probability density p(ylx) is assumed, and the output y(V) in the training data is a sample from p(ylx) given an input vector x(V). It is natural to formulate a neural network model as a probabilistic machine. Let f(x; 0) be a neural network, and r(ylf(x; 0)) be a conditional probability density that specifies the statistical model of a probabilistic output y given a deterministic output f(x; 0) of a network. The statistical model of neural networks {p(y]x; 0)} is defined by a parametric family of conditional probabilities p(ylx; 0) - r(ylf(x; 0)).
(8)
As the simplest example, consider the Gaussian noise in the output data. Given a multilayer neural network model {f(x; 0) 10 6 | the underlying statistical model is a family of conditional probability density functions {p(ylx; 0) 10 c | defined by p(ylx; 0)
(2 ~zcy2)M/2 exp - ~
flY - f(x; 0)11
,
(9)
where cy2 is the variance of a noise in each component of the output y. One standard way of estimating the optimum 0 to express the true probabilistic rule p(ylx) is the maximum likelihood method. Given the training data { (x (~), y(~))}, the maximum likelihood estimator 0 is defined by N
t } - argmaxL(0)
where L ( 0 ) - ~-'~ logp(y(~/[x(~)" 0).
(10)
v=l
The function L(0) is called the log likelihood function. In the terminology of Section 2.1, the loss function g(y,z) is given by -logr(y]z), and Eemp(0)--L(0). For example, in the case of the Gaussian noise model given by Eq. (9), the maximum likelihood estimator is equal to the minimizer of the empirical loss for the square error g(y, z) = 89 [lY - zll 2A formal foundation of the maximum likelihood estimator is derived from the Kullback-Leibler divergence, which measures the discrepancy between two probability distributions. Let p(z) and q(z) be positive probability density functions on a measurable space (~, ~, g), where ~ is a set, N' is a cy-algebra, and ~t is a measure on (~, N). The Kullback-Leibler divergence from q to p is defined by
K. Fukumizu
738
/
q(z) lx(dz). K(q]]p) - fj ~ q(z)log (p--~j
(11)
The Kullback-Leibler divergence resembles a distance in the sense that K(q]]p)>~ 0 is always satisfied and the equality holds if and only if q(z) = p(z) almost everywhere ix. However, it is not a distance in a mathematical sense, because it is not symmetric; that is, K(pllq) :fi K(ql[p). As a special case of the above definition, let us consider the true probabilistic rule p(ylx) and a probabilistic rule given by a neural network p(y]x; 0). It is necessary to assume some input probability density function q(x), which is not estimated in learning. The Kullback-Leibler divergence from the joint distribution p(y]x)q(x) to p(ylx; 0)q(x) is given by f p(yix)
=O (ylx)q(x), lp(ylx; e)q(x)) -
log tp-{7i<-7, t)}dy dx
= - J~ i~ P(YIx)q(x)I~
0)dy dx + Const.
(12)
The log likelihood function L(0) is a replacement of l ] J~/P(Ylx)q(x) logp(yix; 0)dy dx
(13)
by its sample mean. The law of large numbers asserts that L(O)/N converges to Eq. (13). The maximum likelihood estimation (Eq. (10)) is, therefore, an approximation of minimization of the Kullback-Leibler divergence. This approximation can be formally expressed as follows. Consider the Kullback-Leibler divergence from the empirical joint distribution 1
m
pro(x, y) = ~ E
5(x - x(V))5(y - y(V)),
(14)
v:l
where 5(z) is Dirac's delta function. Then, the divergence is given by 1
N
g(pN(x, y)] [p(YiX; 0)q(x)) --- - - ~ ~ 1ogp(y(~)Ix(~); 0) + Const.,
(15)
v=l
which means that the maximum likelihood estimator is the minimizer of the Kullback-Leibler divergence from the empirical distribution to the model. Similar to the case of the functional space, a neural network model defines a finite-dimensional subspace {p(yix; 0) } in the space of all the conditional probability of y given x or the space of all the joint probability of x and y (Fig. 4). Although it is difficult to define a useful topology in a mathematically rigorous way, a naive explanation is possible using this geometric view. Each of the true joint distribution p(ylx)q(x) and the empirical distribution pN(X,y) defines a point in the space of all the probabilities. The difference of two points in the space is measured by the Kullback-Leibler divergence. The nearest point from p(y[x)q(x) is the optimal
Geometry of neural networks: natural gradient for learning
739
Space of all the probabilities (infinite dimensional)
)q(x) The parameter space (finite dimensiona
/P(Y x ; ( ) ) q ( x ) ~ ~ ' ~ " ~
I
Space of the probabilities given by multilayer networks (finite dimensional) Fig. 4.
Space of the probabilities given by the neural network model embedded in the space of all the probabilities.
parameter in the model to express the true probabilistic rule. The maximum likelihood estimator p(y]x; tJ) corresponds to the nearest point in the model from the empirical distribution pN(X, y). It is important to note that the finite-dimensional space {p(y]x; 0)} of the neural network model, the optimum probability p(y]x; 0,), and the maximum likelihood estimator p(ylx; I}) are not dependent on specific choices of parameterization of the model, but are defined intrinsically in the space of all the probability densities. Any reparameterization of 0 does not change these spaces and points. This viewpoint of intrinsic geometry leads to the Riemannian structure of the model and the natural gradient, which will be explained in Section 3.
2.3. Learning in neural networks It is very difficult to solve the maximum of the log likelihood or the minimum of the empirical loss function in general. An iterative learning rule using the gradient of the empirical loss function is applied to obtain the numerical solution. If the steepest descent method is applied, the learning rule is given by 0(t + 1) -- 0(t) - ]3~Eemp(0(t)) ~0
(16) '
where 13 is a learning coefficient. If the model is the multilayer network with the sigmoidal activation function, the above learning rule gives the well-known error back propagation [1]. One can easily derive it using Eq. (16) and the chain rule of the derivatives.
K. Fukumizu
740 Updates of the parameters according to Eq. (16) stop if c~gemp(O(t)) a0
=0
(17)
is satisfied. This is a necessary condition for a global minimum of Eemp(0); that is, if the empirical loss function Eem p takes the global minimum at 0(t), Eq. (17) is satisfied. However, the inverse is not necessarily true. A point that satisfies Eq. (17) is called a critical point of Eem p. There are three kinds of critical point: minimum, maximum, and saddle point. 2 Therefore, the learning rule Eq. (16) stops also at local minima and saddle points of the empirical loss function. The problem of local minima is inevitable if a gradient-based method is applied. One cannot distinguish whether a minimum point is local or global only from the local information of the function. The learning rule given by Eq. (16) is often called batch learning, because all the training data are presented each time. If a new pair of training data (x (t), y(t)) is generated and used for the update at time t, such a method is called online learning. In online learning, the loss function of the current training data is used for calculating gradient; the update rule is given by 0 ( t + 1) - - 0 ( t )
-
13t
~g(y(t), f(x(t);O(t))) 80
'
(18)
where 13t is a decreasing function that converges to zero. The online learning rule is sometimes applied also to a fixed set of training data. Training data are then randomly or cyclically chosen from the data set, and presented to the neural network.
2.4. Examples 2.4.1. Example 1: regression One extension of the Gaussian noise model (Eq. (9)) is a regression model with an additive noise. The true probabilistic rule is given by y = f ( x ) + n,
(19)
where f ( x ) is a deterministic function, which is called the true function, and n is an M-dimensional random variable expressing noise. Suppose that each component of n is independently and identically distributed with the density r(n). Then, the conditional probability density of y given x is M P(YIX) - 1-I r(yi i=1
2
-
-
f-(X)).
(20)
A point 0, is called a critical point of a function F(0) if all the partial derivatives OF/O0 i are zero at 0,. A critical point 0, of F is a minimum (maximum) if there exists an open neighborhood of 0 such that F(0) >~(~<)F(0,) for any 0 in the neighborhood. A saddle points is a critical point that is neither a maximum or minimum, that is, a critical point 0, is a saddle if and only if for any open neighborhood of 0 there exist 01 and 02 in the neighborhood such that F(01) < F(0,) and F(02) > F(0,) hold.
Geometry of neural networks." natural gradient for learning
741
If one has a neural network model {f(x; 0)} and knows the noise model r(n), the statistical model of neural networks is given I by M
p(ylx; 0) - IX r(yi - f.(x; 0)).
(21)
i=1
It is easy to see that the batch learning rule is given by
N M r,(y}V) _ j~(x(V); 0(t))) aJ~(X(V); 0(t)) 0(t + 1) -- 0(t) -- 13~v-1 ~i=1 r(Y}v) -- J~(x(V) ",0(t))) ~0
(22)
2.4.2. Example 2: classification- two classes
When a learning machine is used for classification problems, the output of training data takes a value in a finite set, whose elements represent the classes, in the case of two-class classifications, the simplest method is to use a leaning machine with binary output y c o~ = {0, 1}: the code for one class is " l " and the other "0" [8, Section 6]. For a given input vector x, the true probabilistic rule returns "1" with probability f ( x ) and "0" with 1 - f ( x ) , which can be written by p(y]x) : f(x)Y(1 - f(x))1-y.
(23)
In the binary classification problem, the optimum classification boundary is given by {x E 5flf(x) = 0.5}. Then, the estimation of f ( x ) is essential to the classification problem. A usual method of estimating f ( x ) is to prepare a parametric family of functions {f(x;0) : Y" ---+ ~ = [0, 1]10 E t9}, assume the probability p(ylx; 0) - f(x; 0)Y(1 - f ( x ; 0)) l-y,
(24)
and estimate the optimal 0 for given training data. The log likelihood function is given by N
L(0) -- Z { y (v) logf(x(V); 0) + (1 -y(V))log(1 - f(x(V); 0))}.
(25)
v=l
This is equal to the empirical loss function for g(y, z) - - y log z - (1 - y) log(1 - z),
(26)
which is called the cross-entropy. 2.4.3. Example 3." multiple attributes
As an extension of the cross-entropy for two-class classification, the model for multiple independent attributes is defined [8, 6.8]. The output is an M-dimensional binary vector, y c {0, 1}M, and the conditional probability of y given x is defined by M
p(y]x; 0) - I I f . ( x ; o)y/( 1 - f ( x ; 0)),-yi, i:1
(27)
K. Fukumizu
742
where {f(x; 0) } is a neural network model with M output units whose output values range within [0, 1]. In this model, the output variables {yi} are mutually independent, because the joint probability density function p(ylx; 0) is defined by their product. Each output component is considered to represent an attribute. The probability that the ith attribute is taken for a given input x is equal to f.(x; 0). This is not the model of multiple-class classification, because all of yi can take 1 or 0. 3. Riemannian metric, information geometry, and natural gradient
3.1. Statbtical asymptotic theory Although the maximum likelihood estimator t) is a random vector, which takes various values depending on the random sample, its asymptotic behavior follows some general rules for an infinite number of samples. This subsection gives a brief review of the statistical asymptotic theory of a general parametric estimation problem. Let {p(z; 0)]0 E E m} be a parametric family of the probability density functions on a measurable space ( ~ , ~, ~t). A probability that generates observable samples is called the true probability. Assume here that the density function of the true probability is included in the model {p(z; 0) 10 c Em}, and is given by p(z; 00). The parameter 00 is called the true parameter. A sample {z(1),z(2),... ,z (N)} is independently distributed according to the true probability. First, it is known that the maximum likelihood estimator 6 converges almost everywhere 3 to the true parameter 00 under some regularity conditions when N goes to infinity [9,10]. This is called the asymptotic consistency of the maximum likelihood estimator. The simplest example is the estimation of the mean value of a normal distribution. Let p(z; a) be the density of the one-dimensional normal distribution with mean a and variance 1
p ( z ; a ) _ lv/~-~exp { - ~ 1( z -
} a) 2 .
(28)
The maximum likelihood estimator is given by the sample average: 1 N ^ - - - Z z (v). a Nv-1
(29)
Another example is the estimation of the variance. Let
q(z;v)=12v~_~ exp { - ~ v1z
q(z; v) be
2} .
defined by (30)
The maximum likelihood estimator ? is the sample variance 3 We say a sequence of random variables Xn converges to a random variable X almost everywhere if there is a measurable set A such that the probability of A is one and Xn(m) ~ X(m) for all m E A.
Geometry of neural networks: natural gradient for learning
743
V ~Nv=l
In these cases, the asymptotic consistency is a rephrase of the law of large numbers. It is possible to discuss a more minute behavior of the maximum likelihood estimator. Because the difference between t} and 00 converges to zero, it should be enlarged for a closer look. The variable v/N(t}- 00) presents the appropriate order for this purpose. It is known [10, Section 6] that under some regularity conditions the random vector x/N(l~- 00) converges in law 4 to a multidimensional normal distribution; v @ ( 0 - 00)
>N(0, G(00) -1)
in law,
(32)
where N ( m , V) shows a random vector subject to the normal distribution with mean m and variance-covariance matrix V, and G(0) is a Fisher information matrix defined by Gab(O)
_
_
f ~ 0log p(z; 0) 0log p(z; 0)
00a
00 b
p(z; 0)d~(z).
(33)
In the example of Eq. (30), the Fisher information (scalar in this case) is given by G(v) =
2v 2 - 2 v
1
p(z; v)dz - 2 v 2"
(34)
Then, the asymptotic theory asserts 1
u
x/N Z ( (z(v))2 - v0)
~N(0, 2 v2)
in law.
(35)
v=l
This is a rephrase of the central limit theorem, because the variance of z2 is equal to 2v 2 .
3.2. Riemannian metric on a stat&tical manifold
The limiting covariance in the asymptotic theory essentially depends on the parameterization. To see this, consider a new parameterization of the example of Eq. (30) by introducing cy = v/~. The model with the new parameterization is v/~
{
exp --~G-gG2 .
(36)
In this parameterization, the Fisher information is given by (7(o)
4
2 -
cr 2 .
(37)
We say a sequence of random variables X. converges to a random variable X in law if for any continuous, bounded function ~, E[~(X.)] ~ E[~(X)] holds.
K. Fukumizu
744
Assume that the true parameter cy0 is equal to 1. Then, the true parameter v0 is also 1. The asymptotic variance of v/N(~- 1) is 2, while that of v/N(~ - 1) is 1. It would be nonsense, if one should consider the former estimator superior to the latter. This difference is just caused by the parameterizations, and the behavior of ~ coincides with v/~ (= ~). As is shown above, the absolute value of the asymptotic variance of a specific parameterization is not so meaningful. Because the definition of the maximum likelihood estimator does not depend on the parameterization, an intrinsic description, which does not use parameters, must be possible. The Kullback-Leibler divergence works as an intrinsic measure for two different probabilities. Consider a point p(z; 0) in the functional space of the model and another point p(z; 0 + A0) which is very close to p(z; 0). The Taylor expansion gives an approximation of the Kullback-Leibler divergence by
p,O! } ,~,Ep(z;o)[logp(z;0)]
I
g(p(z; O)llp(z; o + A0)) -- Ep(z:O) log i,p(z; 0 + A0)
[
-Ep(z:O) logp(z;
=!
2
O)+ Za =ml
Ep(z;o) [ (a=~l ol~
~0 a
0)
~logp(z; 0) A00+
~0 a
la~lO21ogp(z;O) AoaAOb1 ~oa~o b , =
-2
P(z; 0) A0b) ] A0a) (b=~l ol~ ~0b
(38)
m
= 2 Z
Gab(0)A0aA0b"
(39)
a,b=l Tangent space at p(z;00) 11
Tangent vector y = Ea(0a _0g) c3 00a ./: .: .-
Space of the parametric model embedded in all the probabilities Fig. 5. Tangent space of a statistical manifold.
Geometry of neural networks." natural gradient for learning
745
To obtain Eq. (38), the equalities (40)
Jfz p(z; 0) 81ogp(z; ~0a 0) dz -- ~ ~ Jfz p(z;0)dz = 0 and
fp(
,;0)
~2 logp(z; 0) dz ~0~0 b
/, =
-
/
0) 8logp(z; O) 8logp(z; O) dz
p(z;
~0 a
Jz
-
-Gab(O)
80 b
(41)
are used. To formulate the asymptotic theory using intrinsic quantities, it is necessary to express Eq. (39) with variables independent of parameterizations. An infinitely small amount of deviation from a point p(z; 00) can be approximated locally by a tangent vector at p(z; 00), which is a vector tangential to the model in the space of all the probabilities (Fig. 5). The tangent space is the set of all the tangent vectors, and is represented as an affine space in the space of all the probabilities. 5 The small deviation A0 can be formally considered to be a tangent vector
(42)
"~2~ 8 A0 a 80-----a,
X =
a=l
where {O/~0a}m_l defines a basis of the tangent space with respect to the specific parameterization 0. It is important to note that the definition of X is independent of the parameterization, if infinitesimal values are considered. In fact, if r is another parameterization, the tangent vector X can be rewritten as m
X =
Z A0a ~
~
m
~
a=l
m
ZZ a=l
i=1
~0a
~
m
Am'
~0)/"
~
m
~0 a ~m/ "=
A~ i=1
(43) ~0)i"
Thus, the tangent vector X is defined intrinsically. The cotangent space is defined as the dual vector space of the tangent space. A cotangent vector is also independent of parameterizations. Consider the differential of logp(z; 0); m
d(logp(z; 0)) = ~
~ logp(z; ~0a O) dO~.
(44)
a=l
The basis {d0a}~ml in the cotangent space is a dual basis of {~/~0a}, which means the two bases satisfy
/d0.
, ~-~
- 8ab
(45)
F o r the rigorous definitions of a tangent space and a tangent vector, see a textbook on manifolds. [11] or [12], for example. F o r a naive understanding, think of a statistical model as a smooth surface embedded in a Euclidean space.
K. Fukumizu
746
for the coupling (,) of the dual spaces (see Appendix A for the fundamentals of a dual space). In the expansion of Eq. (38), the term
zm
~logp(z;~0 a 0)
A0 a
a=l
can be rewritten as (d(logp(z; 0)),X).
(46)
This form is independent of the parameterization if A0 is very small. From Eq. (39), the small change X from p(z; 0) causes the difference K(p(z; e)Ilp(z; e + All)) ,~ ~1 Ep(,.;o) [(d(logp(z; e)) ,X)2]
(471
to a kind of "distance" of the densities. Therefore, the norm of the tangent vector X should be measured by Ep(,.;0)[(d(logp(z; 0)),X}2], not by a usual Euclidean norm in a specific parameterization. In other words, a metric or an inner product
g(X, Y) = Ep(z;o)[(d(logp(z; 0)),X)(d(logp(z; 0)), Y)]
(48)
is naturally introduced in the tangent space. In terms of tensors, the symmetric tensor
g - Ep(,;o)[d(logp(z; 0)) | d(logp(z; 01)]
(49)
in the tensor product of the cotangent space defines this metric. If a specific parameterization 0 is taken, the inner product of m S~ZSa
m
~~
and
Y- ZY
a=l
a ~0---~
a=l
is expressed by m
gO(, Y) = ~
Gab(O)Xay b.
(50)
a,b=l
This shows that the Fisher information matrix G(0) is an expression of the intrinsic metric # with respect to the parameterization 0. The metric tensor # induces a metric 9" in the cotangent space, which is defined by
g*(g(.,X),g(.,Y))
(51)
for two cotangent vectors g(.,X) and g(., Y). If one uses a parameterization 0 and selects two cotangent vectors 11 - ~ a qa dOa and ~ - ~-'~b ~bd0b, their inner product is written by m
-
c a,b-1
(52)
Geometry of neural networks: natural gradient for learning
747
where Gab(O) denotes the (a,b) component of G-l(0), the inverse matrix of G(0). Thus, G -1 (0) is an expression of 9" with respect to the parameterization 0. In general, a metric in the tangent space of a manifold is called a Riemannian metric, and a manifold with a Riemannian metric is called a Riemannian manifold [12]. A parametric model with the metric Eq. (48) is a Riemannian manifold, and it is called a statistical manifold. This Riemannian structure can describe local properties, or the asymptotic behavior of the first-order, of the maximal likelihood estimator ^[13,14,15]. Let 0 be the maximum likelihood estimator, and m
y
~ ( ~ a _ 0~)~0a a=l
be a tangent vector reflecting a deviation from the true probability p(x; 00). Consider a cotangent vector m
-
Go
(00)(O
-
(53)
b=l This is independent of the parameterization if N goes to infinity. The asymptotic theory means that the limiting distribution of x/~g(.,X) is the normal distribution with mean 0 and variance-covariance identity, where the variance and covariance are calculated by an arbitrary orthonormal basis with respect to the induced metric g* in the cotangent space. Further geometric formulation of a parametric model is possible using a connection, which is an important notion in differential geometry. This geometric treatment of parametric estimation is called information geometry [13,14,15]. However, these topics are far beyond the scope of this chapter.
3.3. Natural gradient- gradient of a function on a Riemannian manifold The empirical loss function is defined on the Riemannian manifold of a neural network model. In the usual learning rule Eq. (16), the gradient with respect to a specific parameterization is used. However, such a gradient is not necessarily meaningful, because its direction can change according to the parameterization. For example, let F be a function on N2 defined by
F(x,y) = x + (y + 1) 2.
(54)
Consider two parameterizations 0 and o~ defined by (02 02) --(X Y!2 and (0)1,0) 2) = (x, 2y). Then, the function F is given by F ( 0 ) - 0 ] + (02 ~-'~j and F(o~) = co~ + ( 89 2 + 1)2. Let p be the origin of ~2. The gradient of F at p with respect to 0 and r are given by ~F/80= (1,2) and ~F/8o = (1, 1) respectively. Considering that y-coordinate is the half of co2, 8F/8o corresponds to (1, 89 in the tangent space of R 2 at p. Therefore, the gradient of F differs depending on the parameterizations (Fig. 6).
748
K. Fukumizu (a) Gradient i n (01,02)
(b) Gradient i n ((01,032)
2.5
2
1.5
% ,
-
%
1
0.5
0
-0.5
-1
-0.5
0
0.5
1
01
1.5
Fig. 6.
2
2.5
3
-0.5
0
0.5
1.5
2
2.5
3
Gradients in different parameterizations.
If the gradient depends on the parameterization of the space, what is the most to descend the surface of a function? The Riemannian structure on a manifold gives the answer. Let S be a Riemannian manifold with a Riemannian metric g, and F be a function defined on S. To consider the steepest direction, the change of F caused by a small deviation of a point of the same amount and different directions should be evaluated. The norm of the small deviation or the tangent vector is measured by the Riemannian metric. The natural definition of the steepest direction of a function F is as follows. Let be a small positive number, and
natural direction
m X = ZA0i /=z
00;
be a tangent vector at a point p in S. The steepest ascent direction is a vector that achieves the maximum of AF = F(0 + A0) - F(0)
(55)
in all of X satisfying g(X,X) = ~2 for infinitesimal e. The circle must be defined by the Riemannian metric g (Fig. 7). Note that a tangent vector is identified with a point in the manifold S in a small neighborhood ofp. In the coordinate 0, the Taylor expansion of AF gives the approximation m
OF
m
AF~ Z-~6AOa= ~-~ Gba0-0-a OF GbcAOC' a=l a=l
(56)
where (Gbc) is the matrix that represents the Riemannian metric with respect to the coordinate 0, and (G ha) is the inverse of (Gba), which represents the induced metric in the cotangent space. In other words, this approximation is
Geometry of neural networks." natural gradient for learning
~q,, ~ , ~ '
749
~,.........
~'~ ~!',i.i,i)? ' ''
F(O)
~i~!i?i:i?, ,;
Unit circle around p
Fig. 7.
Natural gradient of a function on a Riemannian manifold. (57)
AF ,~ (X, dF) - g(X, g* (., dF)), where ~F dF - ~--~--0-h- d0 a a
is the cotangent vector given by the total differential, and g* (', dF) is a fixed tangent vector defined by the duality. Then, under the restriction of g ( X , X ) - ~2, the variation AF takes the maximum if and only ifX is in the same direction as g* (', dF), that is, AO = ocG-1 (0) OF ~0
(58)
for some constant 0r This is called the natural gradient 6 in [3]. It is important to note that the infinitesimal change of the point p according to Eq. (58) is independent of the coordinate system or parameterization. If Eq. (58) is applied to the learning rule of a neural network, Eq. (16) should be modified as; 0(t + 1) - 0(t) - 13G(0(t)) -1
~Eemp ( 0 ( t ) )
8O
(59)
The term "natural" sounds a little strange in mathematics. Mathematicians think it as a usual gradient in Riernannian geometry. However, this term has been widespread in the field of neural networks, and this chapter follows this convention of terminology.
750
K. Fukumizu
where G(0) is a Fisher information matrix or the Riemannian metric of the neural network model. This is called the natural gradient learning.
4. Natural gradient of neural network models
If the natural gradient is applied to specific models of conditional probability density functions p(ylx; 0), the integral in the Fisher information matrix can be sometimes calculated further, and different learning rules appear for various statistical models. This section describes the explicit forms of the natural gradient in neural network models, and explains how the natural gradient learning is similar to and different from the Newton and quasi-Newton methods. There are computational problems in applying the natural gradient: one is the integral in the definition of a Fisher information matrix, and the other is the inversion of a matrix. This section discusses also the solutions of these problems. 4.1. Natural gradient learning f o r various models 4.1.1. Regression
For the model defined by Eq. (21), the Fisher information matrix is given by Gab(O)-
M S q(x)S H r ( y i -fi(x; 0)) i=1 •
9 --
= Er
r(yj - j~(x; 0))
I, r(yk - A(x; 0))
80a
-
/ 0f(x;00a0)T0f(x;00 b0) q(x)dx.
-
e0 b
dy dx
(60)
Neglecting the constant factor, the natural gradient learning rule is
0(t + 1) - 0(t) - 1~ -~ (0(t))
i~Eemp(0(t)) ~0
(61)
where the matrix (~ab is given by
~.b(o)
f ~f(x; o) ~ Of(x; o) q(x)dx ~0b " J ~0"
(62)
The rule has the same form for all regression models, independent of the noise model
r(n). 4.1.2. Classification
For two-class classification problems formulated by Eq. (24), the Fisher information is given by
751
Geometry of neural networks: natural gradient for learning
O) + (1 - y ) Gab(O)--yE{0,1} Z /{Y ~logf(x; ~0a x
y
~log(1 ~0 a -f(x; 0))}
Ologf(x; O) alog(1 - f(x; 0))) ~0b + (1 - y ) ~0b
• (y f(x; O)+ (1 - y ) ( 1 - f(x; O)))q(x)dx 1 Of(x; O) ~f(x; O) q(x)dx. f ( x ; O ) ( 1 - f ( x ; O ) ) ' - a~0 ~ ~0b
-
(63)
The natural gradient of two-class classification has a different form from that of regression.
4.1.3. Multiple attributes In a similar manner to classification, the Fisher information matrix of this model is given by
Gab(O)--
i•lf
1 Off(x; O) ~fi(x; O) q(x)dx, f.(x; O) (1-- f.(x; O)) ~ ~0 ~0b
(64)
and the natural gradient is calculated using this formula.
4.2. Approximation by empirical data A Fisher information matrix is defined using the integral over the input and output space in general. However, if the number of dimensions is large, the numerical integration is computationally intractable. In addition, the input density q(x) must be known for calculation of the integral. As Section 4.1 shows, the Fisher information matrix of useful neural network models can be written by an integral with input probability q(x). One practical solution of the computational problem is to use the given input data for the approximation of the integral. A Fisher information matrix approximated with empirical data is called an empirical Fisher information matrix. The empirical Fisher information matrix for regression models is
1 ~ ~f(x(V); O)T ~f(x(V)" O) Gab(O) -- ~ v=l ~0a o0b'
(65)
and the empirical Fisher information matrix for two-class classification is t.
1
X
l
Gab(O) - - ~ ~ f ( x ( V ) ; O ) ( 1 _f(x(V);O) )
~f(x(v);o) ~f(x (v)"O) aOa
aOb'
9
(66)
The empirical Fisher information matrix for multiple attributes can be also obtained similarly. It can work as a substitute of the exact Fisher information in natural gradient learning.
752
K. Fukumizu
It is important to note that the integral of the general definition (Eq. (33)) cannot be replaced by a sample average using {(y(V)x(V))}. The integral over the output space should be calculated using the conditional density p(y]x; 0), which is given by the current parameter 0, while the output data y(~) is a sample from the true probabilistic rule p(ylx). An experimental result on the natural gradient with empirical Fisher information matrices is shown in Fig. 8. In these simulations, a multilayer network with the sigmoidal activation function is used. The network has 4 input units, 2 hidden units, and 1 output unit. Given 100 training data, the network is trained using the steepest descent (error back-propagation) in (a) and the natural gradient in (b). Training data are selected cyclically one by one, and used in the online algorithm Eq. (18). The learning coefficient is given by ]3t = [3/t. Because the convergence speed is influenced critically by the choice of the learning coefficient, many values have been tested for [3, and some of them are shown in the graphs. In the steepest descent, 13= 0.14 shows the fastest convergence, and 13 = 0.004 in the natural gradient. Fig. 8 shows efficiency of the natural gradient. The best convergence in (b) outperforms the best of (a), which shows the effectiveness of the natural gradient. This is only a very simple application, but the natural gradient shows a better performance also in more practical problems, as shown by the experimental results in Park et al. [5]. 4.3. Natural gradient and Newton method The natural gradient uses the matrix which includes the derivatives of the secondorder. This may remind a reader of the Newton method, which is the standard second-order method for optimization, and is applied also to neural networks [16,17]. These two methods are closely related, but are different, as described below. Let F(0) be an arbitrary function for minimization. The parameter update rule of the Newton method is given by
'
-9
'"
0-1
(b) natural gradient
(a) steepest descent
10 ~
-
--
9
"
"
--
..
10 ~
~ = 0.001 1 3 = 0 . 0 1
~- "
- 13=0.1
...
13 = o.14
.\. . . . --. .
_ o l 0-1
\
0
0
\
=O l 0-2 .,,.~
\\
.. \
\\ '..
~\( -
" ~
..
"'\
1~=0.001
0.002
I --
13 =
L---
~ : 0.004
\ \
\\ ..
N .,..~
13 = o.oool
I '
\ \
.~ 10 "2
\
J- -
.\
k \
=1 10 -3
t3
t3
7" : ~ - - ~ 10
-4 10 o
102
104
Number of iterations (log)
Fig. 8.
0.a
"~_. - ~_. \ . . . . . .
10 -4 10 ~
10 2
Number of iterations (log)
The comparison of learning curves.
1 O"
Geometry of neural networks: natural gradientfor learning
0(t -+- 1) = 0(t) - 13H(0(t))-I 5F(0(t)) 80 '
753
(67)
where H is the Hessian matrix of F, defined by
82F H.b(0) = 50ao0b (0).
(68)
The convergence of the Newton method is excellent, much faster than the first-order method like the steepest descent, when the parameter is close to the optimal one. If the function for minimization is minus the log likelihood function, the Hessian is given by
N
Hab(O) = E
1 5p(x(V) Ix(V); 0) 8fl(x(V)Ix(V); 0) v=l {p(y(~)Ix(v); 0)} 2 9 9 N
1
- Zp(y(V)Ix(~). O) v=l
~2p(y(v)Ix(V).0 ) ~oa~o b '
_ ~N 8 logp(y (~)ix(V) ;0) 8 logp(y(~) [X(V)., 0) 80 a 80 b v=l X
1
- ~p(y(V)Ix
O2p(y(~t Ix(V/; 0 ) 80a80 b
(69)
The first term is the general form of the empirical Fisher information matrix. Assume that the true density p(y]x) is realizable by the model and p(y]x; 00) = p(y[x). The expectation of the second term is zero when the parameter 0 is equal to 00. If 0 is close to the optimum parameter and N is very large, the first term is dominant from the law of large numbers. Therefore, the natural gradient is an approximation of the Newton method around the optimum point. This also assures very fast convergence of the natural gradient learning around the optimum parameter. It is important to note that the second term in Eq. (69) is not negligible at a parameter far from 00. Moreover, if the true conditional density p(ylx) is not included in the model, this term is not zero even when 0 is equal to the best possible parameter 0,. The Fisher information matrix is different from the Hessian in these cases. Thus, the natural gradient and the Newton method give different learning rules except near the neighborhood of the true parameter which gives the true conditional probability density. Another difference between the natural gradient and the Newton method appears when a specific statistical model is assumed. The Hessian o f - L ( 0 ) is defined using training data, including both the output and input. On the other hand, the Fisher information matrix is defined by the expectation with respect to the probability at the current parameter. If further theoretical calculation of the expectation is possible, the simpler form should be used for calculating the empirical Fisher information. The models of regression, two-class classification, multiple attributes
754
K. F u k u m i z u
are such examples. Their Fisher information can be written without the output variable y (Eqs. (62),(63), and (64)). The empirical Fisher information based on these formula is calculated only by input data.
4.4. Natural gradient and Gauss-Newton method Consider the least squares loss function g ( y , z ) _ l [lY- z[[2, or equivalently, the regression model with an additive Gaussian noise. The Hessian matrix o f - L ( 0 ) is given by N
N
Hab(O) = Z 8f(x(~); 0)T ~f(x(V); O) -- Z ( y (v) -- f(x(~)" 0)) T o2f(x(~); o) v=l
o0a
~0b
v=l
'
~oa~ob
(70) "
The Gauss-Newton method approximates the Hessian by using only the first term of the Eq. (70); N
/~ob(0) -- ~
v=l
8f(x(~); 0)T 8f(x(V); 0) ~0a
80b
(71) "
This modification has two advantages. One is to overcome the computational cost to calculate the second derivatives. The other is to make the matrix positive definite. The Hessian matrix is not always positive definite in general when the parameter is far from the optimum. Because the first term is always semi-positive definite, this approximation prevents the unstability of the algorithm. The application of the Gauss-Newton method to neural networks has been also studied [16,17]. The natural gradient learning of regression models with the empirical Fisher information matrix is exactly the same as the Gauss-Newton method, as is shown by Eq. (65). However, if the statistical model is different from regression, the natural gradient gives a different learning rule from the Gauss-Newton method. In this sense, the natural gradient is a wider methodology than the Gauss-Newton method, and presents a theoretical basis for the Gauss-Newton method in regression problems.
4.5. Adaptive natural gradient learning To use the natural gradient learning, it is necessary to calculate the inverse of a Fisher information matrix. This is computationally very expensive if the number of the parameters is large. It sometimes makes the learning rule impractical. To overcome this problem, an adaptive method has been proposed in online learning of neural networks [3]. The method calculates the matrix inversion by an iterative rule. For simplicity, assume that a neural network model has only one output unit. Consider a statistical model expressed by p(y[x; 0) - r(ylf(x; 0)),
(72)
where f(x; 0) is a neural network and r(y[z) be a conditional density function of y given z. As Section 4.1 shows, many statistical models are written in this form. The Fisher information matrix is given by
755
Geometry of neural networks: natural gradientfor learning
Gab(O )
f
- -
r(/(YIf(x;
,]
Ep(ylx;O)Lk r ( ~
~f 0(X;a)0)a~f0a0 (X;b
q(x) dx.
(73)
Let m be the dimensionality of 0. If an m-dimensional vector F(x; 0) is defined by Fa(x; 0 ) - ~EpCvlx;0)[(r'(Ylf(x;O))) 2 ] r ( y ] f ( x0)) ;
~f(x; 0 ) ~a 0
,
(74)
the empirical Fisher information matrix can be written by ^ 1 N G(0) - ~ Z F(x(V); 0)FT (x(V); 0)" V--1
(75)
The left-hand side of Eq. (75) is the sum of matrices of the form vvf for a vector v. The inverse of such a matrix can be calculated according to the formula: (A +
vvT) -1
A -1 -
--
1 A_lvvTA_ 1 1 + vTA-lv
(76)
Suppose 0 is fixed for a while, and let G, be the empirical Fisher information matrix up to n training data. Because of the equality (
Gn=
1-
!)(
Gn-l+
1 n--1
F,F T
}
(77)
for F, = F(x('); 0), the application of Eq. (76) leads to
Gnl _
-1
1
•
Gnl 1
n
-1
G;11F~F~G~.11
n
1
1 --n(1
_ Fn" --n)1 + nl E Tn G~ll
(78)
Using Eq. (78) as a basis, the adaptive natural gradient learning is defined by 0(t + 1) -- 0(t) + ~3tK(t ) 8l~
1i(,
K(t + 1) - 1 - e------~t
(79)
80 1
1 - et (1 - et) + e,tFtXK(t)ft K(t)FtFtXK(t)'
(80)
where Ft = F(x(t); 0(t)). The coefficient ~t decides a forgetting rate. If the parameter 0 converges to the optimum one, the matrix K(t) converges to the inverse of the Fisher information matrix at the optimal parameter. This adaptive method remarkably reduces the computational cost necessary to the inversion of matrices. Park et al. [5] apply the adaptive natural gradient method to various problems including artificial and practical ones. Although the computational cost of adaptive natural gradient for one parameter update is more expensive than the ordinary gradient descent, their experiments show that the convergent speed of the former is about 10 times faster than the latter. In the same paper, they also give the general formula of adaptive natural gradient for networks with multiple output units. In such cases, Ft is an m x M matrix and an M x M matrix ( ( 1 - e,t)IM + e,tFtTK(t)Ft) must be inverted.
K. Fukumizu
756
However, it still reduces the computational cost drastically, because in the original natural gradient, a matrix of the dimensionality of the parameter 0, which is much larger than the number of output units, must be inverted. The control of the coefficient et is important in this method. When the parameter 0 changes, the effect of the past 0 should be forgotten to some extent to approximate the inverse matrix at the current 0. A small ct results in slow convergence of K(t), and a too large ct causes unstability of learning. In the experiments of [5], the order c t - O(+) shows a good performance. Scarpetta et al. [18] propose another interesting online method to approximate the inverse of the Fisher information matrix. They use the matrix momentum, a momentum term with a matrix coefficient, which was first introduced in Orr and Leen [19], and analyze the dynamics of learning using the framework of statistical mechanics.
5. Singularity of multilayer neural networks A statistical model of multilayer neural networks does not perfectly admit the structure of a Riemannian manifold. There are some parameter points where the Fisher information is not positive definite. This section explains the singularities of the Riemannian structure, which is very important to understand the dynamics of gradient learning and the effect of the natural gradient. The singularities are caused by the layered structure of the model. 5.1. Smaller networks embedded in a larger model
Consider a multilayer neural network model with linear output units. In the following, the function and the parameter of a network with H hidden units are written by f(n/ and 0 (#), respectively, to emphasize the number of hidden units. As explained in Section 2.1, all the functions given by the parameters in On form a function space, which is denoted by 5pu _ {f(n)(x; O(n)) . ~t __~ ~M[o(H) E OH}.
(81)
If a parameter 0 (H) is specified, one function f(m (x; 0 (H)) in 5PH is determined. The correspondence from OH onto 5PH is written by ~zn" |
~
5pn,
0 (n) ~ f(x;0(n)).
(82)
Fig. 3 depicts this map. Note again that the parameter space On and the functional space 5Pn are different spaces. It is important to note that the mapping ~n is not one-to-one, that is, different 0 (n) may give the same input-output function. It is easy to see that the interchange between (vj,, wj, ) and (vj:, wj:) does not alter the function f(x; 0), because the sum ~-~j vjq)(x; wj) remains the same by the interchange. In addition, if an odd function like tanh is used for the activation function, the sign flip (vj, wj) ~ ( - v j , - w j ) , does not change the resulting function f(x; 0) either. These two kinds of maps, hidden-
Geometry of neural networks." natural gradientfor learning
757
unit interchanges and sign flips, define transforms of OH, which do not change a function f ( x ; 0) (Fig. 9). Conversely, Chen et al. [20] showed that for tanh activation function any analytic transform T of OH such that f(H)(x;T(0(/4))) = f(/4)(x; 0 (/4)) for all 0 (/4) is given by a composition of hidden-unit interchanges and sign flips. Therefore, the same function f E 5P/4 is given by 2/4H! different parameters in | caused by these transforms. They define the symmetric structure in the parameter space. Sussmann [21] focused on how the functions given by a smaller number hidden units are realized in the parameter space | when the activation function is tanh. The functional space 5e/4_1 is trivially included in 5p/4, because a function of a network with H - 1 hidden units can be realized also by a network with H hidden units. However, the set of parameters 0 (/4), which give functions in 5p/4_~, is not necessarily trivial. Let f~H denote this set, that is, aH-
{0 (/4) ~ |
]3~o(H-l) ~ OH_I,f(H)(x;O (H)) --f(H-1)(X;~(H-1))}.
(83)
It is easy to see that there are at least three cases in f~/4 (Fig. 10);
c OHIv -
- o)
c Oz, lwj - o )
c ~J1J2 . _ {0(H) E |
(1 <~j<~H),
(84)
(1
(85)
A -- -t-wj2 }
(1 ~<jl < j 2 ~ H )
(86)
The first one is a parameter where vj - 0 so that the jth hidden unit plays no role. The second one is given by wj - 0 so that it responses tanh(0) - 0. In the third case, the jl th hidden unit and jzth hidden unit have the same (or opposite) weight vector, so that their behaviors are the same (opposite). These two hidden units can be merged into one unit without changing the overall function f . According to Sussmann [21], these cases cover all the possibilities that give a function in 5P/4_1 in the parameter space | Therefore, the subset ~/4 is given by a union of sJj-, N'j. and ~,j2Fukumizu and Amari [22] analyzed the embedding of a parameter in | into | in detail, and proved that there are many critical points of Eemp in fl/4 and some of them can be local minima under one condition. The part of the work will be shown in Section 6. 1~'2~W2
I -
Fig. 9.
The symmetric structure of the parameter space.
K. Fukumizu
758
A.
J
Fig. 10.
B~
J
,j2
A network given by a parameter in d j , ~j and ~.t. JlJ2"
5.2. Singularities of the Riemannian structure For any point 0 in the set ~ , which gives functions of smaller networks, the function f ( x ; 0 ) gives a singularity of the Riemannian structure on the manifold of neural networks. To see this, let 0, = (0, v 2 , , . . . , v/4,, W l , , . . . , w/4,) be a point in all. For an arbitrary Wl, the point 0 - (0, v 2 , , . . . , v/4,, Wl, w 2 , , . . . , w/-/,) gives the same function. This means that an L-dimensional affine space, which contains 0,, defines the same function. Similarly, a point in ~ j is included in an M-dimensional affine space that gives the same function in common. If the parameter 0 , - - ( V l , , . . . , v / 4 , , W l , , . . . , WH,) satisfies Wl, = -]-W2, , any point 0 such that u -'[- u = u ~ u gives the same function as 0,. In all cases, for any point in flu, the partial derivative o f f ( x ; 0) along an affine space vanishes. Therefore, for a statistical model Eq. (72), the Fisher information matrix at a point in flH is singular. Fukumizu [23] discussed whether these three cases cover all of the possibilities that a Fisher information matrix is singular. He showed that for the sigmoidal (or tanh) activation function the three cases are the only causes of singular Fisher information matrices. Theorem 1. Suppose that the probability density function on the input space q(x) is positive for all x c ~, and the activation function in the hidden layer is 1/(1 + e-t). Assume the statistical model of regression. Then, the Fisher information G(0) is positive definite if and only if 0 is in O H - ~II. From the viewpoint of statistical estimation, a parameter in ~ / is called unidentifiable. If the true function is given by such a parameter, the maximum likelihood estimator does not converge to a point but to the subset giving the same function. The asymptotic normality does not hold, of course. In the functional space 5PH, the parameters that give the same function are shrunk to a single point, which is
759
Geometry of neural networks: natural gradientfor learning
a singular point in 5PH. Fig. 11 illustrates this kind of singularities on the subspace all. For a parameter 0 ~ OH -- f~H, any small movement of the parameter induces a small change of the function in 5e/_/. However, for a point 0 E d ~ , the direction of w~-coordinate is shrunk to a single point, while the other coordinates Vl, ..., w2, ... remain the same. 5.3. Topics related to the singularity of Riemannian structure
The unidentifiability or singularity of neural networks causes various interesting and unusual properties. The property related to the gradient learning will be addressed in Section 6. Another important topic is the generalization error. Model selection criterions such as AIC [24] and M D L [25] are standard tools to achieve better generalization. However, many of the criterions including AIC and M D L assume the regularity of the Fisher information matrix at the true parameter. They are not applicable to multilayer neural networks if the true function is assumed to belong to
w1
A set giving the
/
V1
! ! iI
V2~ ..., VH,W2, . . . , W H
(a) Parameter space with unidentifiability Singularities
~1
)
V1
V2, ..., VH,W 2, ..-,W H
(b) Functional space with singularity Fig. 11.
The parameter space of neural networks, OH, has unidentifiable parameters, which correspond to singular points in the functional space 5ell.
760
K. Fukumizu
a smaller model in model selection [26]. Some interesting behaviors are known about the generalization error of the maximum likelihood estimator for an unidentifiable true parameter [27], and a general result on the generalization error of Bayesian estimation is obtained [28], while the correct asymptotic property of the maximum likelihood estimator is not known yet in general. It is also known that there is a different feature about overtraining, the increase of generalization after long learning, depending on whether the true parameter is identifiable or not [29].
6. Natural gradient learning in multilayer networks As explained in Section 4, the natural gradient learning shows good convergence speed around the optimum parameter. The natural gradient applied to multilayer neural networks has another advantage in the transient phase of learning [30]. The aim of this section is to give an explanation of this property based on the singularity of geometry of neural networks, discussed in Section 5.
6.1. Saddle points and plateaus During the learning of a neural network, a period of little decrease of the empirical loss is sometimes observed. After this period, a sudden decrease of the loss occurs. Such a flat interval in a learning curve is called a plateau (Fig. 12). In the learning curves of the steepest descent in Fig. 8, plateaus are actually observed. Analysis of online learning using the framework of statistical physics describes the dynamics of learning with a small number of order parameters, and find the plateau to be a common property of learning in neural networks [31]. This chapter does not go into the details on these studies, but one of the important facts is that a reason of plateau is the symmetry of hidden units explained in Section 5.1. A plateau appears when some of the vectors wj have the same value. Fukumizu and Amari [22] discussed the existence of plateaus from the mathematical viewpoint. They showed that there always exist saddle points of the empirical loss function in the parameter set, where two hidden units have the same weight vector. A saddle point can cause a plateau, because a parameter approaches to the saddle from the hillside, moves very slowly with a small gradient around it, and leave it by finding the decreasing direction at last (Fig. 12).
h_*
Plateau
0
h_ S..
W
,,
_
(a)
Fig. 12.
~
time
(b)
(a) Plateau in learning. (b) Dynamics around a saddle point.
Geometry of neural networks." natural gradientfor learning
761
While Fukumizu and Amari [22] considered only the case of one-dimensional output to discuss local minima, the results on the saddle points can be extended to the case of multi-dimensional outputs. Let EH(O(H)) be the empirical loss function of neural networks with H hidden units. To distinguish the model size, a different notation is used here for the parameter of f(H-1)(X; 0(H-l));
H f(H-1)(X; 0(H-l)) = Z CJq)(uTX)'
(87)
j=2
where 0 (H-l) -- (r 0!--'>-
cT, u2T,..., uT)T. Note that the indices run from 2 to H. Let
.
,.. , , 2,,. , H,) T C | be a critical point of E,_I, that is, OEH-1 ( 0 ? - l ) ) / a 0 (H-l) -- 0. The function f(H-1)(x; 0~H-l)) can be realized in the parameter space OH in many ways. According to the discussion in Section 5.1, for any v c ~a4 and w r ~L, the points defined by
]tv(0~H-l)) --- (V, ~2. -- V, ~ 3 . ' ' ' ' , ~H*' W2., W2., W3,,---, WH.)
(88)
C~w(0~H-l)) -- (O, ~ 2 . , ' ' ' , ~H., W, W2.,... , WH.)
(89)
and
realize f(x; 0~H-1)). Critical points of E (H) can be found as Theorem 2. (1) I f v = )~g2, for )~ E [~, then 7v(0!H-l)) is a critical point of EH. (2) The point ct0(0!H-~)) is a critical point of EH. For the proof, see Appendix B. In the parameter set giving the same function as f(x; 0!H-l)), the critical points of EH appear in the specific locations. All the points giving f ( x ; 0!H-l)) in OH are not critical points. The critical points given by Theorem 2 can be saddle points or local minima. In the case that the number of the output units is larger than 1, all of them are saddle points; Theorem 3. The critical point a0(0!H-l)) is a saddle point. I f the number of output units M is larger than 1, the critical point 7~r (0~H-l)) is a saddle point. For the proof, see Appendix C. In the case of M = 1, some of the critical points {7~r (0!H-l))] k C ~} can be local minima [22]. Theorem 3 asserts that there always exist saddle points in the symmetry of hidden-unit permutations and of the sign flips.
6.2. Natural gradient escapes plateaus In comparison with other second-order methods, Rattray and Saad [32] showed that the natural gradient works effectively to shorten plateaus, in addition to the rapid convergence around the optimum point. They use the framework of statistical
762
K. Fukumizu
mechanics to analyze the dynamics of learning. This subsection presents another explanation of the dynamics by showing the effect of the natural gradient around the saddle points discussed in Section 6.1. In this section, it is assumed that the loss function g is the least square g(y,z)= 89 2, and the activation function satisfies r and 5q0(x; 0)/Ow ~ 0. For simplicity, the discussion below is limited to networks with two hidden units. However, the following results can be easily extended to a general case, because the critical points in Section 6.1 are defined using only a relation between a network with one hidden unit and a network with two hidden units. Let 0! l) = (~,, u,) be a critical point of El, the empirical loss function of networks with one hidden unit, such that the Hessian matrix of E1 is not singular. By the condition of a critical point, the equalities N
E ( y (~) -f(')(x(~); 0~')))q~(x(~); u,) - 0 (90)
N
E(Y(V) - f(1)(X(V)" 0~I)))T ~* ~q)(X(V); U,) = 0T v=l
~
~lll
hold. A derivative of q~ is considered to be a row vector here. The critical point 0~2)-a0(0! 1)) is given by (Vl,V2,Wl,W2)=(0,~,,0,u,). Assume a point 0 (Vl, Wl, v2, w2) is close to a0(0~l)). Here, a different order of the coordinate components is used for simplicity. Write the components as Vl--ep,
Wl =~q,
vz--~,+~r
and
w2-u,+es,
(91)
where e is a small positive number, and h = (pX, qX,rX,sX)X is an arbitrary vector. The gradient of E2 around 0~2) is given by 5
~ =
p e
~
5
(o!
+
q
r
5
S
= ell(2) (0~2))h,
(92)
where H(2)(0! 2)) is the Hessian matrix of E2 at the critical point 0!2). It is easy to check that the Hessian matrix is 0 A AT 0
where
0 0
0 0 c
0
0
s
0
0
CT D
'
(93)
763
Geometry of neural networks." natural gradient for learning
A -- -- Z ( y (v) -- f(1)(X(v); o!l))) ~(P(X(v); O) ~w
v
'
v
c - ;, E ~( x(~) u~)
&p(x("); u, )
~W
v
D
f(1) (x(V); 0!1))) T
Z(y(V ) -
-
-
v
~2q)(X(v) r
awaw
r162 x-" a~(~(~); ",) a~(x(V); ",)
+
~w
v
aw
; u,)
"
This Hessian is nonsingular in general, and the gradient around the critical point can be written as aE2
~0 (0) - O ( s ) .
(94)
The Fisher information at 0!2) is singular. Intuitively, the multiplication of an almost singular matrix is expected to lengthen the gradient around the critical points. This can be proved as follows. Because the first derivative of the function f(2) is given by a ~1 a b-w71 a f(2) (X; 0) --
~
q~(x; wl) a~(X;Wl)
V1 ~w ~(x; w2) V2 Oq~ww2)
~
aq,(,,;0) sa aw aq,(x;O) gC aw
~(x; u,) ' ~, &p(x;U,o)w
(95)
the Fisher information matrix around the critical point 0!2) has the form
G(O)--(O(82)O(s)
O(1)O(s))"
(96)
The inverse is given by
G(0)-I-
(O(~2) 0(1)
O(1))
o(1)
9
(97)
Therefore, the natural gradient around the critical point has the form
O(0)_l ~g2 --~-(0)--
o(~) 0(1) O(1)
"
(98)
o(1) Eqs. (94) and (98) show a clear difference of the dynamics between the ordinary gradient and the natural gradient learning. As the parameter approaches to a critical
764
K. Fukumizu
point given by 0t0(0,), the change of the parameter becomes very small in the ordinary steepest descent, which causes a very slow decrease of the empirical loss function. On the other hand, in the natural gradient learning the change of the parameter becomes very large in the direction apart from the subset {01wl = 0 or vl = 0}. This strong repulsion avoids the slow dynamics caused by the saddle point. Similar analysis is possible on the critical point 0~2,) -- ~tXr (0~1)) if the networks have a single output unit. Suppose to = (~,1, ~,2,b, 11) be a new parameterization of 0092 -- {f(2)(x; 0(2)) -- 101(p(X; Wl) nt- 102(p(X; W2)}, which is defined by r
-- 101 - - 02,
1
1II - - ~ ( W I
-- W2),
vl + v2 r
--
Vl + 10"~ _~
b
~
(99)
~ WVl I
- 4 - ~ 02 W 2 .
Vl + V2
101 -+- 102
This is well-defined as a parameterization unless vl + v2 = 0. Indeed, the inverse of this transform is given by 101 - -
89( r
-+- r
U2 __ 1(__r
+ r
W1 - - b + -~"2'r W2 -- b
q,
(100)
r162
- -_-T=q"
Let to = (~1,1], ~2, b) be a point close to the critical point to, which corresponds to 0 (2/ Write to as **
9
~1 = (2~,- 1)~, + 8p,
q = 8q,
~2 = ~, + 8r
and
b = u, + 8s,
(101)
where ~ is a small positive number, and h - (pT,qT, FT, sT)T is an arbitrary vector. With respect to this parameterization, the gradient and the Fisher information matrix around the critical point to, have the form <
0(82 )
E~(to) --
0(8)
(102)
-
and
G(to)--
0(8 4)
0 ( 8 3)
0 ( 8 2)
0 ( 8 2)
0 ( 8 3) 0 ( 8 2)
0(82) O(8)
0(8) O(1)
0(8) O(1)
O(a 2)
0(8)
0(1)
0(1)
(103) '
respectively. The proof is given in Appendix D. From the above equations, the natural gradient is of the order
765
Geometry of neural networks: natural gradient for learning
0@4) o@ o@ o@ o@ o @ o @ o @ (,o) -
o@ _
o@
0(~)
0(1)
0(1)
O(e)
0(1)
o@
0( 88 0(1)
0(1)
O(e)
0(1)
' (104)
which shows a similar dynamics to the case of 0~0(0!1)). The Fisher information matrix is not invertible at the exact critical point. However, when the update rule by a discrete time step is considered, the parameter in the natural gradient learning around the saddle points leaps away from them. This dynamics is expected to shorten the plateau phase. 7. Conclusion
In this chapter, the natural gradient has been discussed based on the geometric structure of the manifold of multilayer neural networks. It has been explained that the statistical formulation of neural networks enables one to introduce a Riemannian metric on the neural network model. This Riemannian structure is essentially an intrinsic quantity of the statistical manifold, which is defined in the space of all the probabilities. The natural gradient is the direction that gives the steepest descent for a change of small, fixed length measured by the Riemannian metric. In introducing the natural gradient, this chapter has emphasized the viewpoint that the metric of a parameter is not necessarily Euclidean but that a more natural metric can be introduced in many problems. In the case of parametric estimation, the Fisher information gives this. Another important point is that the Riemannian metric can be singular on some subsets in the manifold of neural networks. This has not been focused so much in the literature yet, but it presents interesting issues in learning of multilayer neural networks. In this chapter, the singularities of the Riemannian structure have been explained, and the effect of the natural gradient learning in the transient phase has been discussed based on the singularities. This presents a new explanation for the dynamics of natural gradient learning, which has been analyzed before with the framework of statistical mechanics (see also the chapters by Coolen in this book). What has not been emphasized in this chapter is an engineering viewpoint of multilayer neural networks. There are many sophisticated second-order optimization techniques applied to neural networks. See [16, Section l] or [17], for example. More experimental and theoretical studies, including comparison with other secondorder methods, are needed to verify the practical effectiveness of the natural gradient learning. The natural gradient is a very general and useful concept in optimization problems. It requires reconsideration of the metric, which is sometimes regarded as natural without a good reason. This concept would be helpful to look for natural metrics and natural learning schemes in many learning systems, including artificial and biological ones.
K. Fukumizu
766
Abbreviations AIC, Akaike's Information Criterion MDL, Minimum description length
Acknowledgments The author would like to thank Prof. Shun-ichi Amari for many useful and interesting discussions, and Dr. Masami Tatsuno for his careful reading of the manuscript of this chapter. Appendix A. Dual vector space
A summary on dual spaces is described here. Let V be an m-dimensional vector space. The dual space of V, denoted by V*, is defined by all the linear functionals on V; that is V*= {~: V
>~ l ~ :
linear}.
(A.1)
Note that V* is also a vector space with usual scalar multiplication and addition. Let {vl,..., Vm) be a basis of the vector space V. The linear functionals defined by f~i(Vj) = 8ij
(1 <, i <~m)
(A.2)
form a basis of V*. This is called the dual basis of { v l , . . . , Vm}. Evaluation ~(w) for C V* and w c V is often written by the coupling (,,v)
(a.3)
to emphasize the duality. Let O be a metric on V. For any functional ~ there is a unique vector w c V such that 9(', w) = ~. Therefore, the metric 9 induces a metric 9" on V*, which is defined by 9* (9(', Wl), 9(', w2)) = 9(Wl, w2)
(a.4)
for Wl, W2 C V.
Appendix B. Proof of Theorem 2
At the critical point, the following equations hold for 2 ~<j ~
~Uj (o!H-1))=v/-~'~'I~---Z'Z
'
; (H-I)
~q)(x(V);llJ*)=oT
(B.1)
767
Geometry of neural networks: natural gradient for learning
The derivatives by a vector variable are considered to be row vectors here. The partial derivatives of E/-/are given by ~EH~vj( 0 ) - ZzN~g(y(v),f(I4)(x(v)'O))q~(x(v);wj) ~ v=l
~E~I
N ~g (y(V), f(,) (X(v). O) )vj ~q~(X(V); Wj)
ewj (0) - Z ~ z v=l
'
ew
(1 ~ j ~ H ) ,
(1 <~j<~H).
(B.2)
Note that f(H)(X; 0) -- f(H-1)(X; 0!H-l)) for 0 -- 7~;2,(0~H-l)) or 0 - at0(0!H-1)). It is easy to check that the conditions (109) make all the above derivatives zero. D
Appendix C. Proof of Theorem 3 If M >i 2, the set of critical points, {7k;2, (0! H-l)) E OH [)l, C ~}, does not cover the set {Tv(0~/-/-1)) E | c EM}, which takes the same value of E/-/ as the critical points. Then, if M>~2, for any neighborhood of the critical point 7~;2,(0!/4-1)) or a0(0!/-/-1)), there exists a point at which E/-/takes the same value as the critical point but does not take a critical point. In any small neighborhood of such a point, E/-/ takes both of larger and smaller values than the value of the critical point. D
Appendix D. Approximation of the Fisher information around the critical point From the definition of q and ~1, the following lemma holds; Lemma D. 1.
For any to C
~f (x; to) - 0x
q : o},
and
~f (x; to) - 0.
(D. 1)
Let m represent one of the coordinate components in to = (~1, qx, ~2, bT). From Lemma D.1. at any point to c {q = 0}, the second derivative ~2f O~lOm (0) - 0
~2f and
~q~o3 (0) - 0
unless co = qj(1 ~<j ~
For any to C {to[ q
~2f (X, to) -- Vl V2
~q~q
~2f ~b~b
--- 0},
O2q~(x; b ) (X, to) -- Vl V2~2 - - .
~w~w
(D.2)
In a similar manner to the second derivatives, at any point to c {q = 0}, the third derivative O3f ~ 1 ~fDa~f-Ob (to) -- 0
and
-
o
768
K. Fukumizu
unless ma = rlj and cob - rl, (1 <~j,k ~L). Further calculation shows Lemma D.3. For any to E {to[q - 0}, (x; to) -- 1
i33f
~1 ~lq~~~'~
82qo(x; b)
(D.3)
2 ~1~2 ~W~~"
From Lemmas D. 1-D.3, the gradient of E2 around the critical point to, is given by <
89
P E (2) (to) ~ oK(to,)
q
+ ~;2
r
,
s
'
(D.4)
9
where K(to,) is the Hessian matrix of E2 at the critical point to,, and the matrix A is given by
A - ~*E v:l
- f ( X(0!l))) ~2 '
The Hessian K(to,) is calculated as K(to,) -
(o
o
~)w~w
o )
0
89(2)v - 1)A
0
0
0
H(I)(0! l))
"
,
(D.5)
(D.6)
where H(1)(0~ ~)) is the Hessian matrix of El with respect to the coordinate (~,u) at the critical point 0~1). Thus, the gradient of E2 around to, is of the form Eq. (102). From Lemma D.1-D.3, the derivative of f ( x ; to) around to, is of the form
~q f ( x ; r
~
which gives Eq. (103).
O(e)
'
(D.7)
[-I
References 1. Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) in: Learning Internal Representations by Error Propagation, pp. 318-362, eds D.E. Rumelhart, J.L. McClelland and the PDP Research Group; Parallel Distributed Processing, Vol. 1, MIT Press, Cambridge. 2. Poggio, T. and Girosi, F. (1990) Networks for approximation and learning. Proc. IEEE. 78(9), 14811497. 3. Amari, S. (1998) Natural gradient works efficiently in learning. Neural Comput. 10, 251-276. 4. Amari, S., Park, H. and Fukumizu, K. (2000) Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Comput. 12(6), 1399-1409.
Geometry of neural networks: natural gradient for learning
769
5. Park, H., Amari, S. and Fukumizu, K. (1999) Adaptive natural gradient learning algorithms for various stochastic models. Neural Networks, Submitted. 6. Amari, S., Cichocki, A. and Yang, H.H. (1996) in: A New Learning Algorithm for Blind Signal Separation, pp. 757-763, eds D.S. Touretzky, M.C. Mozer and M.E. Hasselmo; Advances in Neural Information Processing Systems, Vol. 8, MIT Press, Cambridge. 7. Amari, S., Douglas, S.C., Cichocki, A. and Yang, H.H. (1997) Multichannel blind deconvolution and equalization using the natural gradient, in: Proceedings of the IEEE International Workshop Wireless Communication, Paris, pp. 101-104. 8. Bishop, C.M. (1995) Neural Networks for Pattern Recognition. Oxford University Press, Oxford. 9. Wald, A. (1949) Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20, 595-601. 10. Lehmann, E.L. (1991) Theory of Point Estimation. Chapman & Hall, New York. 11. Br6cker, T. and Jnich, K. (1982) Introduction to Differential Topology. Cambridge University Press, New York. 12. Lang, S. (1972) Differential Manifolds. Addison-Wesley, Reading, MA. 13. Amari, S. (1985) Differential-Geometrical Methods in Statistics. Springer, New York. 14. Amari, S., Brandorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L. and Rao, C.R. (1987) Differential Geometry in Statistical Inference. Institute of Mathematical Statistics. 15. Murray, M.K. and Rice, J.W. (1993) Differential Geometry and Statistics. Chapman & Hall, New York. 16. LeCun, Y., Bottou, L., Orr, G.B., and Mfiller, K-R. (1998) in: Efficient Backprop, eds G.B. Orr and K-R. Mtiller; Neural Networks: Tricks of the Trade. Springer, Berlin. 17. Reed, R.D. and Marks II, R.J. (1999) Neural Smithing. MIT Press, Cambridge, MA. 18. Scarpetta, S., Rattray, M. and Saad, D. (1999) Matrix momentum for practical natural gradient learning. J. Phys. A 32, 4047-4059. 19. Orr, G.B. and Leen, T.K. (1997) in: Using Curvature Information for Fast Stochastic Search, pp. 606-612, eds M.C. Mozer, M.I. Jordan and T. Petsche; Advances in Neural Information Processing Systems 9. MIT Press. Cambridge. 20. Chert, A.M., Lu, H. and Hecht-Nielsen, R. (1993) On the geometry of feedforward neural network error surfaces. Neural Comput. 5, 910-927. 21. Sussmann, H.J. (1992) Uniqueness of the weights for minimal feedforward nets with a given inputoutput map. Neural Networks 5, 589-593. 22. Fukumizu, K. and Amari, S. (2000) Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons. Neural Networks 13(3), 317-327. 23. Fukumizu, K. (1996) A regularity condition of the information matrix of a multilayer perceptron network. Neural Networks 9(5), 871-879. 24. Akaike, H. (1974) A new Look at the statistical model identification. IEEE Trans. Automatic Control 19(6), 716-723. 25. Rissanen, J. (1987) Stochastic Complexity and the MDL Principle. Econometric Rev. 6, 85-102. 26. Hagiwara, K., Toda, N. and Usui, S. (1993) On the problem of applying AIC to determine the structure of a layered feed-forward neural network, in: Proceedings 1993 International Joint Conference Neural Networks, pp. 2263-2266. 27. Fukumizu, K. (1999) in: Generalization Error of Linear Neural Networks in Unidentifiable Cases, pp. 51-62, eds O. Watanabe and T. Yokomori; Algorithmic Learning Theory. Springer, Berlin. 28. Watanabe, S. (1999) in: Algebraic Analysis for Singular Statistical Estimation, pp. 39-50, eds O. Watanabe and T. Yokomori; Algorithmic Learning Theory. Springer, Berlin. 29. Fukumizu, K. (1997) Special statistical properties of neural network learning, in: Proceedings 1997 International Symposium Nonlinear Theory and its Applications (NOLTA'97), pp. 747-750. 30. Rattray, M. and Saad, D. (1999) Analysis of natural gradient descent for multilayer neural networks. Physi. Rev. E 59(4), 4523-4532. 31. Saad, D. and Solla, S. (1995) On-line learning in soft committee machines. Physi. Rev. E 52, 4225-4243. 32. Rattray, M. and Saad, D. (1998) in: Incorporating Curvature Information into On-line Learning, pp. 183-229, ed D. Saad, On-line Learning in Neural Networks. Cambridge University Press, Cambridge.
This Page Intentionally Left Blank
CHAPTER
18
Theory of Synaptic Plasticity J.L. V A N H E M M E N Physik Department der TU Mfinchen, D-85747 Garching bei Mfinchen, Germany
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
771
Contents 1.
Introduction
2.
Spike response model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................................................
773 778
3.
H e b b i a n learning in a n e t w o r k o f f o r m a l n e u r o n s
780
..........................
3.1.
R e p r e s e n t a t i o n o f n e u r o n a l activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
781
3.2.
H e b b i a n learning
782
...........................................
3.3.
Spatiotemporal patterns •
3.4.
Hebbian unlearning
coding
................................
3.5.
S p a t i o t e m p o r a l p a t t e r n s a n d 0/1 coding
Time resolved H e b b i a n learning: l o o k i n g at synapses t h r o u g h a learning w i n d o w
5.
D i s e n t a n g l i n g synaptic inputs: the Poisson n e u r o n 5.1. Poisson neuron: definition and properties
..........................
............................
786 .......
788 796 796
5.2.
R e l a t i o n to rate-based H e b b i a n learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
799
5.3.
Synaptic d y n a m i c s and self-normalization
800
5.4. A s y m p t o t i c s a n d structure f o r m a t i o n 5.5.
7.
785
.............................
4.
6.
783
.........................................
............................
...............................
Simple example o f structure f o r m a t i o n
..............................
802 803
S h o r t - t e r m synaptic plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
806
6.1. T h e p r o b l e m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
806
6.2.
M o d e l i n g s h o r t - t e r m synaptic plasticity
806
6.3.
M o d e l i n g s h o r t - t e r m depression
6.4.
Periodic input
6.5.
M o d e l i n g s h o r t - t e r m facilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.............................
..................................
.............................................
807 810 810
C o n c l u s i o n and o p e n p r o b l e m s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
812
Abbreviations
813
................................................
Acknowledgements
.............................................
A p p e n d i x A. L a w s of large n u m b e r s
..................................
A p p e n d i x B. I n h o m o g e n e o u s Poisson processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . References
.....................................................
772
813 813 815 819
I. Introduction
What is synaptic plasticity? If something changes, the first question is: what changes, where, and how? Apparently, we have to focus on 'synapses' [1-3] but what happens there and why? These questions, though natural and, as we will see, fascinating, do not have simple answers and they need some biophysical background to be fully understood. They are important since in general a neuronal network stores its information in the synapses. In the present chapter we first provide the necessary background, then turn to the theory of information storage, ranging from simple activity patterns of formal neurons to long-term processes involving spatiotemporal patterns of realistic neurons, and finish the argument with a detailed account of short-term plasticity. References will be provided as we go along. Since we concentrate on storage of spatiotemporal patterns the reader has to consult the literature [4, Chapter 13; 5] for other learning rules and related results such as Sejnowski's seminal work [6]. First of all, what is synaptic plasticity and where precisely does it occur? Since we are concerned with biological neural nets we will now review the essentials of neuronal anatomy as far as they are relevant to our present purpose, viz., developing learning theory; see Koch [4, Chapters 4, 13] for biological details that are skipped here. Stated simply, a neuron consists of three parts: the dendritic tree gathers the input, the soma with the axon hillock is the " C P U " generating action potentials or spikes (Fig. l) on the basis of the input provided, and the axon conveys this output to other neurons. It is important to realize that spikes are the only output of a cortical neuron. A spike lasts typically about one millisecond (1 ms). Synapses are the axonal terminals on the dendritic tree of other neurons. When an axon bifurcates, the amplitude of a spike does not decrease but remains constant, about 0.1 V. This is due to an active propagation process. Fig. 2 is a picture of a pyramidal cell, a typical cortical neuron. An axon in general bifurcates several times but, as we have seen, spikes look the same everywhere. At the axonal ends one has synapses, which contact the dendritic trees of other neurons. It is here that neuronal information is stored. Most of the postsynaptic potentials (Fig. 1) rely on chemical transmission from a presynaptic axonal terminal to a postsynaptic dendritic spine. A cleft, 20-40 nm wide, separates the two parts. When a spike arrives, calcium ions enter the axonal terminal, where vesicles with 30-40 nm diameter and filled with a few thousand neurotransmitter molecules are waiting. As a consequence of the Ca 2+ influx, these then move to the membrane bordering the synaptic cleft. In cortical synapses at most a few of them will fuse with the membrane at one of a small number n (say, 1, 2, or 3) of fixed release sites and emit their contents into the cleft, a process called exocytosis [13,14]. Because of the transmitter-mediated opening of postsynaptic ion channels this kind of synapse is 773
PrwynOaC msbn pot.ntW
0
-70mV
0
Exclt.tay
posl .ynep*c
poNntld
m*
-
l ~ b c
AGSOII poton#& MYMrW
tennhrd
anr~ausaa
m e w o n .nd Ommmer mloaso
m+
H '
Channels opa. N.+ enIam Uut vesicles r e c y d e
Fig. 1. Synaptic transmission at a chemical synapse in a neuromuscular junction. An action potential (upper left) arrives at the presynaptic terminal (A), induces an influx of Ca2+ ions that cause the vesicles, round and filled with neurotransmitter molecules, to fuse with the membrane (B, exocytosis) and release their contents into the synaptic cleft separating the presynaptic membrane from the postsynaptic one. The neurotransmitter molecules diffuse across the cleft and bind to the receptors at the postsynaptic side. In this figure of an excitatory synapse, they induce the opening (C) of Naf channels, leading to an excitatory postsynaptic potential at the axon hillock of the soma (lower left); see also Fig. 2. In cortical synapses a similar process occurs, though it is more stochastic - cf. Eq. (1) - and the EPSPs have a smaller amplitude. Reprinted by permission [7].
fi
?
Theory of synaptic plasticity
775
i ,7
it"
Fig. 2. This wonderful picture of a pyramidal cell from the motor cortex of a 30-year-old man stems from Ram6n y Cajal [10], who drew it more than a century ago; whole-cell staining after Golgi. (a) Axon via which action potentials (spikes) leave the "pyramidal" soma of the neuron (hence its name) to synapses on dendritic trees of other neurons, (c) dendrites that gather the postsynaptic currents from synapses, terminals of axons coming from elsewhere, (d) axonal collaterals that branch off. called 'chemical'. In contrast, some synapses have electrical transmission; e.g., that of the M a u t h n e r cell [15-17]. The released neurotransmitter molecules diffuse across the synaptic cleft, which takes about 10 ~ts, and dock into receptors of ion channels, which are then opened so that ions enter the postsynaptic dendritic spine, if any. If the ions are positive, e.g., Na +, then the dendritic tree of the receiving neuron experiences a positive influx, which reappears as an excitatory postsynaptic potential (EPSP) at the soma; see Fig. 1. This is what we will concentrate on, though most of the considerations below are much more general. For the sake of simplicity we describe an EPSP stemming from a spike arriving at a synapse connecting an axon from neuron j to the dendritic tree of neuron i by Ji?(t). Here 8(t) >~0 is a fixed response function with m a x i m u m 1, t denotes time, and ,//.i. is called the "strength" or efficacy of a synapse connecting neuron j to neuron i. In cortex we typically have one, at most two connections [4, Section 4.2], [9, Sections 20 and 33]. To simplify the notation and without loss of generality we therefore assume that for each pair {i, j} there is at most one synaptic
776
J.L. van H e m m e n
connection. Synaptic plasticity refers to changes of Ju as time proceeds. All this looks simple and straightforward but there is a stochastic complication, which can be eliminated. We have three ingredients of a synaptic response [4,8]: 9 n presynaptic release sites (or active zones); for cortical synapses, n is a small number near to 1. 9 the probability 0 ~
(1) One can verify that the mean number of quanta is
(k) - ~
kp(n,k) - np.
(2)
k=0
Consequently, Jij as determined by (1) is a stochastic variable itself. 9 A quantum induces a postsynaptic response. A succinct notation is Q. We take Q to be the maximum of an EPSP generated by a single quantum, i.e., we look for the maximal response as a function of time. For inhibitory synapses the learning ability is at the moment less clear but it seems that, if they can change their efficacy, they do so in a way analogous to their excitatory counterparts so that their effect can be measured by the minimum of an inhibitory postsynaptic potential (IPSP). Throughout what follows we will focus on excitatory synapses, leaving the realization of mutatis mutandis for the inhibitory ones to the reader. Altogether the mean postsynaptic response induced by a spike is ~ = npQ but this is of no help when a specific spike arrives since its effect is in general never the mean. Why then compute the mean response? The input of a cortical neuron is provided by many synapses; a typical number is 104. As for the simple stochastic process (1), it can be taken to be independent for different synapses. At time t the potential vi(t) at the axon hillock, the " C P U " of neuron i, is to first approximation a sum of the different postsynaptic potentials,
vi(t) -
E
Jije(t-
~),
(3)
j(~i),f
where f in tfj labels all spikes of neuron j; the postsynaptic potential ~(t) is causal in that it vanishes for t < 0 so that future events do not influence the present. For the
Theory of synaptic plasticity
777
time being axonal delays are incorporated in the ~. Let us suppose that neuron i has N 'synaptic' neighbors j and let us consider vi(t)/N for a moment,
vi(t)/N -- N -1 ~ Jij~(t - tfj ) w, N - ' Z (Jij)~(t - tfj ), j(FLi)
(4)
j(r
where the synaptic strengths Jij are independent stochastic variables and the interspike intervals are assumed to be in the millisecond range, an every-day fact. The last equality is exact in the limit N ~ ~ and, as such, a consequence of the strong law of large numbers [11,12]. A sum that can be replaced by its average is called selfaveraging- a most valuable property. In our case N is large so that, by the central limit theorem in conjunction with the law of the iterated logarithm (see Appendix A), the approximate equality in (4) is an equality up to an error of order O(1/x/N) that is the deviation from the mean and has a Gaussian distribution; the contents of the "laws of large numbers" as referred to above have been listed in Appendix A. In fact, we can allow e to depend on both i and j while the Jij may have any distribution with finite second moment. Then we have that, for i fixed, the fj : = Jijgij are independent but not identically distributed random variables. Nevertheless the strong law of large numbers and the central limit theorem still hold [11,12]. They even do so, when the fj are weakly dependent in the sense that, given two positions j and k, their correlation functions ( ( f j - ( f j ) ) ( f k - (fk))) decrease fast enough as a function of ]j-k]. For (3) the above statements hold as well, provided one multiplies everything in sight by N and realizes that the deviation from the (nonzero) mean can be estimated by the law of the iterated logarithm [11,12]; cf. Appendix A. At a first sight one might object that the Jij govern the dynamics of the system as a whole and, therefore, in the long run the latter induces dependencies among the former. That may well be but is completely irrelevant since the local 'gambling' we are considering here is that of vesicles fusing with the cellular membrane, i.e., exocytosis, a process lasting for a millisecond or less [13,14]. On this time scale fusion processes are independent both inside and outside a synapse. Despite the randomness, we arrive at the pleasant result that the leading contribution to vi(t) is deterministic and given by vi(tl
-
-
(51
j(r
as long as most of the expectation values (Jij) are nonzero. That is to say, (Jij) = nijpijQij r O, where {ij} labels the synapse; a generic one will carry no label. If nonzero, we can stick to studying synaptic plasticity in its dependence upon npQ. If a synaptic strength vanishes, the synapse can, and will, be dropped. According to the present state of the art [4], long-term synaptic plasticity lasting for hours means that both p and Q change, whereas short-term synaptic plasticity lasting for seconds is equivalent to saying that p alone changes. In both cases, n seems to be fixed, though Bonhoeffer et al. [18] have found that in LTP n may change as well. We will present a theory of synaptic plasticity that incorporates both long- and short-term
778
J . L . van H e m m e n
effects. As a side remark we note there is also channel noise, which is generated by random gating of voltage-gated ion channels. It is different from synaptic noise, as has been explained in detail elsewhere [20], and can in principle be handled similarly to (4). The synaptic action considered so far is called ionotropic as it is governed by postsynaptic ionic channels. It is fast and implements computations underlying, for example, rapid perception and motor control. Here, then, are two classes of (glutamate) receptors determining the state (open/closed) of their underlying ionic channels: N-methyl-D-aspartate (NMDA) and non-NMDA. The name N M D A is that of an agonist absent from the brain itself but used by neurobiologists to discern them. The N M D A receptors do need a strong depolarization to become active, e.g., through a positive potential change stemming from a postsynaptic spike [21,4, Chapter 19]. N M D A receptors are 10 times slower than their neighboring nonN M D A counterparts. Furthermore, they are important to long-term potentiation (LTP) since they allow Ca 2+ ions to enter the cell- in addition to Na + and K +. On the other hand, the non-NMDA receptors convey the fast excitatory traffic that has to pass in a few milliseconds. Their typical EPSP is that of an alphafunction, e(t) :--
(t/r)exp(1
-
t/r)
(6)
having its maximum 1 at t -- ~, and ~ ,~ 5 ms. Of course e is causal in that it vanishes for t < 0. For an extensive discussion of this and other types of response function the reader is referred to Gerstner's chapter in this book [27]. In addition to ionotropic receptors, which open ionic channels that permit a certain type of ion to cross the postsynaptic membrane, there are also metabotropic receptors where binding of a neurotransmitter leads to the activation of a second messenger such as Ca 2+ ions. The messenger molecules then have to diffuse to particular ionic channels, which is a relatively slow process. The action of metabotropic receptors can, and usually will, extend over a long distance both in space and in time. We will not treat them here but refer to the literature [4,22] for further details concerning both types of receptor. Instead we turn to a simplification of the mathematical description of spike generation, the spike response model (SRM).
2. Spike response model Spikes require a mathematically intricate description that can be summarized by the following system of Eqs. (7)-(9). The key variable is the membrane potential V, which can be measured, d
Im -- Cm ~ V +/channel -k-/ext-
(7)
Here Im is the total current, Cm a membrane capacitance, Iext a n externally applied current, e.g., due to synaptic input, while currents I~ through the 1 ~
Theory of synaptic plasticity
779
/channel- ~'~Ig g=l
(S)
with Ie - g e ( m ) ( V - Ee).
The conductances ge = ge(m) depend on a vector m. The heart of nearly any differential-equation model producing spikes is a set of auxiliary variables m = (my), each of the components my satisfying an ordinary differential equation of the form d
d t m v - [mv,o~(V) - mv]/'cv(V).
(9)
For constant V, the variable my relaxes to mv,~(V) at the rate 1/~v(V). In general the functions mv,~(V) and ~v(V), which both depend on the membrane potential V, are the result of an extensive fitting procedure, the most famous one being that of Hodgkin and Huxley (1952), who got the Nobel price for their ingenious fit [4,23,24] for function sets {ge(m),mv,o~(V),'cv(V)} describing two active ionic channels (Na + and K +) and three auxiliary variables. In their notation, m = (m,n,h) and gNa(m)= 0Nam3h, gK(m)--0K n4, where 0Na and 0K are constants, as is the third conductance describing a 'leak' current. It is good to realize that only V, /channel, and Ioxt are accessible to experiment whereas the auxiliary variables my are not. It is typical to real neurons and also to all these auxiliary-variable models that they produce spikes when, to excellent approximation, the potential V exceeds a threshold 8. A glance at the equations suffices to convince any reader that this statement is not evident. In fact, it is the result of the careful fit alluded to in the previous paragraph. A second glance at the upshot of what a spike with a 1 ms width induces at a postsynaptic neuron, viz., an EPSP such as the alpha function in (6) with a several ms width, may then suffice to let the beholder wonder whether the precise form of a spike is really important to what it induces. Most of the time it is not and one can stick to a simplification, the SRM [25-27] . This is what we now focus on. Let us discretize time and break the continuous time axis into parts of length At = 1 ms. We then write t = 1 , 2 , . . . and specify the state of neuron i by a Boolean variable ni(t) E {0, 1}: hi(t)= 1 when the neuron fires, ni(t) = 0 when it does not. Looking at synapse {i, j} with synaptic strength J;j, we note that it induces an EPSP Jij~(t- t(i - Aija x ) at neuron i for a spike that arose at neuron j at time jtf. and was delayed by ATf ms, the axonal delay that occurs when the spike travels along the axon from neuron j to the synaptic terminal {i, j}. Throughout what follows we write Jij instead of (J0) since the membrane potential vi(t) at the axon hillock (our " C P U " ) is given by (5), a sum of many terms. Once a neuron has fired it refuses, so to speak, to do so again directly afterwards; this is the absolute refractory period. Furthermore, it takes some time to recover. Then it is rather reluctant to fire; this is the relative refractory period. It may fire but needs some extra input as compared to the original threshold 8. All this is taken into account by a refractory potential rl that is added to v once a neuron has fired; it is taken to be causal so that rl(t) = 0 for t < 0. If rl(t) < 0 for t > 0, then the neuron
J.L. van Hemmen
780
needs more v to fire. In this way we can keep the threshold ,9 fixed and arrive at the simple dynamics n i ( t -k- At) = O [ v / ( t ) - ~9]
with
f
J
Here 0 is the Hcaviside step function with O ( x ) = 1 for x~>0 and O ( x ) = 0 for x < 0. A sum over f is always a sum over firing times tf of whatever neuron, here neuron i and its 'neighbors' j. Neuron i 'fires' as soon as its membrane potential vi reaches the firing threshold 9 from below, lim 1)i(t) -- ~
,/"t,(
and
lim
t/tit
dvi(t) ' dt '
> 0.
(12)
The potential /)i being in general a continuous function, the second condition is a mathematical formulation of the fact that there is no spike appearing when vi returns from being above ~). In passing we note that the dynamics (10) need not be based on discrete time. With a few, trivial, modifications it works equally well for continuous time. Throughout what follows we will work with the above dynamics where during each time step all neurons are updated. This is the so-called parallel dynamics, for biological neurons the natural one. Absolute refractory behavior means q assuming the value - o o while relative refractory behavior is equivalent to saying q is negative but finite. Though bursts can be described easily by allowing q to be positive during an appropriate period of time, we will not study this explicitly. For arbitrary axonal delays Au.~ there is no hope for obtaining an exact solution of the network dynamics. As Eqs. (10) and (11) show explicitly, the SRM incorporates the three essential ingredients of neuronal spike generation, viz., a variable threshold, spikes, and their effect, the postsynaptic potentials. All three are a response to external input - hence the name of the model. It incorporates the integrate-and-fire model as a special case [27]. To finish this section, it may be well to face the question: Why does a timediscrete dynamics such as (10) make sense? After all, it treats the effect of a spike as a Kronecker delta. The answer is that, as we have already seen, the width of a postsynaptic potential in general greatly exceeds that of a spike so that we can treat the latter as an approximate delta function. Discretizing time we then end up with a simple Kronecker delta, as advertised. 3. Hebbian learning in a network of formal neurons
Encoding and decoding are two sides of the same medal. Encoding means asking a twofold question: how do we represent the neurons' activity and how do we store spatiotemporal activity patterns in the synapses connecting the neurons? Hebbian
Theory of synaptic plasticity
781
learning is a prominent and very efficient way of information storage. Decoding means reading out stored information. We will treat both.
3.1. Representation of neuronal activity Formal neurons are described by a Boolean variable indicating their activity. (A Boolean variable assumes only two values.) We have already met ni(t)E {0, 1}. Neuron i is active at time t E 7/At when ni(t) = 1 and it is quiescent when ni(t) = O. Here Z represents the integers. It is convenient to take At = 1 ms as the duration of a spike. For what follows we also introduce a pseudo-spin Si = 2 n i - 1 E { - 1 , 1}. Now Si(t) = 1 when neuron i fires a spike at time t and Si(t) -- - 1 when it does not. The above distinction leads to two ways of encoding neuronal activity, viz., the 0/1 and the + 1 representation (coding). Each of them needs a specific context, to which we now turn. Apparently there are at least two ways of coding a neuron's activity: through {0, 1} and through +1. Their choice is dictated by the global activity of a network. In a theoretical analysis, an activity pattern is a set of independent, identically distributed (iid) random variables. Different patterns are also taken to be independent. There are at least three reasons for doing so. First, in this way we avoid any special assumption concerning the state of the network. Second, iid random variables are easy to generate on a computer by means of a random number generator [19]. Third, one can use laws of large numbers from the theory of probability [11,12] to analyze collective behavior. This is what we are going to do. Suppose about half of the neurons in a network are active during each time step. Then a pattern ~t is a set of iid random variables { ~ ; 1 ~
aN'- limN -1Z~=pN-(1-pN)-2pN-1 N--~cx~
(13)
i-1
by the strong law of large numbers [11] applied to the iid random variables {~/~; 1 ~
782
J.L. van Hemmen
activity is the exception and inactivity is the rule so that ni(t) E {0, 1} is just what we are looking for since the active sites with ni(t) = 1 carry the information. Though inactive neurons are the majority, they carry the label re(t) ---0 and, thus, do not c o u n t - as they should. We will see that mathematically all this fits together quite nicely in decoding neuronal information. 3.2. Hebbian learning
Donald Hebb's classic The Organization o f B e h a v i o r - A Neurophysiological Theory [28,29] appeared in 1949. On p. 62 of this book one can find the now famous "neurophysiological postulate": When an axon o f cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A "s efficiency, as one o f the cells firing B, is increased. One may then wonder, of course, where the above "metabolic change" might take place. Hebb directly continued by suggesting that "synaptic knobs develop" and on p. 65 he states very explicitly: "I have chosen to assume that the growth of synaptic knobs, with or without neurobiotaxis, is the basis of the change of facilitation ~ from one cell on another, and this is not altogether implausible". No, as we now know, it is not. It is just a bit more complicated. Sloppily formulated, Hebbian learning is learning by repetition: "Practice makes perfect". The organism repeats the very same pattern or sequence of patterns several times and in so doing trains the synapses. (A nice example is the barn owl learning to perform azimuthal sound localization, an example we will consider later on.) It has turned out that Hebbian learning is robust, faithful, and a key to understanding map formation in the cortex. Hebb also formulated a second idea having nearly an equally big impact as his learning rule, viz., that of an assembly (or ensemble) of neurons. If neuronal behavior should code a synapse, then a postsynaptic neuron should fire at the right moment. To attain its firing threshold, a neuron needs well-timed input in a narrow time window from many other neurons. The 'assembly' should then fire more or less simultaneously. This is its distinguishing feature. The activity patterns we will be analyzing in learning theory are often concrete examples of Hebbian assemblies. Hebb's postulate has been formulated in plain E n g l i s h - but not more than t h a t and the main question we are facing here is how to implement it mathematically. From a higher point of view, one might define Hebbian learning to be long-term synaptic plasticity induced by pre- and/or postsynaptic activity and local in space and time. Most of the information which is presented to a network, then, varies in space and time. So what is needed is a common representation of both the spatial and the temporal aspects. As a pattern changes, the system should be able to measure and store this change. How can it do that?
1 Webster's Ninth New Collegiate Dictionary says (to those who do not belong to the incrowd): facilitation = the increasing of the ease or intensity of response by repeated stimulation.
783
Theory of synaptic plasticity
3.3. Spatiotemporal patterns and •
coding
As in real life, a network may, but need not, learn. Suppose then it does and let us imagine a spatiotemporal pattern of duration Tstp, i.e., a sequence of patterns {~I; l ~ i ~ N , 0 < t ~ Tstp}; for fixed i but different times t the patterns ~ may be identical. We simply describe such a pattern by {Si(t);l<~i<~N,O < t~Tstp} :-{S(t); 0 < t ~
l~iJ -- ~ij(Aij )-~1~ Si(t _Jr_At)gj(t- A~jX.),
(14)
t=l
which is to be added to the existing synaptic efficacy. Here T1 is the learning time that may, but need not, be identical with the duration Tstp of the spatiotemporal pattern under consideration; for instance, because the pattern is repeated ("practice makes perfect"). The rationale of (14) is the following. We look at synapse {ij} at time t. ax What we see there is the activity of neuron j at time t - A;j since the axonal delay ax lasts Agj ms. We correlate this with the activity of neuron i at t + At. It is important to realize that this is one time step later. With the benefit of hindsight this is perfectly reasonable since the synapse at time t should tell the postsynaptic neuron what to do next. In technical terms, the retrograde effect of a neuronal action potential as noticed by the synapse is taken to be instantaneous so that a delta function is a fair approximation [21]. The prefactor ~ij(A~.~) ~>0 is still at our disposal. For the sake of simplicity we assume no self-interaction is present (Jig = 0). It seems plain that, in a +1 coding, Eq. (14) is not quite what Hebb [28] had in mind: if neurons i and j are active, AJ;j >i 0, but the same holds true, if i and j are inactive. The former would be fine to Hebb, the latter somewhat weird. With hindsight this is, however, perfectly reasonable since the states 'active' and 'inactive' (p = 1/2) are equivalent: what's in a name? (Representation is all.) On the other hand, l~ij ~ 0 if one of the neurons is active and the other quiescent. In (14) Sg and S~ are treated on an equal footing. Symmetry with respect to an interchange of i and j reigns, if the pattern is a stationary one, i.e., Si(t) : - ~i for all 0 < t ~<7i. Apart from the high activity, a condition that will soon be relaxed, a constant firing might happen in neurons with negligible refractory behavior; in view of our present choice of At = 1 ms, it is equivalent to saying that the maximal firing rate is 1000 Hz, which is not completely off. Let us now study the effect of a stationary pattern in conjunction with q >> 1 similar patterns, the Hopfield model [33]. The network has no delays (A ax = 0). In practice this means that we have (q + 1)N independent, identically distributed random variables ~ which assume the values +1 with probabilities p for + 1, 1 - p for - 1 , and mean a - 2 p - 1. Furthermore, p - 0 corresponds to the pattern ~ we started with. Hopfield took a fully connected neural network with p - 1/2 so that a - 0 - by good reason, as we will soon see. The patterns are presented to the
784
J.L. van Hemmen
network one after the other and learned through (14). Altogether we obtain, putting ~ij --- 1/N, q Z ~ila~j. ~=0
Jij_U-I
(15)
The dynamics being given by
Si(t + At) = sgn[vi(t) - O]
(16)
the threshold is taken to vanish, i.e., 9 = 0, and so is the average (Jis) = 0; here sgn is the sign function. The rationale of these two requirements, which belong together, will soon become clear. Following Hopfield, we now imagine that EPSPs are also instantaneous, i.e., delta functions, while refractory behavior lasts as long as a spike (so that q =-0), and compute the potential vi with the original pattern ~ as input to our neural network. Using (15) and (16) in conjunction with (11) we then find, as N ~ c~,
~,~j
=
~j
=
~, + N -
j(r
Y ~ i ~j~j ~' j(r ~t(#0)
-- ~i + q a3 + N - 1 Z
( ~~i ~~j ~ j - a3 ).
(17)
j(~i) ~(~0)
By the central limit theorem in conjunction with the law of the iterated logarithm (see Appendix A), the last sum on the right is Gaussian (with mean zero) and of the order O ( ~ ) . The above argument is a signal-to-noise ratio analysis with ~i being the signal and the last sum representing the noise. If, then, a ~ 0 but ~) = 0 in (16), there is no hope for storing anything but a few patterns since qa 3 will wash out the signal (l~i] = l) for q large enough. That is why Hopfield took p = 1/2 so that a = 0. Then a faithful retrieval is possible only if q/N < 0.138; for q/N beyond this fraction no pattern can be retrieved. It is easy to see that an inequality of this kind must exist; determining the precise number 0.138 is a completely different story [34-36]. We simply return to (17) and note that the standard deviation of the sum is v/q/N. When it becomes too big, it will wash out the signal, as did qa 3 in the case a r 0. In fact, according to the law of the iterated logarithm we get as an upper bound qmax/N = 0.5; apparently it has to be less. Hopfield [33, pp. 2556-2557] already found 0.15 numerically, which was surprisingly close to 0.138. Of course one could adapt the threshold, if a r 0, and take 0 = -qa 3 but what should tell a synapse that q patterns have been stored? For spatiotemporal patterns with a = 0 the learning rule (14) in combination with the dynamics (16) and ~ = 0 works well, provided the system has a broad distribution of delays. It is easy to see why, Vi(t) -- Z
JijSj(t - A~jX).
J
(1 S)
Theory of synaptie plasticity
785
The Ji] look back into the near past and tell neuron i at time t what to do next - in agreement with (14) and (16). That is to say, the delay A~.~ that has been taken into account during learning plays the very same role during retrieval. If, then, a certain activity pattern keeps constant during a time 8p, there should be delays A~x > ~p to " t h r o w " the system out of this state into the next. A 'broad' distribution then means A~x > 8p for 'enough' j. Fig. 3 illustrates the potency of learning rule (14). For biased signals with a # 0 it has to be modified; see Section 3.5.
3.4. Hebbian unlearning In the +1 representation, we can define a spatial average a(t) = N -1 ~'~iSi(t) ~ la (13) for each discrete pattern but in practice there is no hope that it will vanish. If, however, it is small enough, viz., la[ < lacl = 0.54, then unlearning [41] removes the
A o~o
00
0.5
~
"
0.5
0.0
Jt_~
0.~0
0.5
[1.5
o.o
A
O.O
i.O
10
0.5
05
0.0
"~
O0
0
t [MCS)
100
200
o
t
[MCS)
150
Fig. 3. (A) Space warp. The overlaps m~(t)= N -1 ~i~iSi(t) with patterns 1,2,3, and 4 (from top to bottom, +1 representation with p~ ~ 1/2) have been plotted as a function of time. The network with maximal delay A m a x - - 40 has learned the cycle 1,2, 3, 4 (or BACH), where each pattern lasts 10 time units; the network size is N = 512. After it has been presented the faulty pattern sequence 1,4,3,4 (or BHCH) as initial condition (space warp) for -Amax ~
786
J.L. van H e m m e n
correlations, restores the patterns, and greatly increases the storage capacity. Though at a first sight unlearning looks a bit weird, it is a very powerful algorithm. Its motivation stems from neurobiology. In the late 1970s Hobson and McCarley [38] suggested that there exists a dream state generator in the pons that, during R E M sleep, produces a series of pulses in the forebrain, the ponto-geniculo-occipital (PGO) bursts. These pulses provide frequent and semi-random stimuli to the cortex and might thus function as the driving force of rapid-eye-movement (REM) dreams. The Hobson-McCarley idea was taken up by Crick and Mitchison [39], who assumed that during R E M sleep the cortical network, once it has been excited by a PGO burst, relaxes to a parasitic or spurious state, which is then weakened or, as they called it, 'unlearned'. The proposal of Crick and Mitchison found an immediate implementation as a three-step procedure for constant patterns in the work of Hopfield et al. [40], while the present version for general, spatiotemporal, patterns is due to van Hemmen et al. [41]: (i) Random shooting, corresponding to a PGO burst in the brain and giving a random initial state. (ii) Relaxation to a limit state ~; - (zd(t); 1 ~< i ~< N), where we assume (in many cases this can be proven) that the limit state X is either stationary or a limit cycle. (iii) Unlearning through Jij --+ Jij - ~l~KJij
(19)
with 0 < e <<1 and AJij, pure Hebbian learning of • after (14) but now being multiplied by -~. The minus sign in front of ~ has led to the name unlearning. The unlearning parameter ~ must be small, say, two orders of magnitude smaller that the available learning parameters in sight. The three steps constitute a single loop, which in the present context is defined to be a "dream". It is repeated D times so that 0 < d ~
3.5. Spatiotemporal patterns and 0/1 coding As a rule, biological neural systems, i.e., neuronal networks, are characterized by a low activity, meaning that a relatively low percentage of neurons per unit of time is active. That is to say, a ~ - 1 so that the above formalism breaks down completely. Since stationary patterns are the exception and spatiotemporal ones the rule, we are looking for an appropriate generalization of (14). There is an evident one: replace ~ by a random variable that also has mean zero and variance one, ~ ---, ( ~ - a)/x/1 - a 2. This looks reasonable but it is not. Already for stationary p a t t e r n s - cf. (15) - this symmetric rule is a lousy one since the storage capacity goes
Theory of synaptic plasticity
787
to zero as a ~ - 1 . A way out [36, Sectionl.6.5] is adding an extra + 1 so as to restore the original storage capacity of the Hopfield model but there is a smarter solution that has turned out to work for spatiotemporal patterns as well. The 'symmetric' substitution by itself did not. Though we now work with the 0/1 representation of neuronal activity we stick to the pseudo-spin S for specifying our asymmetric learning rule for a synapse with axonal delay A ax = A ~ x [32],
1 vl
z~Jij -- ~ij(Aax)~ Z Si(t + At)[Sj(t- Aax) - a ] .
(20)
t=l
This is to be added to the existing synaptic efficacy. It reduces to (14) for a - 0. The rationale of (20) is as before, but modified since we now have [ S j ( t - A ax) - a ] instead of S j ( t - Aa• the - a being crucial. In the low-activity limit with a - - - 1 , [Sj(t- A ax) - a] equals 2 if at time t - Aax the presynaptic neuron j is active and vanishes if the neuron is quiescent. That is perfectly reasonable since j cannot activate i if it is not active itself. After a spike has been generated by j it needs AaX= A~• ms to reach the synapse {ij} at time t. Then AJij >0 if Si(t + A t ) = +1 corresponding to i being "told" to fire or keep quiet. In words, the presynaptic neuron is gating. Given presynaptic activity, the synaptic efficacy increases ( potentiation) when during the next time step the postsynaptic neuron is active whereas it decreases (=depression) when the postsynaptic neuron does not fire. As Fig. 4 illustrates, the asymmetric learning rule (20) has proven to be extremely efficient for storing spatiotemporal patterns [32,42,43]. It was also a key to devising the 'learning window' [45] which is instrumental in describing long-term synaptic plasticity for temporally highly resolved activity patterns. Experiments of Markram et al. [48] were the first to confirm (20) as neurobiological learning rule; a more complete list will be given once we turn to explaining the notion of 'learning window' in the next section. Additional theoretical support through combinatorial and energy-saving arguments has been provided as well [49]. The dynamics is that of (10) with vi(t) = ~-~.jJijnj(t- A(alx.). To see the effect of Si(t + At)[Sj(t- A~f.) - a] as it appears in (20), we perform a simple signal-to-noise ratio analysis for a network with no delays and N as the number of neighbors each neuron is connected with while ~ij = 1IN. Furthermore, the network has been taught q + 1 stationary patterns so that Jij - N -1 ~J-~.~~i~ (~j~ - a). The pattern ~t - 0 with ~0 --: ~i is presented to the network while the others generate the 'noise'. We assume a N = a for the sake of simplicity; cf. (13). Focusing on the low-activity limit a ---+ - 1 , we take nj :-- ( ~ j - a)/2 as input and find, as N ~ e~,
j(r -- ~i (1 -- 02) l -~2 + = - : : " ~ - " ~i~(~j~ - a ) ( ~ j - a) -AN
~..a J(r ~(#0)
(1 +
a)[~i -+- O(v/q/S)].
(21)
788
J . L . van H e m m e n
o
o
9
~
9
~
9
9
o
o 9
9
~
9
o.
~
o ~ .
Fig. 4. Motion of a 'phase boundary', a string of black pixels, through a 20 x 20 storage layer with 0/1 representation; 1 is black and 0 is white, i.e., invisible 9The system starts with a single point in the upper left-hand corner and the string develops as time proceeds (top to bottom; first left, then right). During the motion, the number of black pixels varies between 1 and 20 but a in (20) does not: it is -1. Taken from [32]. Except for the common factor (1 + a) on the right, which we can forget about, we have the same terms as in (17) with a = 0 . One might have argued that ( ~ - a ) ( ~ - a) (symmetry) were nicer. If so, (~i - a) would replace the signal term ~i in (21) so that the signal for the inactive neurons, i.e., the big majority, would be strongly weakened. Because of the asymmetry the signal term ~i now appears alone, without - a or qa 3, which greatly improves the signal-to-noise ratio. Even worse for the "symmetric" term, it does not allow spatiotemporal low-activity patterns to evolve. For Aa• > 0 the argument becomes more complicated but the gist does not change.
4. Time-resolved Hebbian learning: looking at synapses through a learning window The barn owl (Tyto alba) is able to determine the prey direction in the dark by measuring interaural time differences (ITDs) with an azimuthal accuracy of 1-2 ~ corresponding to a temporal precision of a few microsecond, a process of binaural sound localization. The first place in the brain where binaural signals are combined to ITDs is the laminar nucleus. A temporal precision as low as a few microsecond
Theory of synaptic plasticity
789
was hailed by Konishi [50] as a p a r a d o x - and rightly so since at a first sight it contradicts the slowness of the neuronal "hardware", viz., membrane time constants of the order of 200 las. In addition, transmission delays from the ears to laminar nucleus scatter between 2 and 3 ms [44] and are thus in an interval that greatly exceeds the period of the relevant oscillations (100-500 gs). The key to the solution [45] is a Hebbian learning p r o c e s s - cf. Section 3.5 - that tunes the hardware so that only synapses and, hence, axonal connections with the right timing survive. Genetic coding is implausible because three weeks after hatching, when the head is fullgrown, the young barn owl cannot perform azimuthal sound localization. Three weeks later it can. So what happens in between? The solution to the paradox involves a careful study of how synapses develop during ontogeny [45-47]. The inputs provided by many synapses decide what a neuron does but, once it has fired, the neuron determines whether each of the synaptic efficacies will increase or decrease, a process governed by the synaptic learning window, a notion that will be introduced shortly. It is a generalization of what we have seen in Eq. (20). Each of the terms below in (22) has a neurobiological origin. The process they describe is what we call infinitesimal learning in that synaptic increments and decrements are small. Consequently it takes quite a while before the organism has built up a 'noticeable' effect. As for the mean response 5~ = npQ studied in Section 1, only the presynaptic probability of release p and the postsynaptic response Q can change 'continuously' whereas the number n cannot. What happens in the long run is not known yet [4,8,18]. For the sake of definiteness we are going to study waxing and waning of synaptic strengths associated with a single neuron, which therefore need not carry a label; cf. Fig. 5. The 1 4 i ~< N synapses are providing their input at times ft. The firing times of the neuron are denoted by tn, it being understood that n is a label like f . Given the firing times, the change AJ/(t):= J/(t) - J i ( t - T1) of the efficacy of synapse i (synaptic strength) during a learning session of d u r a t i o n / i and ending at time t is governed by several factors,
F
1
k
J
(22)
Here the firing times tn of the postsynaptic neuron may, and in general will, depend on Ji. We now focus on the individual terms. The prefactor 0 < rl << 1 reminds us explicitly of learning being slow on a neuronal time scale. 2 Throughout what follows we refer to this condition as the the adiabatic hypothesis. It holds in numerous biological situations and has been a mainstay of computational neuroscience ever since. It may also play a beneficial role in an applied context. If it does not hold, a numerical implementation of the learning rule (22) is straightforward, but an analytical treatment is not.
2
Sincethe Greek alphabet is finite and there is no ambiguity between the present learning parameter and the refractory potential of Section 2, there is no harm in using 11here as well.
790
J . L . van H e m m e n
output
input
~ si, n
Fig. 5. Singleneuron. We study the development of synaptic weights Ji (small filled circles, 1 ~
Each incoming spike and each action potential of the postsynaptic neuron change the synaptic efficacy by 1"1win and qw ~ respectively; see the literature [51-54] for experimental evidence. The last term in (22) represents the learning window W(s), which indicates the synaptic change in dependence upon the time difference s - ( - t" between an incoming spike tf and an outgoing spike t". When the former precedes the latter, we have s < 0 r ~ < t", and the result is W(s) > 0, implying potentiation. This seems reasonable since N M D A receptors (see Section 1), which are important for longterm potentiation (LTP), need a strongly positive membrane voltage to get 'accessible' by loosing the Mg e+ ions that block their 'gate'. A postsynaptic action potential induces a fast retrograde 'spike' doing exactly this [21]. Because the presynaptic spike arrived slightly earlier, neurotransmitter is waiting for getting access, which is allowed after the Mg 2+ ions are gone. The result is Ca e+ influx. On the other hand, if the incoming spike comes "too late", then s > 0 and W(s) < O, implying depression- in agreement with a general rule in politics, discovered a decade ago: "Those who come too late shall be punished". In neurobiological terms, there is no neurotransmitter waiting for being admitted. The learning rule (22) is a direct extension of (20), its time-discrete predecessor. There is meanwhile extensive neurobiological evidence [48,55-59] in favor of this time-resolved Hebbian learning. An illustration of what a learning window does is given in Fig. 6. If other (infinitesimal) learning algorithms are discovered, one can simply adapt W accordingly. For instance, for inhibitory synapses one has found infinitesimal growth processes [60] that can be described qualitatively by putting W : - - W in Fig. 7; the latter shows a typical learning window for an excitatory synapse [45,47]
791
Theory o f synaptic plasticity
siin(t) I s~ I
ti3: t~ t4
W(s) [ 0
ol
~S
J
L/
i
Ji(t)
wOUt
win-
_
win-
-
w
'
I w~
i -
._t
-__s
::win+W(t 4- t 2)
,, in_
-_t
I
W(ti3- t z)
_t
Fig. 6. Hebbian learning and spiking n e u r o n s - schematic. In the bottom graph we show the time course of the synaptic weight J~(t) evoked through input and output spikes (upper graphs, vertical bars). An output spike, e.g., at time t l, induces the weight Jr, to change by an amount w~ which is negative here. To show the effect of correlations between input and output spikes, the learning window W(s) (center graphs) has been indicated around each output spike; s = 0 matches the output spike times (vertical dashed lines). The three input spikes at times ff = t], t2 and t3 (vertical dotted lines)increase Z- by an amount Win each. There are no correlations between these input spikes and the output spike at time t 1. This becomes clear once we look at them "through" the learning window W centered at tl: the input spikes are too far away in time. The next output spike at t 2, however, is close enough to the previous input spike at t3. The weight Ji is changed by w~ < 0 plus the contribution W(t3i - t2) > 0, the sum of which is positive (arrowheads). Similarly, the input spike at time t4 leads to a change win + W(t 4 - t 2) < 0. Taken from [47].
W(s) - q
exp(s/'csyn)[A+(1 - s/F+) + A_(1 - s/~_)]
for s ~ 0,
A+ e x p ( - s / % ) + A_ e x p ( - s / z _ ) )
f o r s > 0.
(23)
Here, as before, s - ( - t" is the time difference between presynaptic spike arrival and postsynaptic firing, r I is our small learning parameter, ~+ : = "csyn"c+/('c syn -Jr- 1;+), and ~_ : = "csyn'c_/(1; syn -Jr- I - ) . Parameter values as used in numerical simulations [47] are q = 10 -5, A+ = 1, A_ = - 1 , -csyn = 5 m s , T+ = 1 ms, and ~_ = 20 ms. Spike generation is (nearly) always a local process in time and so are the 1 ~
J . L . van H e m m e n
792
A
B
o
.
.
.
~
o =
40
"
"
"20
"
" ~'
d
-,
.
o
0-1. . . . . .
~ t o ~
-60
.
.
.
........................
.1o0 ~o
4 ~i....-.',.-%,--o---..- ........
"oU
o
U
. A!
E <4o = s [ms]
.
80
-o
21"1
I
~" lOO
.6o 4o ~o
i ~
o
2o
~,
4o
,~
so
8o loo
Time of Synaptic Input (ms)
Fig. 7. (A) The learning window W in units of the learning parameter 1"1as a function of the delay s - - - t f - t~ between presynaptic spike arrival at synapse i at time ~ and postsynaptic firing at time : . If W(s) is positive (negative) for some s, the synaptic efficacy J~ is increased (decreased). The increase of Ji is most efficient, if a presynaptic spike arrives a few milliseconds before the postsynaptic neuron starts firing (vertical dashed line at s = s*). For Isl ~ we have W(s) ~ O. The form of the learning window and parameter values are as described in Eq. (23). Taken from [47]. (B) Experimentally obtained learning window of a cell in rat hippocampus; reprinted by permission [55]. The similarity with the left figure is evident. It is important to realize that the width of the learning window is to be in agreement with other neuronal time constants. In the auditory system, for instance, these are nearly two orders of magnitude smaller so that the learning window's width scales accordingly. A Poisson process operating in a learning w i n d o w of finite width (of a few milliseconds) emulates that input frequencies are never fixed but belong to a finite frequency range. The time interval [ t - ~q,t) is taken to be big since, due to the adiabatic hypothesis, learning is so slow that we can safely a s s u m e / ] to greatly exceed neuronal times such as interspike intervals and the width of the learning window. Nevertheless we will arrive at a relatively small change of the Ji's so that the assumption concerning ~q is self-consistent (otherwise we do not see anything). W e can divide the time interval I t - ~q, t) into m a n y small intervals that are, stochastically, independent of each other - apart from a minuscule overlap at their borders. Hence the sum (22) is self-averaging; cf. the discussion following (4), a process that is inc o r p o r a t e d here. The above averaging was one over the randomness. We are now going to perform a n o t h e r one over time. To fully appreciate what is going to happen, we turn to a differential-equation problem, d
dt x - qF(x, t),
(24)
where q is 'small' and F ( x , t ) for fixed x is a periodic function of t, i.e., F(x, t + T) = F(x, t). After one period x has hardly changed so that, f o r f i x e d x, we can average F over t. T h a t is to say, instead of (24) one studies [61,62]
Theory of synaptic plasticity
793
ft d F(x)- "- -fl t-r dt'F(x,t') ~ ~ x - riF(x).
(25)
Here the integral over time, viz., t', is performed with x, the argument of F, fixed; the integration boundaries t - T and t of the integral in (25) can be replaced by 0 and T, respectively. Hence the differential equation we arrive at is an autonomous one since /~ does not depend explicitly on t. It is plain that the whole argument hinges on ri being small. In fact, under suitable conditions the 'method of averaging' [61,62] can be generalized to nonperiodic F. Here we will simply average over a period of duration 7] and often use an overbar to indicate this. We now return to our problem, viz. (22) averaged over the randomness, and average over time as well. This sounds quite harmless (it is) but we will soon see the effect is beneficial. To simplify the notation, we first introduce two spike flows, 3 sin(t) = Z 5 ( t - if), tfi <<.t
S~
- ~ 5 ( t - t"), tn <~t
(26)
and rewrite (22), introducing angular brackets to indicate an average over the randomness, AJi(t )
1
dtt[winis~n(t,)) -+- wOUt(sout(tt))].
+~
rl
a,-r~-t, dsW(s)(Siin(t' +s)S~
"
(27)
It is evident that both types of averaging, over randomness and over time, have been taken into account. So far so good. The first term on the right, the time average (s}n (t)) of the rate function (sin(t')) for times t' in the interval [t - 7], t), is a mean which we call vin i (t). For an inhomogeneous Poisson process (see Appendix B) this is nothing but the mean intensity ;~(t) where the probability of finding one spike in an interval of length At near t is )~(t)At. If )~i is a periodic function (in the auditory system often for frequencies in the kHz range), then its time average is a constant so that the time dependence is gone and vin i (t) - vi. The second term, the time average of (S~ which is to be called v~ (t), is harder to compute since it entails both the outgoing and all the incoming processes, the latter '"deciding" together when an action potential will be generated. For later reference we summarize the above two definitions, yiin(t ) "--(sin(tt)),
v~
"--
(S~
(28)
The former refers to the input only, the latter takes the output by itself. The truly hard nut is the double integral in (27), explicitly correlating input and o u t p u t - a distinguishing property of Hebbian learning. Let us take a "typical" t ~, say t' - t - 7] + xT1 with 0 < x < 1. Then the lower bound of the integral over s is 3
Since there is no fear of confusion we are using the same notation S for spins and spike flows.
J.L. van Hemmen
794
effectively - x / ] while the upper bound is ( 1 - x)Ti. The learning window W is something local in time; for the auditory system of the order of milliseconds, for most of the cortex seconds - anyway, much, much shorter than 7]. Hence for our "typical" t' the lower bound of the integral over s is - o c whereas the upper bound is +oc so that, up to a negligible error, we are left with
! 7]
at'
dsW(s)(Sln(t ' +
7i
s)S~
= f s~ dsW(s)~l fjr~ dt'(sin(t' +s)S~
(29)
Returning to (27), we note that we can transform it into a differential equation since AJ;(t) = J;(t) - J,(t - Tl) and, due to the adiabatic hypothesis, the change of J/is so slow that AJi(t)/Tl can be replaced by dJ;/dt. In other words, we choose 7] so large that it greatly exceeds all neuronal times, e.g., interspike intervals and the width of the learning window W, but on the other hand is much smaller than q-1 _ all in all, a condition fully consistent with the Hebbian philosophy "practice makes perfect". That is to say, T! separates neuronal and learning time scales. Then we find, using (27)-(29),
[
':t r~ dt' (sin(t' + s/xout(t')) ] .
--dtJi - 1"1 w ininvi + w~176 +
oc dsW(s) ~
(30)
This equation is exact and describes the time evolution of infinitesimal synaptic plasticity for a neuron with given inputs. It is a nice aspect of (30) that the final integral over t' is nothing but the timeaveraged correlation function. The correlation function itself is (s~n(ttt)S~ We may interpret it as the joint probability density for observing an input spike at synapse i at the time t" and an output spike at time t'. Hence we write
1
Ci(s,t) := TII
f/
TI
dt'(sin(t ' + s)S~
= (s)n(t + s)S~
(31)
the second equality being just a definition. Altogether we get a synaptic dynamics of appealing simplicity, d
r in in
-~tJi -- rl[w vi +
wOUtyOUt f o c
+ J-
dsW(s)Ci(s, t)].
(32)
OG
In this form the learning equation is easy to remember: the input rate v in i modifies the synaptic efficacy through w in, the output rate v ~ does so through w ~ and the Hebbian correlation function 6',. favors or disfavors it through the learning window W. Appearances are deceiving, however. Not only do v ~ and Ci depend on J; but also, through S ~ on all the other Jj with j -r i. Moreover, neuronal firing is intrinsically nonlinear. Hence synaptic dynamics is an intricate collective process. Fig. 8 gives an illustration of what may, and often does, happen. Inspired by the
Theory of synaptic plasticity
795
~i i
i.
1,
t~
]l]tlllt]]]t][]lli~
,,n,.,.,.,,,,,,,,, twllllilnlJlWUlllllltlllWliliWt ,
- I ...................................................................................... l ............................................................................................................. I
tt
Wil!IIH!iJW!ItltlIUWlUi!t~!WW!U!!!UIIIlllt~! Fig. 8. Bottom to top, selection of synapses during pattern formation of the synaptic connectivity on an integrate-and-fire neuron. As shown by the panel at time tl, a set of axonal delay lines with a uniform distribution of synaptic strengths and delays in the interval [0, T] gets Poissonian input with frequency co = 2Tc/T. The panel at time t2 exhibits symmetry breaking and exponential growth; the latter is a simple consequence of linearizing (32). At time t3 saturation sets in and at t4 we have reached full saturation, a stationary state characterized by "survival of the fittest". barn-owl case [45], we imagine a set of axonal delay lines contacting a neuron. They exhibit a uniform distribution of delays and of synaptic strengths (>0) to start with. The neuron is of the integrate-and-fire type and the solution as shown is numerically exact. As one sees, there is first a symmetry breaking where certain synapses "grow" faster than the rest. The fact that certain delays are favored makes that their 'auto'correlation function exceeds that of more 'distant' axonal delays. In this way they grow faster and the others deteriorate in their role of "those that come too late". This initial stage is characterized by an exponential growth (or decrease), the rationale of which will be illustrated by the Duhamel formula (49). The next stage where a few synapses are favored and the rest is eliminated is hard to characterize since it is governed by a nonlinear dynamics due to the integrate-and-fire neuron. The final stage is a simple saturation at an upper bound determined by finite synaptic resources or a lower bound, zero, where nothing is left. W h a t we see is a kind of evolutionary process where only a few axonal delay lines, the "fittest", survive. In the next section we will study an exactly soluble neuronal model that allows a disentanglement of the different inputs and, in this way, provides a more precise feeling for what is going to happen.
J.L. van Hemmen
796 5. Disentangling synaptic inputs: the Poisson neuron
Eqs. (30) and (32) tell us that "all we need" for deriving the time evolution of the synaptic efficacies is v ~ and the function C~ correlating S)n and S ~ In a threshold model, such as the SRM (Section 2), the S~n together determine S ~ in a nonlinear way because of the threshold. Disentangling input and output and obtaining exact solutions is thus prohibitively difficult. We therefore introduce a model, the Poisson neuron, that allows for an exact solution [47] of the synaptic dynamics (32) by circumventing the threshold but keeping the firing rate.
5.1. Poisson neuron: definition and properties Spikes originate, so to speak, from the potential v(t) as given by (3). We now define the Poisson neuron to be the inhomogeneous Poisson process (see Appendix B) with rate function, or intensity,
vo +
ltl - vo + EJ i.f
ltf l
lt- tf l
/33
where v0 is a spontaneous firing rate. A Poisson process is defined by three properties: (i) the probability of finding a spike between t and t + At is )~out(t)At, (ii) the probability of finding two or more spikes there is o(At), and (iii) the process has independent increments, i.e., events in disjoint intervals are independent. When the potential v(t) in (33) is high/low, the probability of getting a spike is high/low too. The input processes (26) are taken to be Poisson as well, a reasonable, often even realistic, assumption. For those who like an explicit nonlinearity better, the clipped Poisson neuron with
~out clipped -- V1~)[t~(t) -- ~1]
(34)
and | as the Heaviside step function of (11) is a suitable substitute that also allows an exact disentanglement [63]. In fact, practically any function of v will do [64]. For the sake of convenience we require that the integral over e (instead of e's maximum) be one. We start by noting that (S ~ = (~~ where the former average is over both the output process, i.e., the Poisson neuron, and over the input processes whereas the latter is over the input processes only; hence the lower index 'in' served here as a reminder. Using (33) we then get
/S ~
N ~OOC N = v0 + Z Ji(t) ds g(s)~,i.n(t- s) =: vo + E Ji(t)Aiin(t ). i=1 i-1
(35)
The first equality in (35) is as in the transition from (78) to (80) in Appendix B. In agreement with the previous section, Ji has been treated as an adiabatic variable and, thus, is taken to be constant on the time scale of/]. In (35) it can therefore be evaluated at time t. The final equality defines AI.n as the convolution of e and )~in. To compute (Si~.n(t+ s)S~ in (30) and disentangle input and output, we exploit the properties of a Poisson process. For the moment we put v0 -- 0 and define
Theoryofsynapticplasticity
797
hi(t) " - J i ( t ) ~ f e ( t - ~ ) .
Performing the average associated with our Poisson
neuron, viz. (33), we find
(sin(t +s)S~
- (S~n(t +s)[hi(t) + Z hj(t)]~.
\
j(7s
/
(36)
The hj with j r i and Si n being independent, the average of their product is a simple computation as it factorizes so that we can use (35). The result is
(sin(t-Jr- s) ~ hj(t)) -
j(~=i)
)~iin(t
+ S) ~ A~n(t)Jj(t).
j(r
(37)
It it will be recollected in (41) below. The average (s~n(t + s)hi(t)) is over the input process (26),
I [~f ~(t-'~'S-- 4")I X [Ji(t)~f F~(t--t~i)l ).
(38)
The correlations are explicitly present in the arrival times tf/' and t{i of the spikes as they hit synapse i. The disentanglement that is to come is exactly as in the transition from (83) to (84) in Appendix B. We approximate the delta function in (38) by the normalized indicator function (At)-ll]{spikes in[tl,tl+At)}((O)with At--+ 0. Here co is sampling our probability space, i.e., the collection of random events, so that averaging means integrating over e0. Furthermore, we discretize the time axis by breaking it into intervals [tk, t~ + At) of length At and take tl := t -+- S (for k := l) as an end point of one of the intervals. Keeping an eye on (38) and Ji(t) on ice, we have to evaluate averages of the form
(At)--l ll] {spikesin[tz,tl+At)}(O))Z ~{spikesin[tk,tk+At)}(O))~(t -- tk) ) .
(39)
k As long as At > 0 we have to take into account that more than one spike may occur in an interval; 'no spike' is easy because it gives nothing. As At ---, 0, the probability of getting more than one spike in an interval of length At is o(At), in perfect agreement with neuronal refractoriness, and thus may be neglected. In the sum occurring in (39) we separate the term k = I from the rest and note that for k = I we get 4 2 - I whereas the events k r l are independent so that expectation values factorize. Remembering tt = t + s, we then end up with
)~iin(tl)[e'(t- tl) -k-Z )~iin(tk)e'(t--tk)At] k(r
= )~iin(t-+-S)[~,(--S)d-~ )~iin(tk)f.(t--tk)At].
(40)
k(r In
the
limit
At 7 0 , the above Riemann sum converges to its integral which is nothing but Aiin(t). Collecting terms and reinstalling (37), v0 > 0, and Ji(t), we obtain
f dt';~iin(t')e(t - t'),
798
J.L. van Hemmen
(s)n(t -+- s)s~
- )viin(t q- S) V0 q- Ji(t)f,(--s) Jr-
Jj(t)m~(t)
,
(41)
where, except for j = i, the sum stems from (37). We note that a ( - s ) r 0 only if s < 0. In view of causality, this makes sense since S~n(t + s) can influence S~ only if s < 0. In fact, the term Liin(t + s)Ji(t)a(-s) incorporates the way in which an input spike at synapse i at time t + s influences the neuronal output at time t through e ( - s ) and, hence, is correlated with itself. The sum represents the influence of 'other' times (j = i) and other synapses (j r i). Time-averaging (41) is trivial. We insert (41) in (31) and (32) and invoke the adiabatic hypothesis of the previous section so as to find Ci(s, l) -- )viin(t -+- s)[v0 -~- J i ( l ) g ( - s ) ]
N q- ~ Jj(t)~iin(t Jr- s)mSn (t). j=l
(42)
By definition, )vl.n(t) -- Vini (t). Combining (32) and (42) we obtain for 1 ~< i ~< N,
d [
[
_.~ji_~_.r I win Viin + wOUt V0 + ~-'~JjAj in (t) j=l
+
dsW(s)
] + s) Ajin (t)] ~~ " ~ J j ~in(t i j=l
~,iin(t+s)[Vo-+-JiE(--s)]q-
~c
. (43)
Functions of time that carry no argument are to be taken at time t. A single glance suffices to convince us that (43) is a linear differential equation. The time averages, though slowly varying, might still depend on time. If so, the solution is standard [78,79] but very hard, and explicit expressions can only be obtained numerically. If, on the other hand, the 9~/are periodic functions of time, as in the auditory system, the time averages are (practically) constant and an analytic approach is within reach, at least for the mean activity. Throughout what follows we drop the prefactor 1"1 from the dynamics by rescaling time through the substitution tit := t and redefining all functions in sight that depend on time; to this end we use (43) and bring r I to the left. Alternatively, we measure everything in units of size q. We define a few quantities,
[ S
ai - w v i _Jr_VO w ~ _+_
dsW(s))~iin(t nt- s
~C
,]
,
b/=
Ci =
dsW(s)~(-s))~ii"(t + s), ~X2
Qij(t) -
dsW(s))~in(t + s)A~(t), O(3
(44)
Theory of synaptic plasticity with A~(t) = f ds d
799
e (s))viin(t- S) as in (35), and find for 1 ~
N
Ji -- ai -Jr-Z j=l
N
bjJj + ciJi -t- Z
QijJj.
(45)
j=l
W e can rewrite (45) in terms of the vector ,I = (Ji) E ~N by defining the diagonal matrix C = diag{cl, c 2 , . . . , Cu}, using Dirac's bra-ket n o t a t i o n 4 a n d i n t r o d u c i n g the N-vectors 11) = ( 1 , . . . , 1), a = (a~), and b = (bi) so that (45) reappears in the form
d dt a - a + (ll)(b I + C + Q)J.
(46)
As it will turn out in Section 5.5 that C does n o t d o m i n a t e the a s y m p t o t i c b e h a v i o r of (46) we simplify it by taking ci =- c so that C = c11, the spectral theory o f C + Q is reduced to that of Q, a n d we are left with d dt J - a + (11> (bl + c~ + Q)J.
(47)
I n p u t channels i and j whose delays are fixed give rise to a specific matrix element Q~j that takes the delay structure into account. F o r input processes that are Poissonian with periodic intensity, the vln a n d v ~ as defined by (28) are constants w h e n / ] is large e n o u g h , and so are the Q~j. It is i m p o r t a n t to realize that this statement is true on the time scale o f / ] b u t need not hold on that o f the total learning time. Finally, the learning e q u a t i o n (47) allows the analysis o f the influence of noise on long-term synaptic plasticity. L e a r n i n g results f r o m stepwise, infinitesimally small weight changes: "Practice m a k e s perfect". W i t h noise, each weight p e r f o r m s a r a n d o m walk whose expectation value is described by the ensemble-averaged e q u a t i o n (47). F o r an analysis o f noise as a deviation f r o m the m e a n the reader is referred to the literature [47].
5.2. Relation to rate-based Hebbian learning In neural n e t w o r k theory, H e b b ' s ideas [28] have usually been f o r m u l a t e d as learning rules where the change o f a synaptic efficacy Ji d e p e n d s on the correlation between the m e a n firing rate viin o f the ith presynaptic n e u r o n a n d the m e a n firing rate v ~ of a postsynaptic n e u r o n , viz., d ~ . in.v out "-~ d4(viin) 2 + ds (v~ d-'-~tJi "-- do -k dl Vin i + d2 v ~ + u3vi
4
2,
(48)
Dirac's imagination has led to an appealingly simple notation. The idea is this. A Hilbert space a/t~is a vector space with inner product ~ x ;4P 9 {x, y} H (xly) E C, which is taken to be linear in the righthand side, viz., y. The resulting inner product looks like a bracket so that Dirac called (xI a 'bra' and lY) a 'ket'. Vectors in our Hilbert space are kets and written lY)- Then the operator P = Iz><xl is projector-like and bound to operate on vectors lY) in such a way that PlY) = Iz)(xly) c< Iz). See the literature [88, Section 14.4] for additional information.
800
J.L. van Hemmen
do < 0 and d~,..., d5 being proportionality constants. Apart from the decay term do and the "Hebbian" term v iinV~ proportional to the product of input and output rates, there are also synaptic changes which are driven by the pre- and postsynaptic rates separately. The parameters d o , . . . , d5 may depend on Ji. Eq. (48) is a general ansatz with terms up to second-order in the rates; see, e.g., [80-82]. In case do = d4 = d5 = 0 and under the (strong) assumption that S in and S ~ are independent, it is straightforward to derive (48) from (32) directly. Alternatively, one can obtain (48) from (45). Linsker [80] has derived a mathematically equivalent equation to (47) by starting from (48) and using a linear graded-response neuron, a rate-based model. The difference between Linsker's equation and (45) is, apart from a slightly different notation, the term cl] and the fact that (45) has been derived from underlying processes, viz., spikes, whereas Linsker's equation is the result of the rate ansatz (48). The present approach is far more comprehensive. Correlations between spikes on time scales down to milliseconds or below can therefore enter the driving term Q for structure formation. They may be, and I expect are, essential for information processing in neuronal networks such as auditory and electro-sensory systems [83]. The mathematics of (47) has been analyzed extensively by MacKay and Miller [84] in terms of eigenvectors and eigenfunctions of the matrix ]l)(b] + Q with b - bl and c = 0. The matrix (Qij + b + cS;j) in (47) contains e times the unit matrix and thus has the same eigenvectors as (Qij + b) while the eigenvalues are simply shifted by c.
5.3. Synaptic dynamics and self-normalization Normalization of the average synaptic efficacy or of the mean output activity is a very desirable property for any synaptic dynamics. After all, the mean output rate should not blow up during learning but converge to a finite value in an acceptable amount of time. Standard rate-based Hebbian learning, however, can lead to unbounded growth. Several methods have been designed to control this unbounded growth, such as subtractive and multiplicative rescaling of the weights after each learning step so as to impose, e.g., ~ j . J j . - const, o r ~ - ~ j j 2 _ const. [85]. Most of these methods make use of the Jj dependence of the parameters d l , . . . , d5 in the learning equation (48). Mathematically they do what they ought to do but, from the point of view of biological physics, it is unclear where they come from. Hence we will derive self-normalization from scratch. We are going to show under what conditions the arithmetic mean jay__ N-1 }-~ij/. of the synaptic efficacies, and hence the mean neuronal output, converges to a finite limit as t ---, oc. In other words, we focus on the question of how we can get self-normalization. The first result in this direction, an even more pronounced form for integrate-and-fire neurons (see Section 2) in the general context of (32), was found numerically by Gerstner et al. [45]. We will see that, for a realistic scenario, we need ai > 0 and hi < 0, whatever i. The former condition is evident once we realize that naive synaptic efficacies start at J~ m 0; after all, where else? Then (47) is nothing but d J / d t - a. For excitatory synapses starting at Ji = 0 it
Theory of synaptic plasticity
801
would be good to increase. Hence we cannot but require a / > 0. We will assume the excitatory case throughout what follows; inhibition can be treated analogously. The linear equation (47) is of the form d J / d t = M J + a with M being the matrix (]l)(b I + cll + Q). As long as M is fixed, the synaptic dynamics can be solved explicitly through Duhamel's formula, J(t) --- exp[(t - to)M]J(to)+
ft0t ds e x p [ ( t -
s)M]a.
(49)
The right-hand side of (49) satisfies (47) and equals J(t0) at time t = to. Hence it is the solution J(t) we are looking for. In the asymptotic limit t ~ oo, the vector a may depend on s, i.e., time, so that a = a(s). Duhamel's formula is as valid as it was for constant a. To incorporate a time dependence of M one has to replace exp[(t - t0)M] by the corresponding solution operator U(t, to). Since both a(s) and M(t) in general preclude any analytic solution, we will not pursue the issue here but take both constant. Let us, then, suppose first that all eigenvalues of M, the so-called spectrum cr(M), have strictly negative real parts. Accordingly we get e x p ( t M ) J ( t o ) ~ 0 as t ~ oc while in the very same limit the integral gives - M - l a , the fixed point of the differential equation. So it all fits, provided - M - l a has all components nonnegative, which may well happen. We need to keep in mind, though, that the key assumption on or(M) remains a bit hard to verify. We did not require M to be diagonable. In fact, except for or(M), we did not assume anything yet. To get sharp analytic results we now suppose (i) bi = b for all 1 <,i<<,N, which implies ]b) = b]l), and (ii) a specific commutator vanishes, [Q, I1><11] = o .
(50)
Thus all row and column sums of Q equal a single number qN since I1)(11 is the matrix whose elements all equal 1. The matrix Q being N x N, q tells us how big/ small a "typical" matrix element is. Phrased differently, the sum qN has the right scaling behavior as N becomes large. In passing we note that one can do with slightly less [86]; say, [Q, ]l)(b]] = 0 and variations thereof. There are at least two consequences. First, the spectral theory of P := I1)(11 and Q refers to two different things that can be sorted out separately and together determine the effect of M = bP + cl] + Q in that exp(tM) = exp(ct)exp(tbP)exp(tQ). The only eigenvector of the Hermitian P with nonzero eigenvalue (= N) is 11) = ( 1 , . . . , 1); the eigenvalue N is nondegenerate so that 5 11) is also an eigenvector of Q. Alternatively, it is a direct outcome of (50) that the corresponding eigenvalue is qN. Second, the differential equation governing the dynamics of jav is d J av/dt = a av + (Nb + c + Nq)J av.
(51)
Suppose two matrices A and B commute, i.e., [A,B] := AB - BA = 0. Let a be an eigenvector of A with nondegenerate eigenvalue a. That is, Aa = ~a. Then BAa = A(Ba) = ~(Ba) so that, ~ being nondegenerate, Ba -- 13a.In other words, a is also an eigenvector of B. For degenerate eigenvalues of A the present argument breaks down.
802
J.L. van H e m m e n
with a av " - N -1 ~-~j aj > 0. Let us now put m "-- Nb + c + Nq. Then the solution of (51) is again given by Duhamel's formula (with to = 0), j a v (t) -
e t m j av (0) + m -1 (e tm -
1 ) a av.
(52)
To get a finite result we simply require m < 0. Then jav (t) approaches the fixed point j~v := _aaV/rn > 0 of the differential equation. The fixed point is asymptotically stable if and only if m < 0. If (50) does not hold, not even approximately, things become a bit harder. 5.4. Asymptotics and structure formation
By now there is no harm in starting with a matrix M that has a few eigenvalues with a strictly positive real part and calling those with largest and second-largest real part k~ and Z~; for the sake of convenience, we also assume they are nondegenerate. We return to Duhamel's formula (49), viz., its upshot for constant a and constant M, J(t) = e x p ( t M ) J ( 0 ) + M - l [ e x p ( t M ) -
1]a,
(53)
which is the matrix version of (52) with to = 0. If the matrix M is diagonable and the real part ! ~ l of )~l is appreciably bigger than ~R~,2, we need only know the normali:~.Li eigenvectors el and gl with (gl l e l ) - 1 belonging to the "largest" eigenvalues El and ~,~ of M and its Hermitian conjugate (i.e., adjoint) M~ as they determine the leading contribution )~llel)(gll in the biorthogonal expansion [87, Section 11.23] of M in structure formation. 6 Eq. (53) then tells us that, after an initial phase with t ,,~ 0, there is exponential growth along el. Since ~ , l > 0 the nonzero components of J(t) are bound to blow up or decrease to - ~ as t becomes large. This is of course unrealistic since synaptic resources are finite. For excitatory synapses we therefore assume an upper bound ju, with 0 < j u < oo, and a lower bound 0. If the efficacy of synapse i has reached ju, it will stay there as long as its time derivative J[(t) is positive. On the other hand, once J[ (t) < 0 it may decrease. For the lower bound the argument is just the opposite. We thus see that sooner or later, with the timing depending on ~)~, we get saturation of (53), i.e., of J(t) ~ exp(t)~l)[(gl [J(0)) + ~11 (gl la)]el - m - l a .
(54)
I f M is diagonable, then it has N independent eigenvectors ei that constitute the columns of a matrix T with T - I M T = diag()~l,... ,)~X) (i). Hence T t M t ( T t ) -l = diag(~,~,... ,~'N) (ii) and the columns gi of the matrix (Tt) -1 are eigenvectors of M t with eigenvalues k~. For nondegenerate eigenvalues it is a simple argument to show (gj]ei) = 0 for i r j: (Mtgj]ei) = (gjlMei) so that ~,j(gj]ei) -- (gj]ei)~,i with ki r )~j. Now (gi]ei) r 0 because otherwise gi = 0, so that we can put the inner product equal to 1 and find M = ~,i~,ilei)(gi[. In addition, e x p ( t M ) = y~iexp(t~,i)lei)(gi[. For self-adjoint M = m t we are back at the ordinary spectral representation. The reader may consult Merzbacher [88, Section 14.4] for a detailed account of Dirac's convenient bra-ket notation. In fact, the only condition on M that is needed for a l~iorthogonal expansion is that M be diagonable. Then: (i) says T = ( e l , . . . , ex) and (ii) asserts (T-1)T __ ( g l , . . . , gx). Hence biorthogonality is equivalent with (T -1)T = 4, which is evident. The expansion itself can be verified on a complete set of eigenvectors of M, viz., {ei; 1 <~ i ~< N}.
Theoryofsynapticplasticity
803
Once a component of J reaches the upper or lower bound the problem becomes nonlinear. We then take it out, fix it, and continue with the remaining problem, which is again linear; and so on. Though implausible, a 'fixed' component is allowed to return to the interior of [0,J"] once its time derivative points into the 'right' direction. With the benefit of hindsight we can now formulate "self-normalization" to be a limit state where 0 < j a v = N-1 ~ - ~ i j i < ju., for inhibitory synapses, or mixtures, the statement is to be modified accordingly.
5.5. Simple example of structure formation We finish this section by studying a simple, exactly soluble, case. To this end we divide the N statistically independent synapses into two groups, Y l and ~f2 with N1 and N2 synapses, respectively, and N1 + N2 = N while N1, N2 >> 1. Since each group contains many synapses, we may assume that N1/N and N z / N are of the same order of magnitude. The spike input at synapses in group A~I is generated by a Poisson process with a constant intensity )~lin(t) - - v in, which for i c ~/~1 is taken to be time independent. Using the definition (44), we therefore get Qij(t)=-Qll for i and/or
j E JU1. The synapses i E A/~2 are driven by some time-dependent input, )~iin(t)- )~in(t) with the same mean input rate )~in(t)= Vin as in group A#I. Without going into details about the dependence of )~in(t) upon the time t we simply assume )~in(t) to be such that Q~j(t) - Q22 for i,j E A/~2 and regardless of t while Qij(t) =- Qll in all other cases. Here Qll and Q22 are constants and we have used (44). For the sake of simplicity we require in addition that Q22 > Qll. In summary, we suppose in the following:
Qij(t)
__ ~ Q22 > t Qll
Qll
~/'2, otherwise.
(55)
for i,j c
We recall that Qij is a measure of the correlations in the input arriving at synapses i and j; cf. (44). Eq. (55) tells us that at least some synapses receive more positively correlated input than the rest, a rather natural assumption. As one may expect in the animal kingdom, some synapses are "more equal" than others. We now examine the evolution of the average weight in each of the two groups iV" 1 and ~ r 2 and put jpv
1 ~ - N 1 LT Ji,
1 j~v =N22 ~
i 1
,/l.
(56)
iGAr2
As long as lower and upper bounds do not influence the dynamics, the corresponding rates of change are determined by (47),
d (J~ 'v) dt
j~,v
(bNl+C
(1) - a
1
+
bN1
bN2 (b + Q)N2 + c
)(jpv) \ J~V ,
(57)
804
J . L . van H e m m e n
where we have put b " - b + Qll and Q " - Q 2 2 - Qll > 0; the inequality is by assumption. Obtaining an explicit solution to the above equation is straightforward once we realize its relation to a q u a n t u m spin 1/2. The matrix M appearing on the right in (57) is a linear combination M - n01] + n. ~ of the unit matrix 11 and the Pauli spin matrices [88, Section 13.6], (Yx--
(0,) 1
0
'
CYv--
(0 i ) ( , i
0
'
(Y:--
0
-
0)
"
Here i - ~ 1 , the center dot in n. o denotes a scalar product, and n E ~3 if and only if the matrix M is Hermitian. If we want to use the Duhamel formula (53) - we d o - we need to compute exp(tM). This is easy since simple algebra based on ~ c y v - ic~: et cycl. or any decent q u a n t u m mechanics book [88, Section 13.6] shows that (n. g)2 _ (n. n)l] and consequently exp(tM)
-
[ sinh( nx/-fi-~ t) e""' cosh( nv/n~n t) + v/ft. n
1.
(59)
Once M is diagonable the rest of the game is computing two eigenvalues and the corresponding eigenvectors; biorthogonality as treated in Footnote 6 is helpful. It is however simpler, and also more physical, to exploit the fact that both N 1 / N and N 2 / N are O(1) and take them equal, i.e., Nl = N 2 - N / 2 . Then M is Hermitian (here real and symmetric) and n c ~3 so that we can write n - nfi with n being the length v/ft 9n of the vector n and fi being a unit vector. In physical terms, ft. o is the projection of the spin onto the direction ft. Furthermore, n o - ( b + Q/2)NI + c, n - (bN1, O , - Q N 1 / 2 ) , and the eigenva|ues of M are m~ " - no 4- n. As Eq. (59) shows explicitly, the latter result also determines the asymptotics for complex n. Keeping (53) and (54) in mind, we can now exploit (59). The eigenstates of M are identical with those of fi-~, the projection of the 'spin' ~ onto the direction fi; the corresponding eigenprojections are le+)(e• = (1/2)(11 4- ft. ~). A simple computation gives m~: " - no 4- n - (b + Q/2)N1 + c • Nl [b2 + Q2/4]1/2. Since Q > 0 by assumption, both eigenvalues m+ are positive, if b > 0 and N~ >> 1 so that c is subdominant; we can use c for fine-tuning, however. If on the other hand b < 0, then m_ < 0 but m+ > 0. In both cases the eigenstate e+ belonging to m+ > 0 is dominant. As we are given [e+)(e:~[, the source term a, and the initial condition d(0), we know what the asymptotics looks like; cf. (54). Fig. 9 shows the result of a realistic simulation, a nice academic exercise: 0 < J;(0) - J" for all i. Synapses that are "more equal" than others, win. We can now easily understand why. In the present case a = a l , J(0)-J~l, and j u + a/m+ > 0 so that the vector 1+ := ]e+)(e+ll ) - (1 + (N1/n)(b - Q/2), 1 + ( N 1 / n ) ( b + Q / 2 ) ) T tells us what will happen; (N1/n) does not depend on N1. A nontrivial structure occurs only if b < 0 since the first component of 1+ is then negative whereas the second is positive. This is the case in Fig. 9, where b < 0. For b > 0 both components of the vector 1+ are positive and a trivial saturation occurs.
Theory of synaptic plasticity jU
805
t=103S
104s
2.93xl 04
7xl 04s
Ji
0.08
0 lllll'llllllllllllllllllllll'llllllllllll~l[lllillil:iif~[~iill~lll[l[~]~ 7TTT:I[[[ I 50
0.06
==i'7:7"=i
jav
0.04
0.02
0
2X104 i
.
-
,
4X104 t[s]
6X104
Fig. 9. Temporalevolution of the average synaptic efficacies j~v and j~v as defined in (56), and jay = (j~tv + J~ )/2 as a function of the learning time t in units of 10 4 s. This is a fictitious time to keep the computational time finite. It is in general too fast for biology but can be adapted to it by a simple rescaling without changing the picture. The quantity jav is the average weight of all synapses, jpv and j~v are average weights of the groups Y l and Y 2 , respectively. Synapses i in group ~/~1, where 1 ~ 0 and is happening on a time scale that is two orders of magnitude slower than that of the fast relaxation. Now synaptic efficacies saturate at the upper bound (j~v) or at the lower bound (jpv). The insets show the weight distributions at times t = 103, 104, 2.93 x 104, and 7 x 104 s (arrows). Taken from [47].
A careful l o o k at the right inset o f Fig. 9 reveals, h o w e v e r , t h a t the a g r e e m e n t b e t w e e n t h e o r y a n d e x p e r i m e n t is n o t perfect. W h y is that? T o see why, we r e t u r n to (45) a n d a d a p t it to the p r e s e n t situation,
dt Ji - a + b
JJ
+ cJi + QSi,~2 Z
JJ
(60)
jEJU2
with 8;,w 2 = 1 if i E #[/'2 a n d 8i,w2 = 0 otherwise. W e s t a r t e d with N1 = N2 = N / 2 , which h o l d s as long as n o n e o f the Ji has a t t a i n e d the u p p e r or the lower b o u n d , viz., ju or 0. As s o o n as this h a p p e n s , say for i = io E ~U1, we t a k e i = i0 o u t o f (60) b u t not o u t o f ~U1; the p r o c e d u r e m u s t be r e p e a t e d e a c h time a J / t o u c h e s o n e o f the b o u n d a r i e s . F u r t h e r m o r e , N1 a n d N2 n o w b e c o m e d y n a m i c variables c o u n t i n g the
806
J.L. van Hemmen
number of active Ji in A/~l and Y2, respectively, with in general N1 ~ N2. Eqs. (57) and (59) still hold as long as none of the active Ji hits one of the boundaries, but n and no change continuously, as do the Ng in both (56) and (57). Moreover, 1_ (the analog of 1+) as well as a :/: 0 may, and in general will, influence the dynamics. Hence the exact dynamics given by (60) approximates but is not identical with the one given a b o v e - as advertised. In fact, the exclusion process continues until N1 : N2 : 0, beyond which nothing changes any more.
6. Short-term synaptic plasticity Despite being of'short' duration, short-term synaptic plasticity may have profound effects on network behavior and is, in fact, closely correlated with it. We therefore start by outlining the problem and analyzing a simple model of short-term plasticity that is an adaptation of the model of Tsodyks and Markram [65,66] to the SRM (see Section 2). We then specify how the synaptic efficacies change as a function of presynaptic i n p u t - and time. The resulting setup allows a full-blown study of network behavior. 6.1. The problem
Short-term synaptic plasticity is to be contrasted with its long-term counterpart in that it refers to a change in the synaptic efficacy on a time scale of milliseconds up to seconds. It is therefore natural to inquire whether and to what extent this has functional consequences, and to elucidate the underlying mechanisms [65-71]. The experimental observation underpinning short-term synaptic plasticity is the fact [72,89-91] that the transmission of an action potential across a synapse can have a significant influence on the amplitude of the postsynaptic potential (PSP) evoked by subsequently transmitted spikes. In some synapses, the height of the postsynaptic potential is increased by spikes that have arrived previously (short-term facilitation, STF; also called paired-pulse facilitation). In other synapses, the postsynaptic potential is decreased by previously arrived action potentials (short-term depression, STD; also called paired-pulse depression). Short-term synaptic plasticity, or simply short-term plasticity, is different from its well-known counterpart "long-term plasticity" in at least two crucial points. First, nomen est omen, the time scale on which short-term plasticity operates is much shorter than that of long-term plasticity and may be well comparable to the time scale of the network dynamics. Second, short-term plasticity of a given synapse is driven by correlations in the incoming spike train (presynaptic correlations), whereas classical long-term plasticity is driven by correlations of both pre- and postsynaptic activities; a prominent example of the latter is Hebbian learning as studied in the previous sections. 6.2. Modeling short-term synaptic plasticity
Modeling short-term plasticity is based on the idea that some kind of 'resources' is required to transmit an action potential across the synaptic cleft [73,74,66-68]. The
Theory of synaptic plasticity
807
term 'resource' can be interpreted as the available amount of neurotransmitter, some kind of ionic concentration gradient, or postsynaptic receptor availability; cf. Fig. 1. We assume that every transmission of an action potential affects the amount of available synaptic resources and, on the other hand, that the amount of available resources determines the efficiency of the transmission and therefore the maximum of the postsynaptic potential. There is meanwhile considerable evidence [4,89,91] that short-term plasticity is due to presynaptic effects and, hence, to presynaptic correlations only. We can think of presynaptic 'resources' as, e.g., the number of vesicles that determines the release probability [89]. The relevant notion is then the probability p as it occurs in the mean synaptic response N? -- npQ; see Section 1. We are going to discuss short-term plasticity in the context of the SRM (see Section 2). In so doing we closely follow Ref. [77]. It will turn out that the spikeresponse formalism is very convenient in deriving closed expressions for synaptic efficacies as a function of spike arrivals and time. The time-dependent synaptic efficacy Ji;(t) is a function that depends both on time and on the moments of arrival of the spikes from neuron j. This function will be computed in the next subsections.
6.3. Modeling short-term depression The model of Tsodyks and M a r k r a m [66] assumes three possible states for the "resources" of a synaptic connection: effective, inactive, and recovered. Whenever an action potential arrives at a synapse a fixed portion R of the recovered resources becomes first effective, then inactive, and finally recovers. Transitions between these states are described via first-order kinetics using time constants Tinact and ~rec. The actual postsynaptic current is proportional to the amount of effective resources. In the context of the SRM the above three-state model can be simplified further since the time course of the postsynaptic current, as it is described by the transition from the effective to the inactive state, is already taken care of by the form of the PSP given by the response function J~. Focusing on a specific synapse {ij}, we drop its label. The only relevant quantity is the maximum (minimum) J determined by the charge delivered by a single action potential. As we have seen in Section 1, a synaptic efficacy J can be interpreted as its mean (J). We henceforth drop the alternative 'minimum' that takes care of an inhibitory postsynaptic potential and assume an excitatory one, the modifications for inhibition being evident. Transitions from the effective and the inactive to the recovered state are described by linear differential equations. The maximum of a PSP only depends on the amount of resources that are actually activated by the incoming action potential. We therefore summarize the 2-step recovery of effective resources (inactive and recovered) by a single step and end up with a 2-state model of active (Z) and inactive (Z) resources; see Fig. 10. Each incoming action potential instantaneously switches a proportion 0 ~
J.L. van Hemmen
808
A
PS(t)
Z
STD
Z
1/ x
B
11~
A
S'I'F
A
R S(t)
Fig. 10. Schematic representation of the present model of short-term depression (A) and short-term facilitation (B). With short-term depression, every incoming action potential instantaneously switches a proportion 0 ~
dZ
d---t = - P Z S ( t ) + "c-lZ,
2-
1 -Z
(61)
with S(t) = ~ _ , / 8 ( t - tf) as the incoming spike train. This differential e q u a t i o n is well defined, if we declare Z(t) to be c o n t i n u o u s f r o m the left, i.e., Z(tf) := Z(tf - 0). The solution Z(t) is in the interval [0, 1]. The a m o u n t o f charge that is released in a single transmission and therewith the m a x i m u m o f a PSP d e p e n d s on the a m o u n t o f resources that were switched to the inactive state, or, equivalently, on the a m o u n t o f active resources immediately before the transmission. The strength o f the synapse at time t is then a function o f Z(t) and we simply put J(t) = J~ where j 0 is the m a x i m a l m e a n synaptic efficacy. Let us n o w suppose that the first spike arrives at a synapse at time to. Immediately before the spike arrives, all resources are in their active state and Z(to) = 1. The action potential switches a fraction P of the resources to the inactive state so that Z(to + O ) - 1 - P. After the arrival o f the action potential the inactive resources recover exponentially fast in t, a n d we have
Z(t > to) = 1 - P e x p [ - ( t -
t0)/~].
(62)
At the arrival time tl o f the s u b s e q u e n t spike there are only Z(tl) resources in the active state and the PSP is depressed accordingly. T o see h o w to proceed, we integrate (61) between t f - At a n d tf + At so as to obtain Z(tf § At) - Z(tf - At) = - P Z(t/) + O(At), and take the limit At ~ 0. Since Z(t) is c o n t i n u o u s f r o m the left we find Z(tf + O) - Z ( t f ) = - P Z ( t f ) a n d hence
z(t:
+ o) -
(1 -
P)z(t:).
Between two spikes Eq. (61) reads d Z / d t = - d ( 1 - Z ) / d t = x - l ( 1 - Z ) , (1 - Z)(t2) - exp[-(t2 - tl)/r](1 - Z)(q). If, now, tl a p p r o a c h e s (which will also be called tl) f r o m above, then we get
(63) whence a firing time
Theory of synaptic plasticity
809
Z(t2) - 1 - [1 - Z(tl + 0)] exp[-(t2 - tl)/~] = 1 - [1 - (1 - P ) Z ( t l ) ] e x p [ - ( t 2
- t,)/'c].
(64)
In the transition from the first to the second line we have exploited (63). F r o m the first few examples we can easily read off a recurrence relation that relates the a m o u n t of active resources immediately before the nth-spike to that of the previous spike,
Z(to)-
1,
Z(tl) -- 1 - P e x p [ - ( t l - to)/~], Z(t2)
-
-P)Z(tl)]exp[-(t2-
1 -[1 -(1
-
Z(tn) -- 1 - [ 1 - (1 - P ) Z ( t n _ , ) ] e x p [ - ( t n
tl)/'~],
(65)
- tn-,)/'C].
In passing we note that, instead of Z(to) - 1 we could have taken any desired initial condition 0 < Z0 ~< 1. The ensuing a r g u m e n t does not change. The recurrence relation (65) is of the form
Z(t,) -- a, + bnZ(t,_,)
(66)
with
a,-
1-exp[-(t,
- t,_,)/'c],
b,-
(1-P)exp[-(t,-t,_l)/'c].
(67)
Recursive substitution and a short calculation yield the following explicit expression for the a m o u n t of active resources"
z(t,) - . ,
+ b,Z(tn_ )
=.,
+ b,[a,_. +
= a~ + b,a,_l + b,b,_la,_2 + . . . cx~
k-1
k=0
j=0
:
(3O
: Z
an-k(1 -- p)k exp[-(t~ - t,-k)/~]
k=O p
= 1
oo
1 -P Z(1
--p)kexp[--(t" -- t~-k)/~].
(68)
k=l
The synaptic efficacy at time 9.. < t,_2 < t,_l < t is given by
J(t; tn-1, t,-2, . . .) -- jO { 1
t as
a function
P
1 - P Z(1 k--1
This is a key result to w h a t follows.
of the
spike
- p)k e x p [ - - ( t - tn_k)/T.]
arrival
} 9
times
(69)
810
J . L . van H e m m e n
6.4. Periodic input The synaptic efficacy J is a nonlinear function of the spike arrival times tf. We can give a simplified expression for J in the case of a sudden onset of periodic spike input. Let t , - nT for n>~0 and t , - - ~ for n < 0. We obtain from (68) for n>0, n
Z(t.) = 1
P Z( 1 _ p)kexp[_kT/r] 1 - P k=l
P --1-er/~-(1-P)
{1-
[(1-P)e-
r/t]
"}.
(70)
The behavior of Z(tn) for large n can be read off easily from the above equation. Since 0 < e-r/~(1 - P ) < 1, the braced expression converges to unity exponentially fast and the rest, which is independent of n, gives the asymptotic value of Z(t,) as /'/ -----~ O O .
6.5. Modeling short-term facilitation In a similar fashion to Section 6.3, we can devise a model that accounts for shortterm facilitation instead of depression. To this end, we assume that in the absence of presynaptic spikes the fraction A(t) of active synaptic resources decays with time constant ~. Each incoming spike recruits a fraction, or ratio, 0~
A = 1 -A
(71)
with S(t) - ~ f 8(t - tr) as the incoming spike train and A(t) being continuous from the left. Magleby and Zengle [74] used a similar model to describe synaptic potentiation at a frog neuromuscular junction. For a discrete set of spike arrival times t f - to, t~,.., the amount of effective synaptic resources immediately before the nth spike as a function of that before the previous spike is A(t,) = a, + bnA(t,_l),
(72)
where a. = R e x p [ - ( t , . -
t._,)/r],
b. = (1 - R) exp[-(t,. - t._,)/'c]
(73)
are nearly identical with their companions in (67). In a similar way to (68), we obtain an explicit expression for the amount of effective resources. We adopt a simple linear dependence of the synaptic efficacy J upon the amount of effective resources A of the form J - - J ~ 0 ~
Theory of synaptic plasticity
A
811
STD, P-0.1
B
1.5
STD, P=0.9
1.5
~" 1.
~
"~ 0.5
"~ 0.5 i
i
i
1.
i
0
20 40 60 80 100 120 t/ms C
t/ms D
STF, R=0.2
1.5 <
20 40 60 80 100 120
STF, R-0.8
1.5
1.
~
1.
,,..r,
> 0.5
> 0.5 0
20 40 60 80 100 120
t/ms
0
20 40 60 80 100 120 t/ms
Fig. 11. Membrane potential v (solid line) and synaptic resources Z or A (dashed line) as a function of time in case of short-term depression (A and B) and facilitation (C and D). Spikes arrive at t - 0, 8, 16,..., 56 ms, and finally at t - 100 ms. In all figures, the time constant of the synaptic recovery is ~ = 50 ms and the rise time of the EPSP equals 5 ms; cf. (61), (71), and (6). Both Z and A are numbers between 0 and 1; Z starts at 1.0, A at A0 = 0.1. In (A), only a small portion P = 0.1 of all available resources is used during a single transmission so that the synapse is only slightly affected by transmitter depletion. In (B), the parameter P is increased to P = 0.9. This results in a pronounced short-term depression of the synaptic strength. Short-term facilitation is illustrated in the lower two diagrams for R = 0.2 (C) and R = 0.8 (D). Taken from [77].
J(t;tn-l,tn-2,...) __jo Ao + (1 - A 0 ) . 1 _ R Z ( 1
-R) k exp[-(t-t._~)/~]
.
k=l
(74) In the case of periodic input with tn = nT for n ~>0 a n d tn = - o c for n < 0 the a b o v e e q u a t i o n reduces to the facilitation a n a l o g of (70), R
+ l1
Ill
1}
This implies t h a t as n ~ oc the synaptic efficacy converges exponentially fast f r o m below to the a s y m p t o t i c value jSTF _ j 0 [A
(1 - A0)R
0+
il
]
(76)
812
J.L. van H e m m e n
Short-term plasticity introduces a second time scale into the dynamics of a neuronal network. An analysis of its implications for a homogeneous network of excitatory neurons, the simplest possible case, and simulations showing intricate network behavior despite the apparent structural simplicity, can be found elsewhere [77]. In a similar vein, Buonomano [92] presents numerical evidence suggesting that shortterm plasticity plays a role in neuronal decoding of temporal information.
7. Conclusion and open problems
We began this chapter with a provocative question, "What is synaptic plasticity?". We end it with a more practical one, "What induces synaptic plasticity?", knowing that time scale and order of neuronal events may, and often do, play a key role. Hebbian learning is like a game involving three contestants: a presynaptic neuron, a postsynaptic neuron and a synapse between them. The two neurons interact via synaptic events induced by all-or-none depolarizations of their membranes, beginning in the presynaptic cell and propagating along an axon to synaptic sites on the postsynaptic neuron. This takes a finite amount of time, a delay in the millisecond range. Eventually, the postsynaptic neuron reaches firing threshold, producing an action potential that backpropagates [21] from the site of spike initiation into the synapse. Thus, pre- and postsynaptic cells interact at the synapse through a learning window that relates their spike timing to an increase or decrease in synaptic efficacy. If precise timing is not important, the learning window is broad, and rate coding is a simple consequence. Axonal delays naturally appear as elements of the learning window. The notion of learning window was first conceived in theory [45] but has since been extensively confirmed by experiment; see for instance Fig. 7. This experimental evidence emphasizes milli- and sub-millisecond timing as being of fundamental importance in the three-partner game. Conceptually, we can often think of learning as something that occurs in infinitesimal steps, incremental increases and decreases in synaptic efficacy that, as we have shown, lend themselves to a rigorous mathematical treatment culminating in the learning Eq. (32). Though single synapses may behave quite erratically, we have seen that an ensemble of them provides a neuron with a practically deterministic input due to the strong law of large numbers, a mainstay of stochastic analysis. The ensemble of synapses decides whether the postsynaptic neuron will generate an action potential and, subsequently, a backpropagating spike. Since spike generation is highly nonlinear, we have introduced the notion of 'Poisson neuron' to linearize the dynamics of the ensemble of synaptic efficacies. By applying the central limit theorem to the presynaptic input, one can replace the Poisson neuron by a Poissonian counterpart of arbitrary nonlinearity [64] and none the less solve the dynamics nearly exactly; cf. the Berry-Esseen estimate in Appendix B. The key to unraveling the synaptic dynamics of infinitesimal learning is recognizing that, under normal circumstances, Eq. (32) is self-averaging. Nevertheless, it remains a challenge to use a deterministic, hence nonlinear, neuron model to solve
Theory of synaptic plasticity
813
the learning Eq. (32). It may well be that neuronal dynamics in conjunction with synaptic dynamics is an insoluble problem, but the prospect of discovering a full theoretical understanding of both long- and short-term synaptic plasticity is a prize worthy of the attempt.
Abbreviations Ca 2+, Calcium ion CPU, Central Processing Unit EPSP, Excitatory Post-Synaptic Potential IPSP, Inhibitory Post-Synaptic Potential ITD, Interaural Time Difference kHz, kilo Herz K +, Potassium ion LTP, Long-Term Potentiation Mg 2+, Magnesium ion ms, millisecond nm, nanometer Na+, Sodium ion NMDA, N-methyl-D-aspartate PGO, ponto-geniculo-occipital PSP, Post-Synaptic Potential REM, Rapid Eye Movement SRM, Spike Response Model STD, Short-Term Depression STF, Short-Term Facilitation V, Membrane Potential las, micro-second
Acknowledgments It is a great pleasure to thank Wulfram Gerstner, Richard Kempter, and Werner Kistler for an enjoyable collaboration over the years and for all I have learned from them. I also thank Philip Brownell, Moritz Franosch, Richard Kempter, Christian Leibold, and Heather Read for a critical reading of parts of the manuscript and helpful suggestions; it is the author who is to be blamed for the remaining errors. Finally, I gratefully acknowledge constructive discussions with Frank den Hollander and Reinhard Lang concerning simple but optimal formulations of laws of large numbers.
Appendix A. Laws of large numbers The textbook by Durrett [98] is a general, though advanced, background for various formulations of the laws of large numbers listed below. To begin with, let us suppose that the j~ are independent, identically distributed random variables with mean zero.
814
J . L . van H e m m e n
If the mean (f) is nonzero, we subtract it and consider Z := f - (f) instead. There is no harm in taking the ~ to be real variables. Furthermore, we require the second moment (f2) to be finite. By Cauchy-Schwarz, (]fl) ~< (f2)1/2 < oe, and the variance o2 ._ ( ( f _ (f))2) is finite too. Let
&-~j5 i=1
be the sum of the random variables f.. Then the following three theorems hold: 9 Strong law of large numbers: l i m , _ ~ n-lSn = 0 with probability 1. Since the fare sampled from a probability distribution, this means that, as n ~ oc, the configurations where the above equality does not hold have probability zero. In plain English, they do not occur. One also says that the above equality holds 'almost surely' (a.s.). All that is needed is (Ifl) < oc. 9 Central limit theorem: As n ~ ec, n-1/Zgn has a Gaussian distribution with mean zero and variance cy2. 9 Law of the iterated logarithm: lim sup ]&] = 1 ,~ cyx/2 n In Inn
(a.s.).
Etemadi [99] has given an "elementary" proof of the strong law of large numbers for pairwise independent, identically distributed random variables under the minimal condition (If I) < ~ . Slick proofs (occasionally with some extra conditions, say, finite fourth moment) have been given by Lamperti [11]. Breiman [12] treats the first two theorems in their full generality. The law of the iterated logarithm is an extension of the central limit theorem. Its proof is tricky. All three theorems also hold for independent, not necessarily identically distributed random variables [11,12,101]. The first two even allow a weak dependence. For example, let Rij := (f,-fj) - ~ ) ( f j ) , and suppose the f- do not have too wide a distribution, e.g., supi]R, I < ec. Then the strong law of large numbers holds [96, p. 265; 102], provided Rij ~ 0 as ]i - Jl ---* oe; that is to say, the correlations between f. and s should not have too long a range. For the central limit theorem to hold, trickier conditions are required, e.g., stationarity of the sequence fl,f2, 999and some kind of mixing [98, Chapter 7.7c] so that, say, ~_,j IRij] < oc. Then the variance of the Gaussian limit distribution is given by cr2 -
lim 1
~: ij
k=2
Dropping stationarity, the reader may consult Scott [103] for an advanced account. A generalization of the law of the iterated logarithm to independent but not necessarily identically distributed random variables is this [100, p. 241]. Let cy2 be the variance off~, B 2 - - }--~<,, c~2, and f•/B,, - o ( 1 / v / l n l n B 2 ). Then we have ISnl = 1 lim sup n - ~ B~ v/2 In In B 2
(a.s.).
Theory of synaptic plasticity
815
Appendix B. lnhomogeneous Poisson processes In this appendix, which is identical with Appendix A of Kempter et al. [46] and reproduced here for convenience of the reader, we define and analyze the inhomogeneous Poisson process. This notion has been touched upon by Tuckwell [93, pp. 218-220] and others, e.g., Ash and Gardner [94, pp. 28-29], but neither of them explains the formalism itself or the way of computing expectation values. Since both are used extensively, we treat them here, despite the fact that the issue is considered by Snyder and Miller [95, Sections 2.1-2.3]. Our starting assumptions in handling this problem are the same as those of Gnedenko [96, Section 51] for the homogeneous (uniform) Poisson process but the mathematics is different. Neither does our method resemble the Snyder and Miller approach, which starts from the other end, viz., Eq. (B.11). In the context of theoretical neurobiology an analysis such as the present one, focusing on the local behavior of a process, seems far more natural. We proceed by evaluating the mean and the variance and finish by estimating a third moment that is needed for the Berry-Esseen inequality, which tells us how good a Gaussian approximation to a finite sum of independent random variables is.
B.1. Definitions Let us suppose that a certain event, in our case a spike, occurs at random instances of time. Let N(t) be the number of occurrences of this event up to time t. We suppose that N ( 0 ) = 0, that the probability of getting a single event during the interval It, t + At) with At ~ 0 is Pr{N(t + A t ) - N ( t ) = 1} = k(t)At,
~.~>0
(B.1)
and that the probability of getting two or more events is o(At). Finally, the process has independent increments, i.e., events in disjoint intervals are independent. The stochastic process obeying the above conditions is an inhomogeneous Poisson process. Under conditions on k to be specified below, there are only finitely many events in a finite interval. Hence the process lives on a space E~ of monotonically nondecreasing, piece-wise constant functions on the positive real axis, having finitely many unit jumps in any finite interval. The expectation value corresponding to this inhomogeneous Poisson process is simply an integral with respect to a probability measure !~ on ~, a function space whose existence is guaranteed by the Kolmogorov extension theorem [97, Section 4.4.3]. A specific realization of the process, a function on the positive real axis, is a 'point' o in ~. The discrete events corresponding to co are denoted by tf(co) with f labeling them. As we have seen in Eq. (3), spikes generate postsynaptic potentials ~. We now compute the average, denoted by angular brackets, of the postsynaptic potentials generated by a specific neuron during the time interval [to, t),
816
J.L. van H e m m e n
Here it is understood that tf = tf(co) depends on the realization co and to <~tf(o~) < t. We divide the interval [t0, t) into L subintervals [tt, tl+l) of length At so that at the end At ~ 0 and L ~ ec while LAt = t - to. We now evaluate the integral (B.2) exploiting the fact that ~ is a continuous function. Let --/r <~tf(o~) < tl+l } denote the number of events (spikes) occurring at times t~(o~) in the interval [tl, tt+t) of length At. In the limit At ~ 0 the expectation value (78) can be written (B.3) so that we are left with the Riemann integral
j,}
' ds )~(s)e(t- s).
(B.4)
We spell out why. The function 1]{...} is to be the indicator function of the set {...} in f~; that is, 11{...}(~o) = 1, if o~ c {...} and it vanishes, if co does not belong to {...}. So it 'indicates' where the set {...} lives. With the benefit of hindsight we single out mutually independent sets in f~ with indicators ll{t,~tr(o,)
f
d~t(o~)~-~l{t,<<.tr(o~)
(B.5)
l
Each indicator function in the sum equals "~{tl <~ti(o~)~2} 9
(B.6)
In view of (B.2) and (B.5) we multiply this by ~(t-tl)--/C{tl<<.tf(o3)< tt+l}, interchange integration and summation in (B.5), and integrate with respect to ~t. The first term on the right contributes nothing, the second gives e ( t - tl)X(tt)At and thus produces a term in the Riemann sum leading to (B.4), and the last term can be neglected since it is of order o(At). So the eating of the pudding is that only a single event in the interval [tl, tt+l) counts as At ~ 0. Since e(t) is a function which decreases at least exponentially fast as t ---, oo there is no harm in taking to = - ~ .
B.2. Second moment and variance It is time to compute the second moment
I [Z s(t - tf )]2I" t/.
(B.7)
Theory of synaptic plasticity
(Z
817
,)/)
tr,t)
dg(o,~)Z
~{'l~tf(O))
t;(O)))
l,m
-- tC~z[;L(h)At)~(tz)At]a(t- h ) a ( t - tin) + =
=
i'i'
dtl dt2 ;L(tl ) ~ ( t 2 ) ~ ( t
ds)~(s)a(t - s)
+
s
-- t l ) a ( t - t2) +
dg(m) ~
i'
l
1]2{tl
ds ;L(s)a2(t
-- S)
ds ;L(s)aZ(t - s).
(B.8)
Hence the variance is the last term on the right in (B.8). It is a simple exercise to verify that, when )~(t) - )~ and a(t) - 1 in (B.4) and (B.8), we regain the mean and variance of the usual Poisson distribution [96, Section 51]. We finish the argument by computing the probability of getting k events in the interval [to, t). For the usual, homogeneous Poisson process it is Pr{N(t) - N(to) - k} - e x p [ - ) @ - to)]-
[)~(t- to)]k k! "
(B.9)
We now break up the interval [to, t) into many subintervals [zl, Zi+l) of length At and condition with respect to the first, second,.., arrival. The arrivals come one after the other and the probability of a specific sequence of events in [4,tl + At), [t2, t2 + At),..., [tk, tk + At) is made up of elementary events such as Pr{first spike in[tl, tl + At)} - Pr{no spike in[to, tl)}nr{spike in[h, tl + At)} = [1 - ;L(Zl)At][1 - ~L(z2)At]... [1 - ;L(tl -- At)At])~(tl)At = e x p [ - ~ '~ dz ~,(z)] E(tl)At.
(B.10)
Here we have exploited the independent-increments property and taken the limit At --, 0 to obtain the last equality. Repeating the above argument for the following events, including the no-event tail in [tk + At, t), multiplying the probabilities, and summing over all possible realizations we find Pr{N(t) - N(to) - k} = e x p [ - ~ t ] t dz)~(z)] ~] t dtk)~(tk)...fto t3 dt2;k(t2) fto t2 dtl;k(tl)
t =exp[-jft0
1
'
dz;L(z)] "~.[/t0 as
Z(s)qj k.
(B.11)
818
J.L. van Hemmen
In other words, N(t) - N(to) has a Poisson distribution with parameter ftto ds k(s). If ~,(s) = X, one regains (B.9). We now see two things. First, the appropriate condition on ~ is that it be locally integrable. Then P r { N ( t ) - N(to) < zx~) = 1 as the sum of (B.11) over all finite k adds up to one. Furthermore, N(t) - N(t') with to < t' < t has a Poisson distribution with parameter f,t, ds k(s). Second, by rescaling time through t " - f ds X(s) one obtains [93,94] a homogeneous Poisson process with parameter ~, = 1. This also follows more directly from (77). It is of no practical help, though. For instance, in the case of the barn owl, k(t) is taken to be a periodic function of t, with the period determined by external sound input. The cochlea produces a whole range of frequency inputs whereas time can be rescaled only once.
B.3. Berry-Esseen estimate Eq. (3) tells us that the neuronal input is a sum of independent, though not necessarily identically distributed, r a n d o m variables corresponding to 'neighboring' neurons j. Neither independence nor a c o m m o n distribution is necessary but both are quite convenient. The point is that, according to the central limit theorem (cf. Appendix A), a sum of N independent r a n d o m variables 7 has a Gaussian distribution as N ~ ~ . In our case N is definitely finite, so the question is: How good is the Gaussian approximation? The answer is provided by a classical, and remarkable, result of Berry and Esseen [11, Section 15]. We first formulate the Berry-Esseen result. Let X 1 , X 2 , . . . be independent random variables with a c o m m o n distribution having variance (y2 and finite third moment. Furthermore, let S N -- ~"~N_ 1 (Yj - (Yj-)) be the total input, the Xj stemming from neighboring neurons j as given by the right-hand side of (5) with N as the number of synapses, and let Y~ be a Gaussian with mean 0 and variance cr2. Then there is a constant (2rt) -1/2 ~
~
~<x
- Pr{Y~<x} ~<
~3v/_~
.
(B.12)
In the present case, ~2 directly follows from (B.8). Computing (IX1 - (X1)]3) is a bit nasty but it is simpler, and also providing more insight, to directly estimate the third m o m e n t by Cauchy-Schwartz so as to get rid of the absolute value, (IX1 -- (gl)] 3) ~ ((gl - (gl))2)l/2((gl
-
( g l ) ) 4 ) 1/2.
(B.13)
The first term on the right equals cy, the second is given by t
( ( X l -- (XI)) 4) --
7
L
ds)~(s)a4(t_s) + 3cy4,
(B.14)
ThisN directly corresponds with the number of synapses that provide the neuronal input. There is no need to confuse it with the stochastic variable N(t) of the previous section.
Theory of synaptic plasticity
819
where (y2 __ ~t~ ds)v(s)g2(t- S). Collecting terms we can estimate the right-hand side of (B.12), the precision of the Gaussian approximation being determined by 1/x/-N as N becomes large. References 1. Sherrington, C.S. (1897) in: Textbook of Physiology, ed M. Foster. p. 60. 2. Shepherd, G.M. and Erulkar, S.D., Centenary of the synapse: from Sherrington to the molecular biology of the synapse and beyond. Trends Neurosci. 20, 385-392. 3. Shepherd, G.M. ed (1998) The Synaptic Organization of the Brain, 4th Edn. Oxford University Press, New York. 4. Koch, C. (1999) Biophysics of Computation: Information Processing in Single Neurons. Oxford University Press, New York. 5. Abbott, L. and Sejnowski, T.J. eds (1999) Neural Codes and Distributed Representations. MIT Press, Cambridge, MA. 6. Sejnowski, T.J. (1977) Storing covariance with nonlinearly interacting neurons. J. Math. Biol. 4, 303-321; Statistical constraints on synaptic plasticity. J. Theor. Biol. 69, 385-389. 7. Jessell, T.M. and Kandel, E.R. (1993) Synaptic transmission: a bidirectional and self-modifiable form of cell-cell communication. Neuron 10 (Suppl.), 1-30. 8. Faber, D.S., Korn, H., Redman, S.J., Thompson, S.M. and Altman, J.S. eds (1998) Central Synapses: Quantal Mechanisms and Plasticity. HFSP, Strasbourg. 9. Braitenberg, V. and Schlitz, A. (1991) Anatomy of the Cortex, Springer, Berlin. 10. Ram6n y Cajal, S. (1911) Histologie du Syst~me Nerveux de l'Homme et des Vertebras. Maloine, Paris. This is L. Azoulay's definitive French translation of the Spanish original. There is meanwhile also an English translation by N. & L.W. Swanson (Oxford University Press, 1995), based on the French one. 11. Lamperti, J. (1966) Probability, Benjamin, New York; a 2nd Edn. (1996) appeared with Wiley, New York. 12. Breiman, L. (1968) Probability, Addison-Wesley, Reading, MA; a 2nd Edn. appeared with SIAM, Philadelphia, PA (1996). Lamperti and Breiman are both classics, the latter being much more advanced. 13. Levitan, I.B. and Kaczmarek, L.K. (1997) The Neuron: Cell and Molecular Biology, 2nd Edn. Oxford University Press, Oxford. 14. Kandel, E.R., Schwartz, J.H. and Jessell, T.M. (2000) Principles of Neural Science, 4th Edn. McGraw-Hill, New York; see, in particular, Ch. 14 on neurotransmitter release. 15. Faber, D.S. and Korn, H. eds (1978) Neurobiology of the Mauthner Cell, Raven Press, New York. 16. Korn, H. and Faber, D.S. (1996) Escape behavior-brainstem and spinal cord circuitry and function. Curr. Opinion Neurobiol. 6, 826-832. 17. Kruk, P.J., Korn, H. and Faber, D.S. (1997) The effects of geometrical parameters on synaptic transmission: a Monte Carlo simulation study. Biophys. J. 73, 2874-2890, and references quoted therein. 18. Engert, F. and Bonhoeffer, T. (1999) Dendritic spine changes associated with hippocampal long-term synaptic plasticity. Nature 399, 66-70; Bonhoeffer, T., private communication. 19. Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. (1986) Numerical Recipes. Cambridge University Press. There are numerous editions, also for different programming languages. 20. White, J.A., Rubinstein, J.T. and Kay, A.R. (2000) Channel noise in neurons. Trends Neurosci. 23, 131-137. 21. Stuart, G., Spruston, N., Sakmann, B. and HS.usser, M. (1997) Action potential initiation and backpropagation in neurons of the mammalian CNS. Trends Neurosci. 20, 125-131. 22. Hille, B. (1992) Ionic Channels of Excitable Membranes, 2nd Edn. Sinauer, Sunderland, MA. 23. Keener, J. and Sneyd, J. (1998) Mathematical Physiology. Springer, New York.
820
J.L. van H e m m e n
24. Cronin, J. (1987) Mathematical Aspects of Hodgkin-Huxley Neural Theory, Cambridge University Press, Cambridge. 25. Gerstner, W. and van Hemmen, J.L. (1992) Associative memory in a network of 'spiking' neurons. Network: Comput. Neural Syst. 3, 139-164. 26. Kistler, W., Gerstner, W. and van Hemmen, J.L. (1997) Reduction of the Hodgkin-Huxley equations to a single-variable threshold model. Neural Comput. 9, 1015-1045. 27. Gerstner, W. (2000) A Framework for Spiking Neuron Models: The Spike Response Model, Chapter 12 in this book. 28. Hebb, D.O. (1949) The Organization of Behavior. Wiley, New York. 29. Orbach, J. (1998) The Neuropsychological Theories of Lashley and Hebb. University Press of America, Lanham, MD. This book gives a fascinating account of the indirect interaction between Hebb and K.S. Lashley, his thesis advisor. 30. Herz, A.V.M., Sulzer, B., Kfihn, R. and van Hemmen, J.L. (1988) The Hebb rule: storing static and dynamic objects in an associative neural network. Europhys. Lett. 7, 663-669. 31. Herz, A.V.M., Sulzer, B., Kfihn, R. and van Hemmen, J.L. (1989) Hebbian learning reconsidered: Representation of static and dynamic objects in associative neural nets. Biol. Cybern. 60, 457-467. 32. Van Hemmen, J.L., Gerstner, W., Herz, A.V.M., Kfihn, R. and Vaas, M. (1990) Encoding and decoding of patterns which are correlated in space and time. in: Konnektionismus in Artificial Intelligence und Kognitionsforschung, ed. G. Dorffner. pp. 153-162, Springer, Berlin. 33. Hopfield, J.J. (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA. 79, 2554--2558. 34. Amit, D.J., Gutfreund, H. and Sompolinsky, H. (1985) Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Lett. 55, 1530-1533. 35. Amit, D.J., Gutfreund, H. and Sompolinsky, H. (1987) Statistical mechanics of neural networks near saturation. Ann. Phys. (NY) 173, 30-67. 36. Van Hemmen, J.L. and KiJhn, R. (1991) Collective phenomena in neural networks, in: Models of Neural Networks I, eds E. Domany, J.L. van Hemmen, and K. Schulten. pp. 1-105, Springer, Berlin; 2nd Edn. (1995), pp. 1-113. 37. Kiihn, R. and van Hemmen, J.L. (1991) Temporal association, in: Models of Neural Networks I, eds E. Domany, J.L. van Hemmen, and K. Schulten. pp. 213-280, Springer, Berlin; 2nd Edn. (1995), pp. 221-288. 38. Hobson J.A. and McCarley R.W. (1977) The brain as a dream state generator: an activationsynthesis hypothesis of the dream process. Am. J. Psychiatry 134, 1335-1348. 39. Crick, F. and Mitchison, G. (1983) The function of dream sleep. Nature 304, 111-114. 40. Hopfield, J.J., Feinstein, D.I. and Palmer, R.G. (1983) 'Unlearning' has a stabilizing effect in collective memories. Nature 304, 158-159. 41. Van Hemmen, J.L. (1997) Hebbian learning, its correlation catastrophe, and unlearning. Network: Comput. Neural Syst. 8, VI-V17, and 9 (1998) 153. 42. Gerstner, W., Ritz, R. and van Hemmen, J.L. (1993) Why spikes? Hebbian learning and retrieval of time-resolved excitation patterns. Biol. Cybern. 69, 503-515. 43. Ritz, R., Gerstner, W., Fuentes, U. and van Hemmen, J.L. (1994) A biologically motivated and analytically soluble model of collective oscillations in the cortex: II. Application to binding and pattern segmentation. Biol. Cybern. 71, 349-358. 44. Carr, C.E. and Konishi, M. (1990) A circuit for detection of interaural time differences in the brain stem of the barn owl. J. Neurosci. 10, 3227-3246. 45. Gerstner, W., Kempter, R., van Hemmen, J.L. and Wagner, H. (1996) A neuronal learning rule for sub-millisecond temporal coding. Nature 383, 76-78. 46. Kempter, R., Gerstner, W., van Hemmen, J.L. and Wagner, H. (1998) Extracting oscillations: Neuronal coincidence detection with noisy periodic spike input. Neural Comput. 10, 1987-2017. 47. Kempter, R., Gerstner, W. and van Hemmen, J.L. (1999) Hebbian learning and spiking neurons. Phys. Rev. E 59, 4498-4514. Several considerations in Sections 4 and 5 stem from this paper. 48. Markram, H., Liibke, J., Frotscher, M. and Sakmann, B. (1997) Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275, 213-215.
Theory of synaptic plasticity
821
49. Heerema, M. and van Leeuwen, W.A. (1999) Derivation of Hebb's rule J. Phys. A: Math. Gen. 32, 263-286. 50. Konishi, M. (1993) Listening with two ears. Sci. Am. 268 (4), 34-41. 51. Buonomano, D.V. and Merzenich, M.M. (1998) Cortical plasticity: from synapses to maps. Annu. Rev. Neurosci. 21, 149-186. 52. Fitzsimonds, R.M. and Poo, M.-M. (1998) Retrograde signaling in the development and modification of synapses. Physiol. Rev. 78, 143-170. 53. Urban, N.N. and Barrionuevo, G. (1996) Induction of Hebbian and non-Hebbian mossy fiber longterm potentiation by distinct patterns of high-frequency stimulation. J. Neurosci. 16, 4293-4299. 54. Christofi, G., Nowicky, A.V., Bolsover, S.R. and Bindman, L.J. (1993) The postsynaptic induction of nonassociative long-term depression of excitatory synaptic transmission in rat hippocampa! slices. J. Neurophysiol. 69, 219-229. 55. Zhang, L.I., Tao, H.W., Holt, C.E., Harris, W.A. and Poo, M.-m. (1998) A critical window for cooperation and competition among developing retinotectal synapses. Nature 395, 37-44. 56. Bi, G.-Q. and Poo, M.-M. (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18, 10464-10472. 57. Debanne, D., G/ihwiler, B.H. and Thompson, S.M. (1998) Long-term synaptic plasticity between pairs of individual CA3 pyramidal cells in rat hippocampal slice cultures. J. Physiol. 507, 237-247. 58. Bi, G.-Q. and Poo, M.-M. (1999) Distributed synaptic modification in neural networks induced by patterned stimulation. Nature, 401, 792-796. 59. Linden, D.J. (1999) The return of the spike: postsynaptic action potentials and the induction of LTP and LTD. Neuron 22, 661-666. This is a nice review of the neurobiology underlying the present theory of time-resolved Hebbian learning. 60. Caillard, O., Ben-Ari, Y. and Gaiarsa, J.-L. (1999) Mechanisms of Induction and Expression of Long-Term Depression at GABAergic Synapses in the Neonatal Rat Hippocampus. J. Neurosci. 19, 7568-7577. 61. Sanders, J.A. and Verhulst, F. (1985) Averaging Methods in Nonlinear Dynamical Systems. Springer, New York. 62. Verhulst, F. (1996) Nonlinear Differential Equations and Dynamical Systems, 2nd Edn. Springer, Berlin. Chapter 11 presents an excellent introduction to the method of averaging. 63. Kistler, W.M. and van Hemmen, J.L. (2000) Modeling synaptic plasticity in conjunction with the timing of pre- and postsynaptic action potentials. Neural Comput. 12, 385-405. 64. Leibold, C., Kempter, R. and van Hemmen, J.L. (2000) How synapses of spiking neurons learn a temporal-feature map. Preprint TUM, Garching. 65. Markram, H. and Tsodyks, M. (1996) Redistribution of synaptic efficacy between neocortical pyramidal neurons. Nature 382, 807-810. 66. Tsodyks, M. and Markram, H. (1997) The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc. Natl. Acad. Sci. USA. 94, 719-723. 67. Abbott, L.F., Varela, J.A., Sen, K. and Nelson, S.B. (1997) Synaptic depression and cortical gain control. Science 275, 220-224. 68. Varela, J.A., Sen, K., Gibson, J., Fost, J., Abbott, L.F. and Nelson, S.B. (1997) A quantitative description of short-term plasticity at excitatory synapses in layer 2/3 of rat primary visual cortex. J. Neurosci. 17, 7926-7940. 69. Senn, W., Segev, I. and Tsodyks, M. (1998) Reading neuronal synchrony with depressing synapses. Neural Comput. 10, 815-819. 70. Senn, W., Tsodyks, M. and Markram, H. (2001) An algorithm for modifying neurotransmitter release probability based on pre- and post-synaptic spike timing. Neural Comput. 13, 35-67. 71. Tabak, J., Senn, W., O'Donovan, M.J. and Rinzel, J. (2000) Modeling of spontaneous activity in developing spinal cord using activity-dependent depression in an excitatory network. J. Neurosci. 20, 3041-3056. 72. Zucker, R.S. (1989) Short-term synaptic plasticity. Annu. Rev. Neurosci. 12, 13-31. 73. Liley, A.W. and North, K.A.K. (1953) An electrical investigation of effects of repetitive stimulation on mammalian neuromuscular junctions. J. Neurophysiol. 16, 509-527.
822
J . L . van H e m m e n
74. Magleby, K.L. and Zengel, J.E. (1975) A quantitative description of tetanic and post-tetanic potentiation of transmitter release at the frog neuromuscular junction. J. Neurophysiol. 245, 183-208. 75. Gerstner, W. and van Hemmen, J.L. (1993) Coherence and incoherence in a globally coupled ensemble of pulse emitting units. Phys. Rev. Lett. 71, 312-315. 76. Gerstner, W., van Hemmen, J.L. and Cowan, J.D. (1993) What matters in neuronal locking? Neural Comput. 8, 1689-1712. 77. Kistler, W.M. and van Hemmen, J.L. (1999) Short-term synaptic plasticity and network behavior. Neural Comput. 11, 1579-1594. 78. Hale, J.K. and Koqak, H. (1991) Dynamics and Bifurcations. Springer, Berlin; Section 8.6 gives a simple introduction. 79. Hale, J.K. (1963) Oscillations in Nonlinear Systems. McGraw-Hill, New York; republished by Dover, Mineola, NY, 1992. 80. Linsker, R. (1986) From basic network principles to neural architecture: emergence of spatialopponent cells. Proc. Natl. Acad. Sci. USA. 83, 7508-7512. 81. Sejnowski, T.J. and Tesauro, G. (1989) The Hebb rule for synaptic plasticity: algorithms and implementations, in: Neural Models of Plasticity: Experimental and Theoretical Approaches. Chapter 6, eds. J.H. Byrne, and W.O. Berry. pp. 94-103, Academic Press, San Diego, CA. 82. Kohonen, T. (1984) Self-Organization and Associative Memory, Springer, Berlin. 83. Carr, C.E. (1993) Processing of temporal information in the brain. Annu. Rev. Neurosci. 16, 223-243. This wonderful review has been written with great didactic care. It is strongly recommended. 84. MacKay, D.J.C. and Miller, K.D. (1990) Analysis of Linsker's application of Hebbian rules to linear networks. Network: Comput. Neural Syst. 1, 257-297. 85. Miller, K.D. and MacKay, D.J.C. (1994) The role of constraints in Hebbian learning. Neural Comput. 6, 100-126. 86. Kempter, R., Gerstner, W. and van Hemmen, J.L. (2000) Intrinsic rate normalization in spike-based and rate-based Hebbian learning. Neural Comput., Submitted. 87. Bellman, R. (1970) Introduction to Matrix Analysis, 2nd Edn. McGraw-Hill, New York. 88. Merzbacher, E. (1961) Quantum Mechanics, Wiley, New York. The chapter on the spin, where one can find everything on Pauli spin matrices, is strongly recommended reading. 89. Dobrunz, L.E. and Stevens, C.F. (1997) Heterogeneity of release probability, facilitation, and depletion at central synapses. Neuron 18, 995-1008. This paper contains a nice appendix on underlying stochastic notions such as the relation between the readily releasable pool size and the release probability. 90. Dobrunz, L.E., Huang, E.P. and Stevens, C.F. (1997) Very short-term plasticity in hippocampal synapses. Proc. Natl. Acad. Sci. USA. 94, 14843-14847. 91. Dobrunz, L.E. and Stevens, C.F. (1999) Response of hippocampal synapses to natural stimulus patterns. Neuron 22, 157-166. 92. Buonomano, B.V. (2000) Decoding temporal information: A model based on short-term synaptic plasticity. J. Neurosci. 20, 1129-1141. 93. Tuckwell, H.C. (1988) Introduction to Theoretical Neurobiology, Vol. 2. Cambridge University Press, New York. 94. Ash, R.B. and Gardner, M.F. (1975) Topics in Stochastic Processes. Academic press, New York. 95. Snyder, D.L. and Miller, M.I. (1991) Random Point Processes in Time and Space, 2nd Edn. Springer, New York. 96. Gnedenko, B.V. (1968) The Theory of Probability, 4th Edn. Chelsea, New York; p. 265 (Bernstein's theorem, for Appendix A) and Section 51 (for Appendix B). 97. Ash, R.B. (1972) Real Analysis and Probability. Academic Press, New York. 98. Durrett, R. (1996) Probability: Theory and Examples, 2nd Edn. Duxbury Press, Belmont, MA. 99. Etemadi, N. (1981) An elementary proof of the strong law of large numbers. Z. Wahrscheinlichkeitstheorie verw. Geb. 55, 119-122. 100. Prohorov, Yu.V. and Rozanov, Yu.A. (1969) Probability Theory. Springer, Berlin.
Theory of synaptic plasticity
823
101. Gnedenko, B.V. and Kolmogorov, A.N. (1968) Limit Distributions for Sums of Independent Random Variables, 2nd Edn. Addison-Wesley, Reading, MA. 102. Halmos, P.R. (1956) Ergodic Theory, Chelsea, New York. For stationary processes there is a huge literature under the name "ergodic theorem". The theorem asserts that limn-~o~n-lSn exists with probability one. Only if the process is ergodic do we get the answer 0 a.e. Halmos gives a succinct and elegant introduction; see also Khinchin, A.I. (1949) Mathematical foundations of Statistical Mechanics, Dover, New York, for excellent physical background information. 103. Scott, D.J. (1973) Central limit theorems for martingales and for processes with stationary independent increments using the Skorohod representation approach. Adv. Appl. Probab. 5, 119-137.
This Page Intentionally Left Blank
C H A P T E R 19
Information Coding in Higher Sensory and Memory Areas A. TREVES SISSA, Cognitive Neuroscience, Trieste, Italy
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
825
Contents
1.
U n d e r s t a n d i n g neural codes requires information measures . . . . . . . . . . . . . . . . . .
2.
Sampling with limited populations, correlates and repetitions . . . . . . . . . . . . . . . . .
828
3.
W h a t code is used by single cells? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
830
4.
Is a neuron conveying information only when it fires?
. . . . . . . . . . . . . . . . . . . .
831
5.
Are neighboring neurons telling the same story?
. . . . . . . . . . . . . . . . . . . . . . .
834
6.
Can the effect of correlations be quantified? . . . . . . . . . . . . . . . . . . . . . . . . . .
837
7.
Synergy and r e d u n d a n c y in large p o p u l a t i o n s . . . . . . . . . . . . . . . . . . . . . . . . .
840
8.
Parameters that m a t t e r in neuronal representations . . . . . . . . . . . . . . . . . . . . . .
842
9.
Quantifying the structure of neuronal representations . . . . . . . . . . . . . . . . . . . . .
844
Abbreviations
849
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Acknowledgements
827
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
849
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
849
826
1. Understanding neural codes requires information measures
How do you communicate? A moderately bright extra-terrestrial (ET), who were to investigate the codes you use, would reasonably conclude that you mainly communicate verbally. Our ET might further describe verbal codes as strings of chunks of variable length, that you appear to call words, which can be uttered as collections of phonemes or else written as nearly isomorph collections of letters, and so on with further details. If however ET were exceptionally bright, or needed to write a grant application on this investigation, it would probably "discover" that in some situations you also communicate a lot just with your grimace, or with the clothes you have chosen to wear, or in a thousand other ways. Neurons are vastly simpler than human beings, but the metaphor is not completely silly, because it illustrates the volatility of the notion of neural codes. Nobody in the right of his or her mind would think that nature has designed a unique way for neurons to communicate, and in fact they interact, or affect each other, in a thousand different ways. In certain specific situations neurons may tell each other a lot with the way they compete for peptides, for example, or with the way they couple in ephaptic interactions. Yet a first understanding of the operation of neural networks in the brain requires that we try to describe the main, usual form (or forms) of communication. We should take the approach of the moderately bright investigator, and leave the discovery of exceptional facts for later on. Further, we should try to quantify how much is communicated in each situation, because only a quantitative comparison allows to assess different codes, especially if they share part of the content of what is being communicated. Information theory [1] has been developed precisely to quantify communication, and is therefore quintessential to an appraisal of neural codes. Applying information theory to neural activity (rather than to the synthetic communication systems for which it was developed) is however riddled with practical problems and subtleties, which must be clarified before reporting experimental results. In this chapter, we do not consider other means of neuronal communication than the emission of action potentials, or spikes, and regard them as self-similar all-ornone events whose only distinctive features are the time of emission and the identity of the emitting neuron. Thus we restrict ourselves to information being represented, across a given population of neurons, by strings {ti,k}, where i = 1 , . . . , C labels the emitting neuron and k indexes successive spikes in a prescribed time window. This is still a rather general and potentially very rich language, which in many situations can be reduced to considerably simpler forms. For example, most of the information might be carried simply by the total number of spikes, ni, emitted by each neuron in the window, irrespective of their timing within the window. Although the spike count is an integer, it becomes a (positive) real number when averaged over several 827
828
A. Treves
repetitions or even when calculated, in more general terms, by convolving the spike train with a given time kernel. Therefore, it is more convenient to consider, instead, the firing rate, r;, which, being divided by the time window, or the integral of the kernel, is also relatively invariant across window lengths for quasi-stationary processes. The extent to which the firing rates of a population of neurons may or may not carry most of the information represented in the complete list of spike emission times is, of course, a question to be addressed experimentally, in any given situation. This has been done, with some success, mostly at the level of single neurons, as will be discussed later. First, we must consider what information we can set out to measure, given the difficulties and subtleties of measuring it in neuronal activity. For simplicity of notation, we shall think of such information as being represented in the rates, although the arguments of the next section apply equally to information represented in the complete list of spike emission times.
2. Sampling with limited populations, correlates and repetitions The activity of a population A of neurons represents both information and noise. The part that can be considered, in general, to contain information, is the part that varies together with something else, such as the activity of another population B, or some external correlate. It is measured by what is usually called mutual information
I({riEA}; {rj~B}) - / II dri j" I-I dryp({ri}' {ry})log2 [L p({ri}, {rj}) .] "
(1)
Note that {rjes} could stand for the activity of the same population A, but at a different time. Or, it could stand for the parameters of some external correlates. Whatever the case, for the present discussion it is useful to consider that neither set of variables may be easily manipulated by the experimenter. Then, to evaluate the transmission of information from B to A (or vice versa: mutual information is symmetric and does not reflect causality) one should let the coupled system do what it normally does, in "ecological conditions", for the very long time needed to sample accurately the joint probability distribution p({ricA}, {rjes}). Since we are usually interested in the way the neural system operates, and not just in individual neurons, ideally we would need to record the activity of all cells in A and B, CA and CB. The time required would be exponential in CA + C8 (times, effectively, the logarithm of the number of discriminable firing levels of each cell), i.e. much longer than the age of this and all preceding universes. Practical limitations on the number of cells that can be recorded simultaneously (a few hundreds, now) make the time required for a single measure less apocaliptic, but still way from affordable. In practice, a direct measure of mutual information has to be based on the recording of the activity of only a handful of cells. This implies that we are forced to hope that those cells are in some sense "typical", but it also forces us to overlook potentially important coding schemes that could only be revealed, quantitatively, by taking into account the simultaneous activity of many neurons. Substituting computer simulations for real
Information coding in higher sensory and memory areas
829
recording experiments only alleviates the constraint by a tiny bit, while an analytical evaluation of mutual information is sometimes possible with formal models of large populations, in which however the result is in-built in the structure of the models. Thus, a first restriction on the applicability of information measures to neural activity is in the size of the population that can be sampled, and effectively in the dimensionality of the codes that can be investigated. A second restriction emerges when considering the content of the information being represented. The content is determined by the set of external (or internal) correlates of the activity in populations A and B. Ideally, they should reproduce the ecological working condition of the neural system being studied. In practice this is hard to do, in the lab, in a reasonable time, also because what this "ecological condition" is may be unknown or ill-determined (except perhaps for peripheral neural systems tightly coupled to specific dimensions of the sensory environment [2]). A common strategy, when studying the CNS, is instead to select a discrete set of elements, 5% representative of interesting correlates, and to quantify the mutual information not between A and B, but between A and 5P
I({rieA}; {S E 5e}) = J
I-I dri
Zp({ri}
iCA
sC~
'
lp({r,})p(s)
(2)
where the capital P denotes a real probability, rather than a probability density. This is conceptually a different quantity, which, since 5P is an object of much reduced complexity than the activity in B, is much easier to measure. In particular, if Y includes S equiprobable elements, its entropy log 2 S will be an upper limit on the mutual information, no matter how large CA, the population of neurons encoding the set of correlates. Eq. (2) quantifies how much the activity of population A allows us to discriminate between elements in 5e. Eq. (1) quantifies how much it tells us about the activity of another population B. One may wonder then, what the measurable quantity in Eq. (2) tells us about the impractical-to-measure quantity of Eq. (1). Curiously, this question seems to have been disregarded in the literature, with the exception of Frolov and Murav'ev [3], who consider two quantities analogous to 1 and 2, which they denote as I2 and 11, and conclude that the total information that can be extracted from neural activity is the sum I1 + I2. Our recent analyses [4] lead to a rather different conclusion. I({riEA};{s Ccp)), far from being a term to be added to I({ricA}; {rj~B}), provides a good estimate of it, at least when many different correlates are used, and few enough cells are sampled that one is far from the regime approaching the saturation value at log 2 S. While the numbers, and the quality of the estimate, depend on the exact details of the network to be analyzed, our result justifies a posteriori the common practice of extracting measures of I({ricA }; {s G 5p}). We shall see later on how, when sampling more than a handful of cells, the common practice is to adopt a further simplification along this path, and to extract yet another, distinct information measure, the information about 5p recovered by a decoding procedure. The third major sampling limitation with information measures is intimately (but inversely) related to the other two, and touches directly on a core concern of any
830
A. Treves
scientific measure, that of reproducibility. It is the limitation arising from the limited availability of repetitions of the same observation. Mutual information measures, as they depend on the joint probability of two variables, always require many repetitions. To sample adequately a set of S elements and a firing rate vector which can take of the order of R = (max spikes per cell) c values, one needs of the order of S x R repetitions [5]. In recording experiments, especially in the CNS of mammals, this requirement is difficult to meet. Since mutual information depends nonlinearly on joint probabilities, a measure based on insufficient repetitions is not only imprecise, but also, in principle, biased, that is affected by systematic errors. Usually the procedure is to simply substitute observed frequencies for the underlying probabilities (a so-called "frequentist" approach), and usually the effect of undersampling the joint probability (upstairs in the logarithm of Eq. (2)) is much more serious than that of undersampling its marginals (downstairs in the log). Since mutual information is supralinear in the joint probability, its undersampling typically leads to an upward bias, or mean (systematic) error in the measure. Various techniques [6,7] have been developed to estimate and subtract, or otherwise neutralize, this bias, but their limited efficacy makes limited repetition sampling the most stringent constraint, in practice, on measuring information carried by neural activity in the CNS of mammals. Since the problem is exacerbated when many cells and large sets of correlates are considered, the most reliable measures so far have been obtained with very limited sets of correlates and at the single cell level, and it is to these that we turn next.
3. What code is used by single cells? Animals interact, via their sensory and motor nervous systems, with a continuously changing world, and it is obvious that also activity in their central nervous system should reflect, in the time dimension, this continuous change. Some investigators have been curiously excited by finding evidence of such strict coupling between the CNS and the outside world. Richmond, Optican and colleagues, instead, have addressed the question of temporal coding, at the single cell level, in the correct conceptual framework. They have asked whether individual neurons in the CNS make nontrivial use of the time dimension, by recording the responses of cortical visual neurons to static visual stimuli. A stimulus that is, after its sudden onset, constant in time, may elicit in a given neuron activity that varies in time only in a generic fashion [8], or that varies in time in a way specific to the stimulus itself. In the latter case, the neuron has used time to code something about the static stimulus, something which was not its time dependence. Quantitatively, this would appear as mutual information, between the stimulus used and a descriptor of the response that is sensitive to the timing of spikes, much higher than the information present in a descriptor insensitive to spike timing, like the firing rate or spike count. Note that the quantitative difference would have to be substantial, because a higher-dimensional descriptor will always be able to convey something that any single, prescribed lowdimensional descriptor misses out.
Information coding in higher sensory and memory areas
831
The approach taken by Optican and Richmond [9] quickly gained acceptance as a sound basis for revealing temporal codes, and their claim that the time course of single neuron activity carries between 2 and 3 times more information than the spike count had a considerable impact. It was unfortunate for them to discover, in the following years, that this early result was entirely an artifact of the limited number of trials per stimulus they had used. Limited sampling affects differentially the information extracted from descriptors of different dimensionality, and with the time course descriptor it resulted in an upward bias much larger than with the spike count. Having introduced some form of correction for limited sampling [10], the evidence for temporal coding weakened and eventually all but evaporated [11-13]. A replication with a similar experiment in Edmund Rolls' lab [14] has further suggested that part of the residual difference in mutual information could be due to differential onset latency, which could still be called temporal coding, but of a less interesting nature. To date, no report has appeared that demonstrates substantial nontrivial usage of time by single cortical neurons [15,16]. The one apparent exception is the socalled phase precession in rat hippocampal place cells [17]. The firing of these (principal) cells is modulated by the Theta rhythm, which is expressed mainly in the firing of local interneurons. When a rat runs through the place field of a given cell, this cell tends to fire towards the end of a Theta period as it enters its field, and progressively earlier in phase as it goes through it. The effect can be understood as a simple emergent property, whereby a cell that needs more recurrent activation to complement a weakish afferent input, tends to fire later than cells with a stronger extrinsic drive [18]. On a linear track, place fields tend to be directional, that is to be associated with only one of the two directions in which the field can be traversed. Therefore, the phase of firing can be used to extract some additional information on the exact location of the rat, on top of what is available from, say, the number of spikes emitted over a Theta period. In an open field, however, in which the rat can traverse the same place field along an arbitrary trajectory, and elicit firing in the same cell, the phase information cannot be used for absolute localization, independent of the trajectory, and the postulated temporal coding through phase precession reveals itself as a mere epiphenomenon [19]. Nevertheless, the body of experiments addressing temporal coding at the single cell level has stimulated the development of information extraction procedures, among them those addressing limited sampling [6,7,10,12,20], that turn out to be crucial also in measuring the information conveyed by populations of cells.
4. Is a neuron conveying information only when it fires?
The intuition of many neurophysiologists is that central neurons transmit information simply when they fire. In the extreme, a spike is regarded as a quantum of information, and even confused with a bit (which in fact is just a unit, and implies no quantization at all). No matter how crude, this intuition is reinforced by the lack of evidence for sophisticated coding schemes: cortical neurons appear uninterested in
A. Treves
832
the game of transcribing stationary signals into fancy temporal waveforms. Yet, one could think of other nontrivial coding schemes, which do not involve the time dimension, but just clever manipulation, by the neuron, of its conditional firing probability. For example, certain connectionist models assume that a unit active at its maximum level reports the presence of its own preferred correlate (e.g., the sight of one's grandmother), while any intermediate level of activation would be elicited by other correlates, with partially shared attributes [21]. A neuron behaving according to such model might be expected to reliably fire at top rate, say 10 spikes in 100 ms, when detecting the grandmother, and to fire between 0 and 9 spikes when detecting other senior ladies. Each of them in some of the repetitions of the experiment may resemble more the true grandmother, and thus evoke more spikes, than in other repetitions. Then P(r[granma) would be strongly peaked at 100 Hz, while P(r]lady X) would be more broadly distributed between 0 and 90 Hz. The meaning of, say, 3 spikes in close succession would be rather different depending on whether there are 7 more close by, or just 6. Nothing of this sort has ever been observed with neurons. Neurons appear to use spikes in a simple-minded fashion. Moreover, neurons can be as informative when they fail to fire as they are when they do fire. Roughly speaking, the only information they provide is in the extent to which their current firing level is above or below their mean firing level. One way to confirm this is to compute the quantity
(3)
I1 (s) -- f drp(rts ) log~ p(rls) - p(r)
which depends on the probability of each firing rate conditional on the correlate s and, when averaged over correlates, yields the mutual information. We have mistakenly called this quantity 'information per stimulus', or 'stimulus-specific information' [22], along with others. Recently DeWeese and Meister [23] have correctly pointed out that I1 (s) is not additive, as any information quantity should be, while the similar quantity I2(s) - / d r p ( r l s ) l o g z p ( r l s
) - f drp(r) log2 p(r)
(4)
in fact is additive. I2(s) also averages to the mutual information, which is positive definite, but as a function of s I2(s) takes also negative values. I1 (s) is not additive but positive definite, and should be called the 'stimulus-specific surprise', as proposed by DeWeese and Meister [23]. In any case, the interest in I1 (s) is not so much in quantifying information, but rather in illustrating the simplicity of the firing rate code. This can be appreciated by first taking the limit of a very brief time window At [24]. In such a window the cell may emit at most a single spike, with probability rsAt = At f drp(rls)r; and 11(s) reduces to its limit, the 'stimulus-specific surprise per spike' Z(s) = (1/~) dI1 (s)/dt. Relative to the overall mean rate ~ = ~sP(s)rs, g(s) is a universal curve, 1
Z(X) - x log 2 x + ,--vz-~_ ~ (1 - x), |ogz
(5)
833
Information coding in higher sensory and memory areas
where x = r~./e (see Fig. 1 and Ref. [25]). This universality is intimately related to the availability of a single 'symbol', the spike, in the neural alphabet, at least in the limit of short times, when the emission of more spikes has negligible probability. Over a longer time window, instead, I1 (s) is not constrained to follow the universal X(s) curve, and a departure from it could reveal a more sophisticated code. For example, DeWeese and Meister [23] remind us that for an optimal code that saturates the channel capacity, the specific surprise should be constant across different correlates. This is far from what has been observed in the very few cases when this issue has been probed. The specific surprise appears to follow the universal curve [22,26,27], indicating that the firing rate code is likely to remain as simple as it is forced to be for short times. In agreement with this, typically firing rates elicited by repetitions of the same stimulus, or correlate, have a variance monotonic in the mean rate r~, and a simple distribution around the mean, between Poisson and normal, again not hinting at any clever manipulation of the conditional probabilities P(rls) [28]. Related evidence, though not in terms of conditional probabilities, comes from the observation of spike count distributions produced by cortical neurons in their normal operating regime. It has been suggested by Levy and Baxter [29] that an exponential spike count distribution would reveal optimal coding, subject to a metabolic constraint on the energy consumption associated with each spike. This would be an example of a clever design principle implemented in the brain. An attempt to search for such exponential distributions by subjecting visual neurons to more or less ecological stimulation has only shown exponential tails [30], not fully exponential distributions, while it has been shown that the observed distributions
I
I
3.5
,,.-.,,
2.5
..Q v fat)
1.5 O3
0.5
-0.5
0
I
I
20
40
I
I
60 80 Firing rate (Hz)
I
100
20
Fig. 1. The stimulus-specific surprise from real data follows the universal curve valid in the t ~ 0 limit. Real data from an inferior temporal cortex cell responding to 20 face stimuli over 500 ms [22]. The curve is the surprise rate, expressed as bits per 100 ms, for a mean firing rate of 50 Hz. The main difference between limit curve and real data is just a rescaling, roughly by the factor 5 implicit in the graph.
834
A. Treves
can be explained as the result of an elementary random process [31], which has nothing to do with optimizing the neural code. Currently available evidence on single neurons thus indicates that the simple neurophysiologists' intuition is, essentially, accurate. Cortical neurons appear not only unable to make creative use of time, but also unable to alter the mapping between the input they receive and the spikes they produce, on the basis of any coding optimization principle. If this is correct, the equivalent of the old 'tuning curve', that is the distribution of mean firing rates to each correlate, is all that is necessary to characterize adequately the activity of a single cortical neuron. If the relevant correlates are simple one-dimensional parameters, such as orientation in V1, then the tuning curve is simply described by giving e.g., preferred orientation, width, baseline and peak value (and the informational aspects are usually quantified by just the Fisher information, whose relation to mutual information is discussed by Brunei and Nadal [32], see also the chapter by Fukumizu in this book). If the relevant correlates are embedded in a less transparent domain set, such as faces or fractais [33], then in principle the mean rate rs to each correlate should be given (the variance being largely determined by the mean [28]). However the gross feature of the Oi~tribution of rates can be still conveniently described with fewer parameters, su.'=h ~s ove.~-all mean rate, spontaneous rate and sparsity [34] of the distribution. To such pat arveters we turn at the end of this chapter; before, we should consider the possibilities offered by population coding.
5. Are neighboring neurons telling the same story?
The studies cited above show that, in roughly ecological conditions, single cortical neurons typically can transmit up to a fraction of a bit, about stationary correlates, over a few hundreds of ms (with instantaneous information rates occasionally a bit higher). This is clearly way below the behavioral discrimination capability of the animal. Therefore, we are brought to consider the transmission of information by populations of neurons. One crucial question is the extent to which the information provided by different neurons is the same, that is, redundant. This issue has been addressed, perhaps for the first time at a quantitative level, by Gawne and Richmond [35]. Recording from pairs of inferior temporal cortex neurons in the monkey, responding to a set of 32 simple visual stimuli (Walsh patterns), they have compared the information obtained by considering both responses to the sum of that obtained for each response alone. On average across several pairs, they have found an information 'overlap' y - 0.2 shared by a pair; e.g., a single cell information I(1) ~_ 0.23 bits and for the pair I(2) ~_ 0.41 bits ~_ I(1) + I ( 1 ) ( 1 - y ) . This seems to imply that as much as 80% of what the second cell has to say is fresh information, not yet reported by the first c e l l - not much redundancy. Gawne and Richmond have, however, noted that even such limited redundancy would have drastic effects if it held among arbitrary pairs of cells in a local population. They have considered a simple model, which assumes that if a fraction 1 - y of the information conveyed by the second cell is novel, then a third cell would on average
Information coding in higher sensory and memory areas
835
convey a fraction (1 - y)2 of novel information (and a fraction y(1 - y) shared with each preceding cell, and y2 with both); the ith cell recorded would contribute a fraction (1 -y);-~ novel information, and adding up all contributions one ends up with I(oc) = I(1)/y, or just 1.15 bits in their experiment. Since 5 bits are necessary to discriminate 32 stimuli, they have concluded that even an infinitely large population of cells with that apparently limited level of redundancy would not be able to code for their small stimulus set; and therefore that the mean redundancy among neurons farther away, than those they recorded from, should decrease considerably, towards zero, to account for the fact that behaviorally the animal is obviously able to discriminate. A similar warning, that even small redundancies can have drastic effects on the representational capacity of a population, was put forward by Zohary, Shadlen and Newsome [36]. They looked at the correlated discharge of MT neurons to randomly moving dots, where a single (unidimensional) parameter was used as correlate, the average motion of the dots. Their perspective was different from that of Gawne and Richmond, but the result seemed to imply, again, that adding more and more cells adds little to the accuracy of neural codes. What appeared important, then, was to go beyond what could be extrapolated from the shared information between pairs of cells, and measure directly the information that could be extracted from large populations. Going for large populations requires two changes in the approach. The first, experimental, is that many cells have to be recorded simultaneously, and thus with multiple electrodes. Alternatively, separately recorded cells can be considered, but the results should be later checked against those obtained with simultaneous recordings, because these are needed to record trial-to-trial correlations, and their possible effects on information. The second change, in the analysis, is brought about by the exponential explosion of the 'response space' spanned by {ricA }, when i = 1 , . . . , C and C becomes large. The explosion makes it impossible to sample adequately the space, and thus to measure directly the mutual information in Eq. (2). A standard procedure is to use a decoding algorithm, that converts the vector ricA into a prediction of which correlate s' elicited it, or else assigns probabilities P(s'lricA) to each possible correlate. The result is that one measures the decoded information
I({s C ~ } ; {s' E ~ } ) - ~
~ P(s, s')log 2 [ P(s, s I) 1 ,~s ,, ~ LP~)-P-~') J
(6)
In as much as decoding is done correctly, that is the functions S'(FiCA) o r P(sIIriEA) do not contain any a priori knowledge on the actual correlate s, the decoded information is less or equal to the original mutual information; just in the same way that any mapping from a variable (here rigA) to another, e.g. to a regularized variable, can only degrade, or at most preserve, but not improve, the correlation between the original variable and a third one. Decoding can be done in a variety of ways, and it is not possible to quantify the information loss resulting from each particular algorithm. Still, experience with most commonly used algorithms, and comparisons, when possible, with direct measures, suggest that in many cases the information loss
A. Treves
836
is minor. In particular, very simple decoding algorithms, like those that may conceivably be implemented in the brain, appear often to lose only slightly more information than sophisticated algorithms based on Bayesian models [37]. Using nonsimultaneous recordings from up to 58 cells in the monkey temporal cortex, S = 5 stimuli, and a simple decoding algorithm, Gochin et al. [38] proposed to investigate the scaling of I({s c 5P}; {s' E 5~}) with the number of cells used for decoding. They expressed their result in terms of the novelty in the information conveyed by C cells, defined as the ratio of such information with the sum of that provided by each cell alone. They found that the novelty scaled as 1/x/if, intermediate between the 1/C behavior corresponding to no new information being provided by additional cells, and the trend to a constant, if at least a finite part of what each cell contributed, were novel. The 1/x/-ff behavior seemed appealing in that it vaguely matched noise suppression by C independent processes carrying the same signal; unfortunately, it was likely an artefact, generated by determining a curve on the basis of just 3 points, by failing to correct for limited sampling, and most importantly by neglecting to consider the ceiling effect, at I = log 2 S = 2.32 bits, just 9 times above the average single cell information, I = 0.26 bits. Our replications of this type of measure, with decoding algorithms based simply on the firing rates of simultaneously or nonsimultaneously recorded cells, have exposed a different scaling behavior, in all cases investigated [27,37,39] (cf. Fig. 2). I
I
3.5
.m
2.5
c--
._o 4.-*
[]
E
[]
0 c-
/x
A
Z~
/~
1.5
S
0.5 0
A
0
I
I,
2
4
I
I
I
6 8 10 Cells in the population
I
I
12
14
Fig. 2. The information extracted from a population of cells saturates at the entropy of the stimulus set. Real data from up to 14 non-simultaneously recorded inferior temporal cortex cells responding to 20 face stimuli over 500 ms, adapted from Ref. [37]. The intermediate and lower data points correspond to reduced sets of 9 and 4 stimuli, respectively. The curve is the simple exponential saturation model of Eq. 7.
Information coding in higher sensory and memory areas
837
This is a linear increase I(C)e< C, eventually saturating towards the ceiling at Imax = log2 S. The crucial point is that the saturation level depends on the set of correlates used, and mainly on their number, and it has nothing to do, in principle, with the coding capacity of the population of cells. A simple empirical model describing rather well the whole scaling from linear to saturating is an extension of the Gawne and Richmond model, which in addition assumes that their 'overlap' y also represents the average fraction of/max conveyed by single cells (as an overlap it would presumably be lower, when measured across distant pairs, than when measured, as in their experiment, only across pairs of nearby cells). Two cells then overlap over a fraction y of the fraction y/max each conveys, that is the essential assumption of the model is that overlapping areas are randomly distributed across 'information space'. The information carried by C cells is found, with an easy derivation, to be, according to this random model
I(C)-/max I 1 -
(1-
y)C]
(7)
or, in words, a simple exponential saturation to the ceiling. This simple scaling has now been derived analytically, and found to apply exactly in quite general cases [4,40]. The important element, assumed true in the analytical derivation and apparently approximately holding also in the experimental recordings, is the lack of correlation in the activity of different cells. Correlations can be of two main types, sometimes referred to as 'signal' and 'noise' correlations: those appearing across repeated trials with the same correlate (denoted ~,/j(s) in the following section); and those between the average 'tuning curves' of several cells, that is between their activity distribution across correlates, once averaged over many repetitions of each (denoted v/j). Neither type of correlation is considered in the analytical derivation. In the experiments, signal correlations would indeed have an effect, if substantial, while noise correlations would be unlikely to be ever able to effect a departure from the behavior described by Eq. (7): even with simultaneous recordings, decoding algorithms based just on firing rates would likely miss out any influence of such correlations. What is needed then, in order to go beyond Eq. (7) and address the potential role of correlations, is an alternative approach that does not rely on decoding.
6. Can the effect of correlations be quantified?
The role of correlations in producing redundancy, or alternatively synergy, among neural signals has been investigated both outside [41] and within [16] the context of population coding. While redundancy is, in common intuition, the default outcome of correlated signals, it is easy to devise situations in which correlations lead to synergy. Consider the toy case of Fig. 3 with 2 cells responding to 3 stimuli. Synergy may result from a positive noise correlation (in the trial-to-trial variability), if the mean rates to different stimuli are anticorrelated, and vice versa from a negative noise correlation, if the mean rates to different stimuli are positively correlated.
838
A. Treves
\% .~
Correlated
NOISE
Anticorrelated
o,~
%~r
2 34
6 N
0 1 2 3 4 N Spikes from cell 1 Probability r~
Probability
/
Joint probability
<
Stimuli m A B
D C
Fig. 3. A toy case illustrating possibilities for synergy and redundancy, adapted from Fig. 4 of Ref. [42]. A quick calculation shows that signal and noise both correlated, or both anticorrelated, result in redundancy, while the other two situations produce, given the responses indicated in the figure, synergy. When signal and noise correlation are of the same sign, the result is always redundancy. The impact of correlations on redundancy is probably minimal when the mean responses are weakly correlated across the stimulus set (perhaps the 'natural condition'?). Given this realm of possibilities, it is desirable to take an approach, applicable to real data, that enables separating out the information transmitted by individual spikes, emitted by single neurons within an ensemble, from positive or negative contributions due to correlations in firing activity among neurons. One such approach focuses on short time windows [42,43]. We shall see now how in the limit of what is transmitted over very short windows, a simple formula quantifies the corrections to the instantaneous information rate (determined solely by mean firing rates) which result from correlations in spike emission between pairs of neurons. Positive corrections imply synergy, while negative corrections indicate redundancy. The information carried by the population response can be expanded in a Taylor series [24,42]
Information coding in higher sensory and memory areas
839
t2
I(t) - t I, + 2
I, + . . .
(8)
The first time derivative depends only on the mean rates:
It - ~
?i(s) log2 ~
(9)
and it is purely a sum of all single cell contributions, each of the form earlier described by Skaggs and McNaughton [44] and Bialek et al. [2] for single cells. The formula clarifies how misguided it is to link a high information rate to a high signalto-noise ratio, which is the conceptual framework tacitly implied in Refs. [36,38]. The rate, that is the first derivative of the mutual information, only reflects the extent to which the mean responses of each cell are distributed across stimuli; it does not reflect anything of the variability of those responses, that is of their noisiness, nor anything of the correlations among the mean responses of different cells. The effect of (pairwise) correlations begins to be felt in the second derivative, instead, and it is convenient then to introduce appropriate measures of such correlations. Pairwise correlations in the response variability ('noise' correlation) can be quantified by
ri(s)rj(s) ~[iJ(S) ---~i(S)-f j(S ) --
1
(10) ,
i.e. the amount of trial by trial concurrent firing, compared to that expected in the uncorrelated case. The degree of similarity in the mean response profiles of the cells to different stimuli ('signal' correlation) can instead be quantified by
(-~i(s)?j(s))~ _ 1.
(ll)
Vii --- (-f i(S) ) s (-fj(S) ) s
The second derivative, I,, breaks into three components. The first term of Itt depends only on the mean rates and on their correlations I~tl)- 1 c c [ ( 1 :) 1 -ln---2 Z ~(?i(S))s(-fJ(s)), vo + (1 + vv)In 1 + v '
(12)
i=1 j = l
the second term is nonzero only when correlations are present in the noise, even if stimulus-independent ~c ~ c [
I{2)------(-ri(s)-fJ(S)TiJ(S)-s
) ]log 2 (
1
1 -+- V~
)
(13)
'
i=1 j = l
the third term contributes only if correlations are stimulus-dependent 1~3) - = j=l -fi(s)-fj(s)(1 + yij(s)) • log 2 [(-fi(s')?j(s')(1 + 7ij(s'))),,J
,"
(14)
840
A. Treves
This decomposition still has to be applied extensively to simultaneously recorded neural data. The limited evidence in our hands has not yet revealed a situation in which correlations clearly play a prominent role. Extensive data produced in the laboratories of Eckhorn [45] and Singer [46], which qualitatively point at the importance of correlations, have not, to our knowledge, been analyzed in these terms, despite pioneering applications of information theory [47-49]. A very interesting recent finding [50] could not be quantified properly in terms of information due to the limited sampling available, and it could be reanalyzed with the help of this expansion. The expansion has recently being refined in a way that it allows now to assess also the importance of timing relations in pairwise correlations [43]. The crucial question, however, is how soon does the expansion, based on the short time limit, break down. When this occurs, higher-order terms in the expansion (starting from those dependent on three-way correlations, and so on) cannot be neglected any longer. The time range of validity of the expansion is thus limited by the requirement that second-order terms be small with respect to firstorder ones, and successive orders be negligible. Since at order n there are C n terms with C cells, obviously the applicability of the short time limit contracts, in practice, for larger populations. This can be seen in the example from the rat barrel cortex, in Fig. 4. Still, one may ask whether the expansion, at least restricted to second-order terms, may afford some insight on neural coding as expressed by large populations of cell.
7. Synergy and redundancy in large populations Obviously with a few cells, all cases of synergy or redundancy are possible if the correlations are properly engineered - in simulations - or the appropriate special case is recorded - in experiments. The outcome of the information analysis will simply reflect the particularity of each case. With large populations, one may hope to have a better grasp of generic, or typical, cases, more indicative of conditions prevailing at the level of, say, a given cortical module. One may begin by considering a 'null' hypothesis, i.e. that pairwise correlations are purely random, and small in value. In this null hypothesis, the signal correlations v;j have zero average, while v 2. could still differ from zero if the ensemble of stimuli used is limited, since a random walk would typically span a range of size x/~. Then the mean v~. would decrease with S as 1/S. The noise correlations might be thought to arise from stimulus independent terms, ~/;j, which need not be small, and stimulus-dependent contributions 8yij(s ), which might be expected to get smaller when more trials per stimulus are available, and which on averaging across stimuli would again behave as a random walk. The effect of such null hypothesis correlations on information transmission can be gauged by further expanding Itt in the small parameters vii a n d ~'Yij(S), i.e. assuming [viii2 << 1 and ]S7;j(s) 2 ~ l. We consider here a simplified case, in which, for example, all cross-terms like vijSYij are taken to vanish; the full derivation will be
codingin highersensory and memory
Information
-
areas
841
-
Totarinfo
Rate only ~ Corr (stim.ind.! - Corr (stim.depj
ii
-
-
m
O,6
O,6
O,4
0.4
0.2
0.2
/., .................
-0.2
-
0
5
:.-.-.-.-
-
10
-.
-
: ......
.:...-
II
15
.... .:.~-~.-.-
9
20
9
25
Time Windows (ms)
TotaFinfo
Rate only m Corr (stim.ind.) - -
Corr (stim.dep.)
5
30
Tota~'info Rate only ~ Corr (stim.ind.) Corr (stim.dep.)
9
'
;5
10
~',
2'0
Time Windows (ms)
3o
Tota]'info
m
Rate only ~ m Corr (stim.ind.) Corr (stim.dep.) . . . .
0.6
f~
0.6
0.4
0.4
9
.o"
I
"":. -~ ~176
" ' ~ 1 7 6. . . . 0.2
0.2 i
0
-O.2
0
0
5
10
T t5
(20)
25
ime Windows ms
30
35
5
10
15
20
Time Windows (ms)
25
3O
Fig. 4. The short-time limit expansion breaks down sooner, the larger the population considered. Cells in rat somatosensory barrel cortex responding to 2 stimuli to the vibrissae. Components of the transmitted information with 3 (top, left), 6 (top, right), 9 (bottom, left) and 12 cells (bottom, right). The first three cases are averaged over 4 sets of cells. Time window: 5-40 ms. The initial slope (i.e., It) is roughly proportional to the number of cells. The effects of the second-order terms, quadratic in t, are visible over the brief times between the linear regime and the break down of the expansion. Among several similar data sets analyzed, this is close to the worst case, in terms of how soon in time the expansion breaks down.
published elsewhere (Bezzi, Diamond and Treves, Cond. Mat. arxio 0012119). In this case the leading terms in the expansion of I~t1) + I~2) are those quadratic in vii
I{tl) + I~t2)=
1
c
c
ln2 Z Z (2 +
~iJ)(-fii(S))s(-fiJ(S))sV~}'
(15)
i--1 j--1
i.e., contributions to the mutual information which are always negative (indicating redundancy). The leading terms in the expansion of I~3), if we denote
(f [~'Yij(s)]){i'j}'s= S
i=1
(ri(s)-rJ(S))s f [~)'Yij(S)]
(16)
the average over stimuli weighted by the product of the normalized firing rate and take (~u(s)){zu),~ to vanish, are
-~(s)-~j(s),
842
A. Treves
1}3)-ln2
.=
.= I -~- V----~j"
{i,j},s'
that is, contributions to the mutual information which are always positive (indicating synergy). Thus the leading contributions of the new Taylor expansion are of two types, both coming as C ( C - 1 ) / 2 terms proportional to (-~i(S))s(~j(s)) s" the first type, Eq. (15), induces redundancy, and might scale as 1/S in our null hypothesis; the second type, Eq. (17), induces synergy, and might scale inversely with the number of trials per stimulus in our null hypothesis. These leading contributions t o / , can be compared to first-order contributions to the original Taylor expansion in t (i.e., to the C terms in/t) in different time ranges. For times t .~ ISI/C, that is t(~) ~ 1/C, first-order terms sum up to be of order one bit, while second-order terms are negligible, provided enough stimuli are used and enough trials are available. This occurs however over a time range that becomes shorter as more cells are considered, and the total information conveyed by the population remains of order 1 bit only! For times of the order of the mean interspike interval, t "~ ISI, first order terms are of order C, while second-order ones are of order C 2 (v 2) (with a minus sign, signifying redundancy) and C 2 ((By) 2) (with a plus sign, signifying synergy), respectively. If (v 2) and ((83,)2) are not sufficiently small to counteract the additional C factor, these 'random' redundancy and synergy contributions will be substantial. Moreover, over the same time ranges also leading contributions to Ittt and to the next terms in the Taylor expansion in time may be expected to be substantial. We are therefore led to a surprising conclusion, applying to what is likely the minimum meaningful time range for information transmission, that is the time it takes the typical cell to emit a spike. The conclusion is that a large population of cells, which has not been designed to code stimuli in any particular cooperative manner, may still show large effects of redundancy or synergy, arising simply from random correlations among the firing of the different cells. Such a conclusion reinforces the need for careful experimental studies of the actual correlations prevailing in the neural activity of different parts of the brain. However, it also indicates the importance of considering information decoding along with information encoding: real neurons may not care much for the synergy and redundancy encoded in a multitude of variables they cannot read out, such as the Vgj'S and yu's. 8. Parameters that matter in neuronal representations
What are, then, the variables that real neurons are directly affected by? Clearly, the firing rates ri, of the neurons they receive inputs from, comprise an important group. Most theoretical analyses of neural networks are grounded on the assumption that the quintessential processing carried out by a single neuron is a dot product operation between the vector of input firing rates and the vector of synaptic weights (cf. [25]). It is the modifiability of individual synaptic weights, and the consequent variance among the synaptic weight vectors of different processing units, that makes
Information coding in higher sensory and memory areas
843
individual firing rates important, as even very simplified formal models illustrate. If synaptic weights were taken to be uniform across inputs, the enormous fan-in of cortical connectivity would reduce to a mere device for large sampling. If they were taken to be nonuniform but fixed in time, no new input-output transforms could be established for a given population of cells, so in practice the connectivity would, again, subserve just sampling, except for a few in-built operations. The modifiability of individual synaptic weights, according to so-called Hebbian rules [51] or otherwise, is the cornerstone of the theory of neuron-like parallel distributed processing. Thus, quantitative constraints on memory storage are set by the number of synapses available for individual modification, and in fact they are expressed usually in terms of bits/synapse. In the real brain neurons and synapses operate in vastly more complicated ways than summarized by notions like dot products and synaptic weights. Still, maximizing memory storage through maximal synaptic density has been considered by Braitenberg [52] a crucial principle of cortical design. Individual firing rates are therefore central to neuronal coding because of (longterm) synaptic plasticity, but as there is more to cortical plasticity than synaptic plasticity, so there is likely more to neural codes than individual firing rates. Recently, for example, much attention has been devoted to the exquisite refinement of local inhibitory circuitry in the neocortex (Henri Markram, personal communication, and see [53]). Inhibitory interneurons appear to cluster into some 15 different classes, discriminable on the basis of a combination of morphological, electrophysiological and short-term plasticity properties. Synapses to and from inhibitory interneurons are found to demonstrate long-term plasticity as well. Their connectivity patterns are differentiated also in terms of cortical layers. Although the total numbers of inhibitory neurons and the number and location of their synapse appear unsuitable to make them individually involved in information processing, there is no doubt that they provide for a modulation of cortical dynamics that turns certain collective variables into important parameters of neuronal codes. For example, the average degree of synchronization of an afferent volley to a given cortical patch might be crucial in determining the dynamics of feedforward and feedback inhibition, and consequently the time-course of activation of the pyramidal cells in the patch. This is in contrast to the exact degree of synchronization between any two axons (the dynamical equivalent of a single v/j), which would itself be relevant only if there were a corresponding modifiable parameter capable of modulating its effects. At present, our theoretical understanding of neural networks is underdeveloped to deal with such cortical complexities, which are themselves still in the process of being investigated, especially in their dynamical aspects. Despite some promising attempts [54], these are still early days for the elaboration of the appropriate conceptual tools and the identification of the crucial mechanisms and most relevant quantities. At a very basic, and non-dynamical, level, however, it is already clear that the gross statistical features of the distribution of neuronal activity bear a direct import on the efficiency of neuronal codes and of information storage. In the late eighties, a considerable debate between neurophysiologists and modelers centered on the issue of the observed mean level of activity in the cortex, and whether this would make popular models of memory storage totally inappropriate as models of
A. Treves
844
cortical networks [55]. This issue, touching on the first and most basic moment of an abstract 'typical distribution of cortical activity', eventually evaporated when it appeared to be closely linked to the modeling of neurons as binary or sigmoidal units. The second moment of such a typical distribution, instead, has a relevance which does not simply stem from crude modeling technology. It was long recognized that the so-called sparseness of the firing, roughly the proportion of cells highly activated at any one time, is a primary determinant of the capacity for memory storage [56,57]. For nonbinary units, in particular for real neurons, a generalized measure of the sparseness of their activity can be defined as -
((Fi) 2)
(18)
[25,34]. The more sparse a set of representations expressed by a population of cells (a ~ 0), the less the representational capacity and the larger the memory capacity of that population, and consistently a is generally found to decrease approaching central memory systems from the sensory periphery [58]. Sparseness is thus a basic and important statistic of neuronal representations, which however does not reflect their interrelationships. To probe the ways in which different representations relate to one another, one must consider other statistics, that go beyond sparseness, and that in fact are linked to information measures.
9. Quantifying the structure of neuronal representations The structure of neural representations of the outside world has been studied in detail in some simple situations. Typically these are situations in which a welldefined correlate of neuronal activity (i.e. a stimulus, a response, or even a behavioural state) is characterized by one or a few parameters that are made to vary continuously or in steps. Examples are the Hubel and Wiesel [59] description of orientation selectivity in cat visual cortex, the O'Keefe [60] finding of place cells in the rat hippocampus, the mitral cell coding of n-aliphatic acid hydrocarbon length in the olfactory system [61 ], the coding of the direction of movement in 3D-space in the primate motor cortex [62]. In many interesting situations, though, especially in those parts of the brain which are more remote from the periphery, external correlates, or, for simplicity, stimuli, do not vary (either continuously, or in steps) along any obvious physical dimension. Often, in experiments, the set of stimuli used is just a small ensemble of a few disparate individual items, arbitrarily selected and difficult to classify systematically. Examples for the ventral visual system are faces [63], simple or complex [33] abstract patterns, or the schematic objects reached with the reduction procedure of Tanaka et al. [64]. In such situations, the resulting patterns of neuronal activity across populations of cells can still provide useful insight on the structure of neuronal representations of the outside world, but such insight has to be derived independently of any explicit correlation with a natural, physical structure of the stimulus set.
845
Information coding in higher sensory and memory areas
The only obvious a priori metric of the stimulus set, in the general case, is the trivial categorical metric of each element s being equal to itself, and different from any other element in the set. A posteriori, the neuronal firing patterns embed the stimulus set into a potentially metric structure defined by the similarities and differences among the patterns, or response vectors, corresponding to the various elements. A truly metric structure can be extracted by quantifying such similarities and differences into a notion of distance (among firing patterns) that satisfies the three required relations: positivity, symmetry, the triangle inequality. At a more basic level, though, the overall amount of structure, i.e. the overall importance of relations of similarity and difference among firing patterns, can be quantified even independently of any notion of distance, just from a matrix Q(s]s') characterizing the similarity or confusability of s' with s, a matrix which need not be symmetrical. Q(s]s t) can be simply derived from neuronal recordings, after decoding the firing patterns, as the conditional probability P(s, s')/P(s'). Whatever the decoding procedure used, Q(sls t) is essentially a measure of the similarity of the current response vector to s t with the mean response vector to s. It is however important to notice that Q(s]s t) can also be derived from other measures, for example from behavioral measures of error or confusion in recognition or classification. Behavioral measures of the similarity or confusability of s t with s do not access the representation of the two stimuli directly, but indirectly they reflect the multiplicity of neural representations that are important in generating that particular behavior. If some of these representations are damaged or lost, as in brain-damaged patients, the resulting behavioral measures can be indicative of the structure of the surviving representations [65]. The amount of structure can be quantified by comparing the mutual information, which in terms of the matrix Q(s]s') reads
Q(s]st) I -- Z Q(sls')P(s')log 2 ,,s'cs ~s"Q(sls")P(s")
(19)
with its minimum and maximum values Imin and/max [66] corresponding to a given percent correct fcor = ~ Q(s]s)P(s). The lowest information values compatible with a given fcor are those attained when equal probabilities (or equal frequencies of confusion) result for all stimuli s r s t. In this case one finds
Imin = log2S +fcor logzfcor -k- (1 -- fcor) 1og2[(1 - f c o r ) / ( S -
1)].
(20)
Conversely, maximum information for a given fcor is contained in the confusion matrix when stimuli are confused only within classes of size 1/fcor, and the individual stimuli within the class are allocated on a purely random basis (for analytical simplicity we consider only unbiased decoding, such that Q(sls' ) <_ Q(st]s'), and assume that each class may contain a noninteger number of elements). It is easy to see that then /max = log2 S + 1Ogzfcor.
(21)
Interpreting the similarity, or probability of confusion, as a monotonically decreasing function of some underlying distance (e.g. as discussed above), the first
A. Treves
846
situation can be taken to correspond to the limit in which the stimuli form an equilateral simplex, or equivalently the stimulus set is drawn from a space of extremely high dimensionality. In the Euclidean d -+ ~ limit, points drawn at random from a finite e.g., hyperspherical region tend to be all at the same distance from each other, and from the point of view of the metric of the set this is the trivial limit mentioned above. The second situation can be taken to correspond to the ultrametric limit, instead, in which all stimuli at distance less than a critical value from each other form clusters such that all distances between members of different classes are above the critical value. This is a nonEuclidean structure (although it could be embedded in a Euclidean space of sufficiently large dimension), and it is a first example of the possible emergence of nonEuclidean aspects from a quantitative analysis that does not rely on a priori assumptions. Intermediate situations between the two extremes are easy to imagine, and can be parametrized in a number of different ways. A convenient parameter that simply quantifies the relative amount of information in excess of the minimum, without having to assume any specific parametrization for the Q(sls') matrix, is ~. =
I - Imin Ima• - Imin'
(22)
which ranges from 0 to 1 (for unbiased confusion; it can be above 1 if confusion is biased) and can be interpreted as measuring the metric content of the matrix. What is quantified by ~vcan be called the metric content not in the sense that it requires the introduction of a real metric, but simply because it gives the degree to which relationships of being close or different (distant), among stimuli, emerge in the Q(s[s') matrix. For ~ = 0 such relationships are irrelevant, to the point that if confusion occurs, it can be with any (wrong) stimulus. For ~ - 1 close stimuli are so similar as to be fully confused with the correct one, whereas other stimuli are 'maximally distant' and never mistaken for it. In summary, the metric content index )~ quantifies the dispersion in the distribution of 'errors', from maximal, ~ = 0, to minimal, )v = 1. The 'errors' may be actual behavioral errors in identifying or categorizing stimuli or in producing appropriate responses, or simply calculated from the similarity in the response vectors of a population of cells to different stimuli. Two examples of application of the metric content index, in the second situation, are illustrated in Fig. 5. The analyses summarized in the graphs of Fig. 5 point at two important aspects of the metric content index: its being a relatively intrinsic property of a representation (invariant across the number of cells sampled, within sampling precision) and its variation from one population or cortical area to another. The neuronal recordings are described elsewhere (continuous simultaneous recordings of 42 rat hippocampal CA1 cells, with the rat running a triangular maze, divided in windows 250 ms long [39]; and continuous but not simultaneous recordings of 27 monkey cells from the 4 regions indicated, with the monkey freely locomoting in the laboratory, divided in windows 100 ms long [27,67]). It should be noted that the similarity matrix is based on response vectors quite different from Georgopoulos' population vectors [62], which live in the physical 3D
Metric Content - Spatial Views
Metric Content - Places
10
20
30
40
50 60 Percent correct
70
80
90
100
10
15
20 Percent correct
25
30
Fig. 5. The information decoded from different cell populations 11s.the corresponding percent correct, in the rat (left) and monkey (right) hippocampus. In both cases different data points with the same symbol correspond to increasing the number of cells included in each population, thus raising percent correct and information. I,,,i, and I,,, are indicated. The rat example illustrates how metric content is a relatively invariant measure (the third curve is for h = 0.36) across population sizes. The monkey example indicates quantitative differences among neighboring populations (the 2 curves are for h = 0.25 and h = 0.15): datapoints are for populations of CA3 (*), CAI (triangles), parasubiculum (squares) and parahippocampal gyrus cells (diamonds).
848
A. Treves
or 2D movement space rather than in the space of dimensionality equal to the number of cells included, and which correspond to a continuous rather than a discretized correlate. One can see from the figure the extent to which metric content, considering the imprecision with which cells are sampled, their activity is recorded and the information measures are extracted, is still a relatively stable index. This allows some comparisons to be made even among the metric content characterizing vectors of different dimensionality. For each given cortical area, as more cells are considered, both percent correct and decoded information grow, and the relation between the two, expressed as metric content, varies somewhat, but in a limited band of values characterizing each cortical area. These data, particularly those obtained in the monkey, are not fully adequate, on at least two accounts. First, the number of cells recorded and the number of trials available for each cell and each spatial view were not sufficiently large to safely avoid limited sampling effects. Second, the monkey recordings were not simultaneous. Both inadequacies can be removed with parallel recording from several cells at once, as has become now standard practice in a number of laboratories. Within these limits, one possible interpretation of the different metric content in the CA3 area, with respect to the other 3 areas sampled, lies in the different pattern of connectivity, whereby in CA3 recurrent collateral connections are the numerically dominant source of inputs to pyramidal cells, and travel relatively long distance, to form an extended network connected by intrinsic circuitry. Considerations based on simplified network models suggest that such a connectivity pattern would express memory representations with a different metric structure from those expressed by networks of different types. The difference could be further related to the qualitative nature of the memory representation, which might be characterized as being more episodic in CA3 and more structured in the other areas. The metric content depends also on the average sparseness of these representations, though, and further analyses are required to dissociate the effects of connectivity (and of representational structure) from those purely due to changes in sparseness. In particular, it has been shown that in the short-time limit the metric content becomes a transparent function of sparseness [68], and it is possible that even over the 250 ms windows used for the rat, the structure revealed reflects mainly the sparseness of the coding. The monkey recordings were from neighboring areas in the temporal lobes, and it is possible that any difference among memory representations will be more striking when more distant areas are compared. In addition, it is possible that any difference may be more striking when the correlate considered does not have its own intrinsic metric, as with spatial views, but instead lives in a high-dimensional space, as e.g. with faces, thereby letting more room for arbitrary metric structures to be induced in the neural representations by the learning process. For both reasons, it is interesting to extend this analysis to entirely different experiments, sharing with these only the generic requirement that different populations of cells are recorded in their response to the same set of stimuli, or in general correlates. It is also interesting to deepen the analysis of the structure of representations by looking at subtler aspects, such as the ultrametric content [66], that depends on the mutual relations of triplets, rather than pairs, of representations.
Information coding in higher sensory and memory areas
849
Finally, possible c h a n g e s in the r e p r e s e n t a t i o n s t h a t d e v e l o p with time can exa m i n e d by r e c o r d i n g f r o m the s a m e p o p u l a t i o n s - n o t the s a m e cells - over p e r i o d s d u r i n g which some b e h a v i o r a l l y r e l e v a n t p h e n o m e n o n m a y h a v e occured, such as new learning, forgetting, or a m o d u l a t i o n of the existing r e p r e s e n t a t i o n s . O n e specific such m o d u l a t i o n o f interest for the case o f h u m a n patients is the o n e resulting f r o m localized lesions to a n o t h e r cortical area, which m a y affect the s t r u c t u r e o f the r e p r e s e n t a t i o n s in surviving areas o f the cortex.
Abbreviations
C N S , C e n t r a l n e r v o u s system msec, millisecond MT, Medial Temporal Acknowledgements
T h e analyses a n d p r o c e d u r e s discussed in this c h a p t e r h a v e been d e v e l o p e d t o g e t h e r with several colleagues, as evident f r o m the citations, a m o n g t h e m S t e f a n o Panzeri, E d m u n d Rolls a n d W i l l i a m Skaggs. C o l l a b o r a t i o n s were s u p p o r t e d by the E u r o p e a n C o m m i s s i o n a n d the H u m a n F r o n t i e r Science P r o g r a m .
References 1. Shannon, C.E. (1948) A mathematical theory of communication. AT&T Bell Labs. Tech. J. 27, 379-423. 2. Bialek, W., Rieke, F., de Ruyter van Steveninck, R.R. and Warland, D. (1991) Reading a neural code. Science 252, 1854-1857. 3. Frolov, A.A. and Murav'ev, I.P. (1993) Informational characteristics of neural networks capable of associative learning based on hebbian plasticity. Network 4, 495-536. 4. Samengo, I. and Treves, A. (2001) Representational capacity of a set of independent neurons. Phys. Rev. E 63, (in press). 5. Treves, A. and Panzeri, S. (1995) The upward bias in measures of information derived from limited data samples. Neural Comp. 7, 399-407. 6. Panzeri, S. and Treves, A. (1996b) Analytical estimates of limited sampling biases in different information measures. Network 7, 87- 107. 7. Golomb, D., Hertz, J.A., Panzeri, S., Treves, A. and Richmond, B.J. (1997) How well can we estimate the information carried in neuronal responses from limited samples? Neural Comp. 9, 649-655. 8. Oram, M.W. and Perrett, D.I. (1992) Time course of neuronal responses discriminating different views of face and head. J. Neurophysiol. 68, 70-84. 9. Optican, L.M. and Richmond, B.J. (1987) Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex: Iii information theoretic analysis. J. Neurophysiol. 57, 162-178. 10. Optican, L.M., Gawne, T.J., Richmond, B.J. and Joseph, P.J. (1991) Unbiased measures of transmitted information and channel capacity from multivariate neuronal data. Biol. Cybern. 65, 305-310. 11. Eskandar, E.N., Richmond, B.J. and Optican, L.M. (1992) Role of inferior temporal neurons in visual memory: I. temporal encoding of information about visual images, recalled images, and behavioral context. J. Neurophysiol. 68, 1277-1295. 12. Kjaer, T.W., Hertz, J.A. and Richmond, B.J. (1994) Decoding cortical neuronal signals: networks models, information estimation and spatial tuning. J. Comput. Neurosci. 1, 109-139.
850
A. Treves
13. Heller, J., Hertz, J.A., Kjaer, T.W. and Richmond, B.J. (1995) Information flow and temporal coding in primate pattern vision. J. Comput. Neurosci. 2, 175-193. 14. Tov6e, M.J., Rolls, E.T., Treves, A. and Bellis, R.J. (1993) Information encoding and the responses of single neurons in the primate temporal visual cortex. J. Neurophysiol. 70, 640-654. 15. Mechler, F., Victor, J.D., Purpura, K.P. and Shapley, R. (1998) Robust temporal coding of contrast by vl neurons for transient but not for steady-state stimuli. J. Neurosci. 18, 6583-6598. 16. Oram, M.W., Wiener, M.C., Lestienne, R. and Richmond, B.J.M (1999) Stochastic nature of precisely timed spike patterns in visual system neuronal responses. J. Neurophysiol. 81, 3021-3033. 17. O'Keefe, J. and Recce, M.L. (1993) Phase relationship between hippocampal place units and the eeg theta rhythm. Hippocampus 3, 317-330. 18. Tsodyks, M.V., Skaggs, W.E., Sejnowski, T.J. and McNaughton, B.L. (1996) Population dynamics and theta rhythm phase precession of hippocampal place cell firing: a spiking neuron model. Hippocampus 6, 271-280. 19. Skaggs, W.E., McNaughton, B.L., Wilson, M.A. and Barnes, C.A. (1996) Theta phase precession in hippocampal neuronal populations and the compression of temporal sequences. Hippocampus 6, 149-172. 20. Chee-Orts, M.-N. and Optican, L.M. (1993)Cluster method for analysis of transmitted information in multivariate neuronal data. Biol. Cybern. 69, 29-35. 21. Page, M. (2000) Connectionist modeling in psychology: a localist manifesto. Behavioral Brain Sciences 23, 443-512. 22. Rolls, E.T., Treves, A., Tov6e, M.J. and Panzeri, S. (1997b) Information in the neuronal representation of individual stimuli in the primate temporal visual cortex. J. Comput. Neurosci. 4, 309-333. 23. DeWeese, M.R. and Meister M. (1999) How to measure the information gained from one symbol. Network 10, 325-340. 24. Panzeri, S., Biella, G., Rolls, E.T., Skaggs, W.E. and Treves, A. (1996a) Speed, noise, information and the graded nature of neuronal respones. Network 7, 365-370. 25. Rolls, E.T. and Treves, A. (1998) Neural Networks and Brain Function. Oxford University Press, Oxford. 26. Rolls, E.T., Critchley, H.D. and Treves, A. (1996) Representation of olfactory information in the primate orbitofrontal cortex. J. Neurophysiol. 75, 1982-1996. 27. Rolls, E.T., Treves, A., Robertson, R.G., Georges-Francois, P. and Panzeri, S. (1998) Information about spatial views in an ensemble of primate hippocampal cells. J. Neurophysiol. 79, 1797-1813. 28. Gershon, E.D., Wiener, M.C., Latham, P.E. and Richmond, B.J. (1998) Coding strategies in monkey V1 and inferior temporal cortices. J. Neurophysiol. 79, 1135-1144. 29. Levy, W.B. and Baxter, R.A. (1996) Energy efficient neural codes. Neural Comp. 8, 531-543. 30. Baddeley, R.J., Abbott, L.F., Booth, M., Sengpiel, F., Freeman, T., Wakeman, E.A. and Rolls, E.T. (1997) Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proc. R. Soc. Lon. Ser. B 264, 1775-1783. 31. Treves, A., Panzeri, S., Rolls, E.T., Booth, M.C.A. and Wakeman, E.A. (1999) Firing rate distributions and efficiency of information transmission of inferior temporal cortex neurons to natural visual stimuli. Neural Comp. 11, 611-641. 32. Brunel, N. and Nadal, J.P. (1998) Mutual information, Fisher information and population coding. Neural Comp. 10, 1731-1757. 33. Miyashita, Y. and Chang, H.S. (1988) Neuronal correlate of pictorial shortterm memory in the primate temporal cortex. Nature 331, 68-70. 34. Treves, A. and Rolls, E.T. (1991) What determines the capacity of autoassociative memories in the brain. Network 2, 371-397. 35. Gawne, T.J. and Richmond, B.J. (1993) How independent are the messages carried by adjacent inferior temporal cortical neurons? J. Neurosci. 13, 2758-2771. 36. Zohary, E., Shadlen, M.N. and Newsome, W.T. (1994) Correlated neuronal discharge rate and its implication for psychophysical performance. Nature 370, 140-143.
Information coding in higher sensory and memory areas
851
37. Rolls, E.T., Treves, A. and Tov6e, M.J. (1997a) The representational capacity of the distributed encoding of information provided by populations of neurons in the primate temporal visual cortex. Exp. Brain Res. 114, 149-162. 38. Gochin, P.M., Colombo, M., Dorfman, G.A., Gerstein, G.L. and Gross, C.G. (1994) Neural ensemble encoding in inferior temporal cortex. J. Neurophysiol. 71, 2325-2337. 39. Treves, A., Skaggs, W.E. and Barnes, C.A. (1996b) How much of the hippocampus can be explained by functional constraints? Hippocampus 6, 666-674. 40. Samengo, I. (2001) Independent neurons representing a finite set of stimuli: dependence of the mutual information on the number of units sampled. Network 12 (in press). 41. Brenner, N., Strong, S.P., Koberle, R. and Bialek, W. (2000) Synergy in a neural code. Neural Comp. 12, 1531-1552. 42. Panzeri, S., Schultz, R., Treves, A. and Rolls, E.T. (1999) Correlations and the encoding of information in the nervous system. Proc. Roy. Soc. B 266, 1001-1012. 43. Panzeri, S. and Schultz, S.R., (1999) A unified approach to the study of temporal, correlational and rate coding. Physics arXiv.org, 9908027. 44. Skaggs, W.E. and McNaughton, B.L. (1992) Quantification of what it is that hippocampal cell firing encodes. Soc. Neurosci. Abs. 508.9, 18, 1216. 45. Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M. and Reitboeck, H.J. (1988) Coherent oscillations: a mechanism of feature linking in the visual cortex? Biol. Cybern. 60, 121-130. 46. Gray, C.M., Konig, P., Engel, A.K. and Singer, W. (1989) Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338, 334-337. 47. Eckhorn, R. and P6pel, B. (1974) Rigorous and extended application of information theory to the afferent visual system of the cat. i. basic concepts Kybernetik 16, 191-200. 48. Eckhorn, R. and P6pel, B. (1974) Rigorous and extended application of information theory to the afferent visual system of the cat. ii. experimental results 17, 7-17. 49. Eckhorn, R., Grtisser, O.-J., Kr611er, J., Pellnitz, K. and P6pel, B. (1976) Efficiency of different neural codes: information transfer calculations for three different neuronal systems. Biol. Cybern. 22, 49-60. 50. Maynard, E.M., Hatsopoulos, N.G., Ojakangas, C.L. and Acuna, B.D., Sanes, J.N., Normann, R.A. and Donoghue, J.P. (1999) Neuronal interactions improve cortical population coding of movement direction. J. Neurosci. 19, 8083-8093. 51. Hebb, D.O. (1948) The Organization of Behavior. Wiley, New York. 52. Braitenberg, V. and Shtiz, A. (1991) Anatomy of the Cortex: Statistics and Geometry. Springer, Berlin. 53. Wang, Y., Gupta, A. and Markram, H. (1999) Anatomical and functional differentiation of glutamatergic synaptic innervation in the neocortex. J. Physiol. (Paris) 93, 305-317. 54. Douglas, R.J. and Martin, K.A. (1991) A functional microcircuit for cat visual cortex. J. Physiol. (London) 440, 735-769. 55. Amit, D.J. and Treves, A. (1989) Associative memory neural network with low temporal spiking rates. Proc. Nat. Acad. Sci. USA 86, 7871-7875. 56. Tsodyks, M.V. and Feigel'man, M.V. (1988) The enhanced storage capacity in neural networks with low activity level. Europhys. Lett. 6, 101-105. 57. Buhmann, J., Divko, R. and Schulten, K. (1989) Associative memory with high information content. Phys. Rev. A 39, 2689-2692. 58. Rolls, E.T. and Treves, A. (1990) The relative advantages of sparse versus distributed encoding for associative neuronal networks in the brain. Network 1, 407-421. 59. Hubel, D.H. and Wiesel, T.N. (1974) Sequence regularity and geometry of orientation columns in the monkey striate cortex. J. Comp. Neurol. 1582, 267-294. 60. O'Keefe, J. (1979) A review of the hippocampal place cells. Prog. Neurobiol. 13, 419-439. 61. Sullivan, S.L. and Dryer, L. (1996) Information processing in mammalian olfactory system. J. Neurobiol. 30, 20-36. 62. Georgopoulos, A.P., Schwartz, A. and Kettner, R.E. (1986) Neural population coding of movement direction. Science 233, 1416-1419.
852
A. Treves
63. Rolls, E.T. (1992) Neurophysiological mechanisms underlying face processing within and beyond the temporal cortical visual areas. Philos. Trans. R. Soc. (London) B 335, 11-21. 64. Tanaka, K. (1993) Neuronal mechanisms of object recognition. Science 262, 685-688. 65. Lauro-Grotto, R., Piccini, C., Borgo, F. and Treves, A. (1997) What remains of memories lost in Alzheimer and herpetic encephalitis. Soc. Neurosci Abst. 734.2, 23, 1889. 66. Treves, A. (1997) On the perceptual structure of face space. Biosystems 40, 189-196. 67. Treves, A., Georges-Francois, P., Panzeri, S., Robertson, R.G. and Rolls, E.T. (1998) The metric content of spatial views as represented in the primate hippocampus, in: Neural Circuits and Networks, NATO Asi Series F, Computer and Systems Sciences, Vol. 167, eds. V. Torre and J. Nicholls. p. 239-247, Springer, Berlin. 68. Panzeri, S., Treves A., Schultz, S.R. and Rolls, E.T. (1999) On decoding the responses of a population of neurons from short time windows. Neural Comp. 11, 1553-1577.
C H A P T E R 20
Population Coding: Efficiency and Interpretation of Neuronal Activity C.C.A.M. G I E L E N Department of Medical Physics and Biophysics, University of Nijmegen, Geert Grooteplein Noord 21, NL 6525 EZ Nijmegen, The Netherlands
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
853
Contents
1.
Introduction
2.
M a t h e m a t i c s o f (un)biased estimators a n d their variance
3. 4.
855 ......................
2.1.
M a x i m u m likelihood a n d m a x i m u m a posteriori estimators
2.2.
Fisher i n f o r m a t i o n
2.3.
Mutual information
857
.................
857
..........................................
858
.........................................
859
Probabilistic i n t e r p r e t a t i o n of p o p u l a t i o n codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . M o d e l s for p o p u l a t i o n codes 4.1.
5.
.................................................
860
.......................................
Simple version o f p o p u l a t i o n c o d i n g
862
...............................
862
4.2. Poisson m o d e l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
865
4.3.
O p t i m u m linear e s t i m a t o r ( O L E )
868
4.4.
Summary ................................................
.................................
869
O v e r l a p o f receptive fields a n d correlated noise in neural responses . . . . . . . . . . . . . . . .
869
5.1. O p t i m a l receptive fields: b r o a d or n a r r o w ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. The effect o f correlated noise on the i n f o r m a t i o n c o n t e n t o f n e u r o n a l activity 6.
T r a n s f o r m a t i o n o f neural activity by the brain
7.
N e u r o b i o l o g i c a l d a t a on n e u r o n a l p o p u l a t i o n coding
............................ ........................
873 876 878
7.1. N e u r o n a l p o p u l a t i o n coding in the a u d i t o r y nerve . . . . . . . . . . . . . . . . . . . . . . .
878
7.2. N e u r o n a l p o p u l a t i o n coding o f m o v e m e n t direction . . . . . . . . . . . . . . . . . . . . . .
880
7.3. A n e u r o n a l p o p u l a t i o n code for s o u n d localization
8.
869 ......
......................
7.4.
E n s e m b l e c o d i n g o f saccadic eye m o v e m e n t s
7.5.
I n h o m o g e n e o u s r e p r e s e n t a t i o n o f receptive fields . . . . . . . . . . . . . . . . . . . . . . . .
Discussion
..................................................
Abbreviations References
..........................
................................................
.....................................................
854
881 881 883 884 885 885
1. Introduction
A fundamental question in neuroscience concerns the understanding of the neural code. The standard doctrine is, that information is transmitted by action potentials. Since the shape and size of action potentials is almost uniform, the common belief is that the information is stored in the timing of the sequence of action potentials. In order to determine how information is represented by the nervous system, we need to understand two steps of neuronal information processing. First, we have to understand the transformation of a sensory signal into the sequence of action potentials of a single neuron. This aspect of neuronal coding has been an object of study for several decades and has resulted in a good insight into response properties of neurons in sensory pathways (e.g. the visual pathways from retina to cortex and from retina to superior colliculus, the auditory, and the vestibular, and somatosensory pathways). With regard to motor control, we have to understand how neuronal activity in the final motor pathways is related to movement-related parameters, such as movement direction, movement velocity and force. Second, there is the neurophysiological finding, that a single stimulus or movement is encoded in the activity of a large number of neurons. This has raised the question on how neural activity in a population of cells can be interpreted in terms of external stimuli and actions, i.e. in terms of sensory input and motor output. This problem has been recognized since many years, but it is only in the last decade that considerable theoretical progress has been made to deal with this problem. The two problems mentioned above are crucial first steps before more complex issues like information processing and information storage in the brain can be addressed satisfactorily. Neural encoding of information can be studied by measuring neural responses to external stimuli and during movements. Our understanding of the neural code can be tested by solving the inverse problem, inferring sensory input or motor output from a given set of neuronal activity. Solving this problem involves many other problems. For example, the evaluation of firing rate of a single cell brings several problems. In order to extract the continuous probability density of neuronal activity and its variance from the discrete spike data of a single cell, we must count the number of spikes that occur within some fixed time interval. Since a rapidly and regularly firing cell might fire some 100 spikes per second, we would need to count over at least 1 s in order to have an estimated error less than 1% if we had to deal with a single cell only. If the cell's firing is described by a Poisson process with an average rate of 100 spikes per second, we will need to count over a significantly longer interval to make an accurate estimate. This situation becomes even worse for chaotic firing at low firing rates. Since the generation of action potentials is a stochastic process (see the chapters by Meunier and Segev and by Gerstner in this book), the same sensory stimulus will 855
856
C.C.A.M. Gielen
never generate precisely the same neuronal activity pattern. This raises the question how the activity in a population of neurons should be interpreted. For a long time, the traditional view held that information is coded in recruitment and firing rate of neurons. However, firing rate is a continuous signal which can be obtained only by averaging over time. Obviously, averaging over time reduces the temporal resolution, which would be detremental for time-critical processes as required for sound localization. Another solution might be averaging over many neurons. Instead of averaging over time, averaging of the activity of an ensemble of responding neurons has been proposed both to obtain accurate estimates by averaging noise in the population activity, and to combine information from many cells with different receptive fields and response properties. Obviously, this approach allows to use the precise timing of each individual action potential, which has advantages not only from the point of view of temporal resolution as mentioned above, but also from a theoretical point of view since it avoids the problem of selecting the appropriate time window to determine firing rate accurately. Theoretical considerations and an increasing body of experimental findings suggest that information is encoded not only in the recruitment and firing rate of activated neurons, but also in the temporal relations between their discharges. It has been suggested that synchronization of neuronal responses on a time scale of milliseconds could serve to bind spatially distributed cells into coherently active neural assemblies representing particular components of the environment [1,2]. Therefore, synchrony of firing is thought to provide a major contribution in addition to recruitment and firing rate to code sensory and motor events (see also chapter by Golomb et al. in this book). Population codes, where information is represented in the activities of whole populations of neurons, are ubiquitous in the brain. There have been a number of theoretical analyses of population decoding in a variety of contexts. The theoretical studies often used methods that are optimal in some statistical sense usually based on probability distributions of the neuronal firing rates. However, these methods are sometimes highly implausible from a neurobiological point of view. On the other hand, experimental studies typically employed simple methods that may not be optimal from a statistical point of view, but were intuitively more clear in providing insight into the underlying mechanisms of neuronal coding and information processing. In this chapter we will focus on the efficiency of coding sensory and motor events in the activity of a large number of neurons, both from a theoretical point of view, as well as from a neurobiological point of view. In Section 2 we present some general theoretical concepts which deal with the variance of some statistical estimators. We will provide a general probabilistic framework that can be used to understand how the activity of a population of neurons can be considered as encoding information about the world and, concomitantly, the way that this information can be decoded into a theoretical interpretation under rather simplifying assumptions. In Section 3, we will explore how various neuronal properties, like receptive field size or preferred movement direction, can be inferred from correlation studies between neuronal activity and perceptual or motor responses. In
Population coding: efficiency and interpretation of neuronal activity
857
Section 4, we will discuss various models and algorithms, which have been proposed for the interpretation of neuronal population activity. Most of the studies in the literature so far have made simplifying assumptions, which certainly violate biological complexity and function. In Section 5 we will discuss what happens if these simplifying assumptions have to be relaxed. In particular we will deal with the role of correlated activity between neurons and overlap of receptive fields on the interpretation of population activity. In Section 6 we will discuss how algorithms to interpret neuronal activity can be implemented into neuronal connectivity. Finally, in Chapter 7 we will review some of the results of theoretical approaches applied to experimental data.
2. Mathematics of (un)biased estimators and their variance Before dealing with the interpretation of neuronal activity, we will first discuss various mathematical techniques to "measure" the information content of a neuronal signal. The problem, that we have to face is, how to interpret the action potentials of a large number of cells in a neuronal population as a function of time. Since the generation of an action potential is a stochastic process, we will have to rely on statistical techniques and we will have to develop probabilistic estimators. Estimators can be distinguished in biased and unbiased estimators. An unbiased estimator has the property that the output of the estimator approximates the true value of the parameter for a large number of data. One could then wonder, why people sometimes rely on unbiased estimators? The answer is, that it may take quite some effort to obtain an unbiased estimate of some quantity and that a biased estimator may be easier to obtain. 2.1. M a x i m u m likelihood and maximum a posteriori estimators
A prototypical statistical problem is to estimate the value of some parameter 0 from a finite set AF --- {Xi} of data. In the context of sensory coding, 0 is a stimulus in the stimulus domain O, and the information about this stimulus is contained in the activities {ri, i - - 1, ...,N} of a population of a large number of N neurons. Since 0 is described as a parameter, this implies the existence of a family of probability densities p(r; 0) for 0 E O. When we make the assumption that the observations r / a r e independent samples from an unknown density, then the likelihood is a product of a set of conditional probability density functions of 0 defined by - p(rl0)
-
I-[p(nl~ i
where p(ril 0) represents the probability to measure neuronal activity r/ given the stimulus 0. The maximum likelihood estimator (MLE) associates to each set of data a value of 0, which maximizes 50(0; r) ~)(r)
-
arg max 0
r).
C.C.A.M. Gielen
858
Instead of maximizing the likelihood, it is easier to find the maximum of the logarithm of the likelihood since the logarithm maps the product of conditional probabilities into a sum of logarithms of conditional probabilities. The maximum likelihood (ML) is then found by solving the equation N
270 log 5~
r) - Vo Z
log p(r~lO) - O.
i=l
Therefore, the M L E is the value of 0 that maximizes the likelihood p(r[0). According to Bayes' relation the posterior density of 0 can be found by
p(Olr) ~ p(rlO)p(O)= ~ ( 0 ; r)p(O). The maximum a posteriori (MAP) estimator of 0 maximizes p(0[r), or equivalently, ~ ( 0 ; r)p(0). Thus M L E is a M A P estimator for the "flat" prior over p(O).
2.2. Fisher information A natural framework to study how neurons communicate, or transmit information, in the nervous system is information theory. Suppose we have a set of N neurons, whose activity is represented by the vector r. This vector r then codes a specific signal 0. The Fisher information is a functional of p(rl0 ) and can be interpreted as the amount of information in r about the stimulus 0. The Fisher information is defined by J[r](0) - E - ~
log p(r[0) .
(1)
It is important to stress, that the Fisher information itself is not an information quantity. Rather, the Fisher information gives a measure for the accuracy to discriminate between different values of the stimulus near 0 given the signals r (see [3]). The terminology comes from an intuitive interpretation of the bound: our knowledge ("information") about a stimulus 0 is limited according to this bound. Since the generation of action potentials by each neuron is an independent process, the responses r; can be assumed to be independent. As a result J[r](0) = ~N=I J[ri](O) so, that J is of the order of N, implying that the typical fluctuations of the M L estimate scale as N -1/2. This is in contrast to the bias of the ML-estimate, which is of the order of N -1. Hence, the variance is the dominant contribution to the error in the estimate in the limit for large N. One of the reasons of the importance of the Fisher information in neuronal information processing is found in the C r a m e r - R a o inequality, which states that the Fisher information J(0) provides a lower bound for the mean squared error of any unbiased estimator:
((o-
J[r](O) "
Population coding: efficiency and interpretation of neuronal activity
859
This means that the Fisher information is a measure of how well one can estimate a parameter from an observation with a given probability distribution. Since the variance of the MLE approaches the inverse of the Fisher information, the MLE is asymptotically optimal.
2.3. Mutual information Consider an observable r and some stimulus 0. The information about the stimulus 0 in the observable (response) r is given by
dNr p(rl0 ) log P(rl0)
p(r) "
A frequently used measure to express the information contained in two parameters, is the mutual information, which is the only quantity (up to a multiplicative constant) satisfying a set of fundamental requirements [4]. For an observable r and a stimulus 0, the mutual information is defined by I(0, r) = /
dO dNrp(O)p(rlO) log p(r
0)
(2)
and can also be defined as the average information in r over all stimuli 0. The mutual information is closely related to the concept of entropy. Entropy is a measure for the information required to code a variable with a certain probability distribution by characterizing how many states it can assume and the probability of each. The entropy H(0) = - f d0p(0)log p(0) corresponds to the number of bits required to specify all stimuli. Similarly, the entropy H(r) - - f dNr p(r)log p(r) corresponds to the number of bits required to specify all possible neuronal responses. The entropy in the neural response r given the stimulus 0 is defined by H(r]0) = f d0dNr p(0)p(rl0 ) log p(rl0 ). The mutual information, which is the information about the stimulus preserved in the neural response, is given by H(0) - H(0lr ) - H(r) - H(rl0), which is equivalent to Eq. (2). Suppose there exists an unbiased efficient estimator 0 with mean 0 and minimal variance (according to Cramer-Rao !) equal to the inverse of the Fisher matrix J(0). With the definitions given above, the mutual information (i.e. the amount of information gained about 0 in the computation of the estimate {)) is
I(0, O) - - f dOp(t~)log p(O) + / dOp(O)f dOp(OlO)logp(OtO). This is truly an information metric, since the first term represents the entropy of the estimator 0 and I(0, 13) represents the gain o f information about 0 in the^computation of that estimator. The term - f dOp(O) logp(O) is the entropy given O, which for each 0 is smaller than the entropy of a Gaussian distribution with the same variance J--l(O). Since processing cannot increase information, the information I(O, r) conveyed by r about 0 is at least equal or greater than that conveyed by the estimator. This gives
C.C.A.M. Gielen
860
I(0, r)>~I(0, 0)/> - S dOp(O) log p(O) - S dO9(O)
1 (2xe'~
0-{-67)'
(3),
where the last term at the right-hand side follows straightforward for a Gaussian distribution with variance J(0). When the distribution of the estimator is sharply peaked around its mean value (which implies J(0) >> 1) the entropy of the estimator becomes identical to the entropy of the stimulus. When the estimator has a non-Gaussian distribution, the inequality will be strict.
3. Probabilistic interpretation of population codes The starting point for almost all work on neural population codes is the neurophysiological finding that many neurons respond to a particular variable underlying a stimulus (such as the sensitivity of neurons in visual cortex to the orientation of a luminous line) according to a unimodal tuning function. For neurons involved in sensory perception, the set of variables, which affect the response of a neuron, is usually referred to as the receptive field. However, for neurons involved in movements a better terminology would be "movement field". In order to summarize both types of neurons, and especially neurons in the sensory-motor pathway where neural responses have both sensory and motor components, we will use the term "response field". The value or set of values of the variables underlying the response field, which produce a peak in the tuning function, will be called the "preferred value". The response field plays an important role in interpreting neuronal population codes. For many brain structures, the response fields of neurons are not known. Only for neurons in rather peripheral sensory pathways (such as retina, lateral geniculate nucleus, area V 1 in visual cortex) of motor pathways (for example motor cortex [5]), it is possible to determine the response field. However, for neurons in more central brain structures, the relevant sensory and motor features, which underly the response field, may be very hard to discover. Several authors have used Gaussian white noise as stimulus [6]. The reason for using white noise is that the characteristics of a dynamical system are hard to determine, because what happens now depends on what happened before. Thus all possible stimuli and neural responses have to be considered for a full characterization of the system. The use of Gaussian white noise (GWN) stimuli is attractive, since a GWN-signal has the largest entropy given a particular variance and as such contains all possible combinations of stimulus values in space and time. As a first-order (linear) approximation, the response field Ri(t) of neuron i can be defined by the cross-correlation of the Gaussian white noise stimulus x(t) and the neuronal response ri(t) [6]. This cross-correlation can be shown to be equal to the averaged stimulus preceding an action potential [7] or to the averaged response following a spike. We will refer to this as the averaged peri-spike-event (PSE):
Population coding." efficiency and interpretation of neuronal activity
RPSE /
tri/'ld' =
27
= Z
r
x(t-
"c)~ ~(t- t~)dt
2-@ x(t/7 -1:),
861
/4t (5)
(6)
/7
where the response ri(t) of neuron i is represented by a sequence of ~5-pulses and t, is the time of occurrence of action potential n. As we will see later, this cross-correlation technique can provide a first step to characterize the conditional probability p(rl0 ). However, for neurons with complex properties, the complexity of the G W N stimulus increases exponentially with the number of dimensions of the stimulus. Therefore, this approach to characterize the response field is only useful for neurons with simple, low-dimensional response fields. The characteristic properties of the response field can provide information to answer the question "How is an external event x(t) in the world encoded in the neuronal activity r(t) of the cells". A full characterization of the response field of a neuron (both spatial and temporal properties!) implies that the density function p(rl0 ) is known. The response fields are also indispensible for answering the question about the sensory or motor interpretation of neural activity. The response fields allow the mapping from the set of activities in a neural population r(t), with ri(t) representing the activity of neuron i at time t, to the events in the external world by Bayes' relation: p(Olri) = p(O)p(rilO)/p(ri). Since the generation of action potentials is a stochastic process, the problems described above have to be addressed in a probabilistic way. We will define p(r[x) as the probability for the neuronal activity r given the stimulus x. The simplest models assume that neuronal responses are independent, which gives p(rlx ) = 1-[ip(ri[x). For the time being, we will assume independence of firing. The case of correlated firing between neurons will be discussed later. A Bayesian decoding model specifies the information in r about x by p(xlr) e(p(rlx)p(x),
(7)
where p(x) gives the prior distribution about x. Note that starting with a specific stimulus x, encoding it in the neural activity r, and decoding it results in a probability distribution over x. This uncertainty arises from the stochasticity of the spike generating mechanism of neurons and from the probability distribution p(x). As explained in Section 2.1, the most likely or most plausible stimulus x given a response r is given by the MAP-estimator. However, it can be shown that under some (rather restrictive!) assumptions, a more simple and intuitive interpretation of neuronal activity can be obtained. Suppose that the response of neuron i depends on the projection of the stimulus (or response) to a preferred stimulus (or response) Xi of that neuron (like in visual cortex; see [8] for an overview) or in motor cortex [5], such that
C.C.A.M. Gielen
862
Xi)}, where G(x. Xi) is a continuous, symmetric bell-shaped function and where Z is a normalization factor. If Xi is known (for example, it may be the averaged PSE in Eq. (6)) and if neurons fire independently, then
p(rilx)- 89
p(xlr) - p(rlx)
(8)
1 p(x) = ~ H exp{G(x. Xi)} p(r)" i
(9)
The most plausible stimulus is then found by setting the gradient of p(xlr) with respect to x to zero
1 Vxp(X Ir) - Z II[exp{ " i G(x Xi)}VxG(x" X/)-X;I p(x)p-~+ p(rlx)~ Vxp(X)
= II[p(rilx)VxG(x. Xi)-Xi] p(X) , =0.
p(rlx)Vxp(X)
+ p(,)
(10)
The first term in Eq. (10) gives the most plausible stimulus based on the response fields of the responding neurons. The second term gives a correction for the probability density of stimuli. For a flat distribution, the second term equals zero and the MAP-estimator becomes equal to the ML-estimator. When all Xi are distributed homogeneously in a stimulus subspace, such that correlations between neighboring response fields are the same for all neighboring neurons, then the most plausible stimulus is proportional to y~p(rilx)Xi, which is the well-known population vector (see Section 4). i
4. Models for population codes
4.1. Simple version of population coding The most simple and straightforward interpretation of neuronal population activity is obtained by simple summation of the response fields Ri of all neurons i, weighted by the firing rate ri of each neuron
~-~Nl ri(x)Ri
(11)
This choice corresponds to the so-called center-of-gravity (CG) estimate [9]. CG coding can be statistically optimal. This is the case for perfectly regular arrays of sensors with Gaussian tuning profiles that have an output described by independent Poisson statistics, and for arrays of sensors with a sinusoidal tuning profile for the parameter estimated. However, there are many cases in which CG decoding is highly
Population coding: efficiency and interpretation of neuronal activity
863
inefficient. This includes the important case (which is observed at nearly all parts of the brain), where sensor positions or response fields are not regularly spaced. We will come back on this topic later. Moreover, the CG approach assumes a homogeneous distribution of response fields in the event space and a homogeneous distribution of stimuli x for sensory neurons. Given these assumptions, any deviations between the CG result and the true parameter value are small provided that the noise is small and that the neurons sample the parameter space sufficiently dense. Moreover, the question arises, whether the estimate of this population coding scheme is optimal in the sense that it is unbiased and that the variance in its estimate is small. A good estimator should be unbiased, which is the case when the estimator gives the (expectation value of the) true stimulus x. Baldi and Heiligenberg [10] demonstrated that the CG method is virtually bias free. However, this simplistic version of the population vector is inefficient in the sense that variance of the estimate is much larger than the smallest possible variance. One of the first experimental data demonstrating the importance of the concept of population coding were obtained from motor cortex [5,11]. Neurons in the arm area of primate motor cortex in monkey are broadly tuned in the sense that they increase firing rate for a broad range of arm movement directions. Each neuron appears to have a preferred movement direction (i.e. the movement direction, which corresponds to the largest response modulation of the neuron) and preferred movement directions are approximately homogeneously distributed in 3D space. Georgopoulos et al. [5,11] interpreted the population activity as N
M(r) = ~
rini,
i=1
where Mi is the preferred movement direction of neuron i and M represents the estimated movement direction of the arm. Quite remarkably, the estimated movement direction by the population vector was very close to the actual measured movement direction of the monkey's arm. For a simple array of N independent sensors with unit spacing between consecutive sensors and with a gaussian tuning function f~(0) - e x p
-~
cy
and with Gaussian noise W~ superimposed on the response of neuron n (Rn = fn(0) +/4z~), the Fisher information is given by (see e.g. [9]):
According
to the Cramer-Rao bound, the minimal variance is given by the summation is replaced by integration, which is a good approximation for large N and sufficiently large cy, the minimal variance reduces to
N2/~-.,,,( f ,n' 0( ) ) 2 9When
c.C.A.M. Gielen
864
2crN2/v~.
N o t e , that the m i n i m u m attainable variance increases with the sensor tuning width or, a result which is similar to the M L result (see Section 4.2 and Fig. 1). The results a b o v e show, that the m i n i m a l variance o f the center-of-mass m o d e l is p r o p o r t i o n a l to or, i.e. the m i n i m a l variance increases as a function o f the tuning a
3
~1--
--
i
'
9
'----
-
[
'
'
'
\
e~ 0
i m
_
m
~
E 1,._
m
0 C e,,
..-
i m
1l_ .._1
0o
30 ~
-
"-60
.
.
.
.
~
90 ~
120 ~
Tuning width 3--
b
~
2.5
~'',
,
I
2
~ /
, /, 1.5
\\
", .
,/
/
/
4
f
J
J
J
N
1
0.5
t
I
.
1
1
....
Tuning width (radians) Fig. 1. (A) Fisher information (J/Nfma,,) for the population of neurons with tuning functions according to Eq. (15) for the ML-estimate (solid line and broken line) and for the population vector (dotted line and dashed-dotted line) for ratio's of 0.1 (dashed line and dashed-dotted line) and 0.01 (solid line and dotted line) Offmin and fmin + fmax as a function of tuning width a in degrees (modified after [14]). (B) (modified after Fig. 1 in [12]) shows the variance for the population vector estimate for number of neurons N - 103 (dashed line) and for N = 104 (solid line) as a function of the tuning width of the same set of neurons as in Fig. 1A. Note that variance is related to the inverse of the Fisher information.
Population coding: efficiency and interpretation of neuronal activity
865
width or. Hence, it is advantageous to use narrowly tuned sensors. If we compare the variance of the CG model with that of the Cramer-Rao lower bound, Snippe [9] obtained the result Var(0cR) j- V a r r~~
6v/~Cy 3
~<
N-I N+I N+3" 2 2 2
This illustrates that the efficiency of the CG coding is low when the number of neurons is large. This is easily explained. When the number of neurons is large relative to the tuning width, many neurons do not respond to a stimulus, but do contribute to the population average by their noise, since sensor noise is independent of the response. Therefore, neurons, which do not respond to the stimulus, do contribute to the noise in the population average. The analysis so far was for regular arrays of neurons. It can be shown [9] that when the receptive fields of neurons are highly irregularly distributed, the largest contribution to errors in the CG method originates from these irregularities, rather than from neuronal noise. As we will show below, the ML-estimate does not suffer from irregularities. Some linear estimators have been proposed which do not suffer from irregularities in the distribution of receptive fields either. However, these models do come at a price. The regular CG estimator only needs to know the optimal stimulus parameter, whereas the models, that have been proposed to compensate for irregularities in distribution, also require knowledge of the distribution of neuronal tuning [12] or overlap of tuning functions to invert a covariance matrix of neuronal activities (see e.g. [13]). 4.2. Poisson model
Under the Poisson encoding model, the neuronal activities ri(t) are assumed to be independent with p(ril x) - e-~(x)(fi(x)) r'
where f/-(x) is the tuning function for neuron i and ri(t) represents the firing rate or the number of action potentials in a particular time interval. With regard to decoding, several authors [14,15,9] have used ML for the Poisson encoding model. The ML estimate gives the stimulus x, which maximizes the likelihood p(rlx). It is defined as XML--
arg max \ X p(rlx ).
The ML estimate can be obtained by differentiating the logarithm of the response probability distribution ax
n
ax
n kfn(x)rn - - f ~ ( x )
.
(12)
C . C . A . M . Gielen
866
For neurons with a Gaussian tuning profile f,(0) - exp - ~
cy
and with a regular, homogeneous distribution, the ratio f'(O)/f,(O) equals ( n - O ) / c y 2. For sufficiently dense neuron distributions, Eq. (12) reduces to y~'~,,(n- O)r,,. The optimal estimate is obtained when the derivative in Eq. (12) is set to zero, which gives
~-~,~nrn t~M L - -
~-~,,yn
9
This result is identical to the CG estimate for a regular homogeneous array of neurons. It illustrates that for a regular, homogeneous distribution of neurons with Gaussian tuning functions and independent Poisson noise, the CG method is optimal from a statistical point of view. The full probability distribution over the quantity • from this Poisson model is p(xlr) o(p(x) U e-Z)("/(f (x))"; i
ri !
For independent noise between the neurons finding the ML estimate implies maximization of the likelihood p(r]• For a large number of neurons, the estimate is unbiased and the variance is given by E[(• • _ 1/J[r](x), where J[r](x) is the Fisher information as defined in Eq. (1). With the assumption of independent noise across units, the expression for the Fisher information becomes -~x21Og p(rilx)
J[r](x) - ~ E
.
When the stochastic behavior of neuronal firing is modeled by normally distributed noise on the response with variance cy2, then the Fisher information matrix is given by J[r](x) - E N l f'!(X)2 ,
(13)
where f . ' ( x ) = ~f-(x)/Ox. For Poisson-distributed noise the Fisher information matrix for the MLE is given by J[r] (•
N
_
t
(x):
.= J}(x) " /
(14)
l
The Cramer-Rao inequality [16] states that the average squared error for an unbiased estimator is greater than or equal to the inverse of the Fisher information. Hence, the ML estimator is asymptotically optimal for the Poisson model, since its variance approximates the lower bound for a large number of neurons.
867
Population coding: efficiency and interpretation of neuronal activity
These ideas are illustrated in Fig. l, which shows the Fisher information (the inverse of the variance in the ML estimate) for a hypothetical population of neurons in visual cortex. Each neuron is thought to have an optimal orientation sensitivity 0i and the mean response of neuron i to a stimulus 0 is given by f ( 0 - 0i) -- {fmin + (fmax - fmin) cos2(~ ( 0 - 0i)) fnlin
if IO - Oil < a/2, otherwise,
(15)
where a is the width of the receptive field of the neuron. When the stimulus 0 is close to the preferred direction of the neuron, the probability of a large response is high. When the stimulus is outside the receptive field, the response is small with mean firing rate fmin. For the ML-estimator the Fisher information (Eq. (14)) is proportional to N fmaxa-1, which demonstrates that the Fisher information diverges when the width a approaches zero. The Fisher information for the ML-estimator decreases gradually for larger values of a, approaching the value zero (infinite variance!) for very large values of a. Seung and Sompolinsky [14] demonstrated that for the population vector model with the tuning function according to Eq. (15), the variance (i.e. the inverse of the Fisher information)is given by (f0-f2)/2Nf-~, where )~ is the nth Fourier component defined by j~ - ~ f2~ ein0f(0) dO. Obviously, the Fisher information for the population vector model mainly depends on the width a of the tuning function and on the background noise fmi~. For small values of a, the Fisher information is zero, increases with a reaching a maximal value for finite values of a (see Fig. 1) after which the Fisher information decreases for larger values of a. It can be shown [14] that the optimal width amax is proportional to the ratio of background activity fmin to peak activity (fmin + fma• to the power 1/3. For the simple population vector (CG vector), the Fisher information is zero for very small and very large values of a and therefore, the variance is infinity. This can be understood from the fact, that for small receptive fields a most neurons are below threshold and contribute noise with variance fmin without contributing to the signal. In contrast, the Fisher information increases for smaller values of a for ML, because ML is based on the gradient of the response (see Eq. (14)), which approaches infinity for small a. As the tuning curve becomes more narrow, the increase in signal ]f'] more than offsets the decrease in the number of neurons above threshold. In addition, the number of neurons below threshold is completely ignored by the ML estimator. Both for ML and the population vector, the information decreases (and the variance increases) for large receptive fields, since for large receptive fields, a single stimulus will excite many neurons by the same amount, such that an accurate discrimination between responses of different neurons becomes impossible. As pointed out by Zemel et al. [17] the ML model has several problems. First of all, the ML estimator assumes that there is one single stimulus x (for example one single visual bar at a given orientation for neurons in V1) which caused the neuronal activity. If multiple stimuli were present, the Poisson model will fail. Moreover, sometimes the estimation of the optimal decoding may require the whole probability distribution p(xlw ) over all values of the variable x, where w represents all available
C . C . A . M . Gielen
868
information. The Poisson model will not be able to provide such a distribution in many cases. For example, when the tuning function f/.(x) is Gaussian with an optimal stimulus x; for neuron i, then log p(xlr)e( l~ [ p(x) IIe-/;(•
(16)
(f"(x))~;]r,. 1
i
3
(17)
i
1
2
This distribution has a mean g = y'~;rixi/Y~ir; and a variance cy2/y'~iri. Taking the mean of the distribution would give a single value, which is the same as that of the CG estimate, even in the case when the neuronal response was elicited by multiple stimuli. Therefore, the distribution ofp(xIr) for the Poisson model for this Gaussian model is unimodal. In addition, the variance will always be smaller than the variance of the Gaussian tuning function, since Y~'~iri ~>1 for reasonably effective sets of stimuli. Thus the Poisson model is incapable of representing distributions that are broader than the tuning function, which points to a second problem for the Poisson model. Obviously, the proper way to find the true (set of) stimuli is to estimate the full conditional probability p(xlr).
4.3. Optimum linear estimator (OLE) The simplest possible estimator is an estimator that is linear in the activities r of the neurons, which suggests a solution xest = WTr, where the problem is to find the optimal matrix W, which minimizes the mean square distance between the estimate xest and the true stimulus x: arg min
w -
w
x)-'].
One can think of the linear estimator tron-like neural network with a set of w; to the input r and W is the matrix The OLE is known to be unbiased given x is given by
as being the response of a two-layer percepoutput units, where output unit i has weights with columns w;. for a large number of units [15]. Its variance
N
E[(XOLF -- E { x } ) 2 ] - Z
Wi2(Yi2
i=1
2 and c~] = f-(x) for where cy] - c~], for normally distributed noise with variance cyn, Poisson distributed noise. Note, that the OLE model suffers from the same problem as the CG estimate in the sense that many neurons contribute their noisy output to the population estimate, whereas only few neurons may respond to a stimulus. Therefore, a compro-
Population coding." efficiency and interpretation of neuronal activity
869
raise has to be made between small tuning widths for a high resolution versus broad tuning widths to eliminate noise by averaging responses, thereby increasing the signal-to-noise ratio of the estimate.
4.4. Summary In the theoretical approaches discussed so far, the result of the estimation procedure to interpret the neuronal activity, has been a single parameter or feature of a sensory stimulus or movement (such as for the M L estimator) or a distribution over all possible parameters and features, instead of a single parameter or feature. A distribution of probable features is compatible with experimental findings that the neuronal activity is not simply related to a single feature, but rather to a set of stimuli presented simultaneously (see e.g. [18]) or to the expectation about a particular stimulus or movement (see e.g. [19]). The available experimental evidence and the fact that usually several features are represented in the neuronal activity within a particular part of the brain, suggests that we should focus on probability densities, rather than on single features. Also, as long as we do not know precisely what role a particular group of neurons play in the complex sequence of information processing, it might be better to resort to probability densities, rather than to a single feature or parameter.
5. Overlap of receptive fields and correlated noise in neural responses
In the analysis so far, we have made the assumption of independent noise in neighboring neurons. Also, we have demonstrated that the optimal tuning of neurons depends on the type of noise in the neural responses. In this section we will explore this in more detail, in particular in relation to optimal tuning width of neurons and to optimal information content of neuronal activity for various types of (correlated) noise.
5.1. Optimal receptive fields." broad or narrow? One of the central problems with population coding is how the neuronal code can be made as efficient and as accurate as possible. It is a common belief that sharper tuning in sensory or motor pathways improves the quality of the code, although only to a certain point; sharpening beyond that point is believed to be harmful. This was illustrated already in Fig. 1, which shows the Fisher information as a function of the receptive field width of model neurons, which have an orientation specificity, similar to that of neurons in visual cortex. Fig. 1 shows that sharp tuning (small receptive fields) is not efficient for the population coding model, since for very small receptive fields, the number of neurons, that respond to a narrow bar of light, is too small to reduce the noise in the neuronal responses. For broader tuning, more neurons will respond to the narrow bar, which allows noise reduction and improvement of the signal-to-noise ratio. Obviously, the optimal receptive field size depends on several parameters, such as the noise in the neuronal responses, the
C.C.A.M. Gielen
870
number of neurons, the distribution of receptive fields (homogeneous versus nonhomogeneous; see [12]). The best way to proceed is to start with the Fisher information ~2 J - E l - ~ log p(rl0)] , where p(rl0 ) is the distribution of the activity conditioned on the encoded variable 0 and EI.. ] is the expected value over the distribution p(rl0 ). Instead of the Fisher information, one could also have chosen the Shannon information, which is simply and monotonically related to the Fisher information in the case of population coding with a large number of units. Let us consider first the case, in which the noise distribution is fixed. For instance, for the population of neurons from the example in Section 4.2, where we had a population with N neurons with bell-shaped tuning curves and independent G W N with variance cy2, the Fisher information reduces to N t (o) 2 J - --)if. ~2 ,
(19)
i=l
where f ( 0 ) is the mean activity of unit i in response to the stimulus with orientation 0, and f,:(0) is the derivative with respect to 0. Eq. (19) illustrates, that as the width of the tuning curve decreases, the derivative f;'(0) will become steeper and thus the information increases up to infinity for infinitely small receptive fields. Clearly, this corresponds to the ML estimate, discussed in Section 4.2, where narrow tuning is better than broad tuning. Note, that for the same noise the minimal detectable change, which is inversely proportional to the square root of the Fisher information, reveals that narrow tuning may not be optimal for the population coding model (see Fig. 1). It should be noted, that the results on the optimal tuning of receptive fields has been a cause for much confusion, because the results critically depend on the type of noise and on the model used. For example, Fitzpatrick et al. [20] reported the opposite result (i.e. that sharp narrow curves allow better discrimination for the center-of-mass model than broad tuning); however, these authors had a different, biologically implausible noise model. Pouget et al. [21] demonstrated that when the noise distribution is not fixed, the results become different. They considered a two-layer network with an input layer and an output layer, with feedforward connections from input to output neurons and with lateral inhibitory connections in the output layer to sharpen the tuning curves (see Fig. 2). This case is particularly relevant for neurophysiologists. Since the output neurons can never contain more information than the input neurons, this model shows an example where broad tuning contains more information than narrow tuning. However, sharpening is done by lateral interactions, which induces correlated noise between neurons. The loss of information has to be attributed to this correlated noise. The results above demonstrate that the answer to the question whether broad or narrow tuning is best, depends on the noise. In most neurophysiological experiments
Population coding." efficiency and interpretation of neuronal activity
[
,
,
-.
.
.
.
.
.
orientation
871
I
output layer
input ~ ~ T layer ("~ ~ ~i~ (~ ~ c:&cj (..~ ~
orientation Fig. 2. Two-layer neural network with feedforward excitatory connections between input layer and output layer and with lateral connections in the output layer. For visibility, only one representative set of connections is shown in each layer. The tuning of the input units was chosen broad, whereas lateral "Mexican-hat"-like connections in the output layer create narrowly tuned neurons in the output layer. Since information cannot increase, this provides an example, where broad tuning in the input layer provides more information (or at least as much) as narrow tuning in the output layer (adapted with permission from [22]). measuring single-unit activity it is impossible to detect correlated noise and in most cases it is not even possible at all to make a good estimate of the type of noise in the neuronal response. Therefore, usually independent noise is assumed. In the example above, this would lead to the erroneous conclusion that the output layer contains more information than the input layer. This simple example demonstrates that a proper characterization of the noise distribution is essential for a proper estimation and interpretation of the neuronal activity in a population. Multi-unit recording techniques may be an excellent tool for this purpose. Many studies have convincingly demonstrated, that noise in a population of neurons is correlated. When the fluctuations of individual neurons about their mean firing rates would be uncorrelated, the variance of their average would decrease like 1/N for large N. In contrast, correlated fluctuations cause the variance of the average to approach a fixed limit as the number of neurons increases (see Section 5.2). The inverse of the Fisher information is the minimum averaged squared error for any unbiased estimator of an encoded variable. It thus sets a limit on the accuracy with which a population code can be read out by an unbiased decoding method. The analysis above has illustrated how the optimal coding (broad versus narrow tuning) depends on the noise in the neuronal responses. However, another parameter, which is relevant in this context is the dimension of the encoded variable. Suppose, that the encoded variable is a D-dimensional vector x. Under the as-
872
C . C . A . M . Gielen
sumption of independence of the components of the vector x, the Fisher information J(x) is defined by D
For a set of N neurons with radial symmetric response functions, which are assumed to have the same shape for all neurons and where neuronal activity is assumed to be independent, with tuning function
'x-~'22)cy
f-(x)-Fdp(
(20)
the Fisher information is given by J(x) --
rl(cya-2Koo(F,z,D)),
where 1"1is the number of neurons whose tuning centers fall into a unit volume in the D-dimensional space of the encoded variable and z is the time window for the neuronal information under consideration [22]. This illustrates that the Fisher information scales with the tuning width in arbitrary dimension D. Sharpening the tuning width helps only for D = 1, has no effect when D = 2, and reduces information encoded by a fixed set of neurons for D >/3. Although sharpening makes individual neurons appear more informative, it reduces the number of simultaneously active neurons, a factor that dominates in higher dimensions where broad tuning functions give rise to more substantial overlap between neighboring units. One could ask how the information content of an action potential is affected by tuning. This can be addressed using the Fisher information per spike. If all neurons have the same tuning parameters, the total number gspikes of spikes within a time window z is 3C t
~
Nspikes = r I /
z f ( x ) d x l .... dxD -- qcYD
FzQ4,(D),
--"2C
where f ( x ) is the mean firing rate in Eq. (20). When the neurons have Gaussian tuning functions, independent Poisson spike distributions, and independent distributions of peak firing rates and tuning widths, the Fisher information per spike is given by
This illustrates that sharpening the tuning curves (making r smaller) gives rise to a larger information per spike, irrespective of dimension D. However, we should keep in mind, that this result is only valid for uncorrelated noise. The results are therefore not applicable to the information content per spike in the output layer of the toy network in Fig. 2.
Population coding." efficiency and interpretation of neuronal activity
873
5.2. The effect of correlated noise on the information content of neuronal activity Let us consider a simple example of N neurons with firing rates f., identical variances t3.2 and correlated variabilities so that ( ( r i - - f . ) ( r j -- f j ) ) -- 0-2[~ij nt- c(1 - 8ij)]
ri
with mean values (21)
with the correlation coefficient c satisfying 0 _< c < 1. In this case, the variance of the average of the rates 1 U
-- - ~ Z
Fi
i=1
0-2
0-2 = -~-[1 + c ( N - 1)]. This illustrates that the variance increases as a function of the correlation c for fixed N, and that for large N the variance approaches a fixed limit c0-2. A typical correlation among activities of neurons in area MT, which is involved in the processing of moving visual scenes, has been estimated at about 0.1-0.2 [23]. This leads to the conclusion, that coding accuracy will not improve for populations of more than about 100 neurons. In order to obtain a more basic insight in the effect of correlated noise, let us assume a population of N neurons, which respond to a stimulus with firing rates that depend on a variable x that parameterizes some stimulus attribute [24]. When the average activity of neuron i to stimulus x is ~(x), its activity to a given trial is r; =
n;
with rli representing Gaussian noise with zero mean and covariance matrix Q(x). We will consider three different types of variability: additive noise, multiplicative noise and correlation of noise for neurons within a limited range of each other. For additive noise, the covariance matrix is given by Eq. (21). For the limited-correlation model [24] the correlation matrix is given by
Qi# =
0-2
pli-jl ,
where parameter 9 (0 < 9 < 1) determines the range of correlations between neurons in the population. The parameter p can be expressed in terms of a correlation length L by writing 9 - exp(-A/L), where A is the distance between peaks of adjacent tuning curves. For multiplicative noise, the covariance matrix is scaled by the average firing rates Qij -
,2[ ij +
c(1
-
See [25] for more detailed information.
874
C.C.A.M. Gielen
The Fisher information J(x) is the best measure to estimate the effect of correlated noise on the population coding of stimulus x, since the discriminability d, which quantifies how accurately discriminations can be made between two slightly different values x and x + Ax based on the response r, is related to the Fisher information by a =
The larger the Fisher information, the better the discriminability and the smaller the minimum unbiased decoding error. When the random noise rl is drawn from a Gaussian probability distribution, the probability distribution P[rlx], which determines the probability that a given response r is evoked by the stimulus x, is given by
(2 ~)Ndet Q(x)
exp - ~ l[r-f(x)JWQ-1 ( x ) [ r - f(x)]
which results in the Fisher information J ( x ) - f'(x)TQ-1 ( x ) f ' ( x ) + 8 9
(22)
where dQ(x) Q'(x) -
and
f'(x) - d r ( x )
dx
dx
When Q is independent of x, as it is for additive noise and limited range correlations, then only the first term in Eq. (22) survives. This equation also illustrates that when the covariance matrix Q is independent of the stimulus (which includes that neural noise is the same for all neurons), the second term in Eq. (22) vanishes and the remaining variance is identical to that for ML. 5.2.1. Additive noise For the additive noise case and for large N, the Fisher information reduces to [25]
J(x) =
N[F~ ( x ) - ~(x)] ~2(1 - - c )
where Fl(X) -- ~ E ( g ( x ) i=1
and
F2(x) -
g(x)
.
i=1
This explains that the variance of the estimate (i.e. the inverse of the Fisher information) decreases with 1IN for large N, and also decreases as a function of the
Population coding: efficiency and interpretation of neuronal activity
875
correlation c. The minimal error goes to zero as the correlation approaches one: any slight difference in the tuning curves can be exploited to calculate the noise exactly and to remove it. The Fisher information will grow to infinity for correlations approaching one, only as long as F1 ( x ) - Fz(x) is not zero or does not approach zero for large N. When F1 (x) differs from zero for any x, this implies that always a fraction of the neurons will respond to any x. This eliminates the case that F1 (x) - F 2 ( x ) goes to zero for large N. The other case that F1 (x) - F2 (x) -- 0 requires that f/(x) is independent of i. f: (x) independent on i implies that all cells have the same tuning apart from a constant bias on the firing rate, which would be a pathological situation.
5.2.2. Multiplicative noise For the multiplicative noise model, the Fisher information for large N is given by [25] J(x) - N[G1 (x) - Gz(x)] + N[(2 - c)G1 (x) - cG2(x)] cy2(1 - c) (1 - c) '
(23)
where 1 ~-~, ( d log f. (x)) 2 Gl (x) = ~ i=J
dx
and G2(x) i=1
d log j~(x) dx
The second term in the Fisher information, which does not depend on the noise variance cy2, arises because with multiplicative noise the encoded variable can be estimated from second-order quantities, not merely from measurements of the firing rates themselves. The Fisher information in Eq. (23) is proportional to N (just as with the additive noise model) and is an increasing function of the correlation c, provided that G1 (x) > G2(x). Since G1 (x)I> G2(x) by the Cauchy-Schwartz inequality, the only way that the Fisher information can become zero, is when G1 (x) = G2 (x), i.e. when d log f-(x)/dx is independent of i. In other words, except for contrived artificial neuronal networks the Fisher information increases with correlation c and with the number of neurons N.
5.2.3. Limited-range correlations For limited-range correlations, the Fisher information is given by J(x) -
N(1 - p)F1 (x) N'-2/DgF3(x ) cY2(1 + 9) + cy2(1 _ 92 ) ,
(24)
C.C.A.M. Gielen
876
where 1
N
Fl(x) -- ~ ~-~(f.'(x))2, D i=l
the number of encoded variables and where N
F3(x) -
N 2/D-' Z(fl.+, (x) - f.'(x)) 2 i:l
(provided that the stimulus x is sufficiently far away from the boundaries of the stimulus domain [25]). For fixed N the Fisher information is a nonmonotonic function of the parameter p that determines the range and degree of the correlations. The first term in Eq. (24) is a decreasing function of p and hence of L, the correlation length. The second term has the opposite dependence. For fixed N, the first term dominates for small L, and the second term dominates for large L. In the limit for large N, Eq. (24) approaches J(x) N ( 1 9)F1 (x) ~2(1 + 0 ) ' which illustrates that, unlike the additive and multiplicative cases, increasing correlation decreases the Fisher information. However, the Fisher information still increases linearly with N for any p < 1. This clearly illustrates that correlated noise can either lead to a decrease or an increase of the Fisher information, depending on the underlying model. The reader should be aware, that the models, discussed in this chapter are rather simple and all assume Gaussian noise. For more complex models (as will be the case in biology) and for other types of noise, the results may not be valid.
6. Transformation of neural activity by the brain Most procedures discussed so far to estimate the neuronal activity are algorithmic approaches, suitable for off-line analysis by theoreticians. Some of the methods discussed require the complete sequence of stimuli and neural responses; others require complex analyses which do not seem to be biologically plausible. One might also wonder, how the brain is able to transform neuronal activity so as to make it easier to interpret. It might do so by mapping neuronal information in another format or to another frame of reference. Therefore, we will discuss possible neural architectures for this purpose. Consider a two-layered feedforward network, which is fully connected from the input to the output layer and with lateral connections in the output layer (Fig. 3). First we will assume a linear activation function in the output layer with dynamics governed by the following difference equation: Z t - - ( ( 1 - - ~L)I -{- ~ L ~ ' ~ a t ) Z t _ l ,
Population coding: efficiency and interpretation of neuronal activity
877
Zlat
output
layer
9~
qP
9P
input
layer
Fig. 3. Example of a two-layer feedforward neural network with feedforward connections ~'for from input layer to output layer and with lateral interactions ~'~at in the output layer. For visibility, only one representative set of connections is shown in each layer.
where zt represents the output of the network at time t, with )~ a real-valued positive number in the interval [0,1], I the identity matrix, and Nat is a matrix for the lateral connections between the output units. At t = 0 the output z0 is initialized to g~orr, where r is an input pattern and Nor is the feedforward matrix. The dynamics of this network can be solved analytically. When the feedforward connections are equal to the lateral connections (i.e. Nor = g{at), it converges to a state, corresponding to the eigenvector of the matrix (1 - )~)! + )~g~at with the largest eigenvalue. A similar neural network was proposed by BenYishai et al. [26] to explain the orientation selectivity in visual cortex. In their model they incorporated a nonlinearity, which serves to act as a gain control preventing activity from growing to infinity. This would occur for eigenvalues larger than 1. A slightly more complicated situation arises, when the output neurons have nonlinear activation functions. As shown by Zhang [27] the weights l'~at c a n be set in such a way that a hill of activity arises centered around the optimal state xo~t. Sufficient conditions for this to occur are excitatory connections to neighboring neurons and inhibitory connections to more distant units, such as in the well-known "Mexican-hat" profile of lateral connections (see the "winner-take-all"mechanism in the chapter by Flanagan in this book). It can be shown [20] that this recurrent network is able to provide a coarse code estimate of a stimulus x, which is almost as efficient as the M L estimate for a large number of neurons. However, the method is in general suboptimal when the activity of input neurons is correlated. These results show that it is possible to perform an efficient, unbiased estimation with coarse coding using a biologically plausible neural architecture like the twolayered recurrent neural network. The coarse coding and the lateral interactions serve to eliminate uncorrelated noise within a neural population and to obtain a more accurate estimate. In general, this recurrent network does not only preserve Fisher information. It can also change the format of information to make it more easily decodable. Whereas ML is a way to decode the input pattern efficiently, a complex estimator, or even a linear estimator, is sufficient to decode the stable hill while reaching the Cramer-Rao lower bound for the variance. One can therefore think of the relaxation of activity in the nonlinear recurrent network in two ways: as
878
c.C.A.M. Gielen
a clean-up mechanism of uncorrelated noise, or as a processing mechanism that makes information easier to decode.
7. Neurobiological data on neuronal population coding Considering the many theoretical papers on efficiency of neuronal coding and about the interpretation of neuronal activity, the number of studies dealing with real experimental data is rather limited. This is certainly related to the fact, that application of the theoretical ideas on experimental data requires the simultaneous recording of many neurons in the same experimental conditions (i.e. to the same stimulus or during the same behavioral response). Simultaneous recording of action potentials from more than 10 neurons seems prohibitively difficult. An approximative solution to this problem is to make the best possible choice from various bad solutions: recording sequentially from single neurons in (as much as possible) the same experimental conditions. The assumption then is, that the neuronal system is time invariant and that the response of a population of neurons can be obtained by substituting the responses of all individual neurons in the individual recording conditions. Evidently, correlations in activity due to direct neuronal interactions in the population are lost. 7.1. Neuronal population coding & the auditory nerve
One of the very first studies on the interpretation of neuronal activity was by Johannesma and his group (see e.g. [28]). These authors presented a theoretical framework for a probabilistic interpretation of neuronal activity, which provided the basis for most subsequent studies. These authors made a distinction between the most probable response given a stimulus (which is simply related to the mean response f ( x ) to stimulus x), and the most plausible stimulus x given the neuronal response r, which follows from Bayes relation p(x]r) - p(x)__ , . p(r) ptrlx)
p(x) it(x). p(r)
As explained in Sections 2.1 and 3, the simple population vector (Eq. (11)) is the maximum a posteriori estimator, given the measured neuronal activity r under the assumptions of a homogeneous distribution of independently firing neurons with independent noise and for a flat prior on the stimulus density space. Based on this result, these authors proposed that each action potential might be "substituted" by the most probable stimulus, which generated this action potential, and that the complete stimulus could be approximated by the summation of all most probable stimuli at the time of the action potentials of the individual neurons. Note that this substitution is fully equivalent to the construction of the population vector. This theoretical framework was applied to provide a sensory interpretation of the activity in the auditory nerve [29]. Neuronal activity in this study was obtained from a simulation of a set of 64 neurons with stochastic firing in the auditory nerve
Population coding. efficiency and interpretation of neuronal activity
879
with the frequency selectivity equidistantly distributed on a logarithmic frequency scale in the range between 200 and 2000 Hz. For more details about the model neurons, (see legend of Fig. 4) Fig. 4 shows the stimulus in the upper panel (a frequency sweep from 100 to 1500 Hz within 30 ms). The lower panel shows the reconstructed stimulus based on the neuronal population activity recorded in the 64 model-neurons. The reconstructed activity is noisy and small at the beginning of the sweep because of the low density of neurons with a characteristic frequency at low frequencies. Moreover, the sweep starts at 100 Hz, whereas the lower characteristic frequency of the neurons is at 200 Hz. For more details, see [29].
jtjjtri
I
/
.,I,,,o
r~
r~
o
0
1
time
0
2
0
3
0
(ms)
Fig. 4. A frequency sweep starting at 100 Hz and rising to 1500 Hz within a time interval of 30 ms (upper panel) was presented to 64 model neurons. The tuning curve of each model neuron had a band-pass characteristic with slopes of 48 dB/octave. The central frequency ("characteristic frequency") of the neurons was distributed equidistantly on a logarithmic scale in the range between 200 and 2000 Hz. The output of the band-pass filter was supplemented by random GWN, one-sided rectified, and subsequently low-pass filtered by a lowpass filter with a 3-dB cut-off frequency at 1000 Hz and with a slope of 6 dB/octave. This signal was fed into a leaky-neural integrator (time-constant 10 ms) and a spike-generating mechanism, which generated an impulse whenever a threshold was exceeded in positive direction. The spike generating mechanism had an absolute refractory period of 1 ms. The reconstruction (lower panel) of the neuronal activity was obtained by substituting for each action potential the first order cross-correlation between a GWN auditory signal and the neuronal response of each model neuron to this stimulus.
880
C.C.A.M. Gielen
7.2. Neuronal population coding of movement direction The most influential study, which triggered experimental and theoretical research on population coding, was the study by Georgopoulos et al. [5]. These authors recorded from neurons in primary motor cortex in monkey (and later also in parietal cortex and premotor cortex) for arm movements in various directions in 3D space. Each neuron appeared to reveal the largest activity for movements in a neuronspecific particular direction, called the "preferred movement direction". The preferred movement directions of all neurons appeared to be uniformly distributed in 3D space. The directional tuning of the neurons was broad and bell-shaped. The kindness of nature, which led to uniformly distributed, unimodal, bell-shaped tuning curves, and the assumption of independent firing led to the use of the population vector, defined by Eq. (11): the summation of preferred direction vectors of cortical neurons, each weighted by the firing rate of that particular neuron. The estimated movement direction predicted by the population vector appeared to be similar to the actual movement direction within the confidence intervals. The use of the population vector to estimate movement direction based on the responses of a population of cells for movement in a particular direction was extended by Schwartz [30] to conditions where speed and movement direction changed as a function of time. Schwartz recorded from cells in motor cortex in a monkey who was instructed to trace sinusoids with the index finger on a planar surface. The estimated movement velocity based on the population vector varied in a consistent relation to the tangential velocity of the drawing movement. Each increment of the trajectory was predicted by the population vector that preceded it by about 120 ms. This time lead of about 120 ms corresponds well with the mean difference in time between onset of activity in primary motor cortex and onset of limb movements in rhesus monkeys for similar task instructions. These findings suggest that trajectory information is encoded in an ongoing manner in the motor cortex using a relative coordinate system that moves in conjunction with the finger. Although the correlation with actual movement direction and predicted movement direction, based on the population vector, was quite high (typically above 0.95), this does not necessarily imply that motor cortex is explicitly and exclusively involved in the coding of movement direction. Later studies (see e.g. [31]) have reported that motor cortical cells also have a "preferred direction" for isometric force production in 3D space. In these studies, monkeys were tested in a force task with an external load, such that three force variables could be dissociated: the force exerted by the subject, the net force exerted and the change in force. The directional tuning was invariant across different directions of a bias force. Cell activity appeared to be not related to the direction of force exerted by the subject, which changed drastically as the bias force changed. In contrast, the direction of net force, the direction of force change, and the visually instructed direction could all be the directional variables, alone or in combination, to which cell activity might be related. Obviously, this illustrates that the interpretation of population activity depends critically on the proper characterization of neuronal response characteristics. In this context it is important to notice, that Kalaska et al. [32] reported that for
Population coding." efficiency and interpretation of neuronal activity
881
many motor cortical cells and for the neuronal population as a whole, movementrelated activity was dramatically modified by steady force generation. These observations do not violate the concept of a population vector, but indicate that an accurate and reliable interpretation of the population vector is possible, only when the response properties of single neurons are known in great detail for many experimental conditions (see Eq. (7)).
7.3. A neuronal population code for sound localization Sound localization is based on differences in arrival time, intensity and spectral composition of the signals at the two ears. With regard to the neurophysiology of sound localization, the best studied aspect is the inter-aural time difference (ITD) of signals from the two ears. Representation of ITDs in neuronal activity has been demonstrated in various brain nuclei. For example, in rabbits, ITDs have been reported in the superior olivary complex (SOC), which projects to the inferior colliculus and from there to auditory thalamus. The ITD is directly related to position of the auditory stimulus relative to the head. Therefore, a sensitivity for a small range in ITDs corresponds to sensitivity to auditory stimuli in a small part of space relative to the head. Data by Fitzpatrick et al. [20] indicated that neurons in the auditory pathway have a broad spatial tuning, which becomes sharper as the information ascends through the auditory system (see Fig. 5). These authors suggested, that one of the effects of sharpening of ITD-tuning is to increase the efficiency of the population code. For example, using a population of 40 neurons, the authors argued that an interaural time difference of 147 us could be detected in the SOC, of 39 gs in the inferior colliculus, and of 16 gs in the thalamus. In other words, these authors suggest that sharper tuning higher up in the auditory pathway results in fewer and fewer responding neurons that are needed to achieve a given acuity. However, this conclusion is at odds with the notion, that no information gain can be obtained by a neuronal mapping of coarse coded information into a neuronal representation with smaller receptive fields (see Section 5.1). The analysis by Fitzpatrick et al. may be incorrect, as they did assume independent noise for all neurons at each level in the auditory pathway in their analysis. For a feedforward neural pathway, this assumption seems not valid, unless additional noise is added at each level. However, that would certainly not be beneficial for improving the accuracy to detect interaural time differences.
7.4. Ensemble coding of saccadic eye movements Saccadic eye movements are fast eye movements directed to a particular target in the environment. The superior colliculus plays an important role in the generation of saccades. In primates the superior colliculus contains a topographically organized representation of visual targets [33]. It has three layers, with the visual targets represented in the upper layer and the neural activity related to a saccade to the selected target represented in the deeper layers. Since the receptive fields (upper layer) and movement fields (deeper layers) in the superior colliculus are quite large (typically several degrees and more), a considerable region of the colliculus is active
882
C.C.A.M. Gielen
..
,
--// / /V\\Y/\,
~ 100] c~n O
[3.
.--/
(9
\ \ /
/ /"'~ A~,, ;//"\ , i ./~'~I~ ' YA /,~/ ' ~ VI " ~ , , /\.y ',/ t',/:/:, A ",, .-~,,"/ ~.,",, ~ i ',,/'~'i/ ", ,\/
'~,.
"",
/
X
./~
SOC~
'.,/
\/
\
T h ac~o lamus X
t_
..
10 i
~%/ J
N t~
E
t," ,;
\ \\',
~,
,,~ ,,
,
.1
,,,,
,
,
,/
.,,,.,.
\
/
, \ ,, i,
"I
I\
y
.
',
\
~~~<',..~
O Z
,
.0
-0.5
0 0.5 Interaural delay (ms)
0
9
o~
==
20
Inferior Colliculus
soc
o
z
0
0
400
800 Fig. 5.
1200 Caption opposite.
1,0
Population coding." efficiency and interpretation of neuronal activity
883
Opposite: Fig. 5. Sharpening of ITD tuning as the information ascends the auditory system. (a) Examples of ITD tuning of neurons in the superior olivary complex (SOC) and auditory thalamus. Each curve is the composite of responses to binaural stimuli that spanned the ITDsensitive range for each neuron. The ITD tuning is much sharper in the thalamus than in the SOC. (b) Distribution of composite curve widths measured at half maximum response. The neurons at each level had a wide range of ITD tuning widths, but the mean (circles) shifted to progressively lower values as the information ascends (modified with permission from [20]). whenever a visual target appears and when a saccade is made. Van Gisbergen et al. [34] modeled the collicular role in saccade generation based on the same ideas proposed by Georgopoulos, that each movement cell causes a movement tendency in the direction of the center of the movement field in the collicular map. Different from primary motor cortex, the distribution of movement fields is not homogeneous across the visual field, nor is the receptive field size. Therefore, one of the basic assumptions for the notion of the population vector was not met. This was solved by an anisotropic logarithmic coordinate transformation which maps the inhomogeneous movement fields into a homogeneous "collicular map". In this new homogeneous 2D space, the receptive fields have a Gaussian shape and the total saccade vector is the vector sum of the contributions of the neurons weighted by the firing rate of the neurons. The predicted population vector in the "collicular map" was then transformed to retinal coordinates in order to estimate the saccade which was generated by the neuronal activity in the population of collicular cells. It is interesting to note, that the accuracy of coding of saccade direction and amplitude is the same throughout the "collicular map". However, because of the nonlinear mapping between visual field and collicular map, the accuracy of saccade amplitude is not constant in retinal coordinates: accuracy decreases for larger saccade amplitudes, which is well in agreement with experimental observations [35].
7.5. Inhomogeneous representation of receptive fields It is well known that the receptive field size is not constant within a sensory system. For example, within the visual system, receptive fields increase toward the periphery. Also the density of receptive fields is not constant. Within the visual system, the density of receptive fields is much higher near the fovea than in the peripheral visual field. In Section 4.2 we have explained that the Fisher information and the optimal tuning width depend on the density of neurons: the higher the density, the smaller the optimal receptive field size. Up to now, no quantitative studies have been made to investigate whether the mean receptive field size at a given visual eccentricity is optimal given the local density of receptive fields. Some studies suggest that the sizes of receptive fields scale almost linearly with neural density in the visual system (see for example in visual cortex [36] and superior colliculus [33]). This is in agreement with suggestions, based on theoretical arguments on efficiency of neuronal coding (see [12,37], and Section 5.1 in this paragraph). However, there is another argument which should be considered. One could wonder, why tuning tends to become sharper in more central brain structures. There
884
C.C.A.M. Gielen
can be no gain in information and broader tuning allows averaging over a larger number of neurons, improving the signal-to-noise ratio. A possible reason could be, that narrow tuning allows the representation of multiple auditory targets simultaneously. Unlike for movements, where there is usually only one arm movement or one saccade at a time, there can be several visual or auditory targets at the same time. Sharper tuning provides a higher resolution to distinguish between multiple targets.
8. Discussion
The aim of this chapter was to present an overview of theories about coding of sensory or motor events by neuronal activity and about the interpretation of neuronal activity in a population of neurons. For a broad range of neuronal properties, quantitative predictions can be made. The main hurdle for further progress is the experimental ability to make simultaneous recordings from many neurons. This will provide more information about important aspects related to correlation and/or independent neuronal activity due to common input, neuronal interactions and intrinsic noise in the membrane and spiking mechanism of cells. In this context it is relevant to discuss recent observations [19,38] about synchrony of firing. Synchrony of firing has been hypothesized as a way to solve the binding problem (for a good review, see [39]). Whatever the functional role of synchronous firing between neurons, synchronicity indicates a violation of independent firing, which poses some challenges for theoretical analyses to interpret neuronal activity. The majority of studies on neuronal activity in sensory and motor pathways dealt with the problem on how the activity of a neuron is related to a sensory stimulus or a motor response. This deals with the problem of encoding sensory and motor events into neuronal activity. Various studies, both theoretical [40,41] and experimental studies [42], have shown that the variability of firing rate increases as a function of the number of excitatory and inhibitory inputs. A reasonable estimate is that the response variance is about 1.5 times the mean response and is fairly homogeneous throughout the cerebral cortex [42]. Because of this variability, an accurate estimate of firing rate of a single cell can only be obtained by averaging over time, which would eliminate fast temporal information transfer. Instead, it is thought that averaging takes place over an ensemble or population of cells, which suggests another role for synchronous firing and an alternative for the binding hypothesis. Based on the studies mentioned above, it has been concluded [42] that an ensemble of about 100 neurons might provide a reliable estimate of rate in just one spike interval (10-50 ms). Due to the fact that neurons share common input, resulting in a certain amount of common noise that ultimately limits the fidelity of signal transmission, little or no improvement is gained with larger pools. These considerations illustrate that much more work has to be done in order to reveal the neuronal code which underlies action, perception and cognitive processes.
Population coding." efficiency and interpretation of neuronal activity
885
Abbreviations CNS, Central Nervous System CG, center of gravity CR, Cramer-Rao G W N , g a u s s i a n w h i t e noise I T D , i n t e r - a u r a l time difference MAP estimator, maximum aposteriori estimator ML, maximum likelihood MLE, maximum likelihood estimator msec, m i l l i s e c o n d MT, Medial Temporal O L E , o p t i m a l linear e s t i m a t o r SOC, superior olivary complex
References 1. von der Malsburg, C. (1981) in: The Correlation Theory of Brain Function. Max-Planck-Institute for Biophysical Chemistry, Gottingen. 2. Abeles, M. (1982) Local Cortical Circuits. An Electrophysiological Study. Springer, Berlin. 3. Kullback, S. (1959) Information Theory and Statistics. Wiley, New York. 4. Shannon, S.E. and Weaver, W. (1949) The Mathematical Theory of Communication. University of Illinois Press, Urbana, I1. 5. Georgopoulos, A.P., Schwartz, A.B. and Kettner, R.E. (1986) Science 233, 1416-1419. 6. Marmarelis, P.Z. and Marmarelis, V.Z. (1978) Analysis of Physiological Systems. Plenum Press, New York. 7. Eggermont, J.J., Johannesma, P.I.M. and Aertsen, A.M.H.J. (1983) Q. Rev. Biophys. 16, 341-414. 8. Sompolinski, H. and Shapley, R. Curr. (1997) Opinion in Neurobiol. 7, 514-522. 9. Snippe, H.P. (1996) Neural Comp. 8, 511-530. 10. Baldi, P. and heiligenberg, W. (1988) Biol. Cybern. 59, 313-318. 11. Georgopoulos, A.P., Kettner, R.E. and Schwartz, A.B. (1988) J. Neurosci. 8, 2928-2937. 12. Glasius, R., Komoda, A., Gielen, S.C.A.M. (1997) Neural Networks 10, 981-992. 13. Salinas, E. and Abbott, L.F. (1996) J. Neurosci. 15, 6461-6474. 14. Seung, S. and Sompolinsky, H. (1993) Proc. Natl. Acad. Sci. USA 90, 10749-10753. 15. Salinas, E. and Abbott, L.F. (1994) J. Comput. Neurosci. 1, 89-107. 16. Cover, T.M. and Thomas, J.A. (1991) Elements of Information Theory. Wiley, New York. 17. Zemel, R.S., Dayan, P. and Pouget, A. (1998) Neural Comput. 10, 403-430. 18. Newsome, W.T., Britten, K.H. and Movshon, J.A. (1989) Neuronal correlates of a perceptual decision. Nature 341, 53-54. 19. Riehle, A., Grun, S., Diesmann, M. and Aertsen, A. (1997) Science 278, 1950-1953. 20. Fitzpatrick, D.C., Batra, R., Stanford, T.R. and Kuwada, S. (1997) Nature 388, 871-874. 21. Pouget, A., Deneve, S., Ducom, J-C. and Latham, P.E. (1999) Neural Comput. 11, 85-90. 22. Zhang, K. and Sejnowski T.J. (1999) Neural Comput. 11, 75-84. 23. Shadlen, M.N., Britten, K.H., Newsome, W.T. and Movshon, J.A. (1994) J. Neurosci. 16, 1486-1510. 24. Snippe, H.P. and Koenderink, J.J. (1992) Biol. Cybern. 67, 183-190. 25. Abbott, L.F. and Dayan, P. (1999) Neural Comput. 11, 91-101. 26. Ben-Yishai, R., Bar-Or, R.L. and Sompolinsky, H. (1995) Proc. Natl. Acad. Sci. USA 92, 3844-3848. 27. Zhang, K. (1996) J. Neurosci. 16, 2112-2126. 28. Johannesma, P.I.M. (1981) in: Advances in Physiological Sciences, Neural communication and control, Vol. 30, pp. 103-126, Pergamon Press, New York.
886
C.C.A.M. Gielen
29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.
Gielen, C.C.A.M., Hesselmans, G.H.F.M. and Johannesma, P.I.M. (1988) Math. Biosci. 88, 15-35. Schwartz, A.B. (1993) J. Neurophysiol. 70, 28-36. Georgopoulos, A.P., Ashe, J., Smyrnis, N. and Taira, M. (1992) Science 256, 1692-1695. Kalaska, J., Cohen D.A.D., Hyde M.L. and Prud'homme M. (1986) J. Neurosci. 9, 2080-2102. Capuano, U. and Mcllwain, J.T. (1981) J. Comp. Physiol. 196, 13-23. van Gisbergen, J.A.M., van Opstal, A.J. and Tax, A.A.M. (1987) Neurosci. 21, 541-555. Carpenter, R.H.S. (1988) Movements of the Eyes. Pion, London. Hubel, D.H. and Wiesel, T.N. (1962) J. Physiol. 160, 106-154. Gielen, C.C.A.M., Glasius, R. and Komoda A. (1996) Neurocomputing 12, 249-266. Engel, A.K., Konig, P. and Singer, W. (1991) Proc. Natl. Acad. Sci. 88, 1936--1940. Singer, W. and Gray, C.M. (1995) Annu. Rev. Neurosci. 18, 555-586. Feng, J. and Brown, D. (2000) Neural Comput. 12, 711-732. Gerstner, W. (2000) Neural Comput. 12, 43-89. Shadlen, M.N. and Newsome, W.T. (1998) J. Neurosci. 18, 3870-3896. Kalaska J., Cohen, D.A.D., Hyde, M.L. and Prud'homme, M. (1986) J. Neurosci. 9, 2080-2102.
C H A P T E R 21
Mechanisms of Synchrony of Neural Activity in Large Networks D. G O L O M B Zlotowski Center for Neuroscience and Department of Physiology, Faculty of Health Sciences, Ben Gurion University of the Negev, Be'er-Sheva 84105, Israel e-mail." [email protected]
D. H A N S E L * Laboratoire de Neurophysique et de Physiologie du SystOme Moteur EP 1848 CNRS, Universitd Rend Descartes, 45 rue des Saints POres, 75270 Paris Cedex 06, France * to whom correspondence should be addressed
G. MATO Comisi6n Nacional de Energia At6mica and CONICET, Centro At6mico Bariloche and Instituto Balseiro (CNEA and UNC), 8400 San Carlos de Bariloche, R.N., Argentina e-mail: [email protected]
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
887
Contents 1.
Introduction
2.
Synchronous neuronal activity in the C N S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
892
2.1. Synchronized oscillatory activity
in v i v o
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
892
2.2. Synchronized oscillatory activity
in v i t r o
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.
.................................................
Models of large neuronal networks 3.1. Single neuron dynamics
...................................
.......................................
894 896
3.3. Network architecture and inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
898
Nature of the network state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
899
........................................
899
4.2. Classification of degree of synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
900
Synchronous and asynchronous states in a model conductance-based network . . . . . . . . . 5.1. Asynchronous states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
904 904
5.2. Synchronous oscillatory states: synchrony of spikes
......................
907
5.3. Synchronous oscillatory states: synchrony of bursts
......................
907
Synchrony in one population of spiking neurons - the weak coupling limit . . . . . . . . . . . 6.1. The weak coupling limit: principles of phase reduction
....................
909 909
6.2. Examples of phase interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
913
6.3. Stability of fully synchronized state at weak coupling . . . . . . . . . . . . . . . . . . . . .
916
6.4. Stability of the asynchronous state in fully connected, heterogeneous networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
916
6.5. Approximate theory for randomly and weakly coupled neurons . . . . . . . . . . . . . . .
924
6.6. Sparse networks of integrate-and-fire neurons . . . . . . . . . . . . . . . . . . . . . . . . . .
925
6.7. Sparse networks of conductance-based neurons . . . . . . . . . . . . . . . . . . . . . . . . .
928
Synchrony in one population of spiking neurons - beyond weak coupling . . . . . . . . . . . .
931
7.1. Fully connected, heterogeneous networks
931
7.2. Finite-size effects in sparse networks
............................
...............................
7.3. Sparse networks beyond weak coupling
933
.............................
936
Stability of the asynchronous state of integrate-and-fire networks: theory at all coupling strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1. The method
937
..............................................
937
8.2. The phase diagram of two population integrate-and-fire networks Synchronization of bursts in thalamic oscillations
.............
..........................
9.1. Thalamic spindle oscillation in v i v o and in v i t r o . 9.2. The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.
893 894
3.2. Models for synaptic interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1. Thermodynamic limit 5.
890
950
. . . . . . . . . . . . . . . . . . . . . . . .
9.3. Spatio-temporal patterns of synchrony in the R E - T C network model Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
944
...........
950 951 952 956
10.1. Mechanisms of synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
956
10.2. Robustness of the mechanisms for synchrony . . . . . . . . . . . . . . . . . . . . . . . . . .
957
888
10.3. The origin of irregular firing of cortical neurons
........................
957
10.4. Consequences for n e u r o n a l modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
958
Abbreviations
960
................................................
Acknowledgements
.............................................
A p p e n d i x A. Single n e u r o n conductance-based models
961 .......................
Appendix B. Synaptic dynamics and networks architecture . . . . . . . . . . . . . . . . . . . . . References
.....................................................
889
962 964 965
I. Introduction
One of the central issues in Neuroscience is how the brain represents information about its environment and past experience. Initially, the working hypothesis of many researchers was that the firing rates of the neurons code for this information [1]. More recent studies have suggested that relevant information may be contained in the spike-to-spike correlations between neurons [2]. Currently, it is becoming increasingly clear that it is the details of the complex spatio-temporal patterns of activity exhibited by the brain which sub-serve its function. These activity patterns are extremely rich; they include stochastic weakly correlated local firing, synchronized oscillations and bursts [2,3], as well as propagating waves of activity [4-6]. Synchronous electrical activity is known to occur in the nervous system since the discovery of electroencephalography in the 1930s [7] and the first studies on the rhythm in mammals. Following these pioneering works, many other examples of synchronous activity have been discovered in various species and areas of the nervous system. These examples differ greatly by their power spectrum, spatio-temporal patterns and conditions of occurrence, which can be pathological or physiological. The tremendous progress in experimental techniques which have occurred in the last fifteen years, especially the possibility to record extracellularly from many neurons simultaneously, and more recently, intracellular pair recordings in vivo [8], have opened new horizons to establish the relationship between synchronous electrical activities and cellular properties. Synchrony can be defined in several ways. In its broader sense, the activities of two neurons are said to be synchronized if some kind of temporal correlations exist between them. In this sense, in any system of interacting neurons some level of synchrony should exist. In this respect, observing synchrony between directly interacting neurons should not be surprising. As a matter of fact, several studies in the 1960s and 1970s have been devoted to find out what the shape of the cross-correlations of a pair of neurons can tell us about their interactions [9]. The implicit assumption in this approach was that correlations are dominated by direct interactions between the neurons, i.e., they do not result from a cooperative effect. One important contribution of physicists to the field of neuronal dynamics have been to apply concepts of statistical physics to clarify the very notion of "collective effects" in neuronal dynamics. Most of the recent research on dynamics of large neuronal systems has focused on patterns of synchrony which emerge from the collective dynamics of the system. The ubiquity of synchronous patterns of activity leads to several theoretical issues to understand: the relationship between the structure of activity patterns observed in some brain areas and the cellular properties of the neurons, the nature of their interactions, the local and global architectures of the areas, and their 890
Mechanisms of synchrony of neural activity in large networks
891
interconnections with other brain areas. Furthermore, understanding how noise, heterogeneities and spatial fluctuations of connectivity patterns affect synchrony is a fundamental question for neuroscientists, and more specifically for theoreticians. A suitable framework to address these issues is the physics of extended dynamical systems. One possible approach is to study the cooperative dynamical states in simplified network models of the dynamics and architecture of local neuronal populations in the brain. Within some limits, these models can receive in-depth analytical studies based on statistical mechanics of pure and disordered systems, the theory of stochastic systems and the physics of nonlinear dynamical systems. Subsequently, one extends the analytical investigations by numerical simulations beyond the limits which can be treated analytically. This approach has led to substantial progress in our understanding of general mechanisms of neuronal synchrony. Theoreticians have applied also modeling techniques to investigate the collective dynamics of specific structures of the central nervous system (CNS). They include the olfactory bulb [10,11], the visual cortex [12-16], the hippocampus [17,18], the thalamus [19-27], and more recently the circuits of the basal ganglia [28]. Due to their complexity, these models, even in their more schematic form, cannot generally be studied analytically. However, their investigation can rely on concepts introduced in more abstract modeling approaches. For instance, this is the case in recent research on clustering [21,24] or of transient synchrony (propagating synchronizing waves) [29] in large neuronal networks. This paper focuses on the study of the conditions under which a large neuronal network settles into attractors of the dynamics which are synchronized states. Therefore, the mechanisms underlying transient correlated firing, as described for instance in [30], are beyond the scope of this paper. In the networks investigated in most of this paper, the neurons are connected in a random way, two neurons being connected with a probability which is constant across the network. They can be thought of as a model for a local cortical circuit, e.g. a functional cortical column in a sensory area. Hence, propagating synchronous patterns of neuronal activity are beyond the scope of the present paper. This paper is organized as follows. In the next section, we review some of the recent experimental results concerning patterns of synchrony in the CNS. In Section 3, we introduce the neuronal network models. Section 4 is largely inspired by [13]. It shows how dynamical states of a large network can be classified either as asynchronous or synchronous. This classification is used in Section 5 where we describe different network states in numerical simulations of a two-population conductancebased system. Sections 6 and 7 are a synthesis of [31-33]. It is devoted to an extensive study of mechanisms of synchrony in one-population networks. Combining analytical and numerical approaches, we focus on the inhibitory scenario which has been studied by several authors in the last years. The content of Section 8 is, to a large extent, new. It is based on [34], and presents a general approach to study analytically the stability of the asynchronous states of heterogeneous networks of integrate-and-fire neurons. We use this approach to clarify the role of excitation and inhibition in the emergence of synchrony in two-population networks. In Section 9
892
D. Golomb et al.
we summarize a series of modeling works of the thalamus [22-24]. It shows how numerical simulations of a computational model of this structure can help to understand the mechanisms underlying spindle oscillations. Section 10 is devoted to a discussion of some of the results and of several open questions.
2. Synchronous neuronal activity in the CNS In this section we present a short and nonexhaustive review of the rich variety of spatio-temporal patterns of synchrony which have been found experimentally in the nervous system.
2.1. Synchronized oscillatory activity in vivo Synchronized neuronal activity occurs in virtually all forebrain structures of the mammalian brain; it has been suggested that this reflects a fundamental operational mode of cortical networks [35,36]. Synchrony may occur on the spike-to-spike level or on the bursting level and is often, but not always, related to oscillatory activity. Several examples of coherent activity with oscillatory characteristics have attracted much interest over the past few years, as described in review articles such as [3,37-41]. 9 It has been suggested that cortical oscillations in the gamma frequency band (20-70 Hz) might be involved in object representation, using the temporal structure to perform binding [40] (for various points of view, see [42-47] and a series of reviews in Neuron [48]). Sustained gamma oscillations are generated by an intracortical mechanism [49]. Phase lags may or may not appear: When cats respond to a sudden change of a visual pattern, neuronal activity in the visual and parietal cortical areas exhibits synchrony without time lag [50]. In area 17 of the cat visual cortex, phase lags depend linearly on the stimulus orientation [51]. 9 G a m m a rhythm has also been recorded in the sensorimotor cortex of awake monkeys [52-54], where local field potential (LFP) oscillations near the surface of the cortex were 180 ~ out of phase with oscillations in the deep cortical layers, and in the motor and association areas of cat cortex [55,56]. In the prefrontal motor cortex of behaving monkeys, these oscillations are not related to the details of the movement execution but may be related to aspects of movement preparation [57]. Coherent oscillations in the 20-30 Hz range are present in the monkey motor cortex. They are discernible in the E M G of active muscles, and show a consistent taskdependent modulation [58,59]. 9 Oscillations at 8-12 Hz (spindles) and 4-7 Hz (delta) are common in cortex and thalamus during various stages of sleep [60]. Cortical, thalamic, and brainstem neurons exhibit widespread 7-12 Hz synchronous oscillations which begin during attentive immobility and reliably predict the imminent onset of rhythmic whisker twitching [61]. During such low-frequency oscillations, neurons tend to fire in bursts. In the hippocampus, 200 Hz "Sharp waves" are associated with alert immobility [62].
Mechanisms of synchrony of neural activity in large networks
893
9 Cells in the hippocampus, entorhinal cortex and the medial septal nucleus participate in 4-10 Hz theta rhythms during particular behavioral states. In rats, the spike activity of a CA1 cell advances to earlier phases of the theta cycle as the animal passes through the cell's place field [63,64]. 2.2. Synchronized oscillatory activity in vitro
Despite the large differences between in vivo and slice networks (e.g., only part of the circuitry is left after the slicing, and the level of external synaptic "noise" is strongly reduced), understanding the factors involved in turning the rhythm on and off, and in determining its frequency, synchrony and temporal stability, should provide us tools for manipulating the rhythm in vivo, and thereby better assessment of its functional role. Many experiments that have been performed with hippocampal slices shed light on the mechanisms that generate synchronous oscillations in systems of excitatory pyramidal cells and inhibitory cells. Following the hypothesis by Buszf.ki and colleagues [65], that networks of interneurons may synchronize and entrain the pyramidal cells, Whittington and colleagues [66,67] have examined hippocampal slices in which the AMPA and NMDA excitation was blocked with CNQX and D-APV, respectively, whereas the interneurons were excited by the activation of (very slow) metabotropic glutamate receptors with pressure injection of glutamate. The interconnected network of interneurons produced locally synchronized 40 Hz oscillations, mediated by GABAA synaptic inhibition. Long-range excitatory connections between pyramidal cells were shown to be necessary for maintaining synchrony with almost zero phase lag over long distances [68-70]. The cholinergic agonist carbachol induces gamma rhythm in hippocampal area CA3 [71] in vitro. Pyramidal cells fire in a phase-locked manner, but only in a small proportion of the cycles. Carbachol and kainate elicit persistent gamma rhythm in all layers of the somatosensory cortex [72], which are synchronized with no apparent phase lag over nearby sites (<1 mm). However, phase reversal of both extra- and intra-cellularly recorded activity was demonstrated near the borders between layers 4 and 5, corroborating in vivo findings [53]. High-frequency oscillations (~200 Hz) are also present in hippocampal slices, but are synchronized over short distances (diameter of 6-8 cells) [73]. They are probably mediated through electrical coupling. Carbachol also produces theta rhythm in hippocampal slices (e.g. [74-76]). During these oscillations, bursts of different neurons are correlated, but individual spikes are not. These experimental observations raise several questions that can be addressed theoretically regarding the patterns of synchrony observed in the central nervous system and the neuronal intrinsic and synaptic mechanisms that generate these patterns: (I) What are the roles of recurrent excitatory connections, recurrent inhibitory connections and reciprocal excitatory-inhibitory connections in determining the patterns of synchrony? (II) How are the patterns of synchrony affected by the intrinsic ionic properties of the neurons? (III) Which conditions determine whether synchronous activity is in the level of spikes, burst or firing rate, and in which frequency range? (IV) What determines the time scales of activity modula-
D. Golomb et al.
894
tion? (V) Under which conditions does external input synchronize an interconnected neuronal system? We describe here several theoretical and computational approaches to address these questions. 3. Models of large neuronal networks 3.1. Single neuron dynamics 3.1.1. Conductance-based neurons Conductance-based models account for spiking by incorporating the dynamics of voltage-dependent membrane currents (see for instance [77]). In this framework the simplest way to model the neurons in a network is to use a mono-compartmental model which has the form
dr, C--d- ~- - Ii - g L ( ~ -- VL) -- Zgion(Xion,i)(Vi ion
- ~on) 4- Isyn.i(t).
(1)
The integer i is an index which characterizes the neuron in the network, V/- is its membrane potential, C its membrane capacitance. The first term in the right-hand side of this equation is a bias current I; that determines the firing rate of the neuron in the absence of interactions. In the case of homogeneous population, Ii - I . The second term corresponds to the leak current. It has a reversal potential, VL and a voltage-independent conductance, gL. The sum in the third term is extended over all the active ionic currents. The gating variables of the various channels are the components of the vector Xion.i. For the dynamics to be completely defined, one has also to specify the relaxation dynamics of these gating variables. The corresponding voltage-dependent conductances and their reversal potentials are denoted gion,i and ~o,1, respectively. The two conductance-based models considered in this paper are the HodgkinHuxley model [78] and the model of Wang and Buzsfiki [79]. Both models incorporate a leak current, a delayed rectifier potassium current and a sodium current. They differ in the detail of the parameters of the activation and the inactivation functions of the channels as well as in their maximum conductances. Consequently they display some substantial differences in their dynamics. Parameters for both models are given in Appendix A. The Hodgkin-Huxley model. In this model, introduced to describe the firing property of the squid axon [78], periodic firing occurs through an inverted Hopf bifurcation. Neurons with this type of behavior are often said to be of "type II" excitability (type IIE). Therefore, the firing rate varies discontinuously. This is shown in Fig. 1A for the Hodgkin-Huxley model (with standard parameters) which cannot fire periodically below 50 spikes/s. In the repetitive firing regime, the action potentials of this model are of moderate amplitude and are broad (see Fig. 8B, first panel). The Wang and Buzsdtki ( W B ) model. This conductance-based model has been introduced recently to describe hippocampal interneurons [79]. Action potentials in
Mechanisms of synchrony of neural activity in large networks
895
B. Wang-Buzsiki
A. Hodgkin-Huxley model
model
200 -
200 -
150
150
n-" 100
100
N
-r
r r-
:r__. U..
50
50
0
50 100 I (gNcm 2)
150
0
1
2
3
4
I (l~NCm2)
Fig. 1. The firing rate of a neuron as a function of the external current 1 for (A) the Hodgkin-Huxley model and (B) the Wang-Buzsfiki model. Solid line: without spike adaptation; dotted line: with spike adaptation. Parameters are as in Appendix A. this model are higher and thinner than in the standard H H model (Fig. 8A,B, first panels). It displays a saddle-node bifurcation at the repetitive firing current threshold, Ic. Neurons with this property are often said to be of "type I" excitability (type IE, hereafter) [80]. For these neurons, the relationship between the current, I, injected in the neuron and its firing rate behaves like f oc v / I - Ic for I > I~. Therefore, they are able to fire at an arbitrarily small rate. The f - I curve of the WB model is shown in Fig. lB. In the following we will use this model for representing inhibitory neurons. To describe excitatory neurons, we will modify this model by adding a slow potassium current, which will induce spike adaptation, a phenomenon known to be widespread among excitatory cortical neurons. Other parameters will be kept the same. The f - I curve of this modified WB model is also shown in Fig. lB.
3.1.2. Integrate-and-fire neurons Integrate-and-fire models constitute another class of models frequently used in modeling [77]. These models do not rely on a biophysical description of firing and their simplicity makes them more easily amenable to analytical studies than conductance-based models [81-83]. In the simplest integrate-and-fire model, the Lapicque model, the membrane potential of a neuron satisfies the differential equation dVi C d--7= --gL(~ -- VL) + Ii + Isyn,i(t)
(2)
for 0 < V < 0. As before, Ii is an external current and Isyn,i is the synaptic current which is due to the interactions with all the other neurons in the network. A spike is
D. Golomb et al.
896
fired whenever the membrane potential reaches the threshold 0. If at time to a spike is fired, the membrane potential is immediately reset to Vr namely ~(t0~)
-
if Vi(to)
Vr
-
O.
(3)
For simplicity we chose: V~ = VL. One can also introduce in this model a refractory period, if necessary, by imposing that V(t) remains equal to 0 for a time Tr after the firing of a spike. If the neurons are not interacting they emit spikes periodically with a period Ti-- T r - v01n(1 - g L ( 0 - - VL)/Ii) for Ii > gL(0-- VL). where the passive time constant of the membrane is denoted by t0 = C/gL. In the following we will chose: C = l g F / c m 2, gL = 0 . 1 m S / c m 2 and 0 - VL ---- 20 mV. This corresponds to passive membrane time constant: t0 = 10 ms and a current threshold Ith = 2 ~tA/cm 2. The refractory period will be a parameter, the influence of which on the network properties will be discussed. Without loss of generality, we assume that VL = 0. It is often convenient to write the I&F model using the reduced dimensionless variables: = V~- VL O-V L '
/,.-
(4)
c ( o - vL)'
/syn,i /syn,i-- C ( 0 - VL)"
In terms of these reduced variables the dynamics can be written
dVi
d---~= - ~ +/"
+ Isyn.i,
(5)
where the time is measured in units of t0 and the resetting conditions read /2/i(t~-) -- 0
if ~(to) -- 1.
(6)
We will use this formulation in Section 8.
3.2. Models for synaptic interactions The synaptic current, Isyn.i(t), which the ith cell receives, is the sum of all contributions Isyn,ij(t) from all the presynaptic cells with index j impinging on the the ith neuron
Isyn,i(t) -- Z l s y n , i j ( t ) . J
(7)
897
Mechanisms of synchrony of neural activity in large networks The current Isyn,ij(t) is modeled by
Isyn,ij(t)--Gsyn,ij(t)(Vi- Vsyn),
(8)
where Gsyn,ij(t) is the total conductance at time t of the synapse from cell j to cell i with a reversal potential Vsyn. The v a l u e Gsyn,ij is given by 0syn,/j(t) - Gsyn,ij sj(t),
(9)
where Gsyn,ij is a constant which measures the strength of the synapses that neuron j makes on neuron i, and sj evolves with time. Various models are used for the dynamics of the synaptic conductances. They differ by their faithfulness to the details of the biophysics. One possibility is to assume that the variable sj. obeys the kinetic equation [84,85]
dsj = kfS~(Vj)(1 - sj) - krsj, dt 9
(10)
where Vj is the presynaptic potential,
S ~ ( V ) - {1 + exp[-(V - 0~)/cy~]}-',
(11)
where 0~ = 0 being the presumed presynaptic threshold for transmitter release and cy~ = 2 mV. After termination of the presynaptic spike, the decay rate of s is 1/kr. The parameter kf affects both the synaptic strength and its tendency to saturate. If 1/kf is much smaller than the spike time width, the variable s saturates to the value
k f / ( k f + kr). In a more phenomenological description, the variable s is represented by
Sj(t) -- ~
tspike,j)
f(t-
(12)
spikes the summation being performed over all the spikes emitted by the presynaptic neuron with an index j at times tspike,j- The synaptic interaction is usually classified according to whether Vsyn is larger or smaller than the threshold potential, Vth, at which the postsynaptic neuron generates spikes. For Vsyn> Vth the interaction is called excitatory, while for V~y~< Vtth it is called inhibitory. The function, f , is normalized such that its integral is 1. Then, Gsyn,ij, is the total synaptic conductance induced by one presynaptic spike. Several forms can be used for the function f ( t ) . A standard choice is
f (t )
_
1
T1 -- T2
exp
-
-~l
-exp
-
~2
|
"
(13)
Here, | is the Heaviside function and the normalization o f f ( t ) has been chosen so that the integral of f ( t ) is one. The characteristic times ~1 and ~2 are the rise and decay times of the synapse, respectively. In case ~l = "~2 - - T one obtains the so-called "alpha function" [86]:
f (t) --t-~exp ( - ~)
(14)
898
D. Golomb et al.
Analytical studies of integrate-and-fire networks are often easier if one adopts a more simplified model for the synaptic current in which one neglects the effect of the driving force V/- V~yn.In that case, the synaptic current has the form Isyn,ij(t) -- Gsyn.ij sj(t),
~ f (t -
sj(t) --
(15)
tspike,j),
spikes
where the function f(t) is given by Eq. (13) or (14) and the summation is done over all the spikes emitted prior to time t by all of the presynaptic neurons. Note that in this formulation, Gsyn,ij has the dimension of a density of current. In this model, excitatory (resp. inhibitory) interactions correspond to Gsyn.ij > 0 (resp. Gsyn,ij < 0). In such a model, one neglects the fact that the synapses are changing the effective integration time constant of the neurons. This approximation is more justified for excitatory interactions (Gsyn.;j > 0) since no description of the spike is incorporated into the model and the driving force V~yn- V, remains approximately constant in the subthreshold regime. For inhibitory neurons, where shunting effects are more important, this approximation is more crude.
3.3. Network architecture and inputs In its more general form, the network model we consider in this chapter consists of two populations of neurons, one is excitatory (E) and the other is inhibitory (I). The synaptic current, which a cell i of the 0tth population receives from a cell j from the 13th population, is ZY n'i(t) -- Z
(16)
Isyn.ij (t)'
J
where 0t = E, I, i = 1 , . . . , N~, j -- 1 , . . . , NI3, and N~ is number of neuron in the 0tth population. The corresponding elements of the synaptic conductance Gsyn,ij are proportional to the connectivity matrix w~~ defined as 1
if neuron j from population 13 is presynaptic to neuron i from population 0t,
0
otherwise.
w~~ -
(17) We will assume that the network has no spatial structure (except for the segregation into two populations), and define M ~ to be the average number of synaptic inputs a cell from the 0tth population receives from the cells from the [3th population. A neuron, j, ]3 is making one synapse on neuron i, 0t at random with a probability M~/Nf~. Therefore, Prob(w~f - - 1 ) _ M ~ ,
Prob(w;~- 0)-
1
M~I3
(is)
The number of synaptic inputs from population 13onto population at fluctuates from neuron to neuron in population at with an average M~. In part of the examples
Mechanisms of synchrony of neural activity in large networks
899
treated below we further simplify the architecture by assuming all-to-all connectivity. In that case, M~fi = N~. For the sake of simplicity, we assume that all the existing synapses between the neurons of two populations have the same strength
Gsyn,ij
Gsyn w~~.
(19)
The synaptic rise time and decay time will depend only on the nature, excitatory or inhibitory, of the synapses and will be denoted by Zl~,Z2~, 13- E,I. We neglect axonal propagation delays. In many cases it is convenient to model the external input to the network as an external current. In the analytical studies presented below, we adopt this approach since it significantly simplifies the calculation without modifying substantially the specific results presented in this chapter. However, one should keep in mind that depending on the issues which one wants to address, such a description may or may not be suitable. A more plausible description of the external input is in term of conductances. This is the approach we adopt for the simulations presented in Section 5. In that case, the input network consists of No excitatory neurons. Since we are not modeling the architecture and the dynamics of this network in detail, we assume that the input neurons are firing spikes independently, with a Poisson statistics characterized by a rate v0. We also assume that the number of afferent synapses and their strength are the same for all the neurons in the same population. The number of afferent synapses on the excitatory (resp. inhibitory) neurons will be denoted by n~E (resp. nIa) and their strength by gae (resp. g~). Under these assumptions the external input on the neuron i, ( i - 1,... ,N~) in population ~ - - E , I is written
I~(t) = g~ Z
f (t - tspike)[VE -- Via(t)l ,
(20)
spikes
where the sum is extended over all the spike times of the na~ afferent input neurons to neuron (a, i) which have occurred before time t. Here the function f is given by Eq. (13) with rise time and decay time which for simplicity will be taken to be the same as for the recurrent synapses of the excitatory population. Note that the time average conductance of the external input on the population ~ is simply :
n a g a V 0.
4. Nature of the network state
4.1. Thermodynamic limit The dynamic behavior of a network may be sensitive to the precise details of the parameters that characterize its dynamics and connections. One of the parameters is the size of the network. Studying the size dependence of the results is extremely important for several reasons. First, systematic numerical simulations of networks, such as those described below, are limited to sizes of up to several thousands of
D. Golombet al.
900
neurons. This limitation is more severe for more complex networks, e.g., networks with spatially extended cells. Although the relevant scale of local circuits in cortex is not known, it is quite possible that it involves a larger number of neurons. More importantly, understanding the size dependence is crucial for understanding the qualitative nature of the dynamical states in networks and the phase transitions between them. This is because frequently, these states and the transitions between them can be defined in a rigorous way only in the thermodynamic limit, i.e., when the number of neurons goes to infinity. In order to discuss the size-dependence of the dynamic behavior of the network we have to specify how the network parameters vary when the numbers of neurons of the two populations, N~'s, are increased. Here we assume that the single cell properties, including its time-constants, the resting and threshold potentials, as well as the external current, are independent of N~. This implies that the total synaptic input Isyn must remain of order unity as N~, ~ -- E, I grow. Two extreme cases are frequently considered: (1) Massively connected network. In this case the average number of synaptic inputs per neuron varies proportionally to the size of the system. Excluding precise cancelations, the size of the amplitude of most of the individual synaptic inputs are kept fixed. In our case this means that the synaptic strengths, have to vary in inverse proportion with the system size such that Gsy~n = gs~y~n
N~'
(21)
where the normalized conductances, gsyn ~ are independent of the N~'s. (2) Sparse network. In this case the average number of synaptic inputs per neurons, M~, are O(1) compared to the size of the network, and they should not be scaled in the thermodynamic limit. Therefore in this limit, these parameters remain finite. We want, however, that the dynamics of the network would have a defined value also at the limit when the values M~[~ vary and become large. That is why, in most of this chapter, we scale the conductances between cells such that Gsy~n = gs~.
M~
(22)
We will come back to this issue in Section 7.2. Note also that if a cancelation is expected (balanced state) between the excitatory and the inhibitory inputs, the synaptic strengths should be scaled with the square root of the connectivity (see [87-90] for more details on that issue).
4.2. Classification of degree of synchrony 4.2.1. Definitions of synchrony As was already mentioned in the introduction, roughly speaking, the activities of two neurons are synchronized if their spike trains display some level of correlations. According to this definition, in the simplest case, the experimental signature of
Mechanisms of synchrony of neural activity in large networks
901
synchrony is a nonflat cross-correlogram of the spike trains or the membrane potentials of the two neurons. This definition does not assume anything regarding the sign of these correlations, i.e., neurons with positive or negative correlations in their firing times are considered as synchronized. However, in this sense, in any system of interacting neurons, some synchrony exists. In a more restrictive way, two neurons are said to fire synchronously if they tend to fire spikes with positive correlations at zero or small time delay. An example of synchronous firing following this definition would be a pair of identical neurons, periodically firing in a phase-locked manner with a small phase-shift. Full synchrony is the extreme, ideal case in which the two neurons fire simultaneously. Note that some authors are restricting the definition of synchrony to this last situation and call ~-synchrony the more general case where the firing times of the two neurons differ slightly [91]. With these definitions, synchronous firing requires synaptic interactions between neurons with suitable properties [92,93,31,84,94] (see also Section 6.3). These definitions are well suited to the study of spatio-temporal patterns of firing in a small system of neurons which depend on the detailed pattern of connections between the neurons. However, when dealing with large systems of neurons as found in the central nervous system, these definitions are not sufficient. Therefore, a more appropriate definition of neuronal synchrony is needed. To be rigorous, this definition requires the thermodynamic limit. It allows us to differentiate between synchronous activity which would result from the anatomical connectivity and synchronous states which, independent of the detailed connectivity patterns, are cooperative in their origin.
4.2.2. Cooperative synchrony in large networks Massively connected systems. Given the above scaling of the synaptic conductances, there are several ways in which the system size can affect the dynamics of the network. However, as discussed in Ref. [95] there are two simple generic cases. These cases are termed asynchronous and synchronous states, and they differ in the way the temporal fluctuations of the total synaptic conductance change with the size of the system, N. In asynchronous states the total synaptic excitatory and inhibitory conductances generated by the network on a neuron approach a time-independent limit as N ~ ~ . This reflects the fact that the action potentials of the individual neurons are very weakly synchronized. Summing N temporally uncorrelated (or weakly correlated) contributions results in a total synaptic conductance whose fluctuations have an amplitude of the order of 1/~/N. Such a state can be self-consistent because the weak temporal variation in the "common input" to the different neurons may be insufficient to synchronize them. Characterizing this state can be done in two ways. One way is by evaluating a global variable, e.g., the spatially averaged instantaneous activity. In the asynchronous state the variance of such a quantity vanishes as N increases, typically as 1IN. Alternatively, one may evaluate the cross-correlation (CCs) functions between the activity of pairs of neurons [9]. The magnitude of the typical CCs will be of the order of 1IN.
902
D. G o l o m b et al.
A specific type of an asynchronous state is the "splay state" [96], in which neurons fire consecutively with a phase difference of 27tiN between firing of two successive neurons. In this state, g scales like 1IN. This state, however, is hardly seen even in simulations of homogeneous, all-to-all networks. In contrast to asynchronous state, in synchronous states there are temporal fluctuations on a global scale. The variance of the global activity, as well as the variance of the total synaptic conductance of a neuron, remain of order unity even for large N. This implies, of course, that the degree of synchrony generated by the common input from the rest of the network is itself of order unity even in the limit of large N. Sparsely connected system. In this case one cannot discriminate between asynchronous and synchronous states on the basis of the temporal fluctuations of the total synaptic current received by one neuron. Indeed, even if the action potentials of the individual neurons are weakly synchronized, their total effect on the synaptic current received by a given neuron are inversely proportional to M, the number of inputs this neuron receives. Therefore, the temporal fluctuations of the synaptic input on a specific neuron remain finite in the thermodynamic limit. However, one can still define synchronous states in a sparse network as states in which coherent temporal fluctuations of the synaptic inputs are occurring on a global scale. In such a state, the variance of the global activity, as well as the variance of temporal fluctuations of the synaptic currents averaged on the macroscopic spatial scale, remain of order unity even for large N. This implies that the degree of synchrony generated by the common input from the rest of the network is itself of order unity, and that the CC of a pair of neuron is dominated by this common input. This behavior differs from what happens in the absence of coherence in the temporal fluctuations of the inputs. Indeed, in that case the temporal fluctuations of the macroscopic spatial averages of the synaptic currents are vanishing in the thermodynamic limit as 1IN. The CC of the activity of a pair of neurons is strongly dependent on their direct interaction.
4.2.3. Population averages in asynchronous and synchronous states The above criteria of synchrony are difficult to check directly in experimental systems since it requires reliable estimates of parameters, such as the size of the network, the connectivity and the strength of connections. An alternative criterion, which works for massively connected as well as for sparsely connected systems, is based on the behavior of population averages [13]. Let us denote by xi(t) a local observable, e.g., the instantaneous rate of the ith neuron. Let us suppose that we can measure the means of this quantity over a subpopulation of size K, where K << N, yielding 1
K
XK(t) - ~ Z xi(t).
(23)
i
Asynchronous states can be distinguished from synchronous states according to the K dependence of the variance of X,
Mechanisms of synchrony of neural activity in large networks
903
(24)
A ( K ) - ((XK(t) -- (XK))2),
where (...) denotes averaging over time. In an asynchronous state the local variables are weakly correlated, hence 1
(25)
A(K) cx: ~ , 1 < < K < < N . On the other hand, in synchronous states
zx(K) = o(1)
(26)
even for large K. The advantage of this criterion is that it does not rely on the absolute scale of A, but on its dependence on K which, unlike N, can be varied experimentally. The limitation of this criterion is that the sampling of the x;'s and the value of K should be such that the sums are not dominated by unusually strongly correlated variables. Also, the choice of the variable xi must be done with care in cases where destructive interference between different types of neurons, which fluctuate out of phase from each other, is likely to occur.
4.2.4. Measure of synchrony in large neuronal networks The above analysis suggests a way to quantify the degree of synchrony in the steady state of the network [19,20,97,95] based on the study of the temporal fluctuations of macroscopic observables of the networks, such as the instantaneous activity of the system, or the spatial average of the membrane potential. To be specific, we consider the latter. One evaluates at a given time, t, the quantity l X
V(t) - ~ Z
Vi(t).
(27)
i=1
The variance of the time fluctuations of V(t) is (28)
(y2 __ ( [ V ( t ) ] 2 ) t _ [(V(t))t]2
where (...)t= fo md t . . . denotes time-averaging over a large time, Tin. After normalization of cyv to the average over the population of the single cell membrane potentials (y2v, - ([V,(t)]2),- [(Vi(t)),] 2
(29)
one defines a synchrony measure, z(N), for the activity of a system of N neurons by
4
(30)
Z2(N) - U • ~-~iSl ~2V,"
This synchrony measure, z(N), is between 0 and 1. The law of large numbers implies that in the limit N + oo it behaves as z(N) - ~;(oc) + ~
+ O
,
(31)
904
D. Golomb et al.
where a > 0 is a constant. In particular, z(N) = 1, if the system is fully synchronized (i.e., V ( t ) - - V ( t ) for all i), and ~ ( N ) = O ( 1 / v ~ ) if the state of the system is asynchronous. In the asynchronous state, ~ ( ~ ) = 0. By contrast, in synchronous states, g ( ~ ) > 0. In homogeneous fully connected networks, an important class of partially synchronous states, ( 0 < g ( ~ ) < 1), are the "cluster states" [98,20,21,99,31] in which the system segregates into several clusters that may fire alternately. In heterogeneous or sparse networks, the disorder can smear the clustering. For instance, in a "smeared 1-cluster state", the population voltage, V(t), (Eq. (27)) oscillates with time and has one peak as a function of time in each time period. More generally, in a "smeared n-cluster state", V(t) has n peaks as a function of time in each time period. In networks with more than one population of neurons, one can define a synchrony measure Z for each population separately.
5. Synchronous and asynchronous states in a model conductance-based network In this section we describe numerical simulations of a two-population model network. For the sake of simplicity we assume an all-to-all connectivity both within and between the two populations. We also assume that ArE = N1 -- N. Depending on the network parameters, the dynamical state can be either asynchronous, or synchronous and oscillatory. The details of the model are given in Appendices A and B. 5.1. Asynchronous states
If the external input conveys substantial noise, the network settles in an asynchronous state. This is demonstrated in Fig. 2A where the traces of two excitatory neurons and one inhibitory neuron are plotted. Because of the strong local noise the membrane potentials of the neurons display variability and pairs of neurons fire asynchronously. As a result the total synaptic conductances from the network interactions display small fluctuations (Fig. 2B), which vanish in the limit of large network size. The auto-correlation function (AC) of the contribution from the network to the total synaptic excitatory conductance on an excitatory neuron is plotted for three different network sizes in Fig. 2C. This is in contrast to the fluctuations of the external synaptic conductances which remain strong (not shown). An equivalent signature of the asynchronous firing is the scaling of Z with 1/ v ~ . This is also shown in Fig. 2C (inset) from which it is clear that ZE and Z1 decrease proportionally to 1/v/N.
Opposite: Fig. 2. A. Traces of 2 excitatory neurons (E) and one inhibitory neurons (I) in the asynchronous state when the noise in the external input is strong. B. Traces of excitatory and inhibitory conductances. Solid line: network excitatory feedback on excitatory neurons; dotted line: excitatory external input; dashed line: network inhibitory input on excitatory neurons. C. AC of excitatory conductances for different network sizes: N = 200 (solid line), N = 400 (dotted line) and N = 800 (dashed line). Inset: synchronization parameters ZE (circles, solid line), Zt (diamonds, dotted line) as a function of the network size. Parameters are as in Appendix A and B. Other parameters: gsynEE YsynEI gsynlE gsyn11= 0.2 mS/cm 2" gae = gta = 0.015 mS/cm2; v0 = 10 Hz, nEa = nIa -- 100. •
_ .
_~
Mechanisms of synchrony of neural activity & large networks
905
A
E J
_
_
_
_--~_
_-~_
I
-70 mV . . . . . . . .
50 ms
13omv ,,
.'.':
i!
::
I 0.005 mS/cm 2 , L
50 ms ,
C
,.~
L
i
~
o~
~
,
L
[ ~.....,"'"
1-
0.05
0.480 -
o
~
..~
0.10
t,-
o :::3 e~
o O
,,- (/) o
0.00 0.00
0.05
N-1/2
0.475-
E
o'I,
-~x ~ L_
o o o
<
0.10
N=200 ........... N = 4 0 0
..."
0.470-
N=800
6
-100
t(ms) Fig. 2.
Caption opposite.
100
906
D. Golomb et al.
It is important to note that for this set of parameters, the size of the local noise controls the global (finite size) as well as the local properties of the network. This is demonstrated in Fig. 3 for a network with the same synaptic parameters as in Fig. 2 except for the external input for which the fluctuations have been reduced, keeping the same average. Fig. 3A shows the AC of the total inhibitory conductances of the network excitatory synapses on an excitatory neuron for two sizes of the system. As above, the amplitude of the central peak decays to zero at a rate inversely proportional to the size, signaling that the state is asynchronous. However, although the average firing rate has almost not changed, the amplitude of the central peak of the AC of the inhibitory conductance is now substantially bigger than in Fig. 2C. This is shown clearly in Fig. 3B. Thus, decreasing the level of local noise, increases the level
A
1.62 o,I r
E
r
or) v
E
"f, o
'11""
v
1.61 1.60 1.59
,
"
-
x
BI
t~
@
o I:::: o
"o co 0
1.62 1.6o
,...,
o r o ,--,,,
.
t
.
.
.
.
.....
'
f\ [. . . . . . /~\
/',.
~
.,- ,,, ..,. ,., .... ~ . ~
....
-.]
BII o o
0.48
<
0.47 I
-100
.
.
. '"[ . .
.
0 t(ms)
'
.......
--'7
lO0
Fig. 3. (A) AC of the inhibitory synaptic conductance for moderate level of noise in the external input for N = 100 (solid line) and N = 800 (dashed line). Parameters as in Fig. 2 except for 9~ - g ~ - - 0 . 0 0 5 mS/cm2; v0 = 30 Hz. (B) AC of the synaptic conductances for moderate level of noise (same parameters as in A; dashed lines) and strong noise (same parameters as in Fig. 2, solid lines); N = 100. I: inhibitory conductance; II: excitatory conductance.
Mechanisms of synchrony of neural activity in large networks
907
of residual (finite size) synchrony. Another effect of reducing the noise is that the firing pattern of the inhibitory population becomes substantially more oscillatory. Note, however, that the firing pattern of the excitatory population remains very similar to what is found for stronger noise; see Fig. 3B~. This difference of behavior suggests that when the noise is reduced, synchronous oscillations appear first in the inhibitory population. This is confirmed when the noise level is further reduced.
5.2. Synchronous oscillatory states." synchrony of spikes Eventually, if the local noise is sufficiently small, collective synchrony emerges as demonstrated in Fig. 4. In Fig. 4A, the ACs of the average conductances of the network synaptic excitatory and inhibitory inputs on an excitatory neuron are displayed. Note that because of the particular choice of the parameters, the excitatory and inhibitory inputs on the inhibitory population are the same. Both ACs are oscillating at the same frequency, v ~ 60 Hz, with an amplitude which is almost independent of the system size. This reveals that the network activity is now in an oscillatory and synchronous state. Note also that these oscillations have a much larger modulation for the I population than for the E population. The inhibitory network is therefore much more strongly synchronized than the excitatory one. This is also confirmed by the fact that )~E = 0.068 << Z~ = 0.59. In order to have more insight on the pattern of synchrony of the two populations we display in Fig. 4B a "raster plot" of the spike trains for the excitatory and the inhibitory neurons. Each neuron is represented by a dot at the time it fires a spike. The average firing rates of the excitatory and inhibitory populations are 34 and 64 Hz, respectively. The inhibitory neurons are firing spikes at almost every period. However, the excitatory neurons are firing only seldomly in coincidence with the oscillation. This suggests that the synchrony of the network is due to the inhibitory neurons which become synchronized because of their reciprocal interactions. Subsequently, they entrain the excitatory population. However, the excitatory neurons which are firing at a different rate are poorly following the rhythm imposed by the inhibitory neurons. This explains the fact that ZE << ZI. This interpretation is confirmed by the fact that if the interaction between the inhibitory neurons is switched off, the network becomes asynchronous (results not shown).
5.3. Synchronous oscillatory states." synchrony of bursts If the cross-talk between the two populations and the recurrent excitation in the excitatory population is sufficiently strong, the system can settle in a bursting and synchronous state. For simplicity we consider here the case where the firing of the excitatory neurons is only weakly adapting. The case of stronger adaptation has been studied in detail by Van Vreeswijk and Hansel [100]. An example of a bursting synchronous state is shown in Fig. 5A. The external input has been chosen such that the average firing rate of the excitatory (resp. inhibitory) population is around 35 Hz (resp. 90 Hz). Most of the time, the spikes fired by excitatory neurons come in short bursts of two action potentials which are synchronized across the network (ZE ~ 0.3). They coincide with the global oscilla-
908
D. Golomb et al.
AI q) 0 t--
0.48
o:~ d "
0.47
r"
o E
0
o
o E ".~
0.46
All 2.0
0
1.6
8
"5
<
1.2
r
O0
,
~)
100
t(ms)
B
E
100 ]i..:'. :: .-: .:: .::.-.: .:.: :: -. ::" ..:: !.: -::..':: ":: ":: :."
i: ",.: :,. : ",. : ;. :'~. "~. :, "'c "", "':. "~ "'~ ".," ".," "': ~.~i:.'.."
"-(~
I
~
[ . . : . . : . ; . . : . . : . . : . . - . . ' . - " . . :.. v
-~
/'-." | ' ' ' " "," '" "," ' ' "," ' ' ' ' 1","" " '"," " "," "}" ",":" ;". . .;". . ,~" ;":" "~": ' ' :'"' " '.'" "" ."-
~"9"...."" . .,"' . . / ".. ,."" - "..". . z""t ~""~ l" "~." "=. " ."=" "=." ",..." "s.. ' . ~.. "." "t"
I" ~'. ",; ~; " , ' , ~ ' . ~ ; - ' , ' , " . ' I':. , - . .~. . . - .~. . . . ~
['
,
.'.
,"
". : ' . ..'.-._, ' . . ' . .. '. ~. ...'....." . "
,
,',',',',.'.'."
:'.. 9 :', " , ' , "',. ";
..
,
,',',','."
.'" , ." , . ..'" .'.;" .; /,
~"-.! ";, , "~ "',t , "', i, . "".. . "". , "" ",', ,', , . ,", ,'. ". ~. " - . c'. "' . , . "," , . ;" ,.
.; .; ,: -." ,: ..
[
.:.
50
".. 9 "9~ . . " . :. :.9 ..9 ..9 :. 9 "... "... ".....'... ".-. : . . ' . . . ' ;
.
I. .". .': ":'"f ' : " : ": ':':':'.' f :. :";:';:"'::'::':""
~9
h.::'.'.".':'.:'.:".;'.:'.'.'.".'.".'."."":'.
":'. "/".'.'.':":
4"~ .:, . "~ .', .., .., .....',...~..,. ) . ~ " ~ ".r ".~ ";. )
~.
I: [:.
,
(
I
::
.....
:.9
r / " .a " / / ~" =" .= .=" .~" . ( .'. t ' " r t" . . ' . . ' . ;" "; ;: ".. ;. ;. ;. ; ' . ; ' . . . ' . . " ....." ,........
~9 .,
J: " " " " ".'. "t r ., ..
r
.........,,
.~ "..
":. "~ " j
,t
.-
.
.:.;
r
t
,
.;
9 ..,
..
;.
.
. ""
"'" "'" :'" ~" .'" ".." .:" "C "~ "i "~ ".: ".: "~
."
I
."
r
."
(
:
.:
"
,"
s" g 9 J. ~.
."
~.
.
.
".
.
"
.
"
"% ";
."
";
9 "'.
0 j , . ' , . ' , . '... ... ,. ~. ,.'.. -. ,. ,. , . . . . . . . , . ,
50 rns
100 (b "O c." ,,....
c o
50
:::}
(I)
z
Fig. 4.
Caption opposite.
Mechanisms of synchrony of neural activity in large networks
909
tions of the membrane potential of the excitatory neurons which occur at an approximate frequency of 16 Hz, see Fig. 5B. Note that the bursts are synchronized, but that this is not the case for the spikes. The inhibitory neurons also fire in synchronized bursts. These bursts contain about 5-6 spikes which are not synchronized across the network. Interestingly, the rhythmic oscillations of the two populations occur almost in phase. The peak of the cross-correlation of the population activities is slightly shifted, signaling that the excitatory population fires about 3 ms earlier than the inhibitory population (result not shown). In contrast with the synchronous state described above, this bursting state is destroyed when the strength of the inhibitory coupling, gsyn,/I is increased. This is shown in Fig. 6. In this simulation, the external inputs have been modified so that the average firing rates of both populations remain almost unchanged when the coupling is increased. The difference in the effect of the intra-inhibitory interaction indicates an essential difference between the bursting synchronous state and the spiking synchronous state described above. We will come back to this point in Section 8.2.
gsynll
6. Synchrony in one population of spiking neurons - the weak coupling limit
As demonstrated above, synchrony can emerge in a two-population network because the inhibitory neurons synchronize as a result of their mutual interactions, and consequently, they entrain the excitatory population. In order to understand this mechanism, one has to first understand how synchrony emerges in a network composed of only one population of spiking neurons, either excitatory or inhibitory. This issue has been extensively studied in the last 10 years using different complementary approaches [66,68,82,101-115]. In general, the dynamical equations of large networks of spiking neurons cannot be solved analytically, and the study of synchronization in such networks relies on numerical computations. In this section, we describe the technique of reduction to phase models, that enables the analytical studying of synchrony under several limiting conditions. We apply this technique to both fully connected heterogeneous networks and sparse networks.
6.1. The weak coupling limit." principles of phase reduction We consider a network of N neurons with synaptic coupling in the form of Eq. (1). Under the following three conditions the state of each neuron can be described by a phase variable, d~i, which characterizes at time t, the location of neuron i along its limit cycle [116].
Opposite: Fig. 4. (A) Autocorrelations of the average conductances of the network synaptic excitatory (I) and inhibitory (II) inputs on an excitatory neuron are displayed in a synchronized state. Results for N = 100 (solid line) and N - 800 (dashed line) are plotted in the two cases and are almost indistinguishable. (B) Raster plots in both populations, N = 100. Same parameters as in Fig. 2 except for gae - g/ = 0.001 mS/cm2; nEa = nIa __. 5 0 0 .
910
A
D. Golomb et al.
D
E
E
I - 7 0 mV . . . . . . . . . . . .
[30 mV 50 m s
BO E
-70 mV ]30 mV 50 ms Fig. 5. The synchronous bursting state. (A) Traces of 2 excitatory neurons (E) and one inhibitory (I) neurons. (B) Average potential of the two populations. Same p a r a m e t e r s as in E/ = 0.6 m S / c m 2, gsyn IE = 0.6 m S / c m 2 g II Fig. 4 except for gsynEE__ 0.45 mS/cm-, gsyn syn - 0 m S / c m 2, g~ = g~ -- 0.005 m S / c m 2, v0 = 30 Hz, n~E = 8 7 , n aI = 6 0 . '
Mechanisms of synchrony of neural activity in large networks
911
AO E
E
I
-70 mV . . . . . . . . . . [30 mV
50 ms
BO E
-70 mV . . . . . . . . . . . . . . . . . . . . . . . . . ]30 mV 50 ms Fig. 6. (A) Traces of 2 excitatory (E) neurons and one inhibitory (I) neurons in an asynchronous state obtained by increasing gsynlI from the parameters of the previous figure. (B) Average potential of the two populations. The residual fluctuations in these traces are a finite size effect. Same parameters as in Fig. 5 except for gsynlI= 0.4 mS/cm 2", v0 = 30 Hz; nE a __ 67; n Ia =
90.
912
D. Golomb et al.
1. The uncoupled neurons display a periodic behavior (limit cycle) when the external current is above some threshold. 2. Their firing rates all lie in a narrow range, i.e., the heterogeneities in the neuronal intrinsic properties and in the external inputs are small. 3. The coupling is weak, and therefore it modifies the firing rate of the neurons only slightly. If these conditions are satisfied, the original system of equations for the dynamics of the N oscillators can be replaced by a simpler set of N differential equations that governs the time evolution of the (Di 1
dd~i = 2 ~foi + 2 wijV( dpi - %)' dt M jT~i
(32)
where foi is the firing rate of neuron i, when uncoupled. Here, F is the effective interaction between any two neurons F ( ~ / - %) = ~
Z(~ --[- ~)i)Isyn(~/ -71-~)i, ~/ -~- ~)j)d~.
(33)
The function Z is the phase resetting curve [116,92,93,80] of the neuron in the limit of vanishingly small perturbations of the membrane potential, and/syn (dPi, qbj) is the synaptic interaction which is a function of the phases of the presynaptic and postsynaptic neurons. This function indicates what proportionality exists between the amplitude of a small current pulse and the phase advance (or delay) it induces in the neuron firing. The function Z depends only on the single neuron dynamics. For simple integrate-and-fire neurons it can be computed analytically. For conductancebased neuronal models, numerical methods have to be used as described in [117,118,92]. Once this effective phase interaction is determined, it can be used to analyze networks of arbitrary complexity. Reduction to a phase model assumes that the coupling is small enough for amplitude effects to be neglected. The more stable the limit cycle, the weaker this constraint will be, but it is difficult to derive quantitative a priori estimates of the validity of the phase reduction. However, predictions of phase models often remain valid, at least qualitatively, for moderate values of the coupling. The Fourier series representation of the F function can be written as
F(~)) -- gsyn [F0 -']- r ( ~ ) ) ]
c, sin(n~ + cx,) ,
~--- gsyn r 0 - It01
(34)
n=l
where F0 is the contribution coming from the zeroth Fourier component of F((])), I'((D) includes all the phasic components of F, r the normalized amplitude of the nth Fourier component and is therefore always nonnegative, and 0~n is the angle of that mode. This parameterization is chosen in order to be consistent with previous works on networks of phase oscillators [116,119].
Mechanisms of synchrony of neural activity in large networks
9 13
6.2. Examples of phase interactions There are only a few models in which the phase response function Z can be computed analytically. This is the case of the standard leaky integrate-and-fire (Lapicque model) and some of its "avatars" [31]. Neuronal models in which a saddle-node bifurcation occurs at the threshold to periodic firing is another example. In the latter case, Z can be computed analytically at the bifurcation only [120]. In all the other cases, the function Z has to be computed using numerical methods. The software package XPPAUT [80] includes a function which allows to compute Z in a precise and convenient way. Lapicque integrate-and-fire model. The synaptic coupling term (Eq. (15)) in our version of the integrate-and-fire model depends only on the dynamics of the presynaptic neuron via the variable s(t). Therefore, the effective interaction, F(qb), for this model is (Eq. (33)) r ( ~ ) - ~1 fO 2rt d , Z(q/q- qb)Isyn(*).
(35)
The right-hand side is almost a convolution integral (except for the plus sign in the function Z(~ + qb)). The function Z(d~) for the integrate-and-fire model is [31]
z(,)
--
{o 1 e(qb-qbr)/(Oo~o)
0 < ~) ~ (~r,
qbr < qb < 2=,
(36)
where the phase coordinates are obtained from the time coordinate according to = COotand ~r -- co0Tr 9Here, coo = 2 rtf0, where f0 is the firing rate of the neurons. In the simplest model (Eq. (15)), the synaptic coupling term depends only on the dynamics of the presynaptic neuron. It can be easily computed summing over all the spikes prior to time t. One finds Isyn(qb) -- 3:29syn3:0-3:1
(e-~/(~0~2) e-,/(o,0~,) ) 1 -- e-Zrc/(0~~ -- 1 _ e_Z~/(~o0~,) .
(37)
Using the fact that it is a convolution integral, it is straightforward to compute the Fourier expansion of F, defined in Eq. (34), based on Eq. (35). One obtains: Cn
22 } -1/2 (1 + n 2c%%) 22 -1/2 - 2{(1 + n2co023:2)(1 + n 20)03:2) {[ X
2I(I1 -}-
O)
02
]}1/2 (l - cos(ncooTr))
o
% - -rt + x sxgn ( g s y n ) z
+ arctan
nt-
(38)
arctan nco03:l + arctan neo03:2+ arctan nco03:0
sin(nco0 T~) [I/(I - 0)] - cos(nco0Tr) "
(39)
914
D. Golomb et al.
The first factor in Eq. (38) depends only on the synaptic properties, whereas the second factor reflects the intrinsic response properties of the neurons. The functions V(~), s(6), Z(~) and F(qb) for a specific case are plotted in Fig. 7. The function Z(dp) is discontinuous at ~ = 0 and ~ - d~r, but the function F(~) is continuous for all ~. The W a n g - B u z s ( t k i ( W B ) model. As we have seen above, this model displays a saddle-node bifurcation. Fig. 8A shows the response function, Z, of this model for a current, I = 1 ~ A / c m 2. The corresponding firing rate is f = 59.7 Hz. A remarkable property of the function Z is that it is strictly positive except in a tiny region during the spike or right after it. In this region, Z is very small. This is the refractory period of the neuron, where it is very insensitive to external perturbation. Neurons with this property will be said to display "type I R " response. The fact that the response function of Type IE neurons is positive in most of their limit cycles (and therefore these neurons have type IR response) is a rather general result. Actually, it can be
1.0
>
0.5 0.0
.
1.0
,
.
~
,
.
,
,
,
.
,
'
I
'
1
'
~
0.5 0.0
N
o
u
,
,
1
'~
10.0 5.0 0.0
-2.0 -2.5 ~
0
'
rd2
~:
3rd2
i
2~
Fig. 7. The functions V(~) (normalized to 0), s(~), Z(6) (multiplied by 1%) and F(qb) (normalized to [gsyn]) for the integrate-and-fire model with no refractory period (Tr = 0) and inhibitory coupling (gsyn < 0). Parameters: zl = 5.9 ms, z2 = 6 ms, I = 2.24 ~tA/cm2, f0 = 44.8 Hz. Adapted from [33].
Mechanisms of synchrony of neural activity in large networks
B . Hodgkin-Huxley model
A . Wang-Buzs~ki model
~E >
915
3O
30
o
0
-30
-30
-60
-60 r
'
I ....
'
,
,
[
'
i
1.5 1.0 N
0.5
0.0
0.0 -0.5
-
-0.5
,
-.
,
.
,
0.00
-0.1
-0.05
-0.2 '
0
|
rd2
'
I'"
~:
'
I'
3~/2
'
I
2~:
-0.10
r
0
"
q
~2
'
I
~:
"
'
I
3~:/2
'
I
2~
Fig. 8. Phase reduction of conductance based model. The neuronal voltage time course V, the phase resetting curve Z, and the phase interaction function F are plotted as a function of qb in the top, middle and bottom panels respectively. (A) Wang-Buzsfiki model. Parameters as in Appendix A and B except for kf = 2 ms -1 . The external current is such that the time period is To = 16.75 ms. (B) Hodgkin-Huxley model. The parameters are given in Appendix A. The current is I = 10 laA/cm2 so that To = 14.64 ms. In both figures, the synaptic coupling is inhibitory. Parameters are the same as in the case of the Wang-Buzsfiki model (A).
proven analytically that for any type IE neuron, the response function at the bifurcation has the form [120,121]
z(+) - A(1 - cos,),
(40)
where A is a normalization factor. Hodgkin-Huxley (HH) model. In contrast with the WB model, the H H model is type IIE. The difference in the bifurcation has interesting consequences. Indeed, the response function of the H H model displays a substantial negative region. This is shown in Fig. 8B for the H H model. The negative response occurs right after a broad refractory period where Z is very small. Neurons with this property will be said to display "type I I R " response. As we show now, at weak coupling, the synchronization properties of type I R and type I I R neurons are substantially different.
916
D. Golomb et al.
6.3. Stability of fully synchronized state at weak coupling Let us first consider a system of two identical neurons symmetrically coupled. The differential equations are: d~, = 2 rtfo + F((D1 - ~2), dt dqb2 ~ = 2 rtfo + F'(~2 -- ~1 ) dt taking the difference between these qb- = ~l - ~2 evolves according to
d~-
~
dt
= F-(~-),
(41) (42) two
equations,
the
phase
difference
(43)
where F - is twice the odd part of F. This last equation always has the solution ~ - -- 0, that corresponds to a fully synchronized state. Expanding around ~ - = 0, it is found that if the derivative of the F at 0 is negative, this state is stable. Otherwise it is unstable. A similar analysis can be performed for a system of N identical and allto-all coupled neurons. In this case the same conclusion is reached: a fully synchronized state always exists, but it is stable if F'(0) < 0. The stability of this fully synchronized state depends on both the synaptic interaction and on the response function Z. In [92,93,31] the conditions under which synaptic interaction leads to a stable or unstable fully synchronized state were analyzed. The answer depends on the form of the response function. If the response function displays a strong negative region after the refractory period ("type I I R " response), and the synaptic time constant is short enough, the derivative of F at 0 is negative for an excitatory interaction and positive for an inhibitory one. This is because in the integral of Eq. (33), as the phase difference ~i - ~j increases, the peak of the interaction function moves closer to the negative region of the response function. The opposite result is found if there is no substantial negative region in the response function ("type IR" response). Inhibition will lead to stable in-phase synchrony, while excitation will lead to an unstable fully synchronous state. An example of this fact is shown in Fig. 8 where the function F is plotted for the WB and the HH models with inhibitory coupling. One sees that, for ~s = 2 ms, the slope of F at the origin is negative for the WB model but positive for the H H model. Note that for slower synapse, this slope is negative for both models (result not shown).
6.4. Stability of the asynchronous state in fully connected, heterogeneous networks 6.4.1. General formalism In the preceding section we have shown that the phase reduction approach can help to clarify how the stability of fully synchronized states of homogeneous networks depends on cellular and synaptic properties. This is a first step in order to clarify under which conditions strongly synchronized states can occur in large networks. Another approach to investigate the mechanisms of synchrony is by studying the
Mechanisms of synchrony of neural activity in large networks
917
stability of the asynchronous states of large networks. Now we show how the phase reduction technique allows one to perform such study in heterogeneous networks in the weak coupling limit. We sketch here the general approach. More details can be found in [32]. Let us consider a network in which the natural discharge frequencies, co = 2 nf , of the neurons are distributed according to a probability h(o~). Let p(qb, co) be the probability that the phase of neurons with natural frequencies in [co,co + do] is at time t in [qb, qb + dqb]. This probability, p, satisfies the continuity equation Op 0 0-7 = - 0-~ (pv),
(44)
where -t-(x)
v - c o - Z { r n sin[n(, - *n)] + sn cos[n(, - *n)]}
(45)
n=0
and the coefficients rn, sn, and % can be expressed in terms of the coefficients, cn and an, of the interaction function, 9
rne In% -- Ir01e, cos =,
o ( , , t, o))h(ol)e inqbdeo dqb,
(46)
p(~, t, co)h(~o)ein* d~o d~.
(47)
oo
Sn eln*n - -
Ir01c, sin a, oo
The asynchronous state corresponds to the distribution Pas (qb' t' el)
1 2rt
-
-
-
(48)
.
To analyze the linear stability of this state one writes P(4~, t, co)as p ( , , t, co) - ~
1
+ e q ( , , t, co).
(49)
Here, s << 1. Substituting Eq. (49) in Eqs. (44), (46) and (47) and keeping only the terms of order s one finds the partial differential equation that q(qb, t, co) satisfies. Expanding q(~, t, co) in Fourier mode, rt=-k-oo
rl(,, t, 03) -- ~
Cn(t, o~)einqb
(50)
/~z--OC
one looks for a solution Cn(t, ~o) - Dn(~o)e ~",,
(51)
where Xn - gn + inf~n. We then obtain two equations that determine the bifurcation point and the value of f~n
918
D. Golomb et al.
f + ~ h(m) 2 cos an , g" dm~c p2 q_ n2(r q_ gsynFo 4- O,,) 2 rCgsynlFoIcn
f
+x
x
h(m)
(52)
n(m + gsynF0 + fl,,) -2sin % dm. P~ + n2(r + gsynFo + ~,,)2 gsynIFolc n
(53)
When the stability of the mode n changes, the associated eigenvalue is purely imaginary: )~,, = inO,. Taking the corresponding limit, gn + 0 in Eqs. (52) and (53) we get: h ( - g F o - f*,,) _ _2 cos a,,
,
(54)
rCgsynlFolc,,
~--+olim~ ~ h(m - gsynFO - ff~,.) -coh(-m - gsynFo - ~..) d m -
- 2 sin a,, gsyn IColc,,
(55)
If we keep a constant coupling strength, gsyn, these equations give the critical disorder for which the mode, n, of the perturbation, q, becomes unstable. Alternatively, we can keep the disorder constant and obtain the critical coupling. The value of the index, n, that controls the instability gives us information about the kind of dynamical state that appears in the synchronized phase near the transition. The state, in the vicinity of the transition, is a s m e a r e d 1-cluster s t a t e if the transition is controlled by the n = 1 mode. This is because the corresponding unstable eigenvector corresponds to a perturbation of the phase distribution which performs only one oscillation when qb varies over 2 re. If it is controlled by the n = 2 mode the eigenvector performs two oscillations over 2 n. Therefore, the instability tends to develop a s m e a r e d 2 - c l u s t e r state. More generally, the mode n corresponds to a smeared n-cluster state. In the case of a Gaussian distribution of frequencies 1 exp( h(m) = vF2 n:o"2
(m - {B)2) 2 ~2 "
(56)
dx e xe/2 = tan a..
(57)
e x p ( - (f~'' +
(58)
Eqs. (54) and (55) lead to - 2 f tn''+~/':"v~ F -
t~c
__ ./~_
gsyn
V 8 c,, fr0[ cos
gsy"F~+ 6')2) 2 ~c 2
where c~c is the critical value of the disorder at which the mode, n, becomes unstable. To be rigorous, the distribution Eq. (56), has to be truncated to exclude negative firing rates. However, in practice, since at the instability transition point ~ is small,
Mechanisms of synchrony of neural activity in large networks
919
O(gsyn), the contribution of the negative frequency is exponentially small and therefore one can neglect it.
6.4.2. Integrate-and-fire model and conductance-based models For phase models the critical disorder is strictly proportional to 9syn. One defines, for any given value of the coupling strength, 9syn, the robustness of synchrony, R(gsyn), as the critical disorder where the transition to the asynchronous state occurs. This critical disorder is always expressed in what follows as a fraction of the average angular frequency, 6~ in the uncoupled network. The robustness, R, is the maximal relative spread of the firing rates in the network such that all the stable attractors of the dynamics are synchronous states. Meaningful comparisons of the robustness for different neuron models require that analytical calculations or numerical simulations be performed in equivalent conditions. In such a situation, the same value of the coupling strength for different models, as the effect of a given synaptic current, will largely depend on the membrane properties of the postsynaptic neuron. Therefore, one should set the coupling strength at different values for the different models, chosen so that a single synaptic event leads in both cases to a postsynaptic potential of, e.g., 0.5 mV in a neuron at rest ( V - - 6 5 mV). This guarantees that the effects of synaptic interactions between neurons are similar in the different models. Fig. 9A and B, respectively, display the robustness of synchrony, RE, (resp. RI) in an excitatory (resp. inhibitory) integrate-and-fire network, as a function of the average firing rate, f . These results were obtained in the weak coupling limit, for a Gaussian distribution of firing rates, using Eqs. (38), (39), (56)-(58). Both RE and R~ go to infinity as the firing rate approaches zero, which means that synchrony can always be achieved at very low rates in a network of integrate-and-fire neurons, even if heterogeneities are strong. This result is also valid for a Lorentzian distribution of frequencies. In this simpler case, this can be analytically derived as shown in [32]. In the excitatory case, the robustness of synchrony, RE, monotonically decreases with increasing frequency (see Fig. 9A), and eventually vanishes at a finite frequency. This means that the asynchronous state is stable at a high firing rate, even in an homogeneous network. On the contrary, in the inhibitory case, RI is a nonmonotonic function of the firing rate and never vanishes, as can be seen on Fig. 9B. This entails that a mildly heterogeneous inhibitory network always displays some degree of synchrony. The discontinuities in the derivative visible on all the curves correspond to the frequencies at which changes in the order of the critical mode, nc, occur. As can be seen on Fig. 10A and B, the order of the critical mode monotonically decays with increasing firing rate, both in the excitatory case and in the inhibitory case, and eventually becomes 1. By contrast, it diverges at low frequency, as shown above. This means that, at low frequency, destabilization of the asynchronous state will lead to cluster states where the system splits into several groups of in-phase locked neurons [98]. Note, however, that for excitatory coupling, the rates for which clustering occurs are much lower than for inhibitory coupling.
920
D. Golomb et al.
0.05
9. ~ , t " | , . . . , . . . , . . .
A
ill
0.04
'i
0.03
RE
I
J
0.02
-
0.01 0.00
/
N% 0
20
40
"
60
80
100
f" (Hz) 0.005
\
0.004 0.003
RI
\
B
\
0.002 j,,~=
%
0.001 0.000
0
20
40
60
80
100
(Hz) Fig. 9. Weakly_coupled integrate-and-fire neurons: robustness of synchrony, R vs. the average firing rate f. Firing rates in the uncoupled network are spread according to a Gaussian distribution (Eq. (56)). Results are displayed for three sets of synaptic time constants: 1;1 = 1 ms, 1;2= 3 ms (solid line); 1;1 = 1 ms, 1;2= 7 ms (dashed line); 1;1 = 6 ms, 1;2 = 7 ms (dotted line). (A) Excitatory network. (B) Inhibitory network. Adapted from [32]. To compare the efficiency of inhibitory and excitatory couplings at synchronizing neuronal activity in integrate-and-fire networks for given values of the synaptic time constants, the ratio Re/RI is plotted in Fig. 11, as a function of the average firing rate, f . At high frequency, RE/R1 vanishes because the asynchronous state is stable for excitatory interactions, but not for inhibitory ones. When the frequency is decreased, RE/RI rapidly becomes larger than 1. This means that at low firing rate, excitation is much more efficient than inhibition at synchronizing neuronal activity in heterogeneous integrate-and-fire networks. Note that RE/RI remains finite in the limit of vanishing firing rates, in spite of the low-frequency divergence of both RE and R I .
921
Mechanisms of synchrony of neural activity in large networks
10
I
nr
'
'
I
'
! 4
.h
ta'i I l=m,.~
0
i
|
i
0
9
9
2
i
.
4
I
i
6
f'(Hz)
10
i
i
8
10
B
8
n
t.L 4
, |
2 0
i
0
.
I
20
,I ] L.................. L. . . . ,..
I
40
,
I
60
.
i
80
.
100
F(Hz) Fig. 10. Weakly coupled integrate-and-fire neurons: order of the critical mode vs. the average firing rate f in an excitatory network. (A) Excitatory network. (B) Inhibitory network. Same parameters and conventions as in Fig. 9. Adapted from [32].
One can also analyze how synchrony depends on synaptic time course (Fig. 12). For the sake of simplicity, we assume that r l - 1 ; 2 - T s y n . The synaptic interaction is then described by an " a function", :(t)
-
t e-t/~synl~)(t)
(59)
Comparing Figs. 11 and 12, we see that the ratio RE/R: diverges when fZsyn approaches 0 at fixed f , while it tends to a constant value when f'Csyn approaches 0 at fixed Zsyn. Thus, fq;syn is not a scaling variable for Re/Rz. This is because RE~R1 depends not only on the temporal modulation of the synaptic current (controlled by
922
6050~~
D. Golomb et al.
/ R]
' I ' ' ' I " ' ' I ' R ' " IE ' ' ' t
_
~
..
40 30 2 0 rm~el
9 ~.
10 0
"'~ 20
40
60
80
100
f" (Hz) Fig. 11. Weakly coupled integrate-and-fire neurons: relative robustness RE/RI vs. the average firing rate f . Same parameters and conventions as in Fig. 9. Adapted from [32].
140
R E / Rn
120 100 80 60 40 20 0
5
10
15
2O
1;syn(mS) Fig. 12. Weakly coupled integrate-and-fire neurons: relative robustness RE/Rz vs. the synaptic time rsy, for a fixed averaged firing rate f = 20 Hz and an "alpha function" synaptic interaction. Adapted from [32]. f~syn), but also on the phase response of neurons, which explicitly depends on f . This can be analytically shown for a Lorentzian distribution [32]. Are the above results modified when active properties of neurons are taken into account? To investigate this issue Neltner et al. [32] have studied networks of WB
923
Mechanisms of synchrony of neural activity in large networks
neurons. The response function was computed for various values of the external drive, I, leading to different firing rates of the neurons. Using the Fourier expansion of the response function, one can analytically compute the robustness of synchrony. The results are shown for excitatory and inhibitory couplings in Fig. 13A and B, respectively. Qualitatively, these results are similar to those obtained for the integrate-and-fire model, except that both RE and RI remain finite in the limit of vanishing firing rates for the WB model. In particular, R1 never vanishes, RE is zero
0.07
m
0.06 ~_' ' ' ~ '
i
|
i
m
9
9
A __['
' ' '
0.05 0.04
I
RE 0.03
I
0.02
I
0.01 0.00 L 0
I
l
5
10
15
f" (Hz)
0.030
'
'
'
I
''
I
"'
!
'
I
,
,~-. 80
'
0.025 0.020
~%
I
R 10.015
0.010 0.005 o.ooo
0
,
20
,
,
40
,
60
F(Hz)
100
Fig. 13. Weakly coupled WB neurons: robustness R of synchrony vs. the average firing rate f. (A) Excitatory network. (B) Inhibitory network. Same parameters and conventions as in Fig. 9. Adapted from [32].
924
D. Golomb et al.
above some critical frequency, fc, and RE is larger than RI at low frequency (compare Fig. 13A and B).
6.5. Approximate theory for randomly and weakly coupled neurons We now consider a network consisting of N randomly connected neurons with average connectivity M (see Sections 3.3 and 4.1). Randomness in the connectivity tends to oppose synchrony of neuronal activity for two reasons. The first reason is that the number of synaptic inputs fluctuates from neuron to neuron. A second reason is that neurons receive inputs from different presynaptic sources. If the coupling between the neurons is sufficiently weak it is possible to show, using dynamical mean field methods, that this second source of disorder is equivalent to a Gaussian, time correlated multiplicative noise [122]. Substituting Eq. (34) in (32) and using Eq. (17) yield the following dynamics: miFo gsyn N d ~ ) i = 2 rtfo i nt- gsyn -tZ Wij~'((~i - (~J) dt M -M- j=l
i-- 1
""'
N
'
(60)
where N
mi-- S w i j
(61)
.j=l
is the number of synaptic inputs the ith cell receives from other cells. The second term in this equation is a major source of disorder in the system which tends to desynchronize it. It reflects the tonic component of the disorder that stems from the fact that different neurons receive different numbers of synaptic inputs. For example, if F0 < 0, a cell that receives a larger number of inputs, mi, will, in general, tend to delay the time of occurrence of its next spike in comparison to another cell that receives a smaller mi. Here we assume that all the neurons are identical, i.e., f0; - f0 for i = 1,... N. In that case the first term in Eq. (60) can be gauged out of the equations. The distribution of the number of neurons that receive m inputs is binomial. In the limits N ~ ec and 1 << M << N, this distribution (Eqs. (17) and (61)) becomes a Gaussian distribution, P(Sm, M)
P(Sm, M)
- ~ 1 exp v/2 nM
M
'
(62)
where 8m = m - M. As shown in [33], an approximate theory can be developed to study the stability of the asynchronous state in this network. The analysis is based on the assumption that the state of the network at time t can be characterized by the density function p(d~, t, 8m) dqb which is the fraction of neurons with M + 8m inputs whose phase lies between d~ and d~ + dd~ at time t. It is normalized such that
Mechanisms of synchrony of neural activity in large networks
f
~ 9(qb, t, 8m)dqb - 1.
925
(63)
7~
In other words, we assume that the dynamics of a neuron depends only on the number of inputs it directly receives from other neurons. Under this assumption, we effectively map the study of sparse networks onto a study of fully connected networks with heterogeneities in the number of synaptic inputs, and therefore the analysis of the sparse case becomes similar to the analysis of the heterogeneous case discussed above, with a distribution of natural frequency given by Eq. (56) with 6 ~ - COo+ 2 ~F0gsyn
(64)
2 rcF0gsyn
(65)
and ~
9
As shown in [33,122] the results derived under this assumption provide a very good approximation of the exact behavior of the system if c, is sufficiently small (typically c,, < 1). It becomes exact in the limit c, --+ 0 for all n. Following this approach, one finds the critical number, M~. It is the minimum of the numbers, M~,,, over all positive integers, n, for which it is finite: Mc = min Mc,,,
(66)
/7
where Mc,, is determined by the amplitude, c,, and the phase, ~,, of the nth Fourier term of F(d~) (Eq. (34))
Mc, - { 8/[ rccz~z(~ '
infinity
-rc/2 < ~, < n/Z,
(67)
otherwise.
The function ]3(~), which is universal and does not depend on the model details, is shown in Fig. 14. The integer, nc, is defined as the value of n for which Mc,n is minimal. Therefore, M c - M~,, c. From Eq. (67) we see that in order to get a finite Mc,,, the condition - n / 2 < ~, < re~2 should hold. Moreover, M~,, is an increasing function of [an[. The larger the coefficient, c,, the smaller Mc,, becomes. The strength of the coupling, gsyn, does not affect either the Cn'S nor the a,'s (Eq. (34)). Hence, in the limit of weak coupling, Mc does not depend on gsyn. If the instability of the asynchronous state is supercritical, n~ determines, for M>Mc, the nature of the network state for weakly synchronized initial conditions. For example, if n~ - 1 (resp. nc - 2), the system is expected to settle in a "smeared" 1-cluster (resp. 2-cluster) state in which the phase distribution is broad but monomodal (resp. bi-modal). Examples are shown below.
6.6. Sparse networks of integrate-and-fire neurons Using Eq. (67) together with Eqs. (38) and (39) one can study in a systematic way the stability of the asynchronous state as a function of the network parameters [33].
926
D. Golomb et al.
1.0-
0.5
0.0 n
n
2
4
Fig. 14.
0 O~
~
4
n
2
A graph of the function ]3(a). Adapted from [33].
Here we will describe some of the results for inhibitory coupling. We focus on the way the critical connectivity required to have an unstable asynchronous state depends on the the firing rate of the neurons, on the synaptic time constants and on the refractoriness. No refractoriness, Tr = 0: In this case, one can show that Mc varies n o n m o n o t o nously as f0 decreases; see Fig. 15A. At high fo, nc = 1, and Mc decreases with decreasing f0. At a specific value of f0 it reaches its minimum Me,1 (f0) and then increases again. As f0 continues to decrease, a value of f0 for which n~ = 2 is reached, and Mc decreases again until it reaches a second minimum. Then, it increases until nc = 3, decreases again and so on. More generally, Mc oscillates infinitely many times as f0 decreases towards 0, i.e., as I decreases toward the threshold 0. It is also possible to analyze how Mc depends on the synaptic kinetics. If rl -- 0, and in the absence of refractoriness, a, < - n / 2 for all n. Therefore, the asynchronous state is stable for every M. As rl increases from 0 (note that ~1 ~<~2), a lower frequency is needed for Mc.l to be finite. Since cl is larger at lower frequencies, the minimal value of Mc.1 as a function off0 is smaller at higher Zl. For the parameters of Fig. 15A, it is 418 for T~ = 5.9 ms and 2825 for x~ = 1 ms. The dependence of M~ on the synaptic rise and decay times at high frequency (f0 = 102 Hz) is shown in Fig. 15B. In the left panel, Mc.l varies with T 1 for ~ 2 = 6 ms. At small T 1 , M~, is indefinite because 0~l < -re/2. It increases with zl for large zl because Cl decreases. Higher modes have M~.,'s that are out of the scale of the figure. In the right panel, Mc,~ varies with z2 for z~ = 1 ms. Again, it has a minimum with respect to ~2. At small ~2, nc = 2. The minimal number, over all the parameters of the integrate-and-fire model with inhibition and T r - - 0 , of synaptic inputs M required to destabilize the asynchronous state is obtained for Zl = T2 = q;0, where Mc -- 363.8; for t0 = 10 ms, f0 at the minimum is 31.2 Hz. This shows that integrate-and-fire neurons are hard
Mechanisms of synchrony of neural activity in large networks
927
A
'q=l ms
!/'i /'
I;1=5.9 ms
4000-
,
,I I ,ii ail
3000
o 2000
L/
~J
l il
ij[/
1000 0
/
-----
n=l
...... .----
n=2 n=3
I
'
0
50 100 Frequency, fo (Hz) '
'
'
B
I
'
'
'
'
I
'
'
I:2=6 ms
10000
0
'
50 100 Frequency, fo (Hz) '
'
'
I
'
'
'
'
I
'
'
6
'
I:~=1 ms
/
5000
0
'
'
'
I
2
'
'~'1
'
'
(ms)
I
4
'
'
'
I
6
I
2
'
'
'
I
'
4 ~a (ms)
I
Fig. 15. The first three critical numbers of inputs Mc,n for an integrate-and-fire model with inhibitory coupling and Tr -- 0. The graphs of Mc,, are denoted by the solid, the dotted and the dashed lines for n = 1,2, 3, respectively. (A) M~,~ versus f0 for 1;2-----6 ms and 1;1 = 5.9 ms (left) or 1;1 = 1 ms (right). Both figures demonstrate the nonmonotonicity of M~, the minimal M~,,, as a function of I. (B) M~,n versus 1;1 for 1;2 = 6 ms (left) and versus 1;2 for 1;a = 1 ms (right). In both cases, I = 3.2 ~tA/cm 2, f0 = 102 Hz. In the left panel, Me,2 and M~,3 are too large to be plotted; in the right panel, M~,3 is not plotted for the same reason. The scale for 1;2 starts from 1 ms because 1:2 91;1 by definition. Adapted from [33].
to s y n c h r o n i z e with i n h i b i t o r y c o u p l i n g in the sense t h a t a large n u m b e r o f i n p u t s are n e c e s s a r y to destabilize the a s y n c h r o n o u s state o f a sparse n e t w o r k o f such neurons.
928
D. Golomb et al.
The effect of refractoriness: In the integrate-and-fire model with no refractoriness, M~ is on the order of several hundreds or more. Can Mc be reduced by the inclusion of refractoriness? To investigate this issue, we analyze the effect of Tr for fixed firing rate, f0. Finite Tr contributes a factor that is larger than 1 to cn (Eq. (38)), and a term to 0t,, (Eq. (39)). This factor and term depend explicitly on co0Tr. They are small at low firing rates, but they can have a major effect at high firing rates. The dependence of M~.,,, n -- 1,2, 3 on f0 is presented in Fig. 16A for T~ = 2 ms, r2 = 6 ms, and two values of rl: 5.9 and 1 ms. As in the case Tr = 0, the stability is determined by the first mode at large f0 and by higher modes as f0 decreases. Comparing Fig. 16A to Fig. 15A, one sees that the effect of the refractory period is indeed more important at high frequencies. The dependence of Mc on T~ is demonstrated in Fig. 16B for r2 = 6 ms, rl = 1 ms. The level of external current, I, is tuned such that the frequency is fixed: f0 = 102 Hz. At small Tr, ne = 1. M~ decreases considerably with Tr (from 2941 at Tr = 0 to 141 at Tr = 4.1 ms), mainly because of an increase of cl. At Tr = 5.5 ms, ~l = - r t / 2 , and hence, nc = 2 for higher Tr. In summary, if the mode n = 1, destabilizes the asynchronous state for Tr = 0, refractoriness decreases Mc considerably at high firing rates (as long as it is not too large), but only slightly at low firing rates. 6.7. Sparse networks of conductance-based neurons We have seen that networks of integrate-and-fire neurons have Mc on the order of several hundreds and that the refractory period can substantially reduce the minim u m Mc which can be achieved. Moreover, if the refractory period is sufficiently large, the second mode is the more unstable mode and the network state is a smeared 2-cluster. Changing the absolute refractory period is a minimal and crude way to manipulate the active properties of the neurons. One can therefore wonder, if manipulating the kinetics of active ionic currents in conductance-based models can have similar effects. In order to shed light on this issue, we applied the phase reduction technique to study the Wang-Buzsfiki model [79] with different kinetic factor, qb. The results for the critical numbers, Mc, as a function of the neuronal firing rate are shown in Fig. 17A. Except for very low firing rates, the asynchronous state is destabilized by the first mode. Consequently, for M > Mc, the system is in a smeared 1-cluster state, as demonstrated in Fig. 17B. The values of Mc in this case are on the order of a few tens, which are much smaller than that for an integrateand-fire model at the same firing rate. Slowing down the kinetics of the active current (the factor ~ in the W a n g Buzsfiki model) changes the dynamics remarkably. With ~ reduced to 1 (from its reference value ~ = 5), the index of the unstable mode has changed from n =- 1 to n = 2; see Fig. 17C. As expected, the system exhibits a smeared 2-cluster state for M > Mc (Fig. 17D). The values of M~ are now larger, on the order of a few hundreds. When qb decreases in the WB model, the spikes become broader and the refractory period of the neuron increases. Therefore, these results for the WB are
Mechanisms of synchrony of neural activity in large networks
A
T~ - 2 m s
I:1=1 ms
1:,=5.9 ms
4000
i
ii ili / U
3000 O
929
2000
iil
1000
I i /'
II i/]/ ,
od
z~=l
'
'
I
50
i
n=l .......... n=2 - - - - n=3 -
'160'
Frequency, fo (Hz)
ms, fo=102 Hz
i
1
i
,
40001
" '9' ' "
~%lo.oo~oo~176176
50 100 Frequency, fo (Hz) B
.//
\l
II 1
i
2000
0
'
'
'
I
2
'
'
'
I
4
'
'
'
I
6
'
'
i
T r (ms) Fig. 16. Effects of refractory period, Tr. The first three critical numbers of inputs Mc,, are plotted for an I&F model with inhibitory coupling and ~2 = 6 ms. The graphs of Mc,, are denoted by the solid, the dotted and the dashed lines for n = 1,2, 3, respectively. (A) Mc,n versus f0 for Tr = 2 ms and Vl = 5.9 ms (left) or ~l = 1 ms (right). (B) Mc,, versus Tr for ~] = 1 ms, f0 = 102 Hz. Adapted from [33].
g o i n g a l o n g with the c o r r e s p o n d i n g b e h a v i o r f o u n d a b o v e for the i n t e g r a t e - a n d - f i r e m o d e l . H o w e v e r , in c o n t r a s t to the b e h a v i o r o f i n t e g r a t e - a n d - f i r e n e t w o r k s , Mc for c o n d u c t a n c e - b a s e d n e t w o r k s is n o t very sensitive to the value o f ~l (result n o t shown).
930
D. Golomb et al.
4~-~
Ae
r
C.
200 -~
400 1
150
300 -.
~oo
200 ~
~
~
100
50
i 0 0.0
'
sd.0 . . . . .
~06.0
0o.o ~
.
Firing Rate (l/see)
DQ
T~
To
9...... :~
::"'
0.4
"
~ ~-' ~-.l ~~~ !9 ! ,, - ,
~:"~ .[
.
s6.o .
.
.
.
~o6.o .
"~sb.o
.
Firing Rate (l/see)
BO
~oo ~I
.
. ~.'
.
"I!
.'i
~oo .:i. :, ; . , , . ;
~!;,':~
... ,..~ .,..~.~..~..~'.,
~
, , - , , ~'""'~.'; ~.;.,,,~.;..,,. i l ."". , - ~''~
-~ ,oo. ~ ~ ~,.,~.,~~ ~.~, ,,.~ , ~ i ':.
50 ms
50'ms'
0.4t~J~,LLLLLLJL~ZJL
a. 0.2 0.0
0.0
Fig. 17. Synchronization properties of the Wang-Buzsfiki model. Parameters are given in Appendix A and B except that in C and D, the kinetics of the active currents are slowed down by reducing the parameter ~ from 5 to 1 [79]. (A) and (C) The critical number Mc as a function of the neuronal firing rate, calculated using the phase reduction technique. For q 5 - 5, the asynchronous state is destabilized by the first mode, and for ( l ) - 1 it is destabilized by the second mode. (B) and (D) Simulations of network of inhibitory Wang-Buzsfiki neurons. Rastergrams of 200 neurons out of N = 1000 are shown in the upper panel; M = 100 in (B) and M -- 400 in (D). A bar denoting the time period, To, is shown above each rastergram. In each lower panel, P, the number of firing neurons within a time bin of 1 ms is displayed as a function of time. For ~ -- 5 (B), the system is in a smeared one-cluster state whereas for q5 = 1 (D), the it is in a smeared two-cluster state.
Mechanisms of synchrony of neural activity in large networks
931
7. Synchrony in one population of spiking neurons - beyond weak coupling
7.1. Fully connected, heterogeneous networks In order to assess the range of validity of the conclusions reached in the weak coupling limit one has to perform numerical simulations. In Fig. 18A and B, respectively, we show RE and R1 as a function of the coupling strength for integrate-and-fire networks, and a fixed value of the average firing rate in the uncoupled network, i.e., for fixed external inputs. For sufficiently small coupling strength, gsyn, the robustness varies linearly with gsyn with a slope which is correctly predicted by the phase model limit. WB networks behave in a similar way, although the asynchronous state is unstable in heterogeneous excitatory networks only in a much more restricted range of firing rates. Therefore, for both models and both types of interactions when the external input is kept fixed and the strength of the interaction is varied, the conclusions derived using the phase reduction method remain qualitatively valid on a finite range of coupling strength. In this range, excitation is more efficient than inhibition at synchronizing heterogeneous networks at low firing rates. The opposite is true at large firing rates. If the coupling strength is too strong, deviations from the weak coupling behavior for fixed firing rate are observed. The robustness RE, starts to decrease for larger values of gsyn. The robustness RI also decreases with increasing gsyn (although it may eventually increase again at larger coupling strength). This is expected since for fixed external input, increasing the excitatory interaction tends to increase firing rate and inhibitory interaction tends to decrease it. This change has an important effect on synchrony. An increase of the firing rate makes more difficult to synchronize the system because the synaptic interaction is less temporally modulated. For this reason, strong excitation (that leads to high firing rates) is not useful to synchronize the network. One can also ask the question of the relative efficacy of strong excitation and inhibition at a given firing rate, where the firing rate is kept constant independently of the coupling strength by the injection of a suitable bias current. When the synaptic interaction is modeled according to Eq. (15), this can be achieved by making the drive Ii on neuron i depend on gsyn as I i ( f , gsyn) -- I i ( f ) -- gsyn f ,
(68)
where/*;(f) is the constant drive required for neuron i to fireat frequency f and f is the average frequency of neurons across the uncoupled network. In the asynchronous state the total synaptic input on a neuron due to the rest of the network is then exactly balanced by the coupling-dependent term of the applied current, so that the frequency remains that of the single neuron. Let us first consider the case of a network of excitatory neurons for a fixed average firing rate f . Let us suppose that the average firing rate and the synaptic time constants are such that the AS state is unstable in the homogeneous case. When the coupling strength is increased and the external input decreased according to Eq. (68), the robustness increases. However if the coupling becomes too large an
932
D. Golomb et al.
0.05 9
0.04
R
m
m
[]
A
0.03 E
-
II
m
0.02
0.01 0.00 0.0
0.1
0.2
gsy,,
0.3
0.4
0.020
B 0.015
RIO.OlO 0.005
0.000 0.0
0.6
1.2
1.8
2.4
gsyn Fig. 18. Integrate-and-fire neurons. The robustness vs. the coupling strength 9syn (in nC/ cm2). The average free firing rate is f = 50 Hz. Filled squares: numerical simulations. Solid line: results from phase model. (A) Excitatory interactions, ~1 = 3 ms, x2 = 1 ms. (B) Inhibitory interactions, T 1 = 7 ms, x2 = 1 ms. The straight line in each figures denotes the prediction of the weak-coupling theory. Adapted from [32].
instability occurs because the inputs become subthreshold for too many neurons. The activity of the network cannot be controlled in spite of the fact that condition, Eq. (68), is fulfilled. This is due to the fact that the asynchronous state with average firing rate, f , is unstable through a "firing rate" instability. This instability will be analyzed in the following Section. The smaller the average rate f , the smaller the
Mechanisms of synchrony of neural activity in large networks
933
excitatory coupling for which this instability occurs. Therefore, strengthening the interactions can improve to some extent the robustness RE but this is limited by the fact that on the one hand the firing rate must be sufficiently small, and on the other hand small firing rates cannot be controlled when the coupling strength is increased. For an inhibitory network, the situation is different since when the coupling is increased, the external input has to be increased to compensate for the reduction in firing rate. Moreover, when the coupling is sufficiently strong, a precise tuning of the external (depolarizing) input parameters and inhibitory synaptic strength is not essential to control the firing rate. In fact, as shown in [32], if the external and synaptic currents are large enough it is sufficient to assume that both currents are proportional in order to have a control of the firing rate without need of fine tuning of the parameters. In this balanced state the firing rate of the neurons depend only on the current-frequency relation, being independent on the detailed dynamics of the model. The constant of proportionality will be denoted by K. The robustness, RI, of an inhibitory WB network in the balanced state is shown in Fig. 19A and B as a function of K for two values of the average frequency in the large K limit, f - 50 Hz (A) and f - 200 Hz (B). In both cases a much larger robustness than at weak coupling can be achieved by sufficiently increasing K. This is because, in spite of the averaging effect of the high firing rate, the synaptic current remains strongly modulated because the coupling strength is large. The conclusions of these simulations study beyond weak coupling is that as a general trend, inhibitory interactions are more advantageous than excitatory ones for neuronal synchrony in fully connected networks, except, may be in some intermediate range of firing rate which depend on the details of the model.
7.2. Finite-size effects in sparse networks In general, for finite N, even in the asynchronous state, z(N) is not equal to 0. However, for finite but large N, according to Eq. (31), synchronous states differ from asynchronous state in the way z(N) scales with N, i.e., by the way the finite size of the system affects the level of synchrony of the neuronal activity. As explained above (Section 4.1), these finite size effects can be used, in numerical simulations, to characterize the regimes of parameters in which the network states are asynchronous or synchronous. If in the thermodynamic limit, a continuous transition occurs between asynchronous and synchronous regimes when a parameter is changed (e.g., M), this transition is replaced for finite N by a crossover between the two different scalings of Z with N. These finite size effects occur no matter what the connectivity. In networks with finite M and N, sparseness is another source of finite-size effects. The reason is that the desynchronizing effect due to the sparseness depends not only on M, but also on the relative values of M and N. For instance, in Eq. (60), the synchrony level in the system is determined by the competition between the term that includes [~ that may be synchronizing, and the distribution of mi/M that tends to desynchronize the system. If M is sufficiently large, this binomial distribution of mi/M is approximated by a Gaussian with a variance 1/M (Eq. (62)). When N is finite but large enough
934
D. Golomb et al.
0.20
"A
0.15
R I O.lO 0.05
0.001P, 0
= 5
9' . ' , ' . ~ 10 15 20 25
K
9' 30
9 35
0.030 0.025 0.020
Ri o.ols 0.010 0.005
5
10
15
K
20
25
30
35
Fig. 19. Inhibitory network of WB neurons. Robustness, Rl vs. the scaling factor K (see text), vl -- 7 ms, ~2 - 1 ms, f -- 50 Hz (A) and 200 Hz (B). Same conventions as in Fig. 18. Adapted from [32].
(say, N > 30), and N - M is also large enough, the binomial distribution of m i / M is well a p p r o x i m a t e d by a G a u s s i a n with a variance
Var
- M'
' N"
/69)
This suggests t h a t the level of synchrony should depend on M and N t h r o u g h the effective variable, Meff, defined as
Mechanisms of synchrony of neural activity in large networks
1
1
1
Mefr
M
N
935
(70)
R e l y i n g o n this h e u r i s t i c a r g u m e n t , we expect t h a t Mofr is a m o r e a p p r o p r i a t e par a m e t e r t h a n M to r e p r e s e n t s i m u l a t i o n results. N u m e r i c a l s i m u l a t i o n s c o n f i r m this c o n c l u s i o n as s h o w n in Fig. 20.
1.0
A
II
0.8 0.6
I
,"'
i
.'"'"
,,,,
9
" x
0.2
I #x
N=800 N=1600 N=3200
X
0.0 0.0
N=200
x N=400
0.4
1.0-
,,"
I
1
'
1000.0
B
I
'
2000.0 M
I
'
3000.0
"1
4000.0
0.8 0.6 0.4 9
0.2
,~i-~'
~_x x ~" ~ ' ~ "
0.0 0.0
'
~'
o~/
|
I
5000.0
|
u
!
u
I
10000.0
Meff Fig. 20. Simulations of the I&F model with Tr = 0, zl = 1 ms, Z2 = 3 ms, I = 4 ~tA/cm 2 (f0 = 144.3 Hz), 9syn = - 0 . 4 ~tA/cm 2. Simulations are carried out with N = 200, averaged over 100 realizations of the network connectivity and random initial conditions (dots), N = 400, 50 realizations (crosses), N = 800, 25 realizations (dashed line), N = 1600, 20 realizations (dotted line) and N = 3200, 10 realizations (solid line). The integration time step At - 0.25 ms; The runs were performed over 4 x 104 time steps. The synchrony measure was computed after discarding a transient of 2 • 104 time steps. (A) The synchrony measure Z versus M. (B) The synchrony measure Z versus Mort (Eq. (70)). The lines for N - 800, 1600, 3200 almost coincide with each other, and the deviations of the line with N = 400 and N = 200 is small. The analytical estimate for Mc is denoted by the arrow. Note that for Meff ~ 2500, Z decreases with N " Z (x 1/v/N. Adapted from [33].
936
D. Golomb et al.
7.3. Sparse networks beyond weak coupling Here we are interested in how Mc depends on the coupling strength in the integrateand-fire model beyond the weak coupling regime. In Fig. 21, we plot results from simulations showing the synchrony measure g vs. Meff for a fixed value of the external input, I - 4 gA/cm 2, and different coupling strength, 9syn- F o r sufficiently low values of 9syn, z(M) depends weakly on 9syn, and Mc remains close to the weak coupling estimate, although it starts to increase. For larger -gsyn (e.g., 1.0 laA/cm2), the limit of stability of the asynchronous state becomes substantially different from its weak-coupling limit. Moreover, the system displays multistability: 1-cluster state (upper branch) coexists with asynchronous or 2-cluster state (lower branch). Therefore, increasing gsyn opposes synchrony instead of helping synchrony, as one would expect at first sight. This is also clear from Fig. 22A (solid line). Here we have
1.0 / /
oO
-gsyn
(l.tNcm2)
~
0.2
o
0.5
- -9- " --'"
0.4 0.6 0.8
---
J
0.0
0
~'
.
/
/
/
, 1000
1.0 1.2
J
/
" .........
J
,
, 2000
Me. Fig. 21. Beyond weak coupling: simulations of the inhibitory I&F network. The synchrony measure Z is plotted vs. Merf for different synaptic strength. Parameters are: Tr = 2 ms, Zl - 1 ms, 1:2 - 3 ms, I = 4 gA/cm 2, (f -- 112 Hz) N - 800. Simulations with 4 • 104 time steps with At -- 0.25 ms. Time averages were computed after discarding a transient of 2 x 104 time steps. The results were averaged over 100 realizations. The arrow denotes the analytical estimate for Mc in the weak coupling limit. In the asynchronous regime, the average firing rate in the network changes from f ~ 105 Hz for g s y n - - -0.2 gA/cm 2 to f ~ 80 Hz for gsyn -1.2 gA/cm 2. Adapted from [33]. =
Mechanisms of synchrony of neural activity in large networks
937
plotted Z versus gsyn for a given value of M and I. Not only does one find that the synchrony of the system decreases when the coupling increases, but one also sees that there exists a critical coupling, go(M), above which the network settles into an asynchronous state. In these simulations, I was kept fixed. Therefore, the firing rate decreased when the inhibitory coupling was increased. In order to compare networks with comparable average firing rates but different coupling strength, we have performed another set of simulations in which I and gsyn increase in such a way that the average firing rate, f , in the asynchronous state was kept fixed. The results are shown on Fig. 22A (broken dashed line). Here also, Z is a decreasing function of-gsyn, although it varies more slowly than when the firing rate is not fixed. For a sufficiently strong coupling, Igsyn[ > gc(M), the network settles into a state in which neurons fire in an asynchronous and (almost) periodic fashion. By further increasing gsyn, one can keep the population average firing rate constant by increasing I. However, the neuronal activity becomes more and more irregular. This is shown in Fig. 23 where the population average of the coefficient of variation (CV) of the interspike distribution of the neurons is plotted as a function of the synaptic strength. Interestingly, it varies monotonously. Note that it varies very smoothly and that there is no apparent transition between a regime in which the neurons are firing periodically and another one in which they are firing irregularly. When the coupling is sufficiently strong, the neurons are firing with a very high CV. Moreover, the distribution across a network of the time average firing rates of the neurons becomes broad. An example of the traces of the neurons in the strong coupling regime is plotted in Fig. 24. The firing pattern of the neurons is irregular and the four neurons represented have very different firing rates. The firing pattern in the strong coupling regime is investigated further in Fig. 25 for f = 100 Hz, and gsyn - - 5 0 laA/cm 2. In this figure, we present results from the simulations of networks of different sizes (N - 400, N = 800 and N = 1600), but the same Meff (Meff = 800). Fig. 25A shows the distribution of the firing rates of the neurons. The coefficient of variation of the interspike distribution (CV) of the neurons versus their firing rates is displayed in Fig. 25B. Remarkably, the curves for the three sizes superimpose. In Fig. 25B, the three curves are indistinguishable. This means that for M and N sufficiently large, in the strong coupling regime, the dynamic state of the network depends, on M and N, through Meff. Note that the variability in the firing pattern is stronger for the neurons with the lower rate. For fixed 9syn, the lower the population average firing rate or the lower Meff, the larger the variability of the neuronal activity (result not shown). This is in agreement with results obtained by previous authors [87,88,123].
8. Stability of the asynchronous state of integrate-and-fire networks: theory at all coupling strength
8.1. The method For the integrate-and-fire networks with all-to-all connectivity and arbitrary number of populations, it is possible to analyze the stability of the asynchronous state at
938
D. Golomb et al.
A
,.
1.0-
....
fixed
I
fixed
f
0.8 0.6 0.4 0.2 0.0 0.0
'
210
-gsyn
4.'0
'
(i Ncm2)
6'.0
95=0.15 ~Ncm 2
5=0.3 gNcm
2
0.4 0.2 0.0 0.0
,
,
2.' 0
'
4.0 '
610
-gsyn (gNcm2) Fig. 22. The synchrony measure X is plotted vs. the coupling strength gsyn for M - 400, N = 800 (Merr = 800). Parameters are: Tr = 2 ms, zl = 1 ms, z2 -- 3 ms, N = 800. Simulations were carried out with 4 x 104 time steps and At = 0.25 ms. (A) N o heterogeneity. Solid line: The external current is fixed at I = 3.632 laA/cm 2. The firing rate decreases with increasing -gsyn from 100 Hz for gsyn = 0 to 56 Hz for gsyn - - 2 pA/cm 2. Dashed line: The external current increases when -gsyn increases such that the average firing rate in the asynchronous state remains constant (f = 100 Hz). (B) Heterogeneity. the external current varies from neuron to neuron and is homogeneously distributed between [ - 6 and [ + 6. The average, i,
Mechanisms of synchrony of neural activity in large networks
939
0.80.6
>
o 0.4 0.2
0.0
,
0
,
.....
,
[
t
4O
'-
~
I
8O
-gsyn (1aNcm2) Fig. 23. The coefficient of variation of the interspike distribution as a function of the synaptic strength. Parameters are: Tr = 2 ms, ~l - 1 ms, "C2 - - 3 ms, f - 100 Hz. The level of external current is varied together with the synaptic strength such that the population-averaged firing rate remains constant: f = 20 Hz. Averaged are over 160,000 time steps (At = 0.125 ms). any coupling strength. This was shown by A b b o t t and van Vreeswijk [81] for a network consisting of one population of fully connected identical neurons. In the following, we present a generalization of this approach for the case of heterogeneous fully connected networks comprising both excitatory and inhibitory neurons. For simplicity, we first consider a one-population network consisting of N interacting neurons. For convenience we use the formulation in terms of reduced dimensionless parameters, see Eq. (4). To simplify the notation we drop the bar in the notations; time is measured in units of t0 = 10 ms. The firing threshold of the neurons is t? = 1. We assume that the external input is an external current, constant in time, but different from one neuron to the other. We denote by Ii, (i = 1 , . . . ,N) the external current received by neuron i. We consider the case where N is very large and where the external current is drawn at r a n d o m from a distribution P(I). The synaptic interactions will be given by Eq. (15). They will be characterized by a
is determined such that the population average of the firing rate, f , in the asynchronous state is kept constant, f = 100 Hz. Solid line: the width of the distribution is 8 = 0.15 laA/cm2; The network activity is synchronized for 1.0gA/cm2 <-gsyn < 4.4gA/cm2. Dashed line: the width of the distribution is 8 - 0.3 gA/cm 2. The network activity is always asynchronous. Simulations with larger N show that the residual nonzero value of ~ are finite-size effects (result not shown). Simulations indicate that in the fully connected network with the same parameters, the asynchronous state becomes unstable at -gsyn ~ 1.21aA/cm2 (result not shown). Adapted from [33].
D. Golomb et al.
940
Z2_ J
y
120 mV
100 ms
Fig. 24. Voltage time courses of 4 neurons in the strong coupling regime. Parameters are: M = 240, and N = 400 Tr = 2 ms, xl = 1 ms, x2 = 3 ms, f = 20 Hz, Ysyn - - --50 gA/cm 2. The network size is N = 400; M = 240. Parameters are: ~l = 1 ms, ~2 • 3 ms, Tr = 2 ms. dimensionless parameters, g s y n which measure their strength in units of the current threshold. The instantaneous firing rate of neuron i at time t is denoted as f.(t). In the asynchronous state the firing rate foi is given by 1
~0"1
dx -= -log(1 - x + 1i + 9synf
- ~.)
(71)
where f is the firing rate averaged over the all population
1Zfo J
j
(72)
and
Hi - li + gsynf
(73)
is the total input received by neuron i. N o t e that the contribution of one synapse to this input is on average gsynf/N. It is clear that Eq. (71) applies only for neurons for which the total input H / i s suprathreshold. N e u r o n s for which this is not the case are not firing. The distribution, Q(T), of the firing periods of the neurons in the asynchronous state can be related to the distribution of external current. One finds
941
Mechanisms of synchrony of neural activity in large networks
.......... N=400
A X (b
o
] I
N=800 N=1600 .,
,~. ,,I
0.005
-:;
O (b ...Q
E
Z
0.000
|
,
,,
!
,~L_
B
1.00.8 >
O
0.60.4 0.2 0.0
0.0
100.0 200.0 Firing Rate (s-~)
Fig. 25. The distribution histogram of the firing rates (A) and the coefficient of variation of the interspike distribution (B) for Moff = 800, and N = 400,800, 1600. Parameters are: Tr = 2 ms, ~1 = 1 ms, ~2 = 3 ms, f = 100 Hz, 9syn = - 5 0 pA/cm 2. Simulations are done with 3.2 x 10 6 time steps and At = 0.125 ms. Results for N = 400,800, 1600 are averaged over 40, 20, 10 realizations respectively and a bin width of 2 Hz. In both figures the silent neurons (neurons with firing rate < 2 Hz which are about 6.2% of the population) are not represented. In (B), the three curves superimpose almost perfectly and are hardly distinguishable. Adapted from [33].
Q(T)-P
1 1 -exp(-T)--gsy
nf )
exp(-T) (1 - e x p ( - T ) ) 2
O(T) + /__~gsyn/p(i)dI6(1/ T), (74)
where | is the H e a v i s i d e function. In the following we a s s u m e t h a t the d i s t r i b u t i o n P is such t h a t all the n e u r o n s are firing, i.e., the integral in the r i g h t - h a n d side o f Eq. (74) is zero. M o r e o v e r , for r e a s o n s w h i c h will b e c o m e clear later we s u p p o s e t h a t for T ~ ec, Q ( T ) ~ 0 faster
942
D. Golomb et al.
than exponential. This imposes some constraints on the distribution P(I). In the discussion, we will evoke briefly what happens if this constraint is relaxed. Since we have assumed that all the neurons are active in the AS state, one can define a variable yi as y; -
fo xi
f oi dx _, --Xi + Ii + gsynf
(7s)
which evolves in time according to
dyi = foi + K(yi) ~1 Z ~j(t), dt J
(76)
where
ej(t) - f j.(t) - foj
(77)
and K ( y i ) ~.
gsyn fog _. --Xi -'[-Ii -+- gsynf
(78)
The probability, p(y,f, t), that a neuron with firing rate f is at point y at time t, P(Y,f, t) satisfies the continuity equation c~p(y,f , t)
OJ(y, f , t)
~t
Oy
(79)
'
where
J ( y , f , t) --
(
1
)
f + gsynK(y)~ Z
e.j(t) p ( y , f , t).
(80)
J
In the asynchronous state J ( y , f , t ) f . Perturbation around this state leads to a slightly modified J ( y , f , t) and the quantity j ( y , f , t) - J ( y , f , t) - f satisfies
8j(y, f , t) 1 Ot = K(y) -~ Z
j
~j(t) - f
8j(y, f , t) . Oy
(81)
For synaptic interactions there are a difference of two exponentials (Eq. (13)) with the time constants ~l - 1/Y1 and ~2 - l/Y2, Eq. (81)can be represented by the equations d~j(t) = -Y1 ej(t) + hi(t), dt
dhj(t) d - - - ~ = -Y2hj(t) + Yl Y2 Z J ( 1 , f , .t~
(82) t).
(83)
Assuming for all the quantities j ( y , f , t), e9(t), hi(t) a temporal dependence of the form exp()~t), and integrating Eq. (81) we obtain the equation for the eigenvalues
943
Mechanisms of synchrony of neural activity in large networks
()k, --]- ]tl)()L if- 72) -- ~'1 ]_f______~_22 f l dyiK(yi) exp()~yi/foi) N i=1 exp()~/foi)- 1 "
(84)
Replacing xi by yi in the definition of K(yi) using Eq. (76) we finally obtain
(85)
()~ + y1)()~ + y2)( 1 + )~) _ gsyn)~'yl'Y2~ f o i exp((1 + ~,)/foi) - 1 N i=1 Hi exp(L/foi)- 1
Since N is large, one can replace the discrete sum of Eq. (85) by an integral. One finds ()~ + y1)()~ + y2)(1 + L)
=
(86)
gsyn~71~/2(A1 q-A2),
where
A1 --
A2 --
dT Q(T) (1 - e x p ( - T ) ) ,
(87)
C(T)
(88)
T
~0 ~
fo
dT 1 - exp(-)~T)
with G ( T ) - 2 Q ( T ) ( s i n h ( T ) - 1)
(89)
T
Since we have assumed that the distribution Q(T) decays faster than exponentially for T ---+ oc, the integrals in Eq. (88) are convergent at their upper bounds. The asynchronous state is stable if all the eigenvalues, solutions of the spectral equation (86) have a negative real part. If the instability occurs continuously when some parameter is changed, this will happen when one of the eigenvalues crosses the imaginary axis. At this point, ;~ = iv and singularities appear in the calculation of A2. These singularities can be isolated and the result for the real and imaginary part of this integral are given by
,
~(A2) - v
~(A2) =
+
1
en=l
dy
2 v .=l
1 2v
G(2 Ten~v)
G
) + -~1/0 dTG(T),
+ 2 mr v
dy G
sin(y) 1 - cos(y)"
- G - y + 2 nrc v
(90)
sin(y) 1 -
cos(y)
'
(91)
As G(y/v) goes as Q(O)y/v in the limit of small y, these integrals are nonsingular and they can be numerically evaluated. Finally, the equations for the transition are obtained taking the real and imaginary part of Eq. (86)
944
D. Golomb et al.
V2( - 1 -- ~1 -- ~2) -l- ]tly 2 -- -gsynVY172~(A2),
(92)
--V2 + ]tl -+- 72 nt- 713t2 -- gsyn]t172(Al -t- ~ ( A 2 ) ) .
(93)
The simultaneous solutions of these equations give the values of v at the synchronization transition and the critical coupling for a given value of disorder. This critical coupling can be positive (excitatory) or negative (inhibitory) according to the value of v. The sign of the coupling will be the same as the sign of --V2 + ~1 -~- ~2 '~ ~l]t2 because A~ + N(Az) is always positive. At small disorder the solutions appear in pairs, each one near vn = 2 ~nf. These solutions correspond to an instability of the AS toward n-cluster states. As the disorder increases, the values of v inside each pair tend to become closer until eventually they coalesce and the solution disappears. In Fig. 26 we show results derived from Eq. (86) for a Gaussian distribution of periods it-r0 )2
Q(T) - z exp
2~2 |
(94)
where z is a normalization factor such that 1 - foX dT Q(T). It represents the stability boundaries of the asynchronous state in the plane o - - gsyn for two values of the average firing rate. Different branches correspond to different solutions with a different number of clusters (1, 2, 3 and 4 clusters). Beyond some value of o-, o-max the asynchronous state is always stable, no matter the coupling. This is the value corresponding to the maximum over all the cluster states of the point where the solutions coalesce. The index of the cluster state that controls this critical value depends on the parameters of the system. For instance, for a mean firing rate of 60 Hz and synaptic time constants equal to 5 ms, the relevant state is a 3-cluster state, see Fig. 26A, while for 60 Hz and 20 ms the relevant state is a 1-cluster state; see Fig. 26B. In general, as the firing rate is lower and the synaptic time constants faster, the higher-order cluster states become the relevant ones. Interestingly, as the synapses become slower, O'max becomes larger. Therefore, slow inhibition is more powerful to synchronize than fast inhibition.
8.2. The phase diagram of two population integrate-and-fire networks A technique similar to the one introduced in the previous version can be applied to networks of two populations, one excitatory and one inhibitory with ArE and N~ neurons, respectively. Neurons from the two populations have the same intrinsic properties. We will not show the details of the calculations but only the final result for the spectral equations. Denoting the firing rate and the total input current of neuron i in population a in the asynchronous state by f0~ and H~, respectively, one finds
=
Y1EY2EYlIY21 )~2
( 1 + )~)2
UEI U,tE,
(95)
Mechanisms of synchrony of neural activity in large networks
945
A. =~'2=5ms
E >,
- - - - - nc=l ............ nc=2 . . . . no=3 - - - - no=4
-5
-10 .
0.1
0.0
B.
.
.
.
I
0.2
-
-
,
------3
0.3
: = 2=20ms
-50 t'q'l
-100 -150 ! 0.0
. . . . .
|
0.1
0.2
0.3
Fig. 26. Phase diagram of a heterogeneous population of inhibitory I&F neurons. The average firing rate is fixed to be 60 Hz. The period of the neurons is distributed according to Eq. (94). In the shaded region the AS is unstable. Outside this region, the AS is stable. At strong coupling and weak disorder the stable AS coexists with a stable strongly synchronizes oscillatory state (not shown). (A) ~1 = "c2 = 5 ms, (B) ~ = ~z = 20 ms.
where Yi~ = 1/xi~,i S~13
gsyn = ~
1,2, cx - E , I and
f ~ exp((1 + )~)/foi) -- 1 i=l H7 exp(X/f~)- 1 "
(96)
As in the previous section, the a s y n c h r o n o u s state will be stable if all the eigenvalues have a negative real part, and synchronization transitions occur where the real
D. Golomb et al.
946
part of one of the eigenvalues vanishes continuously. This condition, ~(~) = 0, is equivalent to a pair of equations which determines the location of the instabilities of the asynchronous state as a function of the firing rates distribution of the neurons and of the synaptic parameters. In general, these equations must be solved numerically. There are two limits for which one can proceed further analytically. One is the limit of slow synapses which will be addressed below. Another analytically solvable case is the limit of weak coupling in which g s~y n , 0~ - - E, I, ]3 - E, I are of the same order and small. The limit of slow synapses: The spectral equation, (95), can be further simplified if one assumes that the imaginary part of the eigenvalues is going to zero in the limit of infinitely slow synapses. For simplicity we assume here that 3'lE = '{2E = 71t = 72~ = '{- Assuming that in the small 0~ limit, ;L scales like - ~,7,
(97)
where ;L is finite, one finds that ~, is a solution of the equation (~ + 1)2
- - g st;E yn
E1 IE (j EU) 0"E) ((~ + 1) 2 - g"syn O1) = gsyngsyn
(98)
where
-
U~-
~~
Q~(T)
(99)
dTT2I(T)[I(T)- 1]'
where Q~(T) is the distribution of intrinsic periods of the population 0r and I(T) = 1/(1 - e x p ( - T ) ) . It is easy to check that this equation can also be derived from the stability analysis of the fixed point solutions of dh~ = -yh~ + i~, dt
(lO0)
di~
dt = -7i~ + r~(hE, h,),
(101)
where a t - E,I, and the functions r~(hE, hl) are defined by
r~(hE, h,) --
/o
(
dlP~(I)F I + Z gsynh ~13 ~=E.I
)
(102)
with F(x) = - 1 / l n ( 1 - 1/x) and P~(I) is the distribution of injected current on the population Qr Eqs. (100) and (102) are nothing more than the dynamical equations of the heterogeneous rate model corresponding to the two-population integrate-and-fire network. Note that the "rate"-like dynamical variables are actually the excitatory
Mechanisms of synchrony of neural activity in large networks
947
and inhibitory synaptic conductances and that the time constants of the "rate" dynamics are the synaptic time constants (see also [124-126]). Therefore, in the limit of slow synapses, the instabilities of the asynchronous state of the 2-population integrate-and-fire model are the same as the instability of the fixed point of the corresponding rate model. This is not valid if the synapses are not slow. The general structure of the phase diagram. The imaginary part of the unstable eigenvalue at the transition, v, gives us information about the type of dynamical state in which the system settles in the synchronized phase. A complete study of the spectral equation shows that, in general, three types of transitions can occur: 9 Transitions to synchrony of spikes: These transitions exist only if the synapses are not too slow. If the firing rates of the two populations are the same and the coupling is weak, v is proportional to the firing rate of the neurons in the asynchronous phase. As the disorder increases v changes continuously, but for a given level of disorder v depends mainly on the firing rate and only very weakly on the synaptic time constants. The unstable mode performs oscillations with a frequency which is on the order of the firing frequency or one of its multiples. The neurons fire spikes. If v is integer multiple of the firing rate, there is clustering. If the populations have different firing rates the behavior is more complex, depending on both rates, but still it will be proportional to the firing rates of one of the populations at weak coupling. 9 Transitions to synchrony of bursts: These transitions remain when the synapses are very slow. In this case, v is proportional to the inverse of the synaptic time constant. The unstable mode performs oscillations with a period which is controlled by the synaptic time constant. In the limit where the synapses are infinitely slow, these transitions can be found by analyzing the stability of the fixed point solution of a corresponding rate model. If the synapses are sufficiently slow, these instabilities make the neurons fire bursts of spikes. Since the neurons in isolation can fire only spikes, these bursts are a network effect. 9 Instability of the network level of activity v = 0: In this case the perturbation, j(y, E, t) grows exponentially without oscillating in time. It corresponds to a change of the firing rate of the system. If there was another asynchronous state with a different firing rate the system could settle in that state. If the asynchronous state is unique, the system settles in a quiescent state in which there is no network activity. These instabilities remain if the synapses are very slow and they have their counterpart in a corresponding rate model. Phase diagram of the full model, an example. Several options can be taken to discuss the properties of such a multiparameter problem. Here we establish the phase diagram of the system fixing the firing rate of the two populations and taking as parameters the strength of the four interactions. Therefore, when the interactions are changing, the external inputs have to be modify accordingly. We further simplify the discussion by assuming that ZlE = I l I , T2E = ~21 and that the firing rates of the two populations, averaged over each population, are the same in the asynchronous phase. In this case, remarkably, one can show that only two relevant coupling parameters remain, namely,
948
D. Golomb et al.
~-.-_
EE
and
I1
gsyn + gsyn
~_
EE
II
E1
IE
gsyngsyn -- gsyngsyn"
(103)
These p a r a m e t e r s are the trace and the d e t e r m i n a n t of the matrix gsyn,a[3respectively. In Fig. 27, as an example, we show the phase d i a g r a m when the firing rate of each p o p u l a t i o n in the a s y n c h r o n o u s phase is 50 Hz, the synaptic time constants are all 1
3.0
~
I
~
i
,
, 1.0
AS Unstable
AS Unstable
~.
1.0 s
Burst nsition
.,J
am, ~
-1.0
t
1-Cluster AS Unstable
AS Unstable -3.0
F -2.0
.
, -1.0
,
, 0.0
-D Fig. 27. Phase diagram of a system of integrate-and-fire neurons with two fully connected populations. The mean firing rate of both populations is 50 Hz. The distribution of intrinsic periods is a Gaussian with a dispersion of 10% of the mean value. The rise and decay time of the interactions are equal to 10 ms. The shaded area represents the region where the asynchronous state is stable. The long-dashed line correspond to the instability with Im(X) beyond which a firing rate of 50 Hz cannot be controlled. On the solid line and the short-dashed lines spike-to-spike synchrony emerges in the form of a smeared 1-cluster state. The dotted line corresponds to the burst transition. The dashed-dotted line is the burst transition line in the limit of infinitely slow synaptic kinetics. The narrow-long-dashed line corresponds to the trajectory in the phase diagram when gs//n[ increased for fixed gsynEE=1.2 and 1 - l _ l E:
IE gsyn gsyn I "-- 1.6.
949
Mechanisms of synchrony of neural activity in large networks
equal to 10 ms, and there is a Gaussian distribution of intrinsic periods with a width of 10% for both populations. One of the remarkable features of the phase diagram of the two population system is that the dynamical behavior depends in a complex way on the coupling parameters. However, as shown in Fig. 27, three generic mechanisms exist for the emergence of synchronized activity. 9 I n h i b i t o r y m e c h a n i s m . For sufficiently strong gsvnlI and not too large level of heterogeneities, the asynchronous state can become unstable through the synchronization of the network activity on the time scale of the spikes. This requires that gsynlIis larger than some critical value gsyn,c which is an increasing function of heterogeneity level. If the level of heterogeneities is too large, this transition disappears. This scenario for the emergence of synchrony generalizes to the two-population model the mechanism studied in Section 6.4. 9 C r o s s - t a l k m e c h a n i s m . The asynchronous state loses stability through this E/ gsyn] IE is increased. It mechanism, for instance, when gsynEE_ gsynlI= 0, and F I - [gsy corresponds to the occurrence of synchronous activity on the time scale of the spikes. This synchronized state can be destroyed if " g~yn ~1 is9 too large. However, it is more robust to heterogeneities than the previous mechanism. 9 R e c u r r e n t e x c i t a t i o n . This scenario requires a substantially strong cross-talk. An important difference with the two other mechanisms is that it leads to a state in which the neurons can fire bursts of spikes. This mechanism requires a sufficiently strong excitatory feedback within the excitatory population. Moreover, inhibitory ii feedback within the inhibitory population plays against this state. Indeed, as gsyn increases, the asynchronous state becomes stable again. Note that if is sufficiently large (and the heterogeneity level is not too big), the system resynchronizes through the inhibitory mechanism. When gsynII is changed, the trajectory of the network in the phase diagram is a segment of a straight line; see Fig. 27. The slope of the trajectory, as well as its starting point, depends on gseyn. The bigger" gsyn,EEthe smaller the slope and the higher in the diagram is the starting point. The position of the line is controlled by II. When H increases, the trajectory moves to the left. Therefore, the precise sequence of states in which the network settles when inhibition increases depend on H and gseven. This mechanism is more robust to heterogeneities than the two previous mechanism. For comparison, we have plotted the transition line corresponding to the instability which occurs in the rate model. One sees that the true transition to bursting is located beyond this line. We have found that this is the generic situation. Therefore, in the rate model the amount of excitatory feedback required to destabilize the asynchronous state through the recurrent excitation mechanism is underestimated. A transition exists between the bursting synchronized state and the cross-talk induced synchronous state. It cannot be determined analytically. Numerical simulations show that it is a very sharp transition which prolonged the burst transition line in the region where the AS is unstable (not shown). Note also that the "rate instability" extends into the region where the AS is unstable. We did not plot this
gsynlI
950
D. Golomb et al.
extension since it does not correspond to an instability of the asynchronous state but of the synchronized bursting state. 9. Synchronization of bursts in thalamic oscillations 9.1. Thalamic sp&dle oscillation in vivo and in vitro
In the previous Sections, we have studied how synchronous states emerge in large networks in which the neuronal activity is induced by an external input. In this section we consider a different situation in which oscillatory synchronous patterns of activity arise from network mechanisms, although the cells (or at least most of them) are quiescent, when isolated, in the absence of external input. This situation corresponds, for instance, to the spindle oscillations at 7-14 Hz, a highly coherent rhythm which occurs spontaneously at the onset of sleep or drowsiness throughout the entire thalamocortical system [128-131,60]. It has been known since the 1940s that the thalamus is at the origin of this rhythmic phenomenon [132]. It has been proposed that this rhythm involves principally two types of thalamic neurons (see Fig. 28): the thalamocortical (TC) relay cells that excite the reticular thalamic (RE) cells, which in turn send back synaptic inhibition to TC cells and elicit postinhibitory rebound (PIR) bursts of action potentials in these cells [133,131,60]. The rebound response to inhibitory postsynaptic potentials (IPSPs) is produced by a low-threshold T-type Ca 2+ current ICa-T [134-137], together with a hyperpolarization-activated cation "sag" current Ih [138,139]. The discovery of spindle-like oscillations in ferret thalamic slice preparations, which contain the dorsal lateral geniculate nucleus (LGN) in reciprocal connections with the perigeniculate nucleus (PGN) [140-142], strongly enhanced the possibilities to explore the mechanisms that generate this rhythm. These ferret slice experiments can be well controlled and they provide systematic and detailed observations regarding the neuronal and synaptic mechanisms underlying spindle generation. These experiments reveal that single RE and TC cells are generally quiescent, and the oscillations, which can be evoked, or even start spontaneously at one point along the slice, are a collective phenomenon of the network. Under normal conditions, each RE cell bursts almost every cycle, whereas each TC cell bursts almost every 2-3
GABAA
GABAA
RE_
GABAB J
and
AMPA
1
TC
Fig. 28. Synaptic architecture of the thalamic network. RE cells receive TC excitation via AMPA synapses and intra-RE inhibition via GABAA synapses. TC cells receive both fast GABAA and slow GABA~ synapses from RE cells. Adapted from [24].
Mechanisms of synchrony of neural activity in large networks
951
cycles (2:1 or 3:1 bursting mode). When GABAA inhibition is blocked, "bicucullineinduced slow oscillations" can be generated. In these slower oscillations, at about 4 Hz, each RE or TC cell bursts almost every cycle (1:1 mode). Excitation is crucial for the rhythm, as blocking AMPA excitation terminates the activity. During each episode, the activity is generated at one region and propagates along the slice. Thalamic spindle oscillations are different than other types of synchronized oscillations discussed above. Indeed, here the oscillations emerge as a collective, network phenomenon as an interplay between the intrinsic properties of the cell, in particular, the postinhibitory rebound mechanism of the TC cell, and the synaptic and network properties. This activity does not require an external input. Interestingly, the fact that TC cells burst every second or third cycle, with the possibility of skipping bursts, is an experimental manifestation of a "smeared" cluster state. The T C - R E system has been modeled recently by theoreticians in order to shed light on the mechanisms of emergence of spindle oscillations. In the following, we sketch the modeling approach of Golomb et al. [22-24] and describe briefly the main results they have obtained. 9.2. The model
During spindle oscillations, neuronal activity is synchronized on the slow time scale of the bursts of activity, whereas on the much faster time scale, corresponding to the firing of the single sodium spikes, the system remains asynchronous. Therefore, to study the emergence of this rhythm it is possible to model the single neurons dynamics taking into account only the ionic currents that are important for the neuronal bursting, such as the T-type low-threshold calcium current and the h-current ("sag") in TC cells. Due to the fact that on the fast time scale of the sodium spikes the network state is asynchronous, one can represent the synaptic interaction in an effective way by averaging over the fast time scale of the spikes. These simplifications improve the stability of the numerical schemes used when integrating the differential equations of the network dynamics. This allows one to choose reasonably large time steps, and hence to reduce substantially the typical CPU time of the simulations, and to separate conceptually the issue of the generation and synchronization of bursts from the issue of spike generation. The current-balance equations for the RE cells in the model are dV C dt
:
ABA-A --ICa-T -- IAHP -- IL -- I ~RR
-- IAMPA,
(104)
where ICa-T is the low-threshold calcium current, IAHp is a calcium-activated potassium current, IL is the leak current, IGABA-A RR is the GABAA current generated in response to burst of other RE cells, and IAMPAis the AMPA current generated by TC cells. Similarly, the TC cells are modeled as dV dt
C~
= - - / C a - T -- Ih -- IL -- IGABA-A - - / G A B A - B ,
(105)
952
D. Golomb et al.
where Ih is the slow "sag" current and the other intrinsic ionic currents are of the same types as in the RE cell, but with different gating kinetics, conductance strengths and reversal potentials. The GABAA and GABAB currents received by the TC cells from the RE cells are denoted by IGABA-A and IGABA-B, respectively. The equations describing the ionic channels and the synaptic conductances are given in Appendices A and B. 9.3. Spatio-temporal patterns o f synchrony in the R E - T C network model
In order to shed light on the dynamical properties of the RE-TC system it is useful to consider three types of architecture. We consider first the simplest pattern of connectivity in which all the neurons are connected to all the others [22]. With this architecture, one can find a wide domain of network parameters in which without synaptic blockade, each RE cell bursts every cycle, whereas each TC cell bursts every second cycle (2:1 cluster state). An example of this behavior is shown in Fig. 29. Blockade of GABAA while keeping all other parameters unchanged, makes both RE and TC cells burst every cycle, and therefore, destroys this clustering. The sag current of the TC cell provides the ionic mechanism for this clustering phenomena [143]. Due to its slow time scale at hyperpolarized potential, it needs enough time to build up and to enable the cell to burst. Hence, the bursting rate of the TC cell cannot be too high, and if the network frequency is too large, it bursts only every few cycles. In large regimes of parameter space, attractors with different TC bursting ratio (the ratio between the population frequency and the average bursting rate of TC cells) coexist together, and initial conditions select the pattern. Reasonable levels of stochastic noise or heterogeneity in the cell's intrinsic parameters do not cause cells to skip bursts. However, skipping of bursts occurs if sparseness is introduced in the connectivity pattern of the network [23]. Sparseness of the network connectivity is quantified by three parameters, MRR, MRr and MrR, where the first subscript denotes the postsynaptic population and the second subscript denotes the GABA A and GABA 8 RE ~ TC ~ TO ~
Fraction of cells in cluster
GABA A blocked
Fraction of cells in cluster
1.o ~
0.66 ........~--~-=~------L 0.34
> E
ol
,
1.0s
-60mV ...................................................... Fig. 29. Membrane potential time courses of RE and TC neurons in an all-to-all coupled network of identical neurons. With both GABAA and GABAB synapses, RE cells burst together every cycle at 7.7 Hz, while TC cells segregate into two clusters bursting alternately. When GABAA synapses are blocked, the system oscillates coherently at 3.4 Hz. Adapted from [22].
Mechanisms of synchrony of neural activity in large networks
953
presynaptic population. In order to get synchrony, MRr and MrR need to be above some critical value. In the simulations of [23] these critical numbers were found to be on the order of 2-10 (Fig. 30). Above these critical values, in the regime of parameters in which the clustering is found for all-to-all coupling, the neurons are found to fire with synchrony but also to skip bursts. Therefore, provided that the connectivity is not too dense, the firing pattern of the R E - T C network is characterized by population oscillations in which individual neurons display a broad distribution of firing rates. The probability of connection between two neurons decays with their distance. In a model, this can be taken into account by assuming that the synaptic coupling between two neurons decays with their mutual distance; see Fig. 31. Simple functional dependencies of the coupling on the distance are a step function or an exponential decay. A one-dimensional R E - T C network with such an architecture has been studied in [24]. Three epochs of the activity can be observed with this architecture. The first epoch is the recruitment of the neurons into the discharge, followed by a discharge period. Eventually, the discharge terminates. Because of the recruitment and the termination processes, the activity in a one-dimensional system has the form of a propagating pulse. If the termination process is not included in the model, the activity looks like a propagating front, that separates a quiescent regime from a regime of periodic bursting.
o 0.60
i--
.I /'/f /
o~
o E
tO L__ tO r
ii
0.40
Jf
//
I
a
/
la 0.20
b ............................................................
iI
III b
0 F--
|
0.00 0.0
i
i
.....
0.5
0.5s
t
I
t
,
,
,
I
.
.
10.0 RE to TC connectivity MTR 5.0
.
.
,
15.0
Fig. 30. Dependence of the TC synchrony measure Zvc on the RE-to-TC convergence factor Mre with (solid line) and without (dashed line) intra-RE GABAA inhibition. There is a sharp but continuous transition from an asynchronous state at low MTR to a synchronized state at higher MrR values. The quantity Xxc was computed by averaging over a time interval of 10 s and over five realizations of connectivity patterns and initial conditions, for each parameter value. Insert: time courses of population activity pvc(t) (defined as the fraction of the cells with a membrane potential above -45 mV) for MrR = 4 with (a) and without (b) intra-RE inhibition (scales apply to both). The dotted line denotes the zero-level. Met = MRR - - 10. Adapted from [23].
954
D. Golomb et al.
A
One Dimensional Architecture t-
2~,TR
"'
""
I
RE ,'
,,'
', ',, \ ',
\//
/
', \ \ 1 / /
~TC I ~
2~RT
I!
Footprint
Footprint Shapes
Synaptic Coupling Function,w
7 - -,~'
- "l .~I ~~,~....., i
i. . . .
~
-
-
Distance Between Cells Fig. 31. (A) The model has a one-dimensional architecture. The RE-to-TC coupling strength (maximal synaptic conductance) decays with the distance between the pre- and postsynaptic cells. The typical decay length, XrR, is called the synaptic footprint length. Similarly, the TC-to-RE coupling strength decays with a typical length XRT,and the RE-to-RE coupling strength with a typical length XRR (not shown). (B) For an exponential footprint shape (solid line) the typical decay length ~ (which is either XRT, XRT or XRR) is the spatial distance for which the coupling strength reaches 1/e of its maximal value. For a step footprint shape (dashed line), the coupling strength has a constant value if the inter-neuronal distance is between -X and ~, and it is zero otherwise. Adapted from [24]. Recruitment is often carried out in a noncontinuous, "lurching" manner as shown in Fig. 32. At each cycle, a group of TC cells is recruited into the activity by RE cells that were already active, and this group helps to recruit a new group of TC cells into the discharge. With strong excitatory conductance and coupling which decays as a square function, continuous recruitment can also occur [144,145].
Mechanisms of synchrony of neural activity in large networks
955
B. G A B A A B l o c k e d
A. G A B A a a n d G A B A B 1.0
RE Cells
"*"'(i
=~;f ":.
0.5 9
7 . c ' , f V " ~ ' : .o e.~ ~176 oo~
9~
="
o
9I==f///i!iillli;iii-iii~. =s =z s= sz st ss s =. o s. =,
st
co 9
0.0
s9 =
st .
s t9
t:
-:
st
".: ".: --
-:
9 *I i i 9- ",,"
1.0
-
,i
.,:t{iltil,
..| -=
~.
r
o
.(
,ii!
8 1 8 5 5 : : ; : : : ' | I 711111:1:1:1:1 .- . t . l t $ ' . $ i I / : : : : : l 9 " ; l # l l l t l t l : : : : : l 9 " l tl t l t S t l ~ : : : = : ' = t I t l t t l l l l l l l t l t = : $ 7 . l . t t l l l ll II l: :l :::l
.
!
I
.
I
I
I
/
"s
TC Cells
l 99*'~o 9;
0.5
9
,~7I
..'...
g
:
.:.
::
:
I
" " i - " -,i ; i - "
9
;,'
- ..",";~, 9
.~
0.0 0
,.~.;-,-,,
"'='* "I"9","'", :":;I*' ,i,'' i' " "I.I " ""' = " 1 Time
=-=..-, "I :i ." i'.
2 (s)
," i 9 I
s
L'" "-" 9 0
1 Time
I
I
8
ll
!i! !
~l { i I9 9
2
(s)
Fig. 32. Propagation of spindle oscillations: rastergram of RE cells (upper panels) and TC cells (lower panels). Simulations were performed with N - 512 cells from each type. Only the burst times from every eighth cell are shown. The position x is measured in units of the system length L. Initially 16 RE cells at the left (small x) are depolarized (V = 0) and all the others are at rest. (A) In the presence of both GABAA and GABAB synapses, RE cells fire twice as often as TC cells (the 2:1 bursting mode), with a population rhythmic frequency f = 10.1 Hz. (B) When GABAA synapses are blocked, the network oscillates at a lower frequency (f = 4.15 Hz), RE and TC cells now fire in full synchrony and at the same frequency (the 1:1 bursting mode). Adapted from [24].
During the discharge period, the dynamics of this R E - T C network is similar to what is observed with all-to-all coupling, i.e., the network is in a 2:1 cluster state which is destroyed into a 1:1 synchronous state when GABAA inhibition is blocked; see Fig. 32. In the one-dimensonal case, however, if there is multistability, the discharge which is propagating selects the pattern of synchrony. Extensive numerical study of the model has revealed that the experimental results, especially the transition from a 2:1 clustering to the 1:1 mode when GABAA inhibition is blocked, can be reproduced in the model only if the kinetics of GABAB operate far from saturation, and GABAB receptors respond much more strongly to a prolonged burst than to a brief one. This prediction has been later confirmed experimentally [146].
956
D. Golomb et al.
Destexhe et al. have attributed the termination (waning) of the oscillation to the up-regulation of the h-current in the TC cells [25,27]. Indeed, in their computer models of thalamic networks, prolonged activity of the cell shifts the activation curve of the h-current to more depolarized levels and increases the h-conductances. As a result, the TC cells are too depolarized, their postinhibitory rebound property is temporarily suspended, and the network activity terminates. This mechanism, suggested by the computational models, has been later confirmed experimentally in ferret slice preparation [ 147,148]. 10. Discussion
10.1. Mechanhsms of synchrony Experimental studies have demonstrated that large neuronal systems can display a large variety of activity patterns including asynchronous highly irregular firing, asynchronous periodic firing, spiking or bursting, oscillatory or nonoscillatory synchronous state, stationary or traveling hot spots of activity. In the present paper we have shown how the studies of simplified network models which can be investigated analytically or numerically can illuminate the dependence of these spatio-temporal features on the cellular, synaptic and architectural properties of the system where they occur. Mechanisms of neuronal synchrony have been addressed in recent years, following two different and complementary theoretical approaches. One way, followed in Section 6.3, has been to study the conditions under which fully synchronized states or more general phase locked states are stable in fully connected networks of identical neurons [93,31,101,108,109,114,84]. Another approach has been to investigate the conditions of the stability of the asynchronous state of a large neuronal system. Most of the results presented in this paper have been obtained following this second approach. The fact that the synaptic kinetics have a crucial influence on the emergence of synchrony is, in our opinion, an important outcome of these theoretical investigations. The studies summarized above, as well as other works performed in the last several years, have revealed that in single populations of neurons, inhibitory couplings are more efficient than excitatory ones to induce spike-to-spike synchrony. As shown above, this is a general property which results in the noninstantaneous time course of synaptic interactions. It should be stressed that this mechanism of synchrony through inhibition is different from the mechanisms studied in [84,19,20,21] which rely on postinhibitory rebound and which requires sufficiently strong and slow synapses. In some of the models investigated here, the excitatory neurons displayed spike frequency adaptation. However, we have not discussed how adaptation can affect synchrony. As a matter of fact, it has been recently shown that adaptation can induce type II PRC, [114,149]. Therefore a network of such neurons, weakly coupled through fast excitation, can synchronize their spikes in-phase; see Section 6.3. When the coupling becomes sufficiently strong the spike adapting neurons fire synchronous bursts of spikes [100].
Mechanisms of synchrony of neural activity in large networks
957
The studies reported here provide examples of two other mechanisms for synchrony, namely the "cross-talk" and the recurrent excitation mechanisms. In contrast to the inhibitory mechanism, they involve more than one population of neurons. Using both analytical and numerical methods we have been able to characterize these mechanisms. In particular, we have seen that one of them leads naturally to bursting synchronous states and that strengthening of intra-inhibitory interactions opposes these states.
10.2. Robustness of the mechanisms for synchrony Recent studies have pointed out that because of the heterogeneities, synchrony in inhibitory networks can be a problem [108,109]. However, as demonstrated above, the desynchronizing effect of heterogeneities of fully connected networks can be overcome to some extent, provided that the external input and the synaptic interactions are sufficiently strong. The situation in sparse networks is more problematic. Indeed, as shown above in the example of a sparsely connected inhibitory integrateand-fire network, increasing the inhibitory interaction too much is desynchronizing. Hence, the capability of compensating for the heterogeneities in the strong coupling regime is opposed by the amplification of the spatial fluctuations due to sparseness in this regime. To characterize this limitation in a more quantitative manner, more specific information is needed about the intrinsic active properties of the interneurons and about the degree of connectivity of networks of interneurons in the CNS. However, these results indicate a fundamental limitation for the mechanism of synchrony relying solely on inhibition. An important issue is, therefore, whether or not more robust synchrony in the CNS can be achieved through recurrent inhibition between excitatory and inhibitory populations. As a matter of fact the two other scenarios described in this chapter, which involve a cooperative effect between excitatory and inhibitory neurons are much more robust to heterogeneities than the scenario which relies only on inhibition [34]. It would be interesting to investigate whether these scenarios are also robust when connectivity is sparse.
10.3. The origin of irregular firing of cortical neurons In all the examples of synchronized states discussed above, the synchronous neuronal activity is dominated, (in the large N limit), by a strong oscillatory component. Therefore, in this state the cross-correlations of neuronal pairs are strongly oscillatory. However, in many situations in the cortex, cross-correlations do not display such oscillations. In spite of the fact that a significant peak is present around zero delay, side peaks are absent, or at least of much smaller amplitude than the central peak. This means that in these situations neurons are firing synchronously but irregularly. The strongly irregular firing of neurons in vivo [150] is difficult to explain. The challenge that this phenomenon poses is highlighted by the fact that single neurons in isolation are highly regular and that each cell is connected to many other cells. Therefore fluctuations in individual inputs should be strongly suppressed at the level of the recipient cell (the law of large numbers). In general two mechanisms
958
D. Golomb et al.
may maintain strong temporal fluctuations at the network level. One mechanism that has been investigated both numerically [151,152] and analytically in a simplified model of neuronal dynamics [87-90] is that although the many inputs to a cell are uncorrelated their fluctuations are amplified by the fact that the total excitatory input is roughly canceled by the total inhibitory one, so that the neuron is always close to its threshold. This mechanism has the advantage of being robust to the details of the neuronal dynamics. The simplest model implementation of this mechanism is a sparsely connected inhibitory integrate-and-fire network. As we have seen above, when the inhibitory coupling strength and the excitatory external inputs are sufficiently strong, the neurons fire asynchronously and irregularly. In particular, this mechanism can explain why in many observed situations, cortical neurons in vivo are firing with Poisson statistics. However, in this scenario, the network state is asynchronous. Therefore, at least in its simplest form, it cannot explain irregular and synchronous neuronal activity. In addition, this scenario requires very large synaptic conductances, and therefore very short passive time constants. A second way to explain irregular neuronal activity in vivo is to rely on a synchronized state. In this scenario, synchrony and fluctuations are both emerging simultaneously as a collective effect. Due to their correlations, the fluctuations in the inputs to a cell are not averaged out. Such a state has been shown to exist in highly connected model networks which contain both excitatory and inhibitory neurons [97,13]. An example is displayed in Fig. 33 where the voltage traces of 6 neurons are plotted. All the neurons are firing irregularly and the fluctuations in their activity are strongly correlated. The difference in the average firing rates of the neurons is due to the fact that they receive external inputs of different strength. The firing rates of the neurons ranges from a few Hz to more than 100 Hz. Therefore, this synchronous state is very robust to heterogeneities. Synchronous chaos can provide a mechanism for explaining highly irregular correlated firing in cortex. Moreover, Hansel and Sompolinsky [13] have also shown that such a collective state could have interesting functional properties. However, several important difficulties remain with this scenario. For instance, the statistics of the spikes are not Poissonian and the neurons are too strongly synchronized compared to what is observed experimentally in most of the physiological situations.
10.4. Consequences for neuronal modeling 10.4.1. Scaling the size of sparse network In many cases, modeling large neuronal systems of the CNS in their full size would require simulating networks of ten thousands of neurons, interacting through several millions of synapses. Simulations of such large systems are both time- and memory consuming, and systematic studies of their dynamical properties using numerical simulations are hard and frequently impossible due to limitations in computer capabilities. In order to avoid these problems, systems of reduced sizes are frequently studied. We have seen above that a simple scaling rule can be applied to chose the parameters of the reduced network so that its dynamical properties are
Mechanisms of synchrony of neural activity in large networks
959
-70 mV _
,,,
130 mV
50 ms Fig. 33. Membrane potential of six excitatory neurons in a network model of an hypercolumn in the primary visual cortex. The network is in the synchronized-chaotic state. The details of the model can be found in [13]. representative of those of the real one. Assume that the system one wants to study consists of N neurons connected at random, with an average number of synapses per neuron, M, and that all the synapses have the same strength, Gsyn. The synchronization properties of this system can be studied by simulating a model network consisting of NR randomly connected neurons, with average connectivity, MR, and synaptic strength GR, satisfying the following scaling relation: M
GR -- ~-~RGsyn,
(106)
1
1
1
1
MR
NR
M
N
(107)
The first equation insures that the average synaptic input both in the real system and in the simulated system are the same. The second relation guarantees that the spatial fluctuations of this input are the same in both systems. Using this scaling allows one to investigate the behavior of very large networks by simulating a network with a relatively small NR, on the order of a few hundreds. Assuming that in the real system 1 << M << N, NR and MR may be not very different. In spite of this fact, our simu-
960
D. Golomb et al.
lation results indicate that the synchronization properties in the two systems are similar.
10.4.2. Integrate-and-fire vs. conductance-based models Synchrony in integrate-and-fire neurons displays two characteristics that can be considered as nongeneric. The first one is that synchrony in inhibitory networks is very sensitive to disorder. In the context of the present work, this is expressed in the fact that Mc is large for these models or that substantially strong coupling is required to destabilize the asynchronous state in heterogeneous networks. The second is that for this model excitation leads to synchrony for firing rates which are as large as 50 Hz provided the synapses are not too slow. Moreover, the robustness of these synchronized states is large. Both features are a consequence of the purely passive integration of the synaptic and external inputs. This is a consequence of the exponential behavior of Z(O) in Eq. (36), at weak coupling, and more generally of G(T) in Eq. (89) which reflects the fact that near the threshold I&F neurons are very sensitive to synaptic inputs. More generally, the contribution of neurons at low rates to the integral in Eq. (88) is exponentially large. This exponential behavior can give rise to divergences in the integral in Eq. (88) if the distribution probability of the period decays as an exponential function or more slowly. In particular, this will be the case for a Gaussian distribution of external currents. To avoid this problem we have chosen a Gaussian distribution of period. Let us remark that another way for avoiding the divergences is the introduction of noise in the dynamics. A fixed level of noise will generate a minimal firing rate for all the neurons in the networks and the integral in Eq. (88) will become convergent. In conductance-based models which takes the biophysics of neurons into account in a more realistic way, reasonably small Mc's, of the order of few tens, are required to get synchronous states in inhibitory networks. The behavior of conductancebased models is different because just before the threshold, the sodium conductance is activated and governs the neuronal dynamics, and therefore, external disturbances are less effective. This is reflected in the phase-response curve, that decreases just before the spike onset. This raises an important issue for neuronal modeling: under which conditions can conductance-based dynamics be reduced to integrate-and-fire type dynamics without losing synchronization properties? Is it possible to modify integrate-and-fire dynamics in order to take into account in a more faithful way the effects of voltagedependent conductances and such that the model can still be studied analytically? We intend to address this issue in forthcoming works. Abbreviations AC, Autocorrelation AHP, afterhyperpolarization AMPA, amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (glutamate receptor subtype) AS, asynchronous
Mechanisms of synchrony of neural activity in large networks
961
Ca 2+, Calcium ion CA1, Region of Ammon's horn in the hippocampus, that receives input from neurons in CA3 CA3, Region of Ammon's horn in the hippocampus, that receives input from neurons in dentate nucleus CC, cross-correlation cm, centimeter CNS, Central nervous system CPU, central processing unit CV, coefficient of variation EMG, Electro myographic activity f-I curve, Firing rate-current relation GABA, gamma-aminobutyric acid HH model, Hodgkin-Huxley model Hz, Herz IE, type I-excitability IIE, type II-excitability I&F, Integrate and fire IR, insensitive refractory LFP, Local field potential LGN, Lateral Geniculate Nucleus ms, millisecond mV, milliVolt ~tA, micro-Ampere NMDA, u-methyl-D-aspartate (glutamate receptor subtype) PGN, peri geniculate nucleus PIR, postinhibitory rebound PRC, phase response curve RE-cells, reticular thalamic cells TC, thalamo-cortical WB-model, Wang-Buzsfiki model
A ckno wledegements A large part of this paper is based on researches the authors have done in the last years in different collaborative works with C. Meunier, L. Neltner, J. Rinzel, H. Sompolinsky, C. van Vreeswijk and X.-J. Wang. We thank Y. Loewenstein and M. Shamir for many useful comments on the manuscript. D.H. acknowledges the warm hospitality of the Center for Neural Computation of the Hebrew University in Jerusalem where part of this paper was written. The research of D.G. was supported by a Grant no. 9800015 from the United States-Israel Binational Science Foundation (BSF). D.H. has benefited from support by the PICS-CNRS 236, Minist~re des Affaires Etrang6res Francais, AFIRST and PICS-CNRS 867. The work of G.M. was partially supported by grant PICT97 03-00000-00131 from ANPCyT and by Fundaci6n Antorchas.
D. Golomb et al.
962
Appendix A. Single neuron conductance-based models In the following, voltages are measured in mV, conductances are measured in mS/ cm 2, and rate constants in ms -~. The Hodgkin-Huxley model. The dynamical equations of the H H model read [78]: dV C-~-
I-
g N a m 3 h ( V - VNa) -- g K n 4 ( V - VK) -- g l ( V - V1),
dm dt =
m~c (V) - m ~m(V) '
dh
h
dt--
(A.1)
(A.2)
(v) - h
zh(V)
'
(A.3)
dn n~c(V) - n dt-~,(V) "
(A.4)
where I is the external current injected into the neuron which determines the neuron's firing rate. The parameters gNa, gK and g~ are the m a x i m u m conductances per surface unit for the sodium, potassium and leak currents, respectively, and VNa, VK and V1 are the corresponding reversal potentials. The capacitance per surface unit is denoted by C. For the squid's axon typical values of the parameters (at 6.3~ are: VNa = 50, VK = --77, Vl = - - 5 4 . 4 , gN~ = 120, gI< = 36, gl = 0.3, and C = 1 ~tF/cm 2. The functions m~ (V), h~ (V), and n~ (V) and the characteristic times (in milliseconds) Zm, Z,, Zh, are given by
x ~ ( V ) = ax/(a~ + bx), 1:.~= 1/(ax + b~)
with x = m , n , h and:
am = 0.1(V + 40)/(1 - e x p ( ( - V - 40)/10)),
(A.5)
bm = 4 e x p ( ( - V - 65)/18),
(a.6)
ah - 0.07 e x p ( ( - V - 65)/20),
(A.7)
bh -- 1/(1 + e x p ( ( - V - 35)/10)),
(a.8)
a, - 0.01(V + 55)/(1 - e x p ( ( - V -
55)/10)),
b,, = 0 . 1 2 5 e x p ( ( - V - 65)/80).
(a.9) (A.10)
The Wang-Buszdlki model. This model, introduced by W a n g and Buszfiki [79] to describe the firing of hippocampal inhibitory interneurons is a modification of the standard H o d g k i n - H u x l e y model, as follows. The leak current parameters are gl = 0.1 and V1 = - 6 5 . The sodium current has the same form as in the H H model with: am = - 0 . 1 ( V + 3 5 ) / ( e x p ( - 0 . 1 ( V + 35)) - 1),
(A.11)
bm = 4 e x p ( - ( V + 60)/18),
(a.12)
Mechanisms of synchrony of neural activity in large networks
963
ah = 0.35 e x p ( - ( V + 58)/20), (A. 14)
bh = 5 / ( e x p ( - 0 . 1 ( V + 28)) + 1).
For simplicity the activation variable m instantaneously adjusts to its steady-state value m~. The other parameters of the sodium current are: 9Na = 35 and VNa = 55. The delayed rectifier current is described in a similar way as in the H H model with: an = - 0 . 0 5 ( V + 34)/(exp(-0.1 (V + 34)) - 1),
(A.15)
b, = 0 . 6 2 5 e x p ( - ( V + 44)/80)
(A.16)
and 9K = 9 and VK = --90. The model we have used for excitatory neurons incorporates, in addition to these currents, a slow potassium current modeled as
VK),
Iz = gz4t) • ( v -
where z(t) satisfies:
d z / d t = (z~ (V)
-
(A.17)
z)/'cz,
(A.18)
z~ = 1/(1 + e x p ( - 0 . 7 ( V + 30))).
We have chosen 9z = 0.75 and ~z = 50 to have firing properties which fit well with typical physiological data. Thalamic neurons [24]. In Section 9.2 the RE and TC cells are modeled, respectively, as in Eqs. (104) and (105) with C = 1 g F / c m 2. The low-threshold T-type calcium current, Ica-T is given by I C a - T -- g c a m 2 h ( V -
Vca)
with m
(V) --
1
{1 + exp[-(V-
(A.19)
and (A.20)
d h / d t = (ho~(V) - h)/Th(V), h~(V) = For RE cells,
1
(A.21)
{1 + e x p [ - ( V - 0h)/cYh]} " 9Ca =
9h = 23.8 +
1.5,
VCa =
120,
0m =
119 {1 + exp[(V + 70)/3]}"
-52,
~m =
7.4, Oh = - 7 8 , ~h = - 5 , and (A.22)
D. Golomb et al.
964
For TC cells, gca - 2.0, Vca - 120, 17h(V)
--
7.14 +
0 m
--
-59,
(3" m
--6.2,
Oh -- --81, Crh -- --4.4 and
52.4 {1 + exp[(V + 74)/3]}"
(A.23)
The leak current is IL = g L ( V -- VL) with VL = --85 and gL = 0.035 (resp. gL = 0.03) for RE (resp. TC) cells. A calcium-dependent afterhyperpolarization current is added for in the RE cells. It is defined by IAHP - - 0. l m A H p ( V -+- 90.0)
with d m A H p / d t - - 0.02[Ca](1 - mAHP) -- 0 . 0 2 5 m A H P ,
(A.24)
d[Ca]/dt - --0.01ICa_T -- 0.08[Ca].
(A.25)
The TC cells has also an hyperpolarization-activated cation current ("sag") Ih parameterized" Ih -- 0.04 r (V + 40), where the gating variable r is given by:
dr/dt - (r~c(V) - r)/rsag(V), r~(V) -
rsag
(A.26)
1
(A.27)
{1 + e x p [ - ( V + 75)/5.5]} '
2O +
1000 [exp((V + 71.5)/14.2) + e x p ( - ( V + 89.0)/11.6)]"
(A.28)
Appendix B. Synaptic dynamics and network architecture
The two-population networks (Section 5). In Section 5 the neurons of both populations are fully connected. The synaptic currents were modeled using Eqs. (8),(9),(12), and (13). In Section 5, V~y, - 0 mV, rl - 1 ms, r 2 - - 3 ms for excitatory synapses; Vsyn - - 8 0 mV, r 1 - - 1 ms, re - 6 ms for inhibitory interactions. The external inputs to the neurons are described as in Eq. (20). The Wang-Buszaki network (Section 6.7). The synaptic currents were modeled using Eqs. (8)-(11) with 0s - 0, Ors - 2 mV, kf - 12ms -1, kr - 0.1 ms -1. The R E - T C network. The synaptic currents in Eqs. (104), and (105) are as follows: A M P A current IAMPA from TC to RE cells: IAMPA- 0.1 v ~ N _ I WRT(i--j)Spj with d s p j / d t - 2.0 s~(V)(1 - s p j ) - 0.1sej. GABAA current IGABA-A RR from RE to RE cells: I GRR A B A - A = 0.2(V + 75) ~-~y=, WRR(i -- j)saj. GABAA
current I G A B A - A
~-]~y=,w77~(i - j)sxj.
from
RE
to TC
cells: I G A B A - A - - 0 . 1 ( V -Jr- 8 5 )
Mechanisms of synchrony of neural activity in large networks
965
In these two inhibitory currents:
dsAj/dt = 2.0 s~(Vj-)(1 - s ~ j . ) - 0.08 SAj
(B.1)
GABAB
current IGABA-B from RE to TC cells: IGABA-B = 0.06(V + 90) ~U= 1WTR(i- j) SBj, where the gating variable sBy is determined by
dsBj/dt - O.03x4j(1 - sBy) - O.OlsBj,
(B.2)
dxej/dt - 0.02 s~(Vj)(1 - xBj) - 0.05 [1 - s~(Vj)]xBj.
(B.3)
For all the gating variables,
s~(V) =
1 {1 + exp[-(V + 40)/2]} '
(B.4)
where V is the presynaptic membrane potential. In the sparse network considered in Section 9.2, the connectivity matrix is given by Eq. (17) with average numbers of synaptic inputs M ~ , 0t = T, R, MRT = MRR -- 10 and MrR varies between 1 and 15. Note that, in Refs. [22,24], the first subscript in the thalamic network model denotes the presynaptic neuron and the second subscript denotes the postsynaptic neuron, an opposite notation to what we use here. Two one-dimensional architectures were also considered. The first one (exponential footprint shape) is defined by WTR(j) = tanh[1/(2 AT~R)]exp(--lJl/ATR ). In the second architecture (step footprint shape): wrR = (2ArR + 1) if -ArR ~<j ~
ATR = N)~rR = S.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Adrian, E.D. and Zotterman, Y. (1926) J. Physiol. (London) 61, 465. Gray, C.M., Konig, P., Engel, A.K. and Singer, W. (1989) Nature 338, 334-337. Gray, C.M. (1994) J. Comput. Neurosci. 1, 11-38. Nicolelis, M.A.L., Baccala, L.A., Lin, R.C. and Chapin, J.K. (1995) Science 268, 1353-1358. Prechtl, J.C., Cohen, L.B., Mitra, P.P. and Kleinfeld, D. (1997) Proc. Natl. Acad. Sci. USA 94, 7621-7626. Gutnick, M.J., Connors, B.W. and Prince, D.A. (1982) J. Neurophysiol. 48, 1321-1335. Berger, H. (1929) Archiv ffir Psychiatrie und Nervenkrankheiten 87, 527-570. Lampl, I, Reichova I. and Ferster D. (1999) Neuron 22, 361-374. Abeles, M. (199 l) Corticonics. Cambridge University Press, Cambridge. Li, Z. and Hopfield, J. (1989) Biol. Cybern. 61, 379-392. Li, Z. and Hertz, J. (2000) Network: Comput. Neural Syst. 11, 83-102.
966
D. Golomb et al.
12. Ben-Yishai, R., Lev Bar-Or R. and Sompolinsky, H. (1995) Proc. Natl. Acad. Sci. USA 92, 38443848. 13. Hansel, D. and Sompolinsky, H. (1996) J. Comput. Neurosci. 3, 7-34. 14. Ben-Yishai, R., Hansel, D and Sompolinsky, H. (1997) J. Comput. Neurosci. 4, 55-77. 15. Dimitrov, A. and Cowan, J.D. (1998) Neural Comput. 10, 1779-1795. 16. Troyer, T.W., Krukowski, A.E., Priebe, N.J. and Miller, K.D. (1998) J. Neurosci. 18, 5908-5927. 17. Traub, R.D. and Miles, R. (1991) Neuronal Networks of the Hippocampus. Cambridge University Press, New York. 18. Traub, R.D., Jefferys, J.G.R. and Whittington, M.A. (1999) Fast Oscillations in Cortical Circuits. MIT Press, Cambridge MA. 19. Golomb, D. and Rinzel, J. (1993) Phys. Rev. E 48, 4810-4814. 20. Golomb, D. and Rinzel, J. (1994) Physica D 72, 259-282. 21. Golomb, D., Wang, X.-J. and Rinzel, J. (1994) J. Neurophysiol. 72, 1109-1126. 22. Golomb, D., Wang, X.-J. and Rinzel, J. (1995) in: The Neurobiology of Computation, ed J. Bower. pp. 215-220, Kluwer, Boston. 23. Wang, X.-J., Golomb, D. and Rinzel, J. (1995) Proc. Natl. Acad. Sci. USA. 92, 5577-5581. 24. Golomb, D., Wang, X.-J. and Rinzel, J. (1996) J. Neurophysiol. 75, 750-769. 25. Destexhe, A., McCormick, D.A. and Sejnowski, T.J. (1993) Biophys. J. 65, 2473-2477. 26. Destexhe, A., Contreras, D., Sejnowski, T.J. and Steriade, M. (1994) J. Neurophysiol. 72, 803-818. 27. Destexhe, A., Bal, T., McCormick, D.A. and Sejnowski, T.J. (1996) J. Neurophysiol. 76, 2049-2070. 28. Goldberg, J., Hansel, D., Bergman, H. and Sompolinsky, H. (1999) Soc. Neurosci. Abst. 1926. 29. Golomb, D. (1998) J. Neurophysiol. 79, 1-12. 30. Vaadia, E., Haalman, I., Abeles, M., Bergman, H., Prut, Y., Slovin, H. and Aertsen, A. (1995) Nature 373, 515-518. 31. Hansel, D., Mato, G. and Meunier, C. (1995) Neural Comput. 7, 307-337. 32. Neltner, L., Hansel, D., Mato, G. and Meunier, C. (2000) Neural Comput. 12, 1607-1641. 33. Golomb, D. and Hansel, D. (2000) Neural Comput. 12, 1095-1139. 34. Hansel, D. and Mato, G. (in preparation). 35. Freeman, W.J. (1975) Mass Action in the Nervous System. Academic Press, New York. 36. Chrobak, J.J. and Buzs~iki, G. (1998) J. Neurosci. 18, 388-398. 37. Ritz, R. and Sejnowski, T.J. (1997) Curr. Opin. Neurobiol. 7, 536-546. 38. Connors, B.W. and Amitai, Y. (1997) Neuron 18, 347-349. 39. Gray, C.M. (1999) Neuron 24, 31-47. 40. Singer, W. and Gray, C.M. (1995) Annu. Rev. Neurosci. 18, 555-586. 41. Singer, W. (1999) Neuron 24, 49-65. 42. Milner, P.M. (1974) Psychol. Rev. 81, 521-535. 43. Von der Malsburg, C. (1994) in: Models of Neural Networks II, eds E. Domany, J.U van Hemmen and K. Schulten. pp. 95-119, Springer, Berlin. 44. Von der Malsburg, C. (1995) Curr. Opin. Neurobiol. 5, 520-526. 45. K6nig, P. and Engel, A.K. (1995) Curr. Opin. Neurobiol. 5, 511-519. 46. Treisman, A. (1996) Curr. Opin. Neurobiol. 6, 171-178. 47. Golledge, H.D.R., Hilgetag, C.C. and Tov~e, M.J. (1996) Curr. Opin. Neurobiol. 6, 1092-1095. 48. Roskies, A.L. (ed) (1999) Neuron 24, 7-125. 49. Castelo-Branco, M., Neuenschwander, S. and Singer, W. (1998) J. Neurosci. 18, 6395-6410. 50. Roelfsema, P.R., Engel, A.K., K6nig, P.K and Singer, W. (1997) Nature 385, 157-161. 51. K6nig, P., Engel, A.K., Roelfsema, P.R. and Singer, W. (1995) Neural Comput. 7, 469-485. 52. Murthy, V.N. and Fetz, E.E. (1992) Proc. Natl. Acad. Sci. USA. 89, 5670-5674. 53. Murthy, V.N. and Fetz, E.E. (1996) J. Neurophysiol. 76, 3949-3967. 54. Murthy, V.N. and Fetz, E.E. (1996) J. Neurophysiol. 76, 3968-3982. 55. Steriade, M., Amzica, F. and Contreras, D. (1996) J. Neurosci. 16, 392-417. 56. Steriade, M., Timofeev, I., DiirmiJller, N., and Grenier, F. (1998) J. Neurophysiol. 79, 483-490. 57. Sanes, J.N. and Donoghue, J.P. (1993) Proc. Natl. Acad. Sci. USA. 90, 4470-4474. 58. Baker, S.N., Olivier, E. and Lemon, R.N. (1997) J. Physiol. London 501, 225-241.
Mechan&ms of synchrony of neural activity in large networks 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105.
967
Farmer, S.F. (1998) J. Physiol. London 509, 3-14. Steriade, M., McCormick, D.A. and Sejnowski, T.J. (1993) Science 262, 679-685. Nicolelis, M.A., Baccala, L.A., Lin, R.C. and Chapin, J.K. (1995) Science 268, 1353-1358. Buzsfiki, G., Horvath, Z., Urioste, R., Hetke, J. and Wise, K. (1992) Science 256, 1025-1027. O'Keefe, J. and Recce, M.L. (1993) Hippocampus 3, 317-330. Skaggs, W.E., McNaughton, B.L., Wilson, M.A. and Barnes, C.A. (1996) Hippocampus 6, 149-172. Buzsfiki, G. and Chrobak, J.J. (1995) Curr. Opin. Neurobiol. 5, 504-510. Whittington, M.A., Traub, R.D. and Jefferys, J.G.R. (1995) Nature 373, 612-615. Traub, R.D., Whittington, M.A., Coiling, S.B, Buzs~tki, G. and Jefferys, J.G.R. (1996) J. Physiol. London 493, 471-484. Traub, R.D., Whittington, M.A., Stanford, I.M. and Jefferys, J.G.R. (1996) Nature 383, 621-624. Whittington, M.A., Stanford, I.M., Coiling, S.B., Jefferys, J.G.R. and Traub, R.D. (1997) J. Physiol. London 502, 591-607. Jefferys, J.G.R., Traub, R.D. and Whittington, M.A. (1996) Trends Neurosci. 19, 202-208. Fisahn, A., Pike, F.G., Buhl, E.H. and Paulsen, O. (1998) Nature 394, 186-189. Buhl, E.H., Tamas, G. and Fisahn, A. (1998) J. Physiol. London 513, 117-126. Draguhn, A., Traub, R.D., Schmitz, D. and Jefferys, J.G.R. (1998) Nature 394, 189-192. Huerta, P.T. and Lisman, J.E. (1993) Nature 364, 723-725. Williams, J.H. and Kauer, J.A. (1997) J. Neurophysiol. 78, 2631-2640. McMahon, L.L., Williams, J.H. and Kauer, J.A. (1998) J. Neurosci. 18, 5640-5651. Tuckwell, H.C. (1988) Introduction to Theoretical Neurobiology. Cambridge University Press, New York. Hodgkin, A.L. and Huxley, A.F. (1952) L. Physio. London 117, 500-544. Wang, X.-J. and Buzsfiki, G. (1996) J. Neurosci. 16, 6402-6413. Rinzel, J. and Ermentrout, G.B. (1998) in: Methods in neuronal modeling: from ions to networks, Second Edition, eds C. Koch and I. Segev, pp. 251-291. MIT Press, Cambridge, MA. Abbott, L.F. and van Vreeswijk, C. (1993) Phys. Rev. E 48, 1483-1490. Treves, A. (1993) Network 4, 259-284. Tsodyks, M., Mitkov, I. and Sompolinsky, H. (1993) Phys. Rev. Lett. 71, 1280-1283. Wang, X.-J. and Rinzel, J. (1992) Neural Comput. 4, 84-97. Destexhe, A., Mainen, Z.F. and Sejnowski, T.J. (1994) J. Comput. Neurosci. 1, 195-230. Rail, W. (1967) J. Neurophysiol. 30, 1138-1168. van Vreeswijk, C. and Sompolinky, H. (1996). Science 274, 1724-172. van Vreeswijk, C. and Sompolinsky, H. (1998). Neural Comput. 10, 1321-1372. Hertz, J.A., Fazzini, A., Solinas, A., Lauritzen, T.Z. and Bernotas, A. (1999) Soc. Neurosci. Abst. 2257. Hertz, J.A. in: Models of Neural Networks IV, ed L. van Hemmen. Springer, Berlin, in press. Pinsky, P.F. and Rinzel, J. (1994) J. Comput. Neurosci. 1, 39-60. Hansel, D., Mato, G. and Meunier, C. (1993) Europhys. Lett. 23, 367-372. Hansel, D., Mato, G. and Meunier, C. (1993) Concepts Neurosci. 4, 192-210. Rubin, J.E. and Terman, D. in: Handbook of Dynamical Systems, Vol 3: Toward Applications, eds B. Fiedler, G. Iooss and N. Kopell. Elsevier, Amsterdam (in press). Ginzburg, I. and Sompolinsky, H. (1994). Phys. Rev. E 50, 3171-3191. Strogatz, S.H. and Mirollo, R.E. (1993) Phys. Rev. E 47, 220-227. Hansel, D. and Sompolinsky, H. (1992) Phys. Rev. Lett. 68, 718-721. Golomb, D., Hansel, D., Shraiman, B. and Sompolinsky, H. (1992) Phys. Rev. A 45, 3516-3530. Hansel, D., Mato, G. and Meunier, C. (1993) Phys. Rev. E 48, 3470-3477. van Vreeswijk, C. and Hansel, D. (2001) Neural Comput. (in press). van Vreeswijk, C., Abbott, L.F. and Ermentrout, G.B. (1994) J. Comput. Neurosci. 1, 313-321. van Vreeswijk, C. (1996) Phys. Rev. E 54, 5522-5537. Bush, P. and Sejnowski, T. (1996) J. Comput. Neurosci. 3, 91-110. Brunel, N. and Hakim, V. (1999) Neural Comput. 11, 1621-1671. Brunel, N. (2000) J. Comput. Neurosci. 8, 183-208.
968
D. Golomb et al.
106. Butera, R.J.Jr., Rinzel, J. and Smith, J.C. (1999) J. Neurophysiol. 82, 398--415. 107. Rubin, J.E. and Terman, D. Submitted for publication. 108. White, J.A., Chow, C.C., Ritt, J., Soto-Trevifio, C. and Kopell, N. (1998) J. Comput. Neurosci. 5, 5-16. 109. Chow, C.C. (1998) Physica D 118, 343-370. 110. Gerstner, W. (2000) Neural Comput. 12, 43-89. 111. Bressloff, P.C. (1999) J. Comput. Neurosci. 6, 237-249. 112. Bressloff, P.C. and Coombes, S. (2000) Neural Comput. 12, 91-129. 113. Traub, R.D., Jefferys, J.G. and Whittington, M.A. (1997) J. Comput. Neurosci. 4, 141-150. 114. Crook, S.M., Ermentrout, G.B. and Bower, J.M. (1998) Neural Comput. 10, 837-854. 115. Sompolinsky, H., Golomb, D. and Kleinfeld, D. (1991) Phys. Rev. A 43, 6990-7011. 116. Kuramoto, Y. (1984) Chemical Oscillations, Waves and Turbulence. Springer, New York. 117. Ermentrout, G.B. and Kopell, N. (1984) SIAM J. Math. Anal. 15, 215-237. 118. Kopell, N. (1988) in: Neural Control of Rhythmic Movements in Vertebrates, ed A. Cohen. pp. 369-413. J. Wiley, New York. 119. Strogatz, S.H. and Mirollo, R.E. (1991) J. Stat. Phys. 63, 613-635. 120. Ermentrout, G.B. (1996) Neural Comput. 8, 979-1001. 121. Hoppensteadt, F.C. and Izhikevich, E.M. (1997) Weakly Connected Neural Networks. Springer, New York. 122. Golomb, D. and Hansel, D., unpublished. 123 Kim, J., van Vreeswijk, C. and Sompolinsky, H. (1998) Unpublished. 124. Ermentrout, B. (1994) Neural Comput. 6, 679-695. 125 Bressloff, P.C. and Coombes, S. (2000) Neural Comput. 12, 91-131. 126. Shriki, O., Hansel, D. and Sompolinsky, H. Submitted for publication. 127 Wang, X.-J. (1999) J. Neurosci. 19, 9587-9603. 128 Andersen, P. and Andersson, S. A. (1968) Physiological Basis of the Alpha Rhythm. AppletonCentury-Crofts, New York. 129. Buzs~ki, G. (1991) Neuroscience 41, 351-364. 130. Steriade, M. and Desch6nes, M. (1984) Brain Res. Rev. 8, 1-63. 131. Steriade, M., Jones, E.G. and Llin~s, R.R. (1990) Thalamic Oscillations and Signaling. Wiley, New York. 132. Morison, R.S. and Bassett, D.L. (1945) J. Neurophysiol. 8, 309-314. 133 Steriade, M., Desch6nes, M., Domich, L. and MuUe, C. (1985) J. Neurophysiol. 54, 1473-1497. 134. Desch6nes, M., Paradis, M., Roy, J.P. and Steriade, M. (1984) J. Neurophysiol. 51, 1196-1219. 135 Jahnsen, H. and Llinfis, R.R. (1984) J. Physiol. London 349, 205-226. 136. Jahnsen, H. and Llinfis, R.R. (1984) J. Physiol. London 349, 227-247. 137. Pape, H.-C., Budde, T., Mager, R. and Kisvfirday Z.F. (1994) J. Physiol. London 478, 403-422. 138 McCormick, D.A. and Pape, H.-C. (1990) J. Physiol. London 431,291-318. 139. Leresche, N., Lightowler, S., Soltesz, I., Jassik-Gerschenfeld, D. and Crunelli, V. (1991) J. Physiol. London 441, 155-174. 140. von Krosigk, M., Bal, T. and McCormick, D.A. (1993) Science 261, 361-364. 141 Bal., T., von Krosigk, M. and McCormick, D.A. (1995) J. Physiol. London 483, 641-663. 142. Bal., T., von Krosigk, M. and McCormick, D.A. (1995) J. Physiol. London 483, 665-685. 143 Kopell, N. and LeMasson, G. (1994) Proc. Natl. Acad. Sci. USA 91, 10586-10590. 144. Golomb, D. and Ermentrout, G.B. (1999) Proc. Natl. Acad. Sci. USA 96, 13480-13485. 145 Golomb, D. and Ermentrout, G.B. (2000) Network 11, 221-246. 146. Kim, U., Sanchez-Vives, M.V. and McCormick, D.A. (1997) Science 278, 130-134. 147. Bal, T. and McCormick, D.A. (1996) Neuron 17, 297-308. 148 Liithi, A. and McCormick, D.A. (1998) Neuron 20, 553-563. 149 Ermentrout, G.B., Pascal, M. and Gutkin, B. (2001) Neural. Comput. (in press). 150. Softky, W.R. and Koch, C. (1993) J. Neurosci. 13, 334-350. 151. Tsodyks, M. and Sejnowski, T. (1995) Network 6, 111-124. 152. Amit, D. and Brunel, N. (1996) Cereb. Cortex 7, 237-252.
C H A P T E R 22
Emergence of Feature Selectivity from Lateral Interactions in the Visual Cortex
U. E R N S T and K. P A W E L Z I K
M. T S O D Y K S
Institute for Theoretical Physics, University of Bremen, Kufsteiner Str., D-28334 Bremen, Germany
Department of Neurobiology, Weizmann Institute, Rehovot 76100, Israel
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
9 2001 Elsevier Science B.V. All rights reserved
969
Contents
1.
Introduction
2.
Models
3.
4.
.................................................
....................................................
2.1.
Neuronal populations
2.2.
Dynamics of one column
........................................
2.3.
Coupled columns
...........................................
A simple model of visual cortex 3.1.
Orientation preference
3.2.
Direction preference maps
3.3.
R e c e p t i v e fields
Discussion
.....................................
........................................ .....................................
............................................
..................................................
Abbreviations
................................................
Acknowledgements References
......................................
.............................................
.....................................................
970
971 974 975 976 979 984 985 991 995 996 999 999 999
1. Introduction
It was a major breakthrough for the investigation of neurobiological mechanisms underlying brain function, when David Hubel and Thorsten Wiesel (Nobel price for Medicine 1981) discovered that neurons in the primary visual cortex become most strongly activated by elongated visual stimuli moving across the visual field [1,2]. Fig. 1a shows a typical recording of a simple cell of layer 4 in area 17 of the cat. A long bar of light moving across a screen leads to an increased number of action potentials only if: 9 the bar crosses a particular location, the so-called receptive field, 9 the bar has a particular orientation, and 9 a particular direction of motion. The responses are weaker or even vanish if one or more of these conditions are changed. It appears as if the neuron was selective for a particular set of features of a stimulus and for this reason one speaks of feature selectivity of the response. The neuron is said to be tuned for the feature, which is quantified by plotting the dependency of the number of spikes on the features, the so-called tuning curves (Fig. l b and c) show examples for orientation tuning and direction tuning curves, respectively). Hubel and Wiesel also discovered that the selectivity for location and for orientation varies gradually when the recording site moves smoothly from one cortical location to the next parallel to the cortical surface ("horizontally", "tangentially", see Fig. 2). In this way the responses realize mappings of retinal places and of orientations. With novel, recently developed methods of optical imaging of intrinsic signals Grinvald and Bonhoeffer [6] and Blasdel [7] uncovered the precise layout of these maps (Fig. 2b). It turned out that the selectivities for particular orientations and also for directions of stimuli are arranged smoothly across the cortex except for some points and lines where they change abruptly. These discontinuities are called pinwheels and fractures, respectively [6,8], see Fig. 2b). While the electrophysiological experiments of Hubel and Wiesel and many others had already shown that the mapping of retinal location to cortex is on average also smooth ("retinotopy") a very recent series of experiments revealed that the retinotopic mapping is correlated with the orientation map [4,5] in a way that a movement in cortex which is associated with a change of 90 ~ in orientation preference entails on average a movement of the receptive field by one receptive field size (Fig. 3). When Hubel and Wiesel discovered the response properties of the neurons in primary visual cortex, they also offered a simple explanation for the selectivity for orientation (Fig. 4). The idea was, that neurons selective for a particular orientation are connected to neurons in the lateral geniculate nucleus (LGN), that provide the input to the visual cortex, and whose (unoriented) receptive fields are arranged in an 971
972
U. Ernst et al.
I
I
<
[I
0
1
90
I
180
I
270
360
I IIIIII
-!5
III IIII1111 I[ IIIIIL I
I
<
II 0
[
I
I
90
180
270
360
Fig. 1. Orientation and direction preference of cortical cells. (a) Oriented bars moving across the receptive fields (black boxes) of neurons evoke response which are stronger when stimulating with the preferred orientation of the nerve cell (see examples of spike trains on the right). (b) The rate A of one neuron in dependence of the stimulus orientation ~ yields the tuning curve of the neuron. The response in (b) is only orientation selective, while the response in (c) displays direction selectivity.
elongated manner. Despite the simplicity of this model it has surprisingly not been experimentally settled until now whether this picture is correct for the adult animal (see, however [9]). An alternative picture assumes that intracortical mechanisms strongly influence the selectivity for orientation such that a small bias provided by the structure of input connections would suffice to generate the full orientation preference map [10,11]. But also in this picture, the structure of the orientation maps is laid down in specific patterns of input connections. It is usually assumed that these patterns of input connections emerge by activity-dependent development [12-14]. It came as a blow to the idea that the structure of the selectivity maps emerge by activity-dependent self-organization of the input connections, when it was shown, that the maps which develop for the two eyes are very similar (if not identically) also when the animal never experienced vision through both eyes simultaneously [15,16]. Very recently it could be shown that the selectivity for ori-
973
Emergence of feature selectivity from lateral interactions in the visual cortex
b
. . . . . ::~::~:i::i::: ~:;~:
iiii iiiiii
I I/~-,~\
iii~i~i
I I/~
: ,~:~, .....
...................
. . . . .
1
...... ,I m m
Fig. 2. (a) Columnar organization of the visual cortex, as proposed in the ice-cube model by Hubel and Wiesel [3]. The cortical surface appears to be divided into columns sharing similar response properties, as e.g., the orientation (colored bars) or the ocular dominance (R = right eye, L = left eye) of a stimulus. (b) "Real" orientation maps, however, show a more complicated architecture with singularities (pinwheels) and fractures (optical imaging of area 17 of the cat, data from T. Bonhoeffer).
1.6
-Oc(DO
i
.N'~ t.,
-•8 ,..E c
..LI_
r-n" E~
ill
i
I
II
mild
an
i
i
. 'iI~
0.8 i
o.~
0.4
E "~
0.2
n " ~,.~,
9 II ,In
gl
i.2
> OO
u_,~
HI II
o
i iBm i~lI
i
im
5"~m.Hi b. |
o
ali
i
~s
4s
eT.s
9o
C h a n g e in Orientation (degrees) Fig. 3. The distance of the centers of two neighboring receptive fields is a linear function of the difference of preferred orientations of the corresponding neurons. The slope of this function is approximately 1 receptive field per 90 ~ of OP change, such that neurons with orthogonal orientation preferences have nonoverlapping receptive fields [4,5]. entation also emerges identically for both eyes, even if the eyes had no visual input at all [17]. These experimental results challenge the models for the development of feature selectivity which are based on the activity-dependent a d a p t a t i o n of the patterns of afferent connections alone. It appears that some yet u n k n o w n factor in the cortex seems to be responsible in the very young animal for the identity and the r e m a r k a b l e stability of the development of orientation maps under the various rather drastical
974
U. Ernst et al.
receptive fi
Fig. 4. Classical model for orientation preference in a simple cell, adapted from Hubel and Wiesel [1]. One neuron in the cortex (right) receives synaptic afferent input from many thalamic neurons (middle), whose receptive fields are aligned to match a specific stimulus orientation (left). Within this framework, inputs from simple cells with parallel receptive fields, converging onto a single postsynaptic cell, could explain the phase-independent response of complex cells to moving gratings.
experimental conditions. It has been speculated that the factors are genetic in origin [15]. It is, however, unclear what this could mean (see e.g. [18]). In this chapter, we review models for the dynamics of activities in cortex which are based on stereotyped intracortical interactions. These models stood at the very beginning of the mathematical description of collective phenomena in the brain. Nevertheless, the dynamics of these models may have far-reaching consequences and can explain a variety of experimental findings. In particular we will show that they might provide a novel explanation for the early development of feature maps in the visual cortex.
2. Models
The organization of the brain has a high degree of complexity. Every square millimeter of the cortical sheet contains over 106 neurons, each of them receiving input from at least 104 synapses. If we want to understand how these units interact and generate the dynamics seen in the experiments, it is very useful to reduce the amount of complexity. The goal is to build a model which is minimalistic in the sense that it captures the properties we want to analyze, but lacks all the details which may obscure the basic mechanisms. Despite the huge number of neurons, it appears that if one is only interested in mean firing rates, nerve cells within an approximate 0.1 rnm x 0.1 mm-column of the cortex respond very similarly to stimuli. Therefore, one can attempt to analyze cortical dynamics in terms of population activities of certain groups (columns) of neurons. To our opinion, it is surprising how many experimental results may be explained by simulating and analyzing such a simple model of visual cortical circuitry.
Emergence of feature selectivityfrom lateral interactions in the visual cortex
975
2.1. Neuronal populations First, one has to derive differential equations for the fraction of excitatory and inhibitory neurons being active at time t at position r = (rx,ry) T in the neuronal tissue. These activities will be termed Ae(r,t) and Ai(r, t), respectively. Let us first consider the activation dynamics of a single column, neglecting the coupling to other columns in the cortex. In 1972, Wilson and Cowan derived differential equations for
A (t) and A (t)
Ae(t + z) - [ 1 - f f
Ae(,')
x Se(fta(t Ai (t + v) = 1 -
- t')[weeAe(t')-wieAi(t')+Ie(t')]dt'),
Ai (t') dt' T
x Si(ft
ot(t - tl)[weiAe(t ') -wiiAi(t l) +Ii(tl)]dtl).
The fractions of active neurons at time t + z, where z denotes an absolute refractory period, are given by the products of the fraction of neurons having been not active for the time 9 (first term on the right-hand side) with the gain functions Se and Si. These gain functions model the excitability of the populations depending on the total synaptic input I. The synaptic input I has three contributions, one from each population with corresponding coupling constant wxx (x c {e, i}), and one from external sources as e.g., from the LGN. The first index of Wxx denotes the presynaptic, and the second index the post-synaptic population w~ is connecting to the function at models the synaptic response characteristics, which in the simplest cases is an exponential, ~(t) ~ exp(-t). These equations are not very convenient for computational purposes, so their complexity was further reduced by "time coarse-graining", an exponentially decaying time average over Ae (a similar expresssion is obtained for Ai):
Ae(t) "- ~1 A- - (t)
.-
1
ot(t- {)Ae({)d{,
f t
-
at',
oo
yielding the set of differential equations: dAe
Ze dt = -Ae q-- (ke - reA---~)Se(weeA--~- wieA--il-+ Ie), dAi
"Ci d---t-= -A~ + (ki - riA---~)Si(weiAe- wiiA---i~-+-Ii).
(1)
(2)
Without loss of generality, Eqs. (1) and (2) could be rescaled such that the time coarse-graining constants ke and ki become 1, ke = ki - 1. As the equations are freely scalable, we will omit to give absolute values for the parameters and variables. Only
976
U. Ernst et al.
at points where relations between parameters play an important role, we will discuss their implications for the biological length or time scales. F r o m now on, as in the original publication of Wilson and Cowan [19], we will use Ae (Ai) instead of the terms Ae (Ai). The functions Se and Si summarize the distributions of e.g., different neuronal thresholds or different synaptic weights, or the system's response to noisy inputs. It can be expected that Se and Si increase monotonically with their argument, and saturate above a certain activation level. If the distributions are unimodal, it has been shown that Se and Si indeed have the shape of a sigmoid function, as e.g., the logistic curve 1
1
Slog(I) := 1 + exp[--Zs(I -- If)] - 1 + exp(Zslf) " Another common choice is a threshold-linear function Slin Slin(I) " - s . ( I - / f )
for I > If, and 0 otherwise.
Equivalently, Se and Si could be interpreted as the firing rate of a single neuron depending on its total synaptic input. In the original work of Wilson and Cowan [19], the authors consider the dynamics of the fraction of neurons being active at time t, and therefore the activation or gain function has to be limited to values below 1. Alternatively, considering the mean activity (firing rate) of a neuronal population, the gain function could also have a threshold-linear shape as Slin. In Fig. 5, four different gain functions are plotted: the logistic curve Slog, a threshold-linear gain function Slin, the gain function Siaf of an integrate-and-fire neuron (see chapter by Meunier and Segev in this book), and the response curve for a pyramidal neuron Spy~ in the cortex [20]. It can be seen that all these curves share some common properties. Up to some firing threshold If, the response is small, then increases with some average slope s, until some saturation value is reached. Note, however, that the response curve typical for cortical neurons is surprisingly linear, and that most cortical neurons normally do not operate close to their saturation l e v e l - so Slin seems to be a good approximation to work with. At this point, we should note that the rate dynamics as described above has its functional limitations: the dynamics of single spikes and their correlations within one population or column have been averaged out. As can be seen in Eqs. (1) and (2), the time constants for the population dynamics are re and zi, and fast oscillatory responses on a smaller timescale, typically occurring when a stimulus is switched on [21], are not obtained by using the simplified dynamics. It has been shown that the full dynamics is able to switch instantaneously between different attractors of the population dynamics [22], which may lead to interesting dynamical phenomena beyond the scope of this chapter.
2.2. Dynamics of one column By setting dAe/dt =-dAi/dt- O, one can find the fixed points and analyze their stability in the phase space of Ae and Ai. To demonstrate the qualitative differences
Emergence of feature selectivity from lateral interactions in the visual cortex
100
977
j s." /
N
::
50
f.0
_ .... ~ t
0
~S
I,SJ .Z'," 9
---.... ......
,
1 I[nA]
Slog Sgr S,,o
Siaf
2
Fig. 5. The response curve or gain function S models the excitability or rate of one neuron in dependence of its input current I. Above some threshold, S normally increases monotonically with I. Sial, integrate-and-fire neuron; Slog, logistic curve as used by Wilson and Cowan [19]; Spyr, pyramidal cell in the cortex [20] and Slin, threshold-linear neuron. Parameters are s = 7 0 H z / n A and I f = 0 . 3 n A and Slin; s = 1 4 0 H z / n A and I f = l nA for Slog; s = 1/(RCIf) = 60 Hz/nA and If = 0.3 nA for an integrate-and-fire neuron obeying the differential equation RC d V / d t = - V + R I for its membrane potential V (C = membrane capacitance, R = membrane resistance). between a threshold-linear and a sigmoidal gain function, we first consider only one population (excitatory or inhibitory) within a column, with the simplified dynamics of Eqs. (1) and (2) and r - 0 dA
"c - ~ - - A + S ( w A + I).
(3)
The fixed points can be found by solving Ao - S(wAo + I) for A0; if one has to deal with two populations, one has to solve a system of two fixed point equations. One population. There are two different regimes depending on the gain parameter s and the coupling constant w; the weak coupling regime sw < 1, and the strong coupling regime sw >~ 1. In the weak coupling regime, with both S = Slin and S = Slog, we find a stable fixed point A0 whose absolute value increases monotonically with increasing input I (Fig. 6a and b). The only difference between the two gain functions is that with Slog, A0 saturates at higher values of I. In the strong coupling regime, with S = Slin, there is either one stable fixed point at A0 - 0, or the activity increases beyond all limits. With S = Slog, depending on I either one stable fixed point near 0, one stable fixed point near maximum activity, or both of these fixed points coexist. This behavior results in hysteresis: with intermediate I, depending on the initial or previous activation level A, one of the fixed points either at the low activity or at the high activity level is reached (Fig. 6c and d).
978
U. Ernst et al.
b
< 0 0
0
,//
/
<
||
t .dr
0 I
qh,,J
0 I
Fig. 6. Fixed points of the dynamics of a single column. The figures show the fixed points A0 (open circles, stable; stars, unstable) as revealed by the intersections of the gain functions S with the identity. For (a) and (c) the threshold-linear gain function Slin, and for (b) and (d) the sigmoidal gain function Slog was used in (a) and (b) the weak coupling regime, and (c) and (d) in the strong coupling regime. The colors red, green, and blue mark increasing input levels I. In (a) and (b), the dynamics has one stable fixed point for each/, while in (c), only the fixed point at A = 0 may be stable - otherwise, the activity diverges. In (d) the dynamics can have up to two fixed points with medium input levels; here, the system undergoes hysteresis and the activity is limited by the saturating gain function.
Two populations. With two populations, Eqs. (1) and (2) yield two isoclines intersecting at the fixed points of the activation dynamics. Their stability can then be derived by linearization of Eqs. (1) and (2) around these fixed points and solving the characteristic equation. Using Sli,, the activation dynamics is very simple. There is no hysteresis in the system, and either a stable fixed point exists at Ae >/0 and Ai >/0, or the activation diverges because the interaction is too strong. With Slog, there is the possibility of multiple hysteresis phenomena. Increasing the constant input, one finds either one, two, or three stable fixed points existing simultaneously (Fig. 7a and b). The existence of hysteresis is very important, because it can implement a form of short-term memory: brief pulses of external input can excite a column, which remains activated after the input has decayed, due to the dynamics of the internal couplings. Additionally, there is a parameter range where the model can exhibit (damped) oscillations in the population activity. These solutions of the differential equations correspond to the existence of limit cycles in phase space. Limit cycles occur if there is only a single unstable fixed point of the dynamics, and if the input is sufficiently
Emergence of feature selectivity from lateral interactions in the visual cortex
b
0.4
0
A.
979
0.5
0.4
-0.5
0
I
I
1
e
Fig. 7. Hysteresis in the Wilson-Cowan model. (a) shows intersecting isoclines for three different excitatory input currents Ie (red, Ie - 0; green, Ie = 0.5; blue, Ie = 1). Fixed points are marked as in Fig. 6. (b) Depending on the excitatory input current Ie, either one (blue), two (green), or three fixed points (red) are stable, and the initial conditions determine which one is selected. Adapted from Wilson and Cowan [19], parameters were Wee = 13, Wie - - 4, Wei - - 2 2 ,
Wii "- 2, Se - - 1.5,
Si = 6, If, e ---- 2 . 5 ,
If, i = 4 . 3 ,
re = ri - - 1, "1;e - - 10, ~i = 5,
and Ii = 0. high. It can be shown that limit cycles occur naturally in coupled neuronal populations [19]. It has been speculated, that these oscillations explain the rhythmicity seen in the frequency bands of E E G recordings, or other oscillatory phenomena in brain activity. In the following, we will concentrate on nonoscillatory solutions.
2.3. Coupled columns To simulate more than a local cortical column, Wilson and Cowan extended their model and examined a chain of coupled neuronal populations [23]. The activation A is now a function of time and space, A(t) ~ A(r, t), and the synaptic input now depends not only on the activities of the populations in the same column, but also on the activities in all other columns. The products in Eqs. (1) and (2) therefore have to be replaced by the convolution of the activities with the corresponding coupling kernels Wee (r - r'), Wei (r - l't), ~/ie (r - r'), and H/ii(r - r'), where [W 9A](r, t) := j c
W(r - r')A(r')dr' /7(
Te
~Ae(r,
t)
~-----------~- -Ae(r, t) + (ke - reAe(r, t))Se([Wee * Ae] (r, t)
-[g(e ~i
* Ai](r, t ) + Ie(r, t)),
(4)
8Ai(r, t) ~----------~ = -Ai(r, t) + (ki - riAi(r,t))Si([~i * Ae](r,t)
-[Wii * Ai](r, t ) +
Ii(r, t)).
The delay of synaptic transmission from r to r ~ has hereby been neglected.
(5)
980
u. Ernst et al.
The choice of the coupling functions Wxx is crucial for the dynamics of the system. A common assumption is that excitatory couplings prevail on short distances [[r - r']], while inhibitory interactions dominate on larger distances. This leads to a coupling function having the shape of a Mexican hat (Fig. 8). It is questionable if this assumption is really fulfilled in the visual cortex. It has been shown that longrange horizontal connections spanning several hypercolumns exist [24--31], while inhibitory interactions have a limiting range of about one hypercolumn [32,33]. These long-ranging axons, however, are not distributed homogeneously but form dense clusters in columns having a similar orientation preference as the neuron from which they originate. Due to the typical structure of an orientation map in the visual cortex, it may still be possible that the interaction profile has indeed the shape of a Mexican h a t - at least in the young animal, where long-ranging excitatory connections have not been developed yet. For convenience, we will also assume that the coupling functions are chosen such that fc
d r ' W ~ ( r - r') = Wx~. TX
In many cases, it is reasonable to reduce Eqs. (4) and (5) introducing the following simplifications: first, let us assume that the neurons have a vanishing absolute refractoriness, r i - - r e - - 0 . Second, axons originating from one population should contact excitatory and inhibitory neurons in a given distance with equal relative probability, Wee cx Wei and Wie o( g~i- Our last assumption is that the remaining parameters and the gain functions in Eqs. (4) and (5) are identical for inhibitory and
r,.
x !
x v
-50
0 r-r x
50 x
Fig. 8. Excitatory couplings We(rx- r'x) (green) having a shorter length scale than inhibitory couplings ~ ( r x - r'x) (red) lead to a coupling function W ( R ~ - r'~)= We- N (black) having the shape of a mexican hat. Parameters of the coupling functions chosen (see Eqs. (7) and (8)) are We = 14, wi = 12.5, ere = 5.6, cyi = 10, and d = 1.
Emergence of feature selectivity from lateral interactions in the visual cortex
981
excitatory populations, re = r i , ke = ki, and Se = Si. Under these assumptions, Eqs. (4) and (5) are redundant and can be replaced by a single integro-differential equation
~A(r, t)
17
~---7---= -A(r, t) + S([We * A](r, t) - [ < 9A](r, t) + I(r, t)).
In the following section, we will assume a threshold-linear gain function S make the following choice for the coupling functions We and N:
_ We(r - r')
We
(2 g)d/2(Y d e x p
__
Wi
(2 rt)d/zcr~ exp
(r - r')
(,r-r',,2) 2 cy2
--
(r-r',,2) --
2 cy2
(6) =
Slin
and
(7) '
(8)
"
The results are qualitatively identical for other choices, exceptions will be discussed. s will be termed the gain, If the firing threshold, We the excitatory coupling strength, wi the inhibitory coupling strength, and Cre and cri, with ~e < c~i, the excitatory and inhibitory coupling length scales, respectively, d denotes the dimensionality of the neuronal tissue.
2.3.1. Dynamics of coupled columns Constant input." Linear, marginally stable, and diverging regimes. For simplicity, let us first consider a one-dimensional chain of length lx, with periodic boundary conditions. This chain is stimulated with a constant external input I ( r , t ) = I0 = const. > If. What activation dynamics do we expect from the model in Eq. (1.6)? It is simple to calculate that there exists one single spatially homogeneous fixed point A0 > 0, if wi > We - 1/s (for arbitrary coupling functions) A0(r, t) = A0 =
(Io - l d
1 - S(We -
wi)"
Is this fixed point stable? After linearization of the dynamics and solving the characteristic equation by applying the Fourier transformation, one obtains the following spectrum of eigenvalues )v(k):
1 I
a(k) = --+-
,17 ,17
weexp
If there exists one k for which stable. For our choice of eye < )~(kl) is always negative for V ' - c r e2/cr2 < 1. Therefore it is This is the case if
2
-wiexp
( 2 2)1 -
2
.
)v(k) > 0, then the homogeneous fixed point is un~i, 2 has one or two extrema at kl = 0 and k2 > 0. A0 > 0, and k2 does not exist for wi < we/V, crucial to know if)v(k2) is also negative ifwi > we/V.
U. Ernst et al.
982
wi > We ~" V. [(1 - V)-s] -}-1.
(9)
Summarizing these considerations, we have three stability conditions B1-B3: 9 B 1: Existence of a homogeneous fixed point for Wi > We - - 1Is. 9 B2: The homogeneous fixed point is stable for wi < we/V. 9 B3: For wi > we/V, the homogeneous fixed point is stable if condition (9) is fulfilled. Due to these conditions, the phase diagram of the activation dynamics depending on the lateral coupling strengths shows three different regions (Fig. 9a). If the inhibitory coupling is strong enough, the homogeneous fixed point is stable and every perturbation decays exponentially (linear regime) (Fig. 9b). If the excitatory weights become much stronger, the fixed point becomes unstable and the smallest perturbation of the homogeneous external input leads to an exponentially increasing activity. Due to the threshold in the gain function S, however, there is a subregion where the activity converges into an inhomogeneous stable state, the so-called marginally stable regime (Fig. 9c), which will be described later. If the excitatory interaction becomes too strong, the nonlinearity of the coupling function does not suffice to limit the diverging activity (diverging regime). We want to emphasize that this divergence could be avoided by using a saturating gain function as e.g., 5'1o8. In cortical neurons, however, neurons barely operate near their limits - so the distinction between the divergent and marginally stable states could help to find an operating regime where the activity level of the cortical neurons is regulated only by the network, and not by the internal dynamics of a single element.
Nonconstant input in the marginally stable regime. In the linear regime, spatially inhomogeneous input leads to a similar activity distribution, because afferent input dominates over lateral feedback. In the marginally stable state, however, the activation dynamics becomes much more interesting. Let us assume that we have a neuronal chain of length lx with periodic boundary conditions, and that there is a positive perturbation in the input at the position r~x = lx/2. Two possible realizations of this input would be:
' 1 = 10(1 +
cos(2
(lo)
(rx -
Due to the excitatory interactions prevailing on short distances, this perturbation becomes enhanced, while the activity in the surround becomes suppressed by the inhibition. This leads to a localized activation blob centered around 1~/2. If the afferent input is suprathreshold everywhere, other blobs appear in a specific distance which is determined by the length scales of the excitatory and inhibitory interactions. In Fig. 9c, this distance is about half of the size of the chain, such that two activation clusters appear. This picture does not change significantly in higher dimensions: in a two-dimensional cortex, the activation clusters typically
Emergence of feature selectivity from lateral interactions in the visual cortex
983
marginally stable regime
100
0 -2
-4 0
1
2
0
linear regime
1
2
50 divergent regime 0 0
I
n
,
i
20
30
W
b
20
0
,
i
40
50
e
1
2
60
c
200 I
n
m
m
m
m
m
|
m
u
m
u
m
u
<
m
m
m
m
m
n
< 0 0
50 r
X
I O0
(I)
0 0
.=
50 F
100
X
Fig. 9. (a) Phase diagram in dependence of the excitatory and inhibitory coupling strengths We and wi. The conditions B1-B3 partition the phase space in three regions: in the upper region, the homogeneous fixed point is stable (b), and in the lower region, no fixed point exists and the activity diverges exponentially. The region in between shows a different behavior. Here, the homogeneous fixed point is unstable, so each minimal local perturbation of an otherwise constant synaptic input leads to pattern formation, which is stable (c) or unstable, depending on the actual strength of the inhibitory coupling. The green line separates the linear from the marginally stable, and the red line marks a lower boundary of the marginally stable regime. The blue lines are numerical estimates of the phase boundaries. (b) and (c) show successive activity profiles A(rx, t) after the system has been stimulated with a homogeneous input with a small perturbation, at times t = 1.25, 45 (dotted), t--3.75, 50 (dashed), and t = 50, 55 (solid), respectively. Parameters for the simulation were We = 30, 45, We = 80, 32, lx = 100, CYe • 5.6, (5"i - - - 10, z = 5, s -- 100, I = 1, At = 0.25. The insets in (a) display typical eigenspectra X(k) for the three cases. a r r a n g e in a h e x a g o n a l p a t t e r n (Fig. 10). If the inhibitory interaction extends over larger distances, or even does n o t decay significantly as in [34], then the n e t w o r k implements some sort o f a winner-takes-all n e t w o r k with global inhibition. Only one blob will a p p e a r at the location with strongest feedback and afferent input, a n d all other n e u r o n s will be inhibited.
U. Ernst et al.
984
80
m '~'
e 0
' 40
~>~
0
t O
O
p 0
O 9
O O 0 0
:i4
4 0
O
O 0
A(r x, ry)
O O O
,
b:.
O
O ql
0 O 0 rx
9
0 0
9
Q
4
Fig. 10. Stationary activation pattern A(rx, ry) in a two-dimensional homogeneous model cortex, obtained with a uniform stimulus. The blobs arrange in a regular hexagonal pattern. The activity A is coded in shades of grey, see color bar. Simulation parameters are Ix = 8 8 , ly = 105, W e = 45, W i = 60, O"e = 2.8, O'i = 5 , T = 5 , S = 100, I 0 - 1, and At = 1.
There are two other interesting dynamical states in this model leading to propagating waves or blobs of velocity ~'~b [35-37]. In the first state, a movement of a periodic stimulus with velocity ~s as modeled e.g., by I = I0(1 + e cos(2 ~(0.5 + ~st +
rx/lx))
(11)
drags the blobs into the direction of movement. Depending on the time scale of the lateral dynamics and the modulation amplitude ~ of the stimulus, the activation either follows the stimulus perfectly with [~b -- fls, or misses some cycles (fib < fls). In the second case, a small asymmetry in the input leads to a self-propagating wave. Here, a necessary condition is ~ << 0, because otherwise the blob becomes pinned at the position of maximal input. Traveling waves typically occur when the inhibitory input is large [37].
3. A simple model of visual cortex In the previous section, we discussed that a simple model of coupled neuronal populations can exhibit interesting dynamical properties: from simple fixed points to pattern formation, and from network oscillations to traveling waves. Here, we want to make the connection from the model to various experimental observations in the primary visual cortex. We will show that the rate dynamics formalism can explain a variety of dynamical properties of real neurons in the visual system.
Emergence of feature selectivity from lateral interactions in the visual cortex
985
3.1. Orientation preference Neurons in V1 respond perferentially to moving stimuli having a specific orientation 9 . Recording the firing rate A of a cortical neuron depending on the orientation of the stimulus in its receptive field yields the so-called orientation tuning curve A (~). It has been observed that the width of A(~) is nearly independent of the contrast or stimulus intensity I (orientation tuning invariance). One possible mechanism for orientation preference was proposed by Hubel and Wiesel: one cortical neuron receives input from a number of thalamic neurons whose receptive fields are aligned to match one specific stimulus orientation, see Fig. 4. Although the synaptic input indeed shows a tuning matching the preferred orientation of the cortical cell, the aspect ratio of this tuning is far too weak to explain the sharpness of the orientation tuning in cortex (see, however, [38,39]). One method to sharpen this tuning is to introduce some inhibitory intracortical interaction suppressing the response of the cell to nonoptimally tuned stimuli, but still the data could not be explained because the orientation tuning width now depends on the stimulus intensity. A better idea is to use excitatory and inhibitory intracortical interactions, as has been demonstrated in the work of Ben-Yishai and Sompolinsky [10]: Let us again assume we have a one-dimensional chain of coupled cortical columns with length lx. We now identify this chain with a hypercolumn in the visual cortex by assuming a linear dependence between position and orientation preference, ~(rx)= rcrx/lx. This assumption is plausible because orientation preference varies smoothly with position within the cortex, as can be seen in typical examples of orientation preference maps (see Fig. 2b). Accordingly, each stimulus with orientation ~s within the receptive field of this hypercolumn is modeled as a thalamic input
I(rx) = I0[1 + gcos(2((I)(rx) - (I)s))] to the cortex, z << 1 is a weak bias of the input, as e.g., provided by the spatial arrangement of the receptive fields of the LGN cells as proposed by Hubel and Wiesel, encoding the orientation preference of the cortical columns. From the previous section, we know that the model, working in the marginally stable regime, will enhance this perturbation e in the afferent input, leading to a stable activation blob centered around the neuron with orientation preference ~s. Columns with orientation preferences not similar to ~s become suppressed. As the model is symmetric with respect to an exchange of (I)s and ~(rx), the activation profile A(rx) can be identified with the orientation tuning curve A(~s). From Eq. (6), it is easy to see that the shape of A(~s) does not depend on the stimulus contrast I0, provided that the afferent bias ~ is weak. From this evidence, it can be concluded that lateral interactions may play an important role in shaping the responses of cortical neurons in V1, leading to contrast-invariant tuning curves. Within this framework, some other properties of the visual system could be explained easily. It is well known that small angles between two adjacent line segments are overestimated (Fig. 1 l a). Due to the inhibitory interactions, two activation blobs can coexist only if they are separated by a distance roughly equivalent to
986
U. Ernst et al.
a
b
< 0
r
•
Fig. 11. (a) Small angles between line segments are overestimated by our visual system; we perceive a tilt in the vertical lines despite they have been laid out in parallel. (b) Possible explanation for the phenomenon shown in (a). If difference in orientation A~ is identified with difference in cortical position Arx (compare with Fig. 2a), two line segments crossing at small angles lead to a bimodal input distribution I(rx) (thin line). The distance between the two maxima in this distribution is smaller than the minimal distance between two blobs, which appear shifted to the left or to the right, relative to the input. Thus, the position of their maxima in A(rx) may suggest for higher cortical areas that the stimulus displays a larger angle than present.
the length scale of these couplings. Choosing a stimulus with two maxima at angles corresponding to a smaller distance in the cortex, the model would predict that either only one single blob will appear, or two blobs within their minimal interblob distance. In the latter case, this repulsive effect could be interpreted as a much wider stimulus angle by higher cortical areas. Besides this phenomenon, other dynamical properties of the columnar activation dynamics have been directly related to the perception of illusory movements of ambiguous stimuli [40]. In these experiments, the observer sees flashing spots positioned on the corners of an invisible even-sided rectangle. In the first stimulation condition, the leftmost and rightmost two corners, and in the second condition, the uppermost and lowermost two corners are flashed alternately. The observer perceives these stimuli as an illusory movement in the horizontal and vertical directions, respectively. In a third stimulus condition, however the opposite corners (upper left and lower right, or upper right and lower left) are flashed alternately. This stimulus is ambiguous, it can be either perceived as a horizontal or a vertical illusory movement of the spots. Typically, an observer switches between these two percepts periodically. One could imagine that in the visual cortex each flashing light evokes one activation blob. If the stimulus changes, the blobs, instead of vanishing and reappearing at the new position of the spots, could be dragged to their new positions. This dragging of an activation cluster over the cortical surface could be interpreted as an illusory movement of the stimulus itself. With the ambiguous stimulus condition,
Emergence of feature selectivity from lateral interactions in the visual cortex
987
small perturbations would determine into which direction the blobs are dragged first, horizontally or vertically, and this behavior would persist until this attractor of the dynamics is left due to e.g., external influences or noise.
3.1.1. Consequences of disorder The pattern formation mechanism of the model has some other interesting properties which may have far-reaching consequences for cortical development. As we have seen, a small perturbation in the input becomes amplified and leads to the formation of a blob which is centered around the maximum of the input. What happens, if there are also perturbations in the couplings or in the density of the neurons, as one would expect in a "real" cortex? To investigate this question, we applied a small r a n d o m jitter rl(rx, ry) on the weights matrix, such that there will be positions where the coupling is stronger than at other positions. In Fig. 12, we simulated the model with and without jitter, applying the same input centered at different cortical positions. As can be seen in Fig. 12a, the neuronal response follows exactly the stimulus, in the first case, the system is marginally stable in the sense that a stable solution centered at position rx, is also a stable solution centered at position rx + Ar~. In the second case (Fig. 12b),
150
150
x_ x
<
x_ x
<
v
v
0
50 r X
100
0
50 r
100
x
Fig. 12. Superpositions of activation profilesA(rx) of a chain of coupled cortical columns to different afferent inputs (Eq. (10)). Each of the afferent inputs has a maximum at one specific location r'x. These locations have been chosen to be distributed equidistantly (in the simulations, the input has been shifted by equal distances, with periodic boundary conditions). While in (a), the network was homogeneous, in (b), random disorder has been introduced by applying a random displacement of rl(rx ) -- rand(l) on the columnar positions. In (a), the positions of the clusters are exclusively determined by the input maximum, in (b), the marginally stable continuum of attractors has been broken up into a finite set of attractors located at positions with maximal cortical feedback. Here, system disorder and input perturbation determine the neuronal response. The response to the stimulus centered in the middle of the chain is marked in red. Parameters were We = 45, Wi = 60, (5"e - - - 2 0 , O'i = 40, z = 5, s = 100, Io = 0.5, e = 0.05, At = 1, cyl = 5.
988
U. Ernst et al.
however, this continuum of stable solutions breaks down into a discrete set of attractors: the position of the blob is now determined by both the maximum or the specificity of the input, and the position in cortex where the (excitatory) feedback is strong [34]. This clustering phenomenon is qualitatively independent on the type of disorder: random neuronal positions, distributions of neuronal parameters, variations in the synaptic weights or neuronal density - all of these sources of quenched disorder lead to the same observations. It should be noted, that these variations are to be expected in a real biological system, so this phenomenon seems to be the generic case. As we will see in the next section, it is of interest how the inhomogeneities or impurities in the system affect the response not only to static, but also to moving stimuli. Without inhomogeneities, we already mentioned that a movement of the stimulus leads to a movement of the cortical activation pattern. What are the factors determining the velocity ~"~b of this pattern? First, the amount of noise 1"1 will determine the strength of a stimulus needed to delocate the blobs. Also, the modulation amplitude e of the stimulus, and its periodicity )~ compared to the length scale of the blob pattern, will have an impact on the movement. We simulated Eq. (6) on a two-dimensional cortical surface, applying input of the form I(r, t) = I0[1 + ecos(2 rc(rx + ~st)/A)] thus realizing a drifting sinusoidal grating close to stimuli used in experiments. Fig. 13 shows the resulting velocities ~'~b for fixed 11 and different A. It can be seen that with all other parameters held constant, movement is fastest if A has the same order of magnitude as the length scales of the lateral couplings. Additionally, movement is most pronounced if the velocity ~2s is small enough to match the dynamic's time constant, and if the stimulus modulation amplitude e is strong, which is what one would expect. The aforementioned condition on A, however, requires an intuitive explanation. With small gratings, it is clear that the stimulus modulation is smoothed out by the lateral feedback. But using a wide grating, the blobs also remain stationary. This counterintuitive behavior only occurs if multiple blobs are present. While one blob is dragged in one direction, the next blob positioned in the direction of the movement may receive more afferent input at its opposite side. Thus the two blobs counteract while trying to decrease their distance, which is not possible due to the existence of a minimum distance allowed by the length scales of the lateral couplings. While the blobs seem to remain stationary if one averages the activation dynamics over time, the stimulus movement nevertheless induces small oscillatory movements around the blob's positions. The dynamics resembles a particle in a potential well driven by a periodic force too weak to enable the particle to escape. This behaviour might underly the directional specificity of cortical neurons, as explained in the next sections.
3.1.2. Emergence of cortical maps The main result of the last section is that the position of the blobs is determined both by the local strength of the intracortical feedback, in conjunction with the local
Emergence of feature selectivity from lateral interactions in the visual cortex
a
A=4
989
b
A=8
1 cO _Q
0
0
1
4
E
0
16 192 64 ~ [.10-3]
~
s
A=16
4
~
0
192 o~ ~. [.10-3] S
A=32
1
1
m""-n
cO
o9
t_______, x'3
r
0 1
4 0
192 o~ ~
s
[.10-3]
4 ~
0
192 u~ ~
s
[.10-3]
Fig. 13. Movement ~')b of the blobs, normalized to the movement of the stimulus, f~s, for several parameter combinations: (a)-(d) for A - 4, 8, 16, 32, respectively. In dependence of the effective stimulus modulation amplitude e and the stimulus velocity f2s, the blobs either remain more or less stationary (f2b ~ 0), or are being dragged by the stimulus (f2b > 0). If ~'~b -- 1, the blobs move with the same velocity as the stimulus. Parameters as in Fig. 10.
strength of the afferent input. If one presents different stimuli like gratings of various orientations, then the pattern of blobs resulting from the lateral dynamics should be different for each orientation. This lead us to the idea that both orientation and direction selectivity of single neurons, as well as the shape of the corresponding maps, might emerge from a stereotyped pattern of intra-cortical connections instead of a bias in the afferent weights, and are therefore independent from visual experience. We tested this hypothesis on a two-dimensional model with r a n d o m inhomogeneities. Each neuron has an isotropic input field receiving afferent stimulation from a specific circular region of the visual field (VF).
I(r, t) - 2
/0/v 2
g O ' a ft
F
exp(
2crzfr
990
U. Ernst et al.
where G(R) is the visual stimulus (typically a sinusoidal grating or an oriented bar) at position R in the VF, and CYaff is the length scale of the arborization of the afferent connections. Due to the retinotopic organization of the cortex, we choose R(r) := r, and we assume also periodic boundary conditions for the visual field R to avoid perturbation of the results by geometrical constraints [18]. For these reasons, the VF had necessarily the same aspect ratio as the cortex, but it is possible to simulate different magnification factors of the retinotopic mapping by simply rescaling the stimulus. Without lateral connections within the network, the units do not exhibit any orientation or direction selectivity. We considered two types of stimuli G - full-sized sinusoidal gratings Gs and localized spots and bars G3. In the first case, the stimulus is a sinusoidal grating with period A, orientation (I), and velocity f/s Gs(R) "- 1 + e-cos
(2 rt(q~ + ~st)) A '
qx := Rx cos(O) + Ry sin(O), where ~ models the contrast of the stimulus, R - (R~, Ry)T a position in the VF, and t is the time. In the second case, the stimulus is a circular spot of size (Yl, centered at Rc GI(R) "- (1 - e ) + e exp ( -
IIR-RII2) 2 c~2
where Rc runs over all positions where localized stimuli were presented.
3.1.3. Orientation preference maps Similar to the experimental procedure, full-sized gratings moving in N = 16 different directions covering the full circle, (I)(n) = 2rm/N, were presented to the network. For each of these stimuli, after an initial time interval of To = 100 which was long enough for the activity patterns to build up, one single condition activity map An(r), n = 1,... ,N has been obtained by averaging the resulting activities A,(r, t) over a time interval of T = 800 time steps of At = 0.25 (integration with Runge-Kutta fourth-order)
[To+T --Tat-To An(r, 1
A,(r)
--
0.
As predicted, for each stimulus a different activation pattern has been obtained. For a specific neuron, this results in different activation levels depending on the orientation of the stimulus: the neuron spontaneously shows up an orientation and direction preference. The preferred direction | is obtained by vectorially summing up the averaged activities for each direction of motion. Similarly, the preferred orientation @(r) is obtained by the same procedure after first averaging the activity over two opposite directions of motion for each orientation of the grating (see e.g. [8])
Emergence of feature selectivity from lateral interactions in the visual cortex
1 O(r)
-
991
N
~ Z An(r) exp(iO(n)), n=l
1
N
D(r) = ~ Z An (r) exp (2 i(I)(n)), n=l
|
= arg{O(r)},
@(r) = arg{D(r)}. Why do the responses of the columns show a preference for certain stimulus orientations, while there is no structure or bias in the afferent connections? In terms of the symmetry breaking introduced by small inhomogeneities in the weight matrix, each stimulus selects a different subset of all possible attractors. The feedback of the lateral couplings both attenuates the irregularities in the system, and applies a spatial bandpass filter on the cortical response [41,42]. This bandpass filtering by the Mexican-hat type of interaction is responsible for the emergence of orientation preference maps. The neurons not only achieve a preference for one stimulus orientation, but orientation preference also changes smoothly across the cortical surface. The excitatory interactions are responsible for the smoothness of the maps (cooperation), while inhibitory interactions realize a sort of competition. Typically for the mapping of a periodical quantity onto a manifold, the orientation preference map (I)(r) shows characteristic singularities like pinwheels (places where you can find cells of all orientation preferences nearby) and fractures (elongated curves where cells change abruptly their orientation preference), Fig. 14. In this respect, the model is similar to the algorithms proposed by [41,42] explaining the structure of cortical maps by low-pass filtering noise. In the models described in these pages, however, the filter here is related to the dynamics of the biologically plausible lateral interactions which also have much wider capabilities to account for experimental evidence like contrast invariance of orientation tuning and other neuronal properties described in the previous sections. Most importantly, the positions of active patches were robust against random initial conditions due to the stabilizing effect of the inhomogeneities in the lateral connections [43]. 3.2. Direction preference maps
Having a closer look at the single condition maps An, we see that the neurons also exhibit a preference for a certain direction of the stimulus movement (Fig. 17a). The corresponding direction map (Fig. 15) closely resembles experimental data (Fig. 16). Where does this preference comes from? As we already know from the preceding section, moving the stimulus implies a periodic movement of the activation clusters. The existence of the inhomogeneities in the network now introduces barriers like hills in a potential landscape, as in the example with the particle in a potential well discussed above. If the force of the
992
U. Ernst et al.
.........t O
-
ep
J.
"
":
"
~0
..
r..-J
m
S
! % %
i
Fig. 14. (a) Single condition maps An for gratings drifting in n = 8 different orientations (see colored bars). (b) Orientation preference map O(r) obtained by vectorially summing up the single condition maps shown in (a). Parameters were lx = ly = 128, ere = 5.6, O'i = 10 W e --- 45, Wi = 60, O ' a ft - - 4, (q) = 1 (white noise), I0 = 10, s = 100, 9 = 5, At = 1, f~s = 0.2, and A ~ 18, which is approximately the size of a blob. stimulus m o v e m e n t does not suffice to drag the blob over the potential barrier into the basin o f the neighboring attractor, the stimulus m o v e m e n t will only induce an oscillation o f the blob a r o u n d a position which is identified relative to its stationary location in the long-time average. In Fig. 13, there exist regions where the blob moved, ~"~b > 0, or r e m a i n e d stationary, ~"~b = 0. The conditions s u p p o r t i n g a fast m o v e m e n t are: 9 a strong m o d u l a t i o n e o f the sinusoidal stimulus, because o f the c o m p e t i t i o n between the stimulus m o v e m e n t a n d the localization strength o f d o m i n a n t lateral interactions,
Emergence of feature selectivity from lateral interactions in the visual cortex
993
R
#,
g
% m
|
grad(~(r))
* ID(r)i
Fig. 15. (a) Direction preference map | The preferred direction is shown color-coded according to the color bars on the left. (b) Relation between direction and orientation map. Discontinuities in the orientation preference map, as obtained from a gradient transform g r a d ( ~ ) - V/(O(I)/Orx)2+ (O~/~ry) 2 with periodic boundary conditions and periodic argument (I)(r), are coded in shades of yellow superimposed on the map of the directional selectivity ]D(r)[ coded in grey shades. Dark colors represent low, and bright colors represent high absolute values (normalized color table). Same parameters as in Fig. 14.
Fig. 16. (a) Direction preference map and (b) relation between direction and orientation map found in experimental studies from Shmuel and Grinvald [8]. Same representation as in Fig. 15, the sizes of the rectangles are approximately 3 mm z 2.5 mm. 9 a stimulus velocity f~s with a time constant similar to the rate dynamics, and 9 a convenient grating period A in the same range or larger than the typical interblob distance. Because all of these conditions have to be fulfilled to move the blobs over an inhomogeneous cortex, it is very improbable that the blobs will move being in a regime where lateral interactions are strong. Instead, most likely the shifts from the basin of attraction will lead to direction preference: moving the stimulus to the left, the average activation is shifted to the left, and moving the stimulus to the right, the average activation also shifts to the right. This effect provides a novel explanation
994
U. Ernst et al.
0
O A
IV
25
50
m
A(---)(r)
(A+i+,A - )/2
Fig. 17. (a) Two activity maps A+(r) obtained by stimulation with a grating moving up (top left) or down (top right) yield a the mean activation A(r) = (A+ + A - ) / 2 stimulating with a specific orientation (bottom). The single differential directional map is shown by the yellow and green arrows, their lengths coding the selectivity A + - A- for one of the two conditions. (b) shows the equivalent differential map from experiments by Shmuel and Grinvald [8]. Same parameters as in Fig. 14.
for the direction selectivity of the neurons, not relying on special types of neurons, [44] or asymmetries in the afferent connections, either leading to delayed s u m m a t i o n of action potentials [45,46], or to a sharpening of these asymmetries by cortical
Emergence of feature selectivity from lateral interactions in the visual cortex
995
feedback [47] - here, the direction selectivity is caused simply by intracortical inhomogeneities. From these observations, it is possible to explain another experimental result. It has been observed that patches having similar orientation preferences split into two subpopulations having opposite direction preferences (Fig. 17). As a patch of similar orientation preferences corresponds to an activation cluster evoked by a specific stimulus orientation, it is obvious that the shift of the cluster in both stimulus directions divides this patch into two subregions being more strongly activated by the one or by the opposite direction of movement. Necessarily, the deformation being roughly symmetrical, there has to be a narrow region within the activated area where the activities for stimuli moving in either of the two directions are the same [48]. These narrow regions of weak directional preference typically connect pinwheels or fractures in the orientation preference map (Figs.15b and 16b). A similar relationship between PO and PD cortical maps was recently observed in an experimental study [8]. By subtracting the two activity patterns for opposite directions one obtains the differential map (Fig. 17a, length of arrows). Comparing the average amplitude of this map with the corresponding differential map for two orthogonal orientations (not shown), one finds that the orientation map was about twice as strong as the direction map, in agreement with the experimental results obtained for kittens [8]. The relative weakness of the PD map in the network is a consequence of the strong pinning of the activity patches at the particular locations for a given orientation. As a result, the amplitude of the oscillatory movement of patches, and thus the amplitude of the corresponding differential map, is small.
3.3. Receptive fields Recently, several experiments have been carried out to determine the relationship between cortical maps and receptive fields. It has been found that on average, receptive fields are nonoverlapping for neurons more than half a hypercolumn apart. That means, if one measures the receptive fields of neurons positioned in a straight line in the cortex, receptive fields move about two average receptive field diameters per hypercolumn, i.e. a change in orientation preference of 180~ In the model, receptive fields should not be mixed up with the afferent input fields. The input field only determines which afferent input a column receives, but the activation response of the column is determined by the concerted action of all neighboring columns. It has to be expected that the receptive field as measured by the column's response to localized stimuli shown at specific cortical postions does not match the shape of the input field. To obtain the receptive field sizes and shapes, we applied a small nonoriented stimulus G1 at positions R c VF on a regular lattice, sampling the whole VF, thus leading to the activity distributions A(r, R). The receptive field at r was then to be analyzed as follows: with sign(x) = 1 for x > 0 and 0 otherwise, the sizes (average diameters) d(r) of the receptive fields in units of the lattice unit length were computed as
996
u. Ernst et al.
d(r) - ~ / ~ / J v v sign(A(r, R)) dR / 9 Accordingly, the receptive field centers C(r) were given by C(r)-jvvR.A(r,R)dR/fvvA(r,R)dR. It is known that even a very small localized stimulus provides broad sub-threshold excitation to the cortical area [27]. To account for this observation, it is convenient to choose the stimulus radius large enough for the afferent input to cover a big area in cortex, but at the same time small enough to allow only one activation blob to appear. Accordingly, the position where the blob appears will be the location where the lateral feedback is strongest. Without inhomogeneities, the neuronal response is centered at the maximum of the input distribution. In this case, one receptive field has the size of a typical blob (Fig. 18a), because units which are located at the opposite sides of the patch will have nonoverlapping receptive fields. Introducing inhomogeneities, the receptive field size can change from very small values (at positions with weak lateral feedback) up to the size of one hypercolumn (at positions with very strong lateral feedback), see Fig. 18b. Nevertheless, the mean receptive field size equals the size of half a hypercolumn. Correspondingly, the smallest distance between two units with nonoverlapping receptive fields should be about one half of the size of the hypercolumn. This relation can indeed be observed in simulations (Fig. 19b). We note here, that this relation relies on the intracortical interaction only, and not on a specific arrangement or length scale of feedforward connections from the thalamus. It would be very difficult to explain that relation by the dynamics of feedforward interactions.
4. Discussion
In this chapter we reviewed the dynamics of population activities within the framework originally developed by Wilson and Cowan [19]. In chains and twodimensional neuronal layers a Mexican-hat shaped coupling induces localized activation patterns. This simple dynamics can be related to response properties of neurons in primary visual cortex. We showed that this approach can also explain the shape of orientation and direction maps as well as the relation of columnar structures to receptive field size and movement. In particular this chapter demonstrates that lateral interactions can play a crucial role for the emergence of typical response properties of neurons and columns in primary visual cortex. The tuning properties of neurons responding to oriented moving stimuli result from the interplay between excitation on a short length scale, and inhibition dominating at larger distances. While the excitatory feedback amplifies the response, the inhibition sharpens the tuning and suppresses
Emergence of feature selectivity fi'om lateral interactions in the visual cortex
997
V 20
40 A(r, R)
R
20 U ~ . " ~ ~ ~'~
40
A(r, R)
R
Fig. 18. (a) Activations A(r,R) in a simulation with a homogeneous distribution of weights, and (b) in a simulation with a jittered weight matrix (data and parameters from Fig. 12), due to input distributions with maxima at positions R. While in (a), receptive fields A(r = rconst.,R) are of equal size and shape at all cortical positions r, in (b), receptive fields vary tremendously in size and shape, especially at positions where neuronal densities are small or, equivalently, cortical feedback is weak.
responses of columns having different orientation preferences. Within this flamework, several psychophysical experiments can be interpreted in terms of lateral interactions. Recent simulations demonstrate that local intra-cortical connections could also be responsible for the orientation and direction selectivity of single
U. Ernst et al.
998
3
'
Oo oo o
~o ~
2
o
'oo ~176176 o ~ ~o oJ'- o o~ ~ ~o
o
8
,~o~176 ~,~o
o o
-
,,oo
~#~oO ~ ~ o - ~ ~ Oo Oo~o o~o o
O.o<3
~
o
;~[~~B'~epog~"-oo
,,% cPO"
~~i~l~~ o@
~
ooo o
_ o o o
~o o oo o
o
o
o
O'U- 0
, O ~ ~ o O oO %0 0
0
0
~o
'
0
0
100
200
300
Fig. 19. The distance, AR, of the centers of two neighboring receptive fields is an approximately linear function of the difference of preferred orientations, A~, of the corresponding neurons. Compare with Fig. 3. Parameters as in Fig. 14, stimulated with nonoriented stimuli G1 with ~ = 1 and (5"1 = 5.
cortical neurons as well as the geometrical shape of the corresponding maps at the very early stages of cortical development. This hypothesis is reinforced by the result that the overall appearance of the simulated maps, and the interrelation between them was the same as of the experimentally observed maps. These maps emerge by a symmetry breaking induced by small inhomogeneities in the network, which are amplified by the feedback and smoothened by the band-pass filtering property of the coupling function. Formally, the inhomogeneities reduce the continuum of marginally stable states to a finite number of attractors, of which the oriented input chooses a subset as locations for the activation clusters. Since no activity-specific development of neither afferent nor lateral interactions was involved, the shape of the maps in our model is strictly a function of the pattern of connections within the network. It is possible that this mechanism of orientation and direction selectivity arises early during development and only subsequently the afferent and horizontal connections are adapted in an activity-dependent way. This development could be driven by either visual experience or by spontaneous activity waves in the retina which were observed in kittens even before opening of the eyes [49]. In the maturing cortex the connections then become orientation specific in register with the orientation preference of the neuron's responses [50,51,25,52,9]. If the general layout of local intra-cortical connections remains the same at the initial stages of development, our model could explain the remarkable stability of visual maps observed in the experiments [15-17]. Our results are also compatible with the finding that kittens raised with strabismus have only a single PO map that is continuous across the segregated monocular domains of the two eyes [53].
Emergence of feature selectivity from lateral interactions in the visual cortex
999
Abbreviations EEG, Electro encephalogram Hz, Hertz LGN, Lateral Geniculate Nucleus mm, millimeter nA, nano Amp6re OP, Orientation preference RF, receptive field t, time V, membrane potential V1, part of visual cortex (area 17) VF, visual field
Acknowledgements The work on the cortex model and cortical maps presented here was done in close collaboration with Carmit Sahar-Pikielny. We would also thank S. Hochstein, A. Shmuel, S. L6wel, W. Singer, and F. Wolf for valuable discussions; and Stan Gielen for his patience waiting for the manuscript. This work was supported in part by grants from the Max-Planck-Gesellschaft (K.P.) and the Deutsche Forschungsgemeinschaft (U.E.) through SFB 185 and SFB 517. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Hubel, D. and Wiesel, T. (1992) J. Physiol. 160, 106-154. Hubel, D. and Wiesel, T. (1968) J. Physiol. 195, 215-244. Hubel, D. (I989) in: Eye, Brain, and Vision. Scientific American Library, New York. Das, A. and Gilbert, C. (1997) Nature 387, 594-598. Ernst, U., Pawelzik, K., Tsodyks, M. and Sejnowski, T. (1999) Neur. Comp. 11, 375-379. Bonhoeffer, T. and Grinvald, A. (1986) Nature 353, 429-431. Blasdel, G. and Salama, G. (1986) Nature 321, 579-585. Shmuel, A. and Grinvald, A. (1996) J. Neurosci. 16, 6945-6964. Ferster, D., Chung, S. and Wheat, H. (1996) Nature 380, 249- 252. Ben-Yishai, R., Bar-Or, R. and Sompolinsky, H. (1995) PNAS 92, 3844-3848. Somers, D., Nelson, S. and Sur, M. (1995) J. Neurosci. 15, 5449-5465. Obermayer, K., Ritter, H. and Schulten, K. (1990) PNAS 87, 8345-8349. Wolf, F. and Geisel, T. (1998) Nature 395, 73-78. Scherf, O., Pawelzik, K., Wolf, F. and Geisel, T. (1999) Phys. Rev. E 59, 6977-6993. G6decke, I. and Bonhoeffer, T. (1996) Nature 379, 251-254. G6decke, I., Kim, D.-S., Bonhoeffer, T. and Singer, W. (1997) Europ. J. Neurosci. 9, 1754-1762. Crair, M., Gillespie, D. and Stryker, M. (1998) Science 279, 566-570. Wolf, F., Bauer, H.-U., Pawelzik, K. and Geisel, T. (1996) Nature 382, 306. Wilson, H. and Cowan, J. (1972) Biophys. J. 12, 1-24. Prince, D. and Huguenard, J. (1988) in: Neurobiology of Neocortex, eds P. Rakic and W. Singer. pp. 153-176, Wiley Chichester, New York, Brisbane, Toronto, Singapore. 21. Bethge, M., Pawelzik, K. and Geisel, T. (1999) Neurocomputing 26-27, 1-7. 22. Lin, J., Pawelzik, K., Ernst, U. and Sejnowski, T. (1998) Network 9, 333-344.
1000
23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
U. Ernst et al.
Wilson, H. and Cowan, J. (1973) Biol. Cyb. (Kybernetik) 13, 55-80. Mitchison, G. and Crick, F. (1982) PNAS 79, 3661-3665. Malach, R., Amir, Y., Harel, M. and Grinvald, A. (1993) PNAS 90, 10469-10473. Hirsch, J. and Gilbert, C. (1991) J. Neurosci. 11, 1800-1809. Grinvald, A., Lieke, E., Frostig, R. and Hildesheim, R. (1994) J. Neurosci. 14, 2545-2568. Schmidt, K., Kim, D.-S., Singer, W., Bonhoeffer, T. and L6wel, S. (1997) J. Neurosci. 17, 5480-5492. Weliky, M., Kandler, K., Fitzpatrick, D. and Katz, L. (1995) Neuron 15, 541-552. Gilbert, C. (1996) Cur. Op. Neurobiol. 6, 269-274. Das, A. and Gilbert, C. (1995) Nature 375, 780-784. Dalva, M. and Katz, L. (1994) Science 265, 255-258. Kisv~irday, Z., T6th, E., Rausch, M. and Eysel, U. (1997) Cereb. Cortex 7, 605-618. Tsodyks, M. and Sejnowski, T. (1995) Int. J. Neural Syst. 6, 81-86. Amari, S. (1977) Biol. Cyb. 27, 77-87. Amit, D., Brunel, N. and Tsodyks, M. (1994) J. Neurosci. 14, 6435--6445. Ben-Yishai, R., Hansel, D. and Sompolinsky, H. (1997) J. Comp. Neurosci. 4, 57-77. Troyer, T., Krukowski, A., Priebe, N. and Miller, K. (1998) J. Neurosci. 18, 5908-5927. Ferster, D. and Miller, K. (2000) Neural mechanisms of orientation selectivity in the visual cortex, Ann. Rev. Neurosci. 23. Yantis, S. and Nakama, T. (1999) Nature Neurosci. 1, 508-512. Rojer, A. and Schwartz, E. (1990) Biol. Cyb. 62, 381-391. W6rg6tter, F. and Niebur, E. (1993) Biol. Cyb. 70, 1-13. Tsodyks, M. and Sejnowski, T. (1995) Network 6, 111-124. Livingstone, M. (1998) Neuron 20, 509-526. Mineiro, P. and Zipser, D. (1998) Neur. Comp. 2, 353-371. Single, S., Haag, J. and Borst, A. (1997) J. Neurosci. 17, 6023-6030. Suarez, H., Koch, C. and Douglas, R. (1995) J. Neurosci. 15, 6700-6719. Swindale, N., Matsubara, J. and Cynader, M. (1987) J. Neurosci. 7, 1414-1427. Maffei, L. and Galli-Resta, L. (1990) PNAS 87, 2861-2864. Gilbert, C. and Wiesel, T. (1989) J. Neurosci. 9, 2432-2442. L6wel, S. and Singer, W. (1992) Science 255, 209-212. Reid, R. and Alonso, J.-M. (1995) Nature 378, 281-284. L6wel, S. and Singer, W. (1998) in: Perceptual Learning, eds M. Fahle and T. Poggio. MIT press, Boston. (in press).
C H A P T E R 23
Information Transfer Between Sensory and Motor Networks
M. L A P P E Computational and Cognitive Neuroseience Lab, Department of Zoology and Neurobiology, Ruhr-University, 44780 Bochum, Germany
9 2001 Elsevier Science B.V. All rights reserved
Handbook of Biological Physics Volume 4, edited by F. Moss and S. Gielen
1001
Contents 1.
Introduction
2.
Sensory a n d m o t o r systems o f the C N S
3.
.................................................
Sensory systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1003
2.2.
M o t o r systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1005
I m p o r t a n t concepts for s e n s o r i m o t o r t r a n s f o r m a t i o n s 3.2.
........................
................................
F e e d f o r w a r d a n d feedback c o n t r o l systems
3.3. Plasticity o f i n p u t - o u t p u t relations
...........................
................................
1006 1007 1009
Multiple frames o f reference a n d sensor fusion . . . . . . . . . . . . . . . . . . . . . . . . .
1010
3.5.
D i s t r i b u t e d e n c o d i n g in o v e r d e t e r m i n e d , n o n c a r t e s i a n c o o r d i n a t e systems . . . . . . . . .
1011
3.6.
S e p a r a t i o n o f state variables p o s i t i o n a n d velocity . . . . . . . . . . . . . . . . . . . . . . .
1012
..............................
4.1. T o p o g r a p h i c r e p r e s e n t a t i o n in early visual areas
........................
1013 1013
4.2.
C o n s t r u c t i o n o f t h r e e - d i m e n s i o n a l space . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1015
4.3.
Multiple space representations in parietal cortex
1018
........................
Goal-directed spatial action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1024
5.1.
Saccadic gaze shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1024
5.2.
Spatial representations d u r i n g saccadic gaze shifts . . . . . . . . . . . . . . . . . . . . . . .
1027
5.3.
Reaching and pointing ........................................
1027
Motion
....................................................
6.1. Visual m o t i o n detection 6.2.
.......................................
M o t i o n analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1028 1029 1031
6.3. Visual tracking by s m o o t h p u r s u i t eye m o v e m e n t s . . . . . . . . . . . . . . . . . . . . . . .
1032
6.4.
1032
C o n t r o l o f self-motion a n d p o s t u r e
................................
6.5. Gaze stabilization d u r i n g self-motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.
1005
3.4.
Spatial representations a n d t r a n s f o r m a t i o n s
6.
1003
2.1.
3.1. C o u p l i n g o f action a n d perception
5.
1003
................................
A n overall view Abbreviations
............................................... ................................................
Acknowledgements References
.............................................
.....................................................
1002
1035 1036 1037 1037 1037
1. Introduction
The function of the CNS is to generate and control behavior in order to support the living needs of the organism. At any time, such behavior must be appropriately adjusted to the status of the environment and of the organism itself, and hence must be guided by sensory information. The sensory and motor structures of the brain encode information in various different formats. Sensorimotor transformations interface the sensory and motor systems. They have two main tasks to solve. First, they must convert between the different coding formats. Second, they must fuse the different sensory and motor input signals and establish a unified representation of the environment and the organism's action within it. 2. Sensory and motor systems of the CNS
2.1. Sensory systems The senses of most animals consist of vision, audition, the sense of balance, touch, smell, and taste. For the purpose of guiding motor behavior, in primates and many other animals vision, balance, and touch are most important. The associated sensations are registered by the visual, the vestibular, and the somatosensory systems. In the following, we will briefly describe these sensory systems and the primary encoding of the incoming information in each of them. In-depth information can be found in [1-4]. In vision, light entering the eye is transformed into electrical currents in the photoreceptors of the retina. After some initial image processing by the networks of the retina the visual information leaves the eye in the form of action potentials of the retinal ganglion cells. The fibers of the retinal ganglion cells form a thick parallel bundle, called the optic tract, from which branches run to several parallel retinal recipient structures. The main cortical pathway is via the thalamus to the primary visual cortex, or area V1. Other important pathways are those to the superior colliculus and to the accessory optic system and pretectum. Each retinal ganglion cell transmits information from a small localized part of the visual image, known as its receptive field. Neighboring cells transmit information from neighboring image locations. In the primary visual cortex, these topographic relationships are largely preserved (see chapter 16 by Flanagan in this book). The primary visual cortex contains a retinotopic map of the visual field. This is not true for all retinal recipient structures. While the superior colliculus is also retinotopically organized, the accessory optic system, for instance, is not. Several other order parameters are also represented in a structured manner in the primary visual cortex. These include the relative strength of input from the two eyes ('ocular dominance') and the selectivity 1003
1004
M. Lappe
preferences of each neuron for parameters of visual stimuli (orientation, color, motion direction) (see chapter 22 by Ernst and Pawelzik in this book). Each of these order parameters establishes a structured map in the primary visual cortex. All of these maps are present in parallel. The nature and origin of cortical feature maps are discussed in other chapters of this book. The primary visual cortex is the starting point of information processing in a large network of more than 40 identified areas in the visual cortex [5], each containing a representation of the visual field. These areas extract and transform information from the visual image. To a first approximation, this information processing stream can be viewed as a hierarchical system for visual analysis. However, there are frequent interconnections and feedback projections between the different stages [5,6]. There is good evidence that the cortical information processing involves two major streams [4]. One is concerned with the analysis of the features of objects (shape, color, etc.) and object recognition, and maybe perception in general. The other is concerned with spatial positions of objects, visual motion, and the generation of action or motor output. Owing to the topic of this chapter, we will be mostly concerned with the latter. It involves a series of areas in parietal cortex. These areas receive not only visual input but also signals from the vestibular, somatosensory, and auditory systems and build a universal representation of space. The vestibular organs of the inner ear sense gravitational and inertial forces and are important for our sense of balance and self-motion. Rotations and translations of the head are registered by separate sensors. The semicircular canals of the labyrinth organ are closed, liquid-filled tubes [1]. Their inner surface is lined with sensory hair cells. As the head rotates the inertial mass of the liquid generates relative movement between the liquid and the inner surfaces of the canals. This movement induces electric activity in the hair cells, which is transmitted to the vestibular nuclei in the brainstem as a signal of head rotation in space. Linear acceleration forces are sensed by the otholiths [1]. These consist of an orthogonal pair of two-dimensional arrangements of tiny calcium carbonite crystals. Inertial forces acting on these crystals are again detected by hair cells. Signals from the hair cells are transmitted to the vestibular nuclei and provide the input for the detection of linear accelerations of the head and of the tilt angle of the head with respect to gravity. The sense of touch, the position of the limbs, the status of the muscles controlling them, the sense of temperature, and the sense of pain are all registered by the somatosensory system. For the purpose of this chapter we are mainly interested in the position of the limbs and the activity of the muscles. This is known as proprioceptive information. It is generated by spindle fibers in the muscles [1]. These fibers change their rate of action potential generation related to the length of the muscle, change of muscle length, and the contraction of the muscle. The axons of the muscle spindles of the skeleton muscles run along the spinal chord to the brain. Those from the eye muscles run along the trigeminal nerve from the face. Somatosensory information from the body is represented in somatotopic maps in the somatosensory cortex.
Information transfer between sensory and motor networks
1005
2.2. Motor systems Movement of the limbs and other body parts is controlled by muscle contraction. Muscle contraction is controlled by neural commands. The neurons that control the muscles are called motor neurons. Each motor neuron is connected to a small number of muscle fibers which contract when the motor neuron fires. The motor neurons, in turn, are controlled by a network of areas from the brainstem, the cerebellum, the basal ganglia, and the cortex [1,2]. Motor neurons that activate skeletal muscles reside in the spinal chord. Motor neurons that activate eye muscles reside in the oculomotor nuclei of the brainstem. The eye movement system or oculomotor system is the best understood of the motor systems and most of the examples in this chapter will come from the control of eye movements. It is therefore appropriate to focus here on the oculomotor system in particular. A detailed reference to the oculomotor system is [7]. The eye is moved by six muscles. One pair of muscles subserves horizontal eye movements. The remaining two pairs subserve vertical movements and rotations along the line of sight (called torsional movements). The muscles are driven by motor neurons in the oculomotor nuclei of the brainstem. Each muscle pair has its own nucleus. Eye movements can be broadly classified into two categories, those that align gaze with a specific target and those that stabilize vision during movements of the head or body. The first category contains three important types of eye movement. The first are normal gaze shifts, called saccades, which bring the line of sight to an object of interest. These are the most ubiquitous eye movements, performed for instance several times a second while reading this sentence. The second type of gaze targetting eye movements are vergence movements. They adjust the axes of the two eyes such as to look at an object at a certain distance. Often, when gaze shifts from one object in space to another, saccades and vergence eye movements go together. The third type are smooth pursuit eye movements which are initiated when one follows a small moving target with the eyes. The group of eye movements that stabilize vision during movements of the head or body consists of a set of three reflexes, the optokinetic, the vestibulo-ocular and the cervico-ocular reflex. They rotate the eyes opposite to the head movement such that the visual image of the world remains approximately stable on the retina. They use visual, vestibular, and somatosensory information, respectively, to infer the correct direction and speed of the eye movement. Each type of eye movement is driven by its individual neuronal machinery. Only a limited part of that machinery is shared among some eye movements. However, all oculomotor control eventually has to go through the oculomotor nuclei in order to address the eye muscles.
3. Important concepts for sensorimotor transformations In this section, we will visit some important conceptual issues for the understanding of sensorimotor information processing. After that, we will delve into specific topics
M. Lappe
1006
and systems in the remaining sections. The issues discussed here shall allow to define the problems and solution strategies of sensorimotor tasks.
3.1. CouplhTg of action and perception In the normal function of an organism, motor action and sensory perception are closely intertwined. First of all, perception is the basis for controlling goal-directed action. Each action, in turn, has consequences for perception because it changes the temporary relation between the organism and the perceptual world. It is possible to study motor and sensory systems in isolation in controlled laboratory conditions. Often this simplifies matters of experimental design and interpretation, much as it is easier to study simple isolated physical systems rather than complicated ones. But to truly understand either sensory perception or motor control it is necessary to consider the interaction between the two. Sensory-guided motor action is usually an iterative procedure in which sensory input is continuously evaluated and used for control purposes while the motoric act continuously changes the setting of the actor in the environment. Think, for example, of walking through a room towards the door. Every step that takes you closer to the door also changes the view of the room that is used to locate the door and guide the movement. This iterative process of sensory perception and motor activity is called the action-perception cycle (Fig. 1). A further aspect of the action-perception coupling is that often motor activities are performed to aid sensory perception. If you want to get a closer look at an object that is moving, smooth pursuit eye movements are initiated to keep highresolution gaze at the object. Similarly, if you want to get information about an
Action
New state (world & self)
- Need, desire, _ affordance, etc.
/ Perception
Fig. 1.
The action-perception cycle.
Information transfer between sensory and motor networks
1007
object somewhere in the room, a saccadic gaze shift brings it to the center of the visual field. Hence it is not only true that perception provides the basis of action. Rather, action may also provide support for perception. In computer vision, recognition of this principle has led to the development of the paradigm of active vision [8].
3.2. Feedforward and feedback control systems In motor control, the controller sends a signal to an effector that prompts the effector to perform a certain action. In biological systems, the effector is typically itself a complicated system, has its own dynamical properties, possibly exhibits nonlinear behavior, and is subject to noise and delays in internal as well as external processes. There are two principle ways by which the controller can assure that the effector performs the desired output function [9]. Either the controller knows how the effector performs, sends an appropriate command, and trusts that the effector works as expected. This is a feedforward control system. Or the controller sends a signal that approximately results in the appropriate response, checks the output, and uses the difference between the actual output and the desired output to steer the control signal towards matching the two. This is a feedback control system. In feedforward control systems (Fig. 2A), the controller must predict the output generated by the effector. Because there is no feedback information of the actual performance, it is essential that the prediction is very accurate. There are two possibilities by which the controller can generate its prediction. Either the controller knows what the effector will do, i.e., directly knows what output the control signal will generate. Or the controller knows how the effector works and predicts the output generated by the control signal, in which case the controller is said to have an internal model of the effector [10]. The quality of a feedforward control mechanism depends on the accuracy of the prediction. In theory this could be high but in practice it is not, because biological systems behave too variably to allow an accurate prediction and normal biological behavior often has to cope with unexpected disturbances. The advantage of feedforward control, however, is that it is fast. Feedback control systems (Fig. 2B) have less problems with variabilities of the effector and the task. This is because they do not attempt to accurately predict the behavior of the effector. Instead they use the error in the motor output to continuously adjust the control signal. The current output is fed back to the controller and compared to the desired output. The difference is used for the new control signal. This is an efficient and simple method, as it does not require much detailed knowledge about the effector or an internal model. It also has the advantage that it can deal with unexpected disturbances of the task or the behavior of the effector. However, its problems lie in the fact that: (a) the feedback signal is usually sensory in nature and hence must be interpreted or transformed into an appropriate motor error signal, and (b) the error signal arrives with a certain temporal delay, as it needs to be registered and processed by the sensory system first. Because of this, a
A: Feedforward controller r
Goal value
---+
Controller
b
F
Effector
Output
B: Feedback controller -
1
Feedback loop
Actual value Goal value
Controller
b
I Effector
b
I
I
Predicted value I
Fig. 2.
I I I _ _ - _ - _ _
! - - - - - - - I
Predictor
Feedforward and feedback control systems.
:- - -
-
I
Output
Information transfer between sensory and motor networks
1009
feedback controller typically lags behind its goal and only asymptotically reaches perfect performance. Moreover, if the feedback controller is not able to follow correctly it develops a phase lag and might become unstable. A means to cope with these problems is to add a faster way to generate an error signal. This can be done by using an internal model of the effector to predict the error (Fig. 2B). This internal model receives the same control signal as the effector and without delay generates an expected output signal. This expected output is fed back to the controller and is used like the true error signal that later arrives via the regular sensory feedback mechanisms. In biological systems this is called the reafference method and the signal that is sent to the model of the effector is called the efference copy signal [11,12]. A feedback control system consists of a closed loop of control and error signals. If the feedback path is cut the system is in the so-called open-loop situation. In this case, the system behavior is entirely determined by its feedforward path. This makes it possible to study the behavior of that part in isolation. An often used technique, for instance, is to look at only the initial phase of the systems reaction to an input. Because of the temporal delay in the sensory feedback the initial response phase is of the open loop type. The performance of the system is specified by its gain, which is defined as the ratio of the output of the system to the input. In the closed loop situation the gain should ideally be unity, which is often approximately reached. Yet the gain normally depends on the frequency distribution of the input signal. The open loop gain is a measure of the behavior of the feedforward path of the system only. It is often very different from the ideal value of 1.0, which is desirable as it affects the speed of adjustments of the control signal by the feedback path.
3.3. Plasticity of input-output relations The appropriate motor reaction to a sensory input may vary greatly in everyday circumstances. Moreover, the exact motor program that leads to a desired goal varies depending on constraints experienced in the situation. For instance, obstacles in the path may force deviations from a straight trajectory. Changes to the sensory input (think for instance of wearing astigmatic glasses) or to the effector (for instance by an additional load to be moved) require a recalibration of the mappings from sensory to motor coordinates. For these reasons sensorimotor transformations cannot be fixed but have to allow plastic changes to adapt to changes in the behavioral situation or the sensory input. Most sensorimotor behaviors show the capability to adapt to changes. In fact, many sensorimotor transformations are under constant control by recalibration procedures. This recalibration is performed through a continuous comparison of the desired motor effect with the actual motor output. Depending on this comparison, the connections and weights of the sensorimotor transformation are adjusted. Very likely feedback pathways from cortical and subcortical control centers through the cerebellum are involved in the constant recalibration process
[lo,13].
1010
M. Lappe
3.4. Multiple frames of reference and sensor fusion Sensory information is encoded in different primary formats in the different sensory systems. For example, in the visual system the position of an object is encoded spatiotopically in the distribution of activity in the two-dimensional layer of retinal ganglion cells. In the auditory system the spatial position of that same object is encoded in time and intensity differences between the two ears. On the output side, different motor actions that involve different effectors might also be coded differently. Saccadic eye movements are encoded by the spatial distribution of activity in a two-dimensional topographic map in the intermediate layers of the superior colliculus. The encoding of arm movements in the primary motor cortex, in contrast does not appear to have a topographic organization. To transfer sensory information to appropriate motor commands hence requires to interconvert between different encoding schemes. Because the same information might be used for many different purposes it is required to keep it in several different formats in parallel. For instance, the visual location of an object might be used to direct an eye movement to it, or to reach at it with the arm. Encoding information such that it can be used in multiple output formats is achieved in two ways. On the one hand, the brain constructs multiple separate encodings each serving a different purpose and residing in a different brain area. On the other hand, the brain uses encoding schemes that are more universal and that provide the encoded information in such a way that a brain area that requires a specific format may extract it by specific read-out mechanisms of the universal code. This will be elaborated in the next section. A further complication is that the different primary sensory and motor encoding formats do not have a fixed relationship with one another. Neither do they have a fixed relationship to the external world. This is because of the coupling of action and perception. Every action changes the orientation of the sensors in the world and with respect to each other. For instance, eye movements change the orientation of the eye in the head such that the same location on the retina now corresponds to a different direction relative to the head. Head movements likewise introduce a dissociation between spatial locations relative to the head and the trunk. This also applies across sensory modalities. Since the ears are fixed in the head, eye movements also introduce a dissociation between the encoding of the visual and of the auditory location of an object. It is therefore appropriate to ask in which frame of reference, i.e., with regard to which coordinate system, information is encoded. We can distinguish between five frames of reference. Retino-centric encoding is with respect to the retina or the eye. Head-centric encoding is with respect to the head. Body-centric is with respect to the trunk or body. Ego-centric encoding gives the location in external space with respect to the current location of one self. Allo-centric encoding means the location of an object in the world, irrespective of the position and orientation of the body or the sensors. This is what we commonly perceive as the locations of the objects around us. Sensorimotor transformations must keep the encodings of information in these various frames of reference in continuous register.
Information transfer between sensory and motor networks
1011
Although primary sensory information comes in different formats it often pertains to the same object or the same quality. Spatial position can be sensed visually, acoustically, or haptically (i.e., by means of touch). Movement can be sensed visually, acoustically, haptically, or vestibularly. An object's identity is conveyed by its shape, color, feel, smell, or taste. To use all sensory information as efficiently as possible requires to combine or fuse them, in order to form a coherent percept or generate a successful motor command. Such sensor fusion, in turn, requires a universal encoding scheme or representation that is beyond the primary sensory encodings in the different formats. This unified, supramodal encoding of space is established along the pathways from sensory to motor areas of the cerebral cortex.
3.5. Distributed encoding in overdetermined, noncartesian coordinate systems Mathematically, spatial transformations can most simply be described by vector addition, rotation, and scaling in an orthonormal coordinate system that has the smallest possible number of degrees-of-freedom. In the brain, things are different. Spatial parameters are often encoded in the firing rates of large populations of neurons. Each neuron contributes only a small part to the entire encoding of a single parameter and might participate in the encoding of several other parameters as well. A well-known example is the population encoding of movement direction, for instance the direction of arm movement in primary motor cortex or the direction of visual motion in motion sensitive visual areas. In the population code, the direction of movement D is represented by a vector-weighted summation of the activity of all neurons in the population N
D-Zaiei,
(1)
i=1
where ai is the firing rate of neuron i, and e i is a unit vector in the direction in which the neuron is assumed to contribute to the encoding. Usually this is the direction for which the neuron has the strongest firing rate, i.e., the preferred direction of the neuron. Since every neuron has its own preferred direction, and since the preferred directions are often equally distributed across the neuronal population, Eq. (1) is a linear decomposition of D in a vastly overcomplete set of basis vectors. Under the constraints imposed by properties of a biological system, such a distributed population encoding has a couple of advantages. Among them are the robustness against noise or against failure of individual neurons, and the ability to perform smooth interpolations. More information on this is provided in chapter 20 by Gielen and chapter 19 by Treves. With regard to the topic of this chapter, there is one additional important property, namely that it is possible to represent several different encodings with the same population of neurons. Consider, for instance, a population of neurons encoding visual motion in a specific part of the retina. Each neuron has its own preferred direction e i and speed si of motion. Then by using the above population code
1012
M. Lappe
N D -- ~ aiei, i-1
one might retrieve the direction of visual motion, as the speed of visual motion is averaged out by summing over all neurons. This assumes, of course, that the combination of speed and direction tuning is equally distributed across the population. On the other hand, one might construct a similar population code for the speed S of the visual motion by weighting each neuron by its preferred speed N S -- Z
aisi.
i-1
Or one might retrieve the full velocity V by using both the direction and the speed in the population code N
V -- ~
aisiei .
i--1
This argument may seem trivial, but the point is that the neuronal population provides the complete information and the process by which the population is read out determines what information is used and how it is used. In the brain, this means that subsequent areas can each choose to select different parts (or different formats) of the information that is provided by the preceding area. This is important for the construction and use of a supramodal representation of space as outlined in the previous section. The argument becomes more interesting if we consider it in relation to such a supramodal space representation and to the issue of different frames of reference. Let us assume that the population of neurons described above also receives information about the orientation of the eyes in the head, either from the stretch receptors in the eye muscles or via an efference copy signal. Then in an analogous way it is possible to retrieve by different population read-out procedures the visual motion on the retina (a retino-centric variable), the position of the eye in the head, or the visual motion with respect to the head (a head-centric variable). Now this is exactly what would be required of a mechanism for sensorimotor coordinate transformation. In fact, this is one of the ways by which sensorimotor transformation is realized in the brain.
3.6. Separation of state variables position and velocity Physically, position and velocity are closely related as velocity is the derivative of position. The brain often treats them as two separate and unrelated entities. For instance, it is possible to perceptually experience motion without a change in position. This is demonstrated, for example, in the motion aftereffect [14]. If one looks at a continuous motion pattern for about a minute and then suddenly looks at a stationary scene the scene appears to move while at the same time it does not appear
Information transfer between sensory and motor networks
1013
to change position. Another example of the separation of position and velocity can be seen in the control of smooth pursuit eye movements. When a fixated stationary object suddenly starts to move the brain initiates two superimposed eye movements to follow the motion of the object. First there is a smooth acceleration of the eye such as to bring its speed up to the speed of the target. While this acceleration occurs, a saccadic gaze shift is initiated to bring the target back into the center of gaze. These two movements occur independently and at different times after the onset of the target movement, and are controlled by different systems. The separation of position and velocity is also seen in the functional anatomy of the brain. There is a dedicated processing stream that explicitly involves motion. This processing stream analyses visual motion and provides the motor commands for motion-related motor acts such as tracking eye movements or the control of locomotion and posture. The analysis of spatial position and the control of motor acts directed towards spatial positions, i.e., saccadic gaze shifts and reaching and pointing movements, is largely subsevered by a different network of brain areas. In the following, sensorimotor transformations for spatial actions are considered first and then the processing of motion.
4. Spatial representations and transformations 4.1. Topographic representation in early visual areas In an abstract sense, the eye functions similarly to a camera, in which the optics (lens) generate a faithful two-dimensional image of the visual scene on the receptor surface (the retina). Adjacent photoreceptors receive light from adjacent visual directions. The topography of the visual image is preserved in the different layers of the retina and in several of the retinal recipient structures in the brain, such as the superior colliculus and the lateral geniculate nucleus. It is also preserved in the primary visual cortex, or area V1, and subsequent visual cortical areas V2, V3, V4, V5 (or area MT), and V6 in the cortical processing stream. In each of these structures, a neuron can be characterized by its retinotopic receptive field, i.e., the part of the retina from which this neuron receives information. Receptive fields become bigger along the path, increasing in diameter from less than a minute of arc in the retina up to 10~ of visual angle in the peripheral visual field representation in area MT. In the central part of the visual field receptive fields are smaller than in the periphery and the number of neurons per degree of visual angle is higher. Thus more cortical tissue is devoted to the processing of the central parts of the image. This is known as the cortical magnification factor. However, while this affects the metric of the representation, there is a strong tendency in the early visual areas to preserve the topographical relationships between image positions such that neighboring neurons have adjacent or overlapping receptive fields. More information on cortical maps is provided in chapter 22 by Ernst and Pawelzik. In the most simple form, the receptive field of a visual neuron can be modeled as a Gaussian with its center in the receptive field center and its width adjusted to the width of the receptive field (Fig. 3A). For many models that deal with spatial
1014
M.
Lappe
'r B
A
r . 0m
o x III
+ c" c"
,J, /
e'-
.o
C
I1 /
0 x I.t.!
I / /
I
I
\
D
\\ \
/// /
\ \ \
\ \
//
'
A,
\ \ \\N\
qc-"
Fig. 3. Descriptions of receptive fields. (A) Simple Gaussian profile. (B) 'On'-center, 'Off'surround structure obtained as the difference of two Gaussians. (C) and (D) Gabor functions with phase 0 (C) and phase rt (D).
position and the conversion between coordinate flames this is already sufficient. However, the receptive fields of most visual neurons show a richer structure which is related to their respective role in the processing of visual information. Already in retinal ganglion cells the receptive field consists of two parts. One part leads to excitation of the cell when it is hit by light, the other part leads to an inhibition when illuminated. These are known as the 'on' and 'off' regions of the receptive field. In the retinal ganglion cells, and also in the neurons of the lateral geniculate nucleus, the on and off regions are arranged as two concentric circles of different diameters. They can be described by two Gaussians of different widths and signs. The total receptive field is then described by the difference of the two Gaussians (Fig. 3B) [15,16]. In the primary visual cortex and beyond, receptive fields become more complex in correspondence to the more complex response properties of the neurons. Neurons in V1 respond selectively to the orientation and spatial frequency of gratings presented in their receptive field. The receptive fields of such neurons can be modeled by Gabor functions, i.e., the product of a Gaussian and a cosine function (Fig. 3C,D) [17,18].
Information transfer between sensory and motor networks
1015
4.2. Construction of three-dimensional space
The retinal image of each eye is two-dimensional. Accurate goal-directed spatial action requires knowledge of target position in three-dimensional space. The brain, therefore, must construct a representation of the three-dimensional world. There are many cues to the third dimension already in the two-dimensional monocular image (Fig. 4). Among these cues are shading, textural density and perspective, object-size relationships, and motion parallax [3,11,19]. Motion parallax is the differential visual motion that objects in different distances from the eye undergo when one moves the head sideways, for example (Fig. 4C). All of these cues are evaluated by the visual system and are used in parallel to reconstruct three-dimensional spatial relationships. Their usefulness for depth perception varies with the depth scale that an individual cue may provide and with the depth range over which it can be analyzed [20]. A primary cue to depth in the near range (below 10 m) originates from binocular vision [21]. Each eye sees the world from a slightly different perspective. From the difference in perspective, parallactic differences between the images of the two eyes result (Fig. 5). They are called horizontal retinal disparities. In Fig. 5, the lines of sight of the two eyes are converged on the point F and form a certain vergence angle 7F. In both eyes, point F is projected onto the center of gaze. In contrast, for the
A
~ ~ ~
~
~
vt
C
'<" " " "e t I
Q
|
~
'
l~ll"
"
'0
"d( I
V Fig. 4.
Monocular cues to three-dimensional structure. (A) Size foreshortening, texture density, texture perspective. (B) Shading. (C) Motion parallax.
1016
M. Lappe
me
E
\ \
\
!
, Plane of fixation
9
!
"\
Horopter
I. !
.
/ \ \ \ \ \ \ \ "\
F
FP
~- P Fig. 5.
Binocular horizontal disparity.
more distant point P, the rays through the center of the lens of the eye form an angle ~/p. The difference between the two angles, ~p
- - ' ~ p - - ]t F
is the absolute horizontal disparity of P. Its value can be used to estimate the distance of P from F. The absolute horizontal disparity is negative when P is more distant than F. The absolute horizontal disparity of point Q, ~Q- ]tQ- q/F, in
Information transfer between sensory and motor networks
1017
contrast, is positive. This point is closer to the eyes than F. A special set of points P consists of those for which the absolute disparity is zero. They form a two-dimensional curve that is called the horopter. The disparity of a point can be obtained from the projection of the image point in the two eyes. The projection of F is in the center of gaze of both eyes. The projections of a point/~ on the horopter fall on eccentric positions in both eyes. The two positions of the projection of F are called 'corresponding positions' because they both correspond to the same object with zero disparity. In contrast, the projections of point P do not fall on corresponding positions. In the left eye, the projection is to the right of F. In the right eye, it is to the left of F. The angular difference between the two projections of P is equivalent to the absolute horizontal disparity ~Se of P. Once the absolute horizontal disparity ~Seis known, recovery of the true distance of point P from the eye further requires knowledge of the distance between the eyes, and of the vergence angle 3'y. Only if all of these parameters are known, the distance of P from the eye can be calculated geometrically. However, from the visual images in the two eyes solely the disparity can be retrieved. The vergence angle and the interocular distance cannot be determined visually. Hence, an absolute depth judgement is not possible from the visual information alone. Moreover, for geometric reasons, the calculation of absolute depth strongly relies on the accuracy of the vergence angle measurement. Any errors in that measurement will lead to large errors in depth perception. However, the difference between the disparities of two points P and Q, ~PQ -- ~P - ~Q ~- 7P - "YQ
can be retrieved from the binocular images alone. This difference is called the relative horizontal disparity between P and Q. The relative horizontal disparity is independent from the vergence angle. It permits a direct visual estimation of the depth difference between two objects solely from image information. In the primary visual cortex and in several higher cortical areas, the majority of neurons receive visual input from both eyes. Many of these neurons are selective for disparity [22]. Their response to a visual stimulus depends on the binocular disparity of that stimulus. These neurons are considered to form the basis of our stereoscopic depth perception. Models of disparity sensitive receptive fields have proposed two different mechanisms [23,24]. Either the neuron receives input from two different (noncorresponding) retinotopic locations in the two eyes [25]. Models of this type are called position-based models. Or the neuron receives input from two corresponding locations, but with a different phase of its Gabor function in the two eyes [26-29]. Models of this type are called phase-based models. In phase-based models, the neuron becomes sensitive to differences in the horizontal position of a textured object, for instance, because the left eye might be optimally stimulated when the image of the texture is at phase zero, while at the same time the right eye is optimally stimulated by a texture that is shifted a bit, i.e, at a phase different from zero. It is also possible to combine both approaches and arrive at a hybrid model [23]. However, while the phase-based and the position-based models start from different
1018
M. Lappe
assumptions, have different physiological relevances, and involve different computational steps, it can be demonstrated that at the final stages where disparity values are made explicit, the simplest versions of the two methods are mathematically equivalent [30]. From the responses of binocular disparity-sensitive neurons it is possible to infer the relative depth between two visible points. Lehky and Sejnowski [31] have shown how a population code with physiologically plausible parameters can account for human disparity sensitivity data and depth interpolation. To estimate true egocentric distance, absolute disparities and a signal describing the vergence angle are necessary. Such a signal is provided by a modulation of the firing rate of individual binocular neurons in area V1 [32] and in the parietal cortex [33]. These neurons are not only selective for disparity, they are also influenced by the vergence angle of the eyes. This combined selectivity can be used to establish a distributed representation of egocentric distance [34].
4.3. Multiple space representations & parietal cortex Retinotopic receptive fields, even binocular ones, define the position of an object in space in a retinal frame of reference. Spatial position is encoded relative to the current direction of gaze. Many spatial actions, however, require an encoding of target position with respect to the body or the external world. To direct an arm movement to the correct point in space, for instance, the brain needs to have information about the position of that object relative to the shoulder. Knowing the location of an object in retinotopic coordinates is not enough to specify its location in body-centric coordinates because the eyes can move relative to the body. Thus, retinal position information needs to be combined with information about the position of the eyes in the head to create a representation of the object's position in head-centric space. If this is then combined with information of the position of the head on the body a body-centric representation becomes possible. Most research up to now has focussed on the first transformation, that of retinocentric representations to head-centric representations. The second step, head-centric to body-centric has only lately been explored in some detail. Current concepts of space representation in primate cortex center around three neurobiological findings. The first are spatial gain fields that modulate the response to a visual stimulus depending on eye position in the head. Second, neurons with head-centric receptive fields have been found in some brain areas. Third, some neurons shift their receptive field dynamically to a future retinotopic location before an impending eye movement. These mechanisms may act in parallel, or might depend on each other in a serial or circular fashion. Several observations indicate that they could form multiple parallel mechanisms rather than a single hierarchical process. We will look at each of them in turn.
4.3.1. Implicit distributed coding by spatial gain fields Neurons in many areas of the brain scale their response to a visual stimulus within their receptive field depending on the current position of the eye in the head [35].
Information transfer between sensory and motor networks
1019
They respond to a preferred stimulus only when its image falls on a specific retinal location. However, the response strength changes when the eyes (and the stimulus) move to a different direction in space. Thus, while these neurons have a clearly retinotopic receptive field they also carry information about the current eye position. The influence of eye position on the activity of neurons has been termed the 'spatial gain field'. It has been described for many areas along the processing stream towards and within parietal cortex (areas V3A, V6A, MT, MST, LIP, and 7A) [36-39] but also for premotor cortex [40] and superior colliculus [41]. The widespread occurrence of eye position gain fields in the monkey brain could suggest that they subserve a basic form of space coding. The origin of the modulatory input is unclear. It could be proprioceptive feedback from the eye muscles, or a copy of the motor command to move the eye, or a combination of both. Several theoretical studies suggested that gain fields may serve to transform the coordinates of the incoming sensory signals to a non retino-centric representation of space. Zipser and Andersen [42] developed a backpropagation network that used an extraretinal eye-position signal to transform retinotopic visual input into a headcentric representation. The network consisted of three layers of neurons. The input layer contained neurons with retinotopic receptive fields modeled as Gaussians. A second set of inputs encoded the position of the eye in the head. These neurons increased their firing rate linearly with eye position in a preferred (horizontal or vertical) direction. The output layer was set to encode head-centric position of a target. Training data consisted of combinations of retinal input, eye position, and the corresponding head-centric output. The network was trained to associate the correct input and output patterns with a backpropagation learning rule. The units in the intermediate 'hidden layer' developed retino-centric receptive fields but their activity was also modulated by eye position. Their behavior was functionally similar to the gain field neurons in area 7A, suggesting that the role of these neurons might lie in a transformation between reference frames. Later studies have refined the general ideas of Zipser and Andersen by using more biologically plausible learning mechanisms [43] and examining the consequences and function of head-centered coordinates in more detail [44,45]. Bremmer et al. [46] showed with real experimental data that a population of neurons is capable of a coordinate transformation of visual signals into a non retinocentric frame of reference. Pouget and Sejnowski [47] have formalized the spatial transformations provided by gain fields in the theory of basis functions (Fig. 6A). This formalism capitalizes on the fact that any smooth function can be expressed in a series of basis functions. Classical examples are the Taylor series or the Fourier series. Likewise, a set of Gaussians with different centers and widths, or a set of sigmoids with different centers and slopes, also form a basis set to express any smooth function in a series [48,49]. Pouget and Sejnowski described the receptive field of a single neuron by a Gaussian and the eye position gain field by a sigmoid. Both interact multiplicatively such that the behavior of the neuron is described by the product of a Gaussian and a sigmoid (cf. [50]). This product, in turn, also defines a basis set, provided that all possible combinations of parameters are included [47]. Hence each neuron can be interpreted as providing the amplitude of a single basis function from that set.
M. Lappe
1020
A: Distributed head-centric coding by spatial gain fields
1 A i
,
ea -centric A i i
Population read-out Basis functions (Retino-centric with gain fields)
Retino-centric before eye movement
after eye movement
B: Head-centric receptive fields by selection
.•
1
Head-centric
1
Retino-centric
........ :~ ~.~~J:~~;i~:::~i/:/~:;;;~ i :;~i
before eye movement
after eye movement
C: Retinocentric receptive field updating //~
.~
~
Retino-centric
~
Retino-centric
/L /i~IIIi~ Q ;
__
long before eye movement immediatelybefore eye movement
!~);~ii:~........
/L after eye movement
Fig. 6. Schematic illustrations of the three types of dynamic spatial representation described in Sections 4.3.1 and 3. Gaussian functions depict areas of excitation in representational maps. Shaded areas represent range of neural connections. See text for details of the mechanisms.
Equivalently, each neuron can be interpreted as encoding the input at a specific retinotopic location (its Gaussian receptive field) and a specific eye position (its sigmoid gain field). By weighted summation of all neuronal activities with appropriately chosen weights it is possible to represent any smooth mapping from input (spatial position and eye position) to a desired output. The weights, in that terminology, correspond to the amplitudes of the different basis functions in the expansion series. Specifically, it is possible to choose weights such that the output
Information transfer between sensory and motor networks
1021
becomes a Gaussian function of head-centric position. In this case, the neurons encode head-centric position. However, they do so in a distributed, i.e., implicit manner. The appropriate weights can be found by learning procedures. Pouget and Sejnowski [47] used the delta-rule to learn the mapping from retinotopic to headcentric encoding. The delta rule minimizes the squared error between the actual output and the desired output by gradient descent. Also simpler, correlation-based learning methods are feasible [51]. The principle of implicit distributed encoding is schematically illustrated in Fig. 6A. In the left and right drawings of Fig. 6A the same head-centric target is seen from two different eye positions. In the left drawing, the eye is in the central position and the target falls on the right retinal hemifield, where it elicits a Gaussian activity profile. This activity is transmitted to the basis function encoding in the parietal cortex, where it excites all neurons that have connections with this part of the retina. Because the eye is in the central position no gain field modulation occurs and all responses have the same amplitude. In the right drawing of Fig. 6A, the eye is shifted to the right. The image of the target now falls on the left retinal hemifield. This excites a different subpopulation of parietal neurons, namely those that have receptive fields in the left retinal hemifield. However, the distribution of activity within the subpopulation is nonuniform. Because eye position is eccentric, the gain field modulation leads to strong responses in neurons in which the gradient of the gain field is in eye movement direction. Examples are the lower two of the four basis function neurons in Fig. 6A. In contrast, neurons for which the gradient of the gain field is against the eye movement direction exhibit only weak responses (upper neuron in Fig. 6A). Hence the amount of excitation varies within the population of neurons because each neuron is modulated differently by eye position. The distribution of activity in the subpopulation defines how the retinotopic position of the target corresponds to head-centric coordinates. The population read-out mechanism interprets the activity distribution in the population and establishes the location of the target in head-centric space. For an equal distribution of activity in the subpopulation of excited neurons, the head-centric location is identical to the retinocentric position (Fig. 6A, left). For an asymmetric distribution of activity, the headcentric position of the target must be shifted depending on the degree of asymmetry (Fig. 6A, right). An advantage of the implicit, distributed representation by basis functions is that it can be used to encode not only one particular transformation, but any other inputoutput mapping as well. The particular transformation depends on the weights that are used to read out the population activity. In that sense, this type of encoding is coordinate-free (the coordinates are chosen when the weights are defined) and can represent or generate different spatial representations in parallel. Salinas and Abbott [51] have demonstrated that it is possible to interface such a representation directly to the population coding of motor output in primary motor cortex (see Section 5.3). This requires a different set of weights than the extraction of head-centric position, but it can be subserved by the same population of gain-field neurons. Van Opstal and Hepp [41] have shown how the parameters for the control of goal-directed saccadic eye movements (Section 5.1) can be obtained from such a representation.
1022
M. Lappe
4.3.2. Explicit coding by head-centric receptive fields The above model assumes that head-centric position of objects is encoded implicitly in a population of neurons. While this type of encoding has certain advantages in terms of flexibility it may, on the other hand, sometimes be unwieldy to use because the responses of an entire population of neurons must be combined before a true head-centric signal can be reached. A more direct way to represent head-centric position would be to explicitly construct head-centric receptive fields. Indeed in a small number of cortical areas neurons are found that possess head-centric receptive fields. These receptive fields remain in head coordinates even when eye position changes. Head-centric receptive fields have been observed in area V6A [37], area VIP [52], and premotor cortex [53]. However, the existence of cells coding explicitly in a head-centric frame of reference does not exclude eye position effects in the same area or even in the very same cells. About half of the cells in areas VIP, V6A and the premotor cortex reveal eye position gain fields on their firing rate [37,52,54], both in darkness and in normal viewing. Interestingly, in area VIP, this eye position effect occurred in both eye-centric cells and head-centric cells. Gain field models have shown that head-centric receptive fields can be constructed from a combination of retino-centric receptive fields and spatial gain fields in a hierarchical fashion [42,45]. But they have also suggested that the explicit headcentric step is not necessary, since all information is present implicitly in the neuronal population [42,47,51]. A strict hierarchical construction of head-centric neurons from spatial gain fields would predict that once head-centric neurons are established, gain fields are no longer needed. Yet gain fields are found in many cortical areas in the dorsal stream. Moreover, even the head-centric neurons in area VIP themselves (the putative end point of a hierarchical construction) exhibit eye position gain fields [52]. This might reflect a residual effect of the construction by gain fields. But it could also mean that head-centric receptive fields are generated by a different mechanism. For instance, dynamic selection of input from a retinotopic representation could directly yield head-centric receptive fields (Fig. 6B). In this view, a head-centric neuron makes connections to the entire representation of the retina (shaded areas in Fig. 6B) but selectively gates its connections so as to restrict its input to only a part of the visual field (continuous vs. stippled lines in Fig. 6B). The selection, then, is adjusted based on eye position. In this view, the observation that head-centric neurons in VIP are modulated by gain fields would suggest that gain fields have a functional importance that goes beyond the construction of headcentric receptive fields. In fact, gain fields can also be observed in area LIP [39,55], where a third mechanism for spatial localization, receptive field updating, has been described (see below), indicative of a parallelism between the different spatial coding mechanisms.
4.3.3. Dynamic retino-centric receptive field updating Neurons in some brain areas have retinotopic receptive fields that anticipate the effect of impending eye movements. Slightly prior to a saccadic gaze shift, they shift their receptive field in space to the position that will be retinotopically correct after the saccade is completed. These phenomena were first observed in LIP by Duhamel
Information transfer between sensory and motor networks
1023
et al. [56], and later in the frontal eye field [57], and superior colliculus [58]. While these neurons appear to encode a certain retinotopic location, their receptive field cannot be simply anchored to input from only that retinal location. Rather, they must receive information from a much larger area of the retina and dynamically evaluate only a restricted part of the input. A possible mechanism may be a spatially and temporally variable, gaze-dependent gain modulation of the receptive field structure (Fig. 6C). This is similar to the model of direct construction of headcentric receptive fields outlined in the previous section. However, unlike in headcentric neurons, which continue to use the information from the new area of the retina when the eye is in the new position, the retino-centric neurons only transiently use that area but switch back to the original part of the retina that corresponds to their retinotopic receptive field immediately afterwards (Fig. 6C). In the intermediate step, the neuron can be driven by stimulation of either the old or the new retinal location. For anticipatory receptive field shifts, information about the direction and amplitude of an impending saccade are needed. Quaia et al. [59] have proposed a model of receptive field updating. This model takes into account the latencies and firing properties of the neurons in area LIP, the frontal eye field, and the superior colliculus. It assumes that information about impending eye movements is provided by the oculomotor signal from the frontal eye field that precedes the eye movement.
4.3.4. Why multiple space codes? Why does the brain apparently use multiple parallel space codes? Certainly the different encoding schemes differ in a number of behaviorally relevant properties such as accuracy, flexibility, robustness, number of neurons required, demands on the structure of the input or output, etc. This means that each mechanism has a certain functional scope for which it is optimal, or at least for which it is superior to the other mechanisms. For instance, it would seem that an explicit head-centric receptive field must in principle obtain information from every place on the retina as the eye position changes, requiring a heavy convergence of inputs on every headcentric neuron (Fig. 6B, shaded areas). This is costly in terms of connectivity. In contrast, retinotopic neurons with spatial gain fields need only connect to those parts of the retina that are within their receptive field (Fig. 6A, shaded areas). This is more efficient than head-centric receptive fields, particularly in early visual areas where retinotopic receptive fields are small. On the other hand, even though headcentric receptive fields are more expensive, they might be preferable when it comes to accuracy or robustness, or when a specific output format is required. Also, they might provide head-centric information faster because the population summation step is omitted. It is also conceivable that certain functions or tasks require particular encoding strategies in appropriate reference frames. Theoretically all relevant information for spatial coding can be provided by any of the three coding mechanisms, and easily transformed from one to another. Yet in many areas there seems to be more than one mechanism at work. Areas VIP and V6A have gain fields and explicit headcentric receptive fields in the same neuron [37,52]. LIP neurons have explicit re-
1024
M. Lappe
tinotopic receptive fields, which they update prior to a saccade [56], but they also show eye position gain fields [39,55]. This shows that multiple space encodings are available in parallel, and that whichever encoding best serves the functional task of an area becomes the explicit code. Explicite head-centric coordinates might be preferable for the control of reaching movements of the arm (area V6A), for instance, or for a multimodal representation of body surfaces (area VIP). Retinocentric representations, in contrast, could be preferable for the control of eye movements (area LIP).
5. Goal-directed spatial action 5.1. Saccadic gaze shifts Saccades are rapid eye movements that align the direction of gaze with a particular target of interest. Saccades are fast (up to 600~ quick (lasting about 50 ms), and frequent (we perform about three of them every second, mostly without ever noticing). Saccades are probably the most ubiquitous but also the most simple form of goal-directed behavior. For this reason, they have been studied extensively and have served as a prime example of the basics of sensorimotor information transfer. Natural saccadic gaze shifts are usually a combination of an eye-in-head movement and a head-on trunk movement. However, most work on saccades has focussed on a situation where the head is fixed and only the eyes move. Only recently have researchers begun to investigate natural eye-head gaze shifts. Most of what follows in this section pertains to the head-fixed situation. Like all eye movements, saccades are ultimately generated by contraction of the eye muscles. Contraction of the eye muscles is governed by the firing of neurons of the oculomotor nuclei in the brainstem. This final output part of eye motor control is shared by all classes of eye movements. The control systems that act before the oculomotor nuclei are specific for each type of eye movement. While saccades may appear to be rather simple movements, their control involves a large network of subcortical and cortical areas. To understand this network and the flow of information within it, it is useful to consider three separate stages: saccade planning and preparation, saccade initiation, and saccade execution. Saccade planning and preparation refers to the process of choosing a target for a saccade, calculating its position in space, and relating that information to the saccade initiation system. Area LIP in the parietal cortex provides information about salient visual objects that can become targets for a saccade [60]. The visual receptive field of many LIP neurons serves a dual function as a motor field for saccades. Electrical stimulation in a cluster of LIP neurons with similar receptive field position initiates a saccade in the direction of that position [61]. Hence the visual retinotopic map in LIP can also function as a spatial motor map for saccades. A few other cortical areas are also involved in saccade planning and preparation, most importantly the frontal eye field (FEF) [62]. Similarly to LIP, electrical stimulation in F E F also generates saccades with a particular amplitude and direction.
1025
Information transfer between sensory and motor networks
Information about the spatial location of a saccade target is relayed from LIP and FEF to the superior colliculus (SC), which is a primary structure for saccade initiation. The SC contains a retinotopic motor map for saccade generation (Fig. 7A). Each location in this map is associated with a particular direction and amplitude of a saccade. Electrical stimulation at a specific map position leads to an eye movement with the respective direction and amplitude (e.g. points a and b in Fig. 7A). The spatial parameters of a saccade are retrieved from the implicit space code provided by area LIP [41] and represented by a population code in the distribution of activity in the collicular map [63,64]. Many of the neurons in the SC also have visual receptive fields. Their receptive field center is positioned at the same direction and amplitude as the eye movement that is generated. At the rostral pole of the map, neurons are clustered that are active during fixation rather than during saccades (Fig. 7B, top panel). In the saccade preparation phase, the activity of these neurons slowly decreases. At the same time, the activity of a subset of the neurons at the map position that represents the saccade target slowly increases (Fig. 7B, middle panel). Because of the slow build-up of their activity in this phase, these neurons are called build-up neurons [65]. When the build-up activity reaches a threshold level, a further subset of SC neurons at that map position becomes activated, the so-called burst neurons (Fig. 7b, bottom panel). These neurons fire a burst of action potentials which triggers the onset of the saccade. The build-up activity reflects many aspects of saccade preparation and target selection. When several targets are available, build-up activity occurs at all associated places in the map. The strength of the build-up activity depends on the probability that the target will be the goal of the saccade [66]. However, when two targets are presented close together, the saccade may be directed to an average of the two target positions, demonstrating that the SC map can perform vector averaging [67]. The target selection and saccade preparation process in the collicular motor map has been modeled in a neural field approach [68,69]. In this approach, the twoA
Upward
B 'xa'ion
urst
Build-up :~
Horizontal
~5
b 30~ a.
//
6(~ b
Sacca'e
preparation
urst
Build-up
......
1
Downward initiation ~ I
Saccade
Burst
~
_____J
Fig. 7. Saccade initiation in the superior colliculus. (A) The collicular motor map. (B) Distribution of activity in the map during fixation, saccade preparation and saccade initiation.
1026
M. Lappe
dimensional map of neurons in the colliculus is treated as a homogeneous excitable field. Lateral interactions are assumed to provide short-distance excitation and longdistance inhibition [70]. In this model, incoming information about the location of saccade targets initiates the build-up of activity at the associated map locations. Simultaneously, the activity in the map is subject to internal dynamics governed by the lateral interactions. The internal dynamics allow to model vector averaging of closeby targets [67,68] and to model influences of multiple targets on saccadic reaction time [69]. When the build-up activity reaches threshold, the collicular burst neurons initiate the saccade. The execution of the saccade is then controlled by the so-called brainstem saccade generator. It consists of several groups of neurons from a number of brainstem nuclei along with the burst neurons and the fixation neurons of the SC. The brainstem saccade generator has to transform spatial target information provided by the saccade initiation system into an appropriate motor program for the eye muscles. Ultimately, this involves the transformation from a spatial map of target position into a temporal signal for the time course of muscle contraction. The brainstem saccade generator functions as a feedback controller (see Section 3.2). It receives a desired gaze displacement as input, which is provided by the SC burst neurons. The output is the signal to the oculomotor neurons. Two other important pathways of the controller are the feedback signal about eye position and the inhibitory pathway that suppresses the fixation neuron activity in order to release fixation. Many models of the brainstem saccade generator fall within this general scheme (e.g. [71-77]). They differ in the exact nature of the input signal and in the way eye position feedback is generated and used. A saccade aligns the direction of gaze with a target direction in space. The target direction specifies two positional angles of the eye (azimuth and elevation). Yet, the eyeball has three degrees of freedom of movement, the third being torsion around the line of sight. Hence, the saccade target direction does not fully specify the final eye orientation. The saccadic system introduces a further constraint (called Listing's law [11]) that moves the eye such as to minimize torsion [78,79]. This constraint is sensible for two reasons. First, because rotations are non-commutative, a sequence of saccades that does not follow Listing's law would lead to the build-up of strong torsion of the eyeball over time, which would strain the muscles. Second, minimizing torsion ensures that the image of the world always remains in approximately the same orientation in the eye [11,79]. It is still under debate which part of the saccade pathway is responsible for the physiological implementation of Listing's Law [41,80]. Because the duration of a normal saccade is shorter than the latencies of visual input to the brain, the saccadic system cannot receive visual feedback while the saccade is ongoing. Errors in saccade targeting are conveyed to the system only after the saccade is finished. Thus, the entire saccade programming must be based on presaccadic visual information in an open loop sense. It is therefore important for the saccadic system to closely monitor errors in saccade targeting and adjust the saccade programming based upon recent performance. For this reason, saccade gain is plastic [81]. When a subject is instructed to make saccades to a target A that is
Information transfer between sensory and motor networks
1027
suddenly moved to another position B during the saccade, initially all saccades miss the target. After about a 100 trials, however, the saccadic system has learned to associate the presaccadic target position A with a saccade that brings the eye to the postsaccadic position B. This requires an adaptation of the gain of the saccade. Such an adaptation is not seen at the level of the superior colliculus [82], suggesting that it occurs in the saccade generator downstream from the colliculus, probably in the cerebellum [81,83,84]. The requirements for the control system and for the input signal become more complicated when combined eye-head movements are considered. In this case, both eye and head movement have to be controlled such that the gaze reaches the target and stays there. Typically, the eye movement is much quicker than the head movement. Therefore, even when both components are initiated at the same time, the eye movement is first to align gaze with the target. Then, as the head follows, the eye has to be counter-rotated to the head in order to keep gaze on the target. This process involves a complicated interaction between the two movement components. It has been proposed that both components might be driven by a common gaze command [85]. But recent experiments suggest that more likely each component receives an independent control signal [86].
5.2. Spatial representations during saccadic gaze shifts Saccadic eye movements impose problems for the stability of vision. A saccade rapidly and often drastically changes the view of the world that is projected on the retina. Moreover, each saccade induces strong and fast image motion on the retina as it sweeps across the visual image. Typically, however, we are not aware of the retinal image motion generated by a saccade nor of any image displacement after the saccade [87-89]. Both phenomena show that vision is temporally suppressed during saccades. The saccade-induced change of the view of the visual scene, moreover, enforces a match of identical image elements before and after the saccade. Otherwise we would fail to experience a stable environment. Transient changes of the apparent position of a briefly flashed object before a saccade illustrate the mechanisms of transsaccadic visual stability. Just before the beginning of a saccade, the apparent position of briefly displayed objects in the visual scene changes. There is a strong shift in the direction of the saccade anticipating the saccade and compensating for its effects [90,91]. The magnitude of shift varies with position in the visual field, implying a transient compression of the metric of space just before the saccade [92]. However, the compression is less robust than the shift, and is not found under all conditions. It is mainly driven by visual information available in the postsaccadic image [93], while the shift is thought to reflect the efference copy signal [90,91]. The presaccadic position shifts have been linked with the mechanism of receptive field updating in parietal cortex [56].
5.3. Reaching and pointing Goal-directed movements of the arm are much more difficult to control than eye movements. They involve several joints and hence possess a much higher degree of
1028
M. Lappe
freedom. Here we will only provide a very brief overview of the main theoretical concepts involved in arm motor control. For more detailed information the reader is referred to a number of review articles [94-98]. The central problem of arm motor control is to specify a unique set of movement parameters in the high-dimensional state space of the joints of the arm. This involves complex control problems for the kinematics and/or dynamics of the joint coordinates. Most approaches to this problem employ optimization schemes. In these schemes, a sequence of joint coordinates is established that minimize some property of the movement. Important examples are movement jerk (i.e. the third derivative of position) [94], torque change [99], or joint stiffness [100]. The equilibrium point hypothesis [101,102] asserts that much of the control of an arm movement is carried out by the passive properties of the muscoloskeletal system. A specific muscle innervation will drive the arm to a certain associated joint configuration, the equilibrium point, which forms a stable attractor of the force field generated by the muscles. In this case, the motor controller only needs to specify the equilibrium point [95,103]. It is also important to observe that normal arm movements do not use all the possibilities that the arm has. Natural arm movements towards a given point in space typically lead to a single posture of the arm. This posture is associated with the position of the target point relative to the shoulder. While many other postures would be possible, the system often uses only a single one. Thus, arm movements normally behave in a more constraint fashion in which only a smaller number of degrees of freedom is actually exploited [94,104]. This bears some analogy to the reduction of the degrees of freedom for saccadic eye movements by Listing's Law (see Section 5.1). The motor system for arm movements involves areas in the parietal cortex, cortical motor areas, the cerebellum, the basal ganglia, subcortical motor structures, and the spinal cord [1,2]. The motor command neurons in the primary motor cortex are within the highest level of this command hierarchy. These neurons reveal a tuning for the direction of the movement of the hand in space. They are thought to provide a population code for arm movement direction [97,105]. This code can be directly obtained from the distributed encoding of target position in space in the parietal cortex [51]. However, the firing rate of neurons in primary motor cortex is influenced by many other parameters of the movement such as its starting position [106], the orientation of the arm [107], and the load to be moved [108]. Because these parameters influence neuronal firing rate but do not influence the direction of the hand movement, the population may provide more information than just the movement direction. This suggests a more complicated population code which takes proprioceptive feedback about arm position into account [109].
6. Motion
Sensing and interpreting motion are essential for many behavioral tasks. Tracking a moving object with the eye or the hand or controlling one's own motion in the
1029
Information transfer between sensory and motor networks
environment requires the determination and analysis of movement in the visual field. Perceiving motion is also helpful for other information processing tasks. Motion can be used to group objects together or to separate objects from their background. The brain contains specialized mechanisms that detect and analyze motion and that transfer information about motion to motor networks involved in motion tasks. This section will first describe how motion is detected and analyzed and later describe how motion information is used in sensorimotor behavior. 6.1. Visual m o t i o n detection
An object moving across the visual field induces a changing pattern of illumination on the retina. Motion sensitive neurons respond to spatio-temporal luminance changes when motion is into their individual preferred direction. Such directionselective responses are already found in the retina, but also in most retinal recipient structures, and in many cortical areas. Many techniques have been proposed to estimate motion from time-varying images (overview and comparison in [110]). Two main classes of models for neuronal motion detection and direction selectivity are correlation models and gradient models. Correlation models compare the light intensity at one location at a specific time with the light intensity at another location at a later time [111]. The first such models were proposed in the late 1950s and 1960s [112,113]. The basic principle is shown in Fig. 8A. The signal from the first image position is delayed and compared to the signal from the second image position by a coincidence detector, which is modeled as a multiplication of the two signals. This detector responds to a luminance change at the two positions with a specific temporal profile, i.e., to a particular contrast frequency. However, this arrangement alone is not sufficient to truly detect motion in a particular direction. It would also respond to a continuous uniform illumination, because then the time delay does not matter anymore. This ambiguity is re-
A Input 1
V
Input2
/
B
(D t'~
a
O3
Coincidence detection Fig. 8.
Time
Motion detection by spatio-temporal correlation. (A) Correlation detector. (B) Spatio-temporal filter.
M. Lappe
1030
solved by comparing the outputs of two detectors that are mirror images of each other (opponent detectors) [112,111]. But even then, the detector is direction selective but it cannot be selective for the speed of motion. For instance, two gratings of different spatial frequencies can be moved along the detector at different speeds and yield the same response. Speed information can be gained, however, through the analysis of a population of detectors [114]. A variant of the correlation approach are the motion energy models [115,116] (Fig. 8B). These models use linear spatio-temporal filters that establish time delays for some sub-parts of the receptive field. In the example of Fig. 8B the receptive field consists of an excitatory region (light ellipse) flanked by two inhibitory regions (dark ellipses) in a Gabor-like arrangement. Time delays between these areas can be expressed as an orientation of the receptive field in space-time (Fig. 8B). This receptive field structure establishes a filter that responds preferentially to a spatio-temporal luminance change that is aligned with the long axis of the excitatory region. The outputs of opponent filters are squared and summed to obtain a measure of the total motion strength, or motion energy [115,116]. Subsequent models have refined this general structure, either to make it more consistent with human psychophysical data from motion perception [117,118], or to allow the estimation of speed through population analysis [119]. More recent work has elaborated on these procedures in order to closely resemble the properties of direction selective neurons in the primary visual cortex [120,121]. Gradient models attempt to calculate local velocity from the local spatial and temporal gradients of luminance [122-125]. They are built around the assumption that the total image luminance E is stationary over time, dE/dt = 0. If this is true, then the temporal and spatial luminance gradients must sum to zero 5ESx
~E~y +
~E +
-
o.
From this equation, image velocity (~x/St, Oy/St) can be computed once VE and
5E/St are known. An inherent problem in the neural computation of visual motion is the so-called aperture problem [122,123]. The aperture problem occurs if the moving object is larger than the receptive field of the neuron. The neuron only sees the luminance changes inside its receptive field, i.e., through a limited aperture. For a moving onedimensional edge seen through a limited aperture, only the motion component orthogonal to the edge can be determined. The motion component along the edge cannot be determined because there is no change of luminance along this direction. Mathematically, this is expressed in the fact that the luminance gradient equation above is a single equation in two unknowns. Hence, a neuron that is subject to the aperture problem can only register one component of the two-dimensional visual motion signal. The aperture problem can be overcome by integrating local motion signals from many neurons over a larger spatial region. Several models for such motion integration have been proposed, differing mainly in the way in which the integration is performed. The integration of motion signals may be performed along the edges of a moving object [126], or over a two-dimensional area [123,127]. The
Information transfer between sensory and motor networks
1031
latter case attempts to estimate a dense and smooth velocity field across the entire image. However, when motion measurements are spatially averaged, the problem arises that the edges of moving objects form a discontinuity in the velocity field. Averaging motion signals across this discontinuity compromises the estimation of the motion of the object. Moreover, detecting the discontinuity is important to separate the object from its background. In the spatial integration approach, line processes can be incorporated that break up the integration at discontinuities where the local motion signals change abruptly [124]. Alternatively, one may segment the velocity field into coherently moving parts based upon the reliability of individual local motion measurements [128]. In the primate visual system, the integration of local motion signals happens along the pathway from the primary visual cortex (area V1) to the middle temporal (MT) area and the medial superior temporal (MST) area. Neurons in area MT have been shown to overcome the aperture problem [129,130]. Area MT is an area specifically dedicated to the processing of motion. It contains a high proportion of direction-selective neurons [131] and has been linked to behavioral responses to motion stimuli in lesion [132] and microstimulation [133] studies. Area MT contains a topographic map of motion the visual field [134]. Receptive fields in area MT are much larger than those of area V1, reflecting the motion integration that occurs in MT. They range from about 1 deg 2 in the central visual field up to 100 deg 2 in the periphery. Each neuron in the map responds to visual motion at that map position and in its preferred direction. A combination of the responses of several neurons with different preferred directions can provide a population encoding of motion across the visual field [ 124,128,135].
6.2. Motion analysis Visual motion is used for many purposes. This includes the determination of the three-dimensional structure of objects [136], the control of tracking movements of the eyes [7], and the guidance of self-motion [137]. Each task requires a dedicated analysis of the visual motion signal. This analysis is performed in areas MT and MST, the ventral intraparietal (VIP) area and area 7A in the parietal cortex. Much useful information is contained in motion parallax [138,139]. Motion parallax is the difference in the apparent motion of two objects that move with the same physical speed but are positioned at different distances from the observer (cf. Fig. 4C). Near objects, in this case, move faster on the retina than objects located further away. This difference in the speed of motion signals from different depths is the basis for estimation of the three-dimensional structure of moving objects [140-143]. Neurons in area MT have an antagonistic substructure in their receptive fields that suggests a role in the estimation of motion parallax. In addition to the part of the receptive field that generates responses to motion in the preferred direction of the neuron, the receptive field of many MT cells contains an area that reduces the response when it is stimulated with motion in that same direction [144,145]. These neurons respond to differences in local motion, i.e., to motion parallax. Their response properties are useful for the estimation of three-dimen-
1032
M. Lappe
sional shape [146,147] and for the estimation of self-motion and depth in the visual scene [148,149].
6.3. Visual tracking by smooth pursuit eye movements Tracking a moving object involves smooth pursuit eye movements. Smooth pursuit eye movements are performed to continuously keep the image of the object on the fovea, i.e., in the area of high-resolution vision. The cortical pathway that generates smooth pursuit is largely separate from the one that drive saccades. Smooth pursuit is generated in a network consisting of the cortical motion areas MT and MST, the frontal eye field (which is also involved in saccades), the pontine nuclei, and certain parts of the cerebellum (overview in [150]). At the level of the pontine nuclei and the cerebellum the pursuit pathway converges to some degree with the pathways for other eye movements. Smooth pursuit is governed by a feedback control system [151]. It uses visual motion as input and eye speed as the feedback signal. Different models assume different formats for the visual motion input. This difference is best understood by considering the initiation of pursuit to a target that suddenly starts moving. This is the open-loop situation of the feedback system, in which only the input to the controller, not the feedback signal, is available. When the target starts moving, the movement-induced retinal image motion tells the pursuit system the desired speed of the eye. After a certain latency period of about 100 ms the eye is accelerated towards the desired target speed. Meanwhile a saccade is initiated to bring the moving object into the fovea. Later, in the closed-loop situation, smooth pursuit continues to match the eye velocity to the target velocity. But now the retinal image motion of the target is very small because the eye movement stabilizes the target on the fovea. Hence the retinal image motion does not specify the movement of the target anymore and cannot be directly used as the driving signal for the controller. There are two ways to deal with this problem. Target velocity models construct the velocity of the target in space from the sum of the velocity of the eye movement and the remaining retinal motion of the target [151,152]. The target velocity signal is then used as the input to the controller. Retinal motion models, on the other hand, use directly the retinal image motion as input to the controller and integrate it to determine target velocity in space [153]. Neurophysiologically, the input to the pursuit system originates from motion sensitive neurons in areas MT and MST [150,154-156]. Most of these neurons respond to retinal image motion. Hence they can provide the input required by the retinal motion models [156]. However, part of the neurons in area MST appears to encode target velocity in space rather than image motion on the retina [157-159]. Their signal can provide the required input for the target motion model [152].
6.4. Control of self-motion and posture Movement of the observer himself induces global image motion of the entire visual field (Fig. 9). This pattern of image motion is called optic flow. It serves as a signal to control self-motion and to stabilize posture. Optic flow is used to control body
Information transfer between sensory and motor networks
1033
GazeO ~,...... ~} 9 g ~ -~
9
o -
"*"
~
~ --IT---
9
;.
9 **
I
o. *
,~
Move m e n t ~ o o
,
~ ,
i
Fig. 9. Optic flow induced by self-motion. (A) An observer moving across a flat horizontal plane. (B) Vector field of image motion induced by forward movement of the observer while he looks directly into the motion direction. (C) Vector field of image motion when the observer performs an eye movement to look at an element on the ground in front of him (circle). stance [160,161 ], the speed of self-motion [162], the distance traveled [ 163], the timeto-collision with obstacles along the path [160,164], and the direction of heading [137]. Much work has concentrated on the last issue, the estimation of heading. The optic flow pattern induced in the eye of a moving observer is determined by the parameters of the movement and the three-dimensional structure of the environment [138]. Mathematically, the problem of inferring the motion of the observer from the pattern of optic flow is ill posed. At any instance, the motion of the eye, like any rigid body motion, can be described by translation and rotation, i.e., with six degrees of freedom. The image motion of an element of the environment depends on these parameters and on its distance from the eye. The structure of the flow pattern is quite simple when only observer translation is considered. In this case, the motion field radiates away from a singular point, the focus of expansion, which is directly
1034
M. Lappe
equivalent to heading (Fig. 9B). However, when rotational movements of the eye are superimposed on the translation of the observer, the motion field becomes much more complex [165,166] and the singular point is no longer associated with heading (Fig. 9C). Heading estimation then requires the determination of the direction of translation in the presence of rotational flow disturbances. This can be framed as a problem with many unknown parameters, namely the six degrees of freedom in the self-motion plus the distances of all visible points from the eye. Accurate measurement of the retinal flow provides information to solve this problem by registering the direction and speed of every moving point. This allows the mathematical decomposition of the flow into translational and rotational components and the estimation of heading once more than six moving points are given [167]. Usually many more points are available but their measurements are noisy. In this case, redundant information from more than six points can be used to optimally determine heading [168,169]. A key source of information to separate translational and rotational components of the optic flow is motion parallax. For translational movements of the eye, the induced visual speed of each element is inversely proportional to its distance from the eye. In contrast, a rotation of the eye induces equal angular speed in all image points, independent of distance. This difference is exploited by most neural models for heading estimation from optic flow. Three main classes of models have been proposed. Differential motion models directly capitalize on the properties of motion parallax. By computing differences between adjacent flow vectors they remove the constant rotation component and construct an approximation of the translational component only [170,171,148]. Heading can then be recovered by locating the singular point. The antagonistic receptive field structure of neurons in area MT can provide a starting point for such an analysis [148]. Template-matching models take a different approach. They construct neurons that respond to specific instances of optic flow, i.e., to specific flow patterns. As the number of possible flow patterns in principle is infinite this requires either a very large number of templates [172], or restraining assumptions about the parameters of observer motion or of the environment [173], or a mechanism to approximate an entire set of templates from a few basic templates [174]. A third approach, optimization models, constructs an optimization function that leads to a set of motion parameters that optimally predict the measured flow field. Originally this approach was based on minimizing the squared error between the measured flow field and a possible candidate flow field [168,169]. The parameters that define the best matching candidate flow field were found iteratively. However, the time-consuming iteration can be cut short by geometric considerations, resulting in a fast and robust estimation procedure [175]. This algorithm has been implemented in a neural network [176]. The emerging properties of the elements of this network bear strong resemblance to the properties of neurons in areas MT and MST [177,178]. While the three types of model follow quite different approaches, and make different predictions for the neuronal elements involved, ultimately they all bear on the properties of motion parallax [ 137]. The first step for this analysis could be provided by receptive field properties in area MT [149]. The optimization model provides a generalization from local to global parallax analysis [149].
Information transfer between sensory and motor networks
1035
The representation of self-motion in areas MT and MST is a good example of the fusion of different sources of information for a common goal. For the separation of translational and rotational (eye-movement) components, motion parallax has been mentioned above as a useful visual cue. A further source of information, however, is provided by eye movement feedback. A nonvisual eye movement feedback signal can be used to adjust the gain of templates in a template model [174] or, more directly, to estimate the eye movement-induced visual motion and to subtract it from the flow pattern in the optimization model [179]. Such an eye movement feedback signal is available in area MST [158], and it is used to obtain heading information in the presence of eye movements [180,181]. In addition, self-motion is also sensed by the vestibular system. Vestibular self-motion signals are integrated with visual self-motion signals in area MST [182,183]. Finally, knowledge of the three-dimensional depth structure of the visual scene can also provide constraints for the evaluation of the flow field [184,185]. Disparity selectivity in area MT contributes to the robustness of the flow representation by a depth-dependent spatial filtering of the flow vectors [135]. It reduces noise among flow vectors with the same motion parallax. Disparity selectivity in area MST may provide a selective weighting of flow signals from distant objects which enhances the separation of translational and rotational components of the flow field [186]. The direct coupling of optic flow to motor output has been investigated in detail for the control of posture. When a standing, stationary subject experiences a lowfrequency, low-amplitude expanding and contracting flow pattern, the subject will unconsciously sway back-and-forth along with the pattern movement [160,161]. Hence visual motion is used for postural control, and must be integrated with vestibular and somatosensory signals [187]. The swing and phase coupling behavior has been modeled in a dynamical systems approach. The original idea was that the expansion rate of the flow field can directly drive a passive dynamic system for postural responses [188]. However, detailed comparison with experimental data suggests that the expansion rate of the flow field rather couples into a system that actively generates postural responses [189]. Again, flow field input to this system appears to be provided by area MST [190].
6.5. Gaze stabilization during self-motion The coupling of perception and action also manifests itself in the oculomotor behaviors during self-motion. In the above section, we have seen that eye movements give rise to rotational components in the optic flow that severely complicate the task of heading detection- and probably many other optic flow-related tasks, as well. Why then do we perform eye movements during self-motion? The reason is that selfmotion creates a problem for stable vision as it sets the entire image of the world on the retina in motion. In order to accurately perceive the environment it is desirable to have a clear and stationary visual image. For this reason, several types of compensatory eye movement reflexes are active during self-motion that move the eye so as to keep the central part of the retinal image stable [7,191]. These gaze stabilization mechanisms use vestibular, proprioceptive, and visual signals.
1036
M. Lappe
The requirements of gaze stabilization are quite different for rotation and translation movements. For rotations of the head or body, the entire visual scene moves with a single angular velocity. The rotational vestibulo-ocular reflex (rVOR) very directly uses the signal from the vestibular organs to compensates for rotations of the head by rotating the eyes opposite to the head rotation in a feedback control loop [71,191]. The speed of the eyes in the rVOR closely matches the speed of the head movement such that very good image stabilization is achieved. This is particularly true for fast head rotations, e.g. head oscillations in the 2-8 Hz range. For slower head rotations, ocular compensation increasingly relies on the optokinetic reflex (OKR). The optokinetic reflex tries to null retinal image motion by adjusting the eye speed to the speed of the visual motion in a visual feedback control loop. A combination of the optokinetic and vestibulo-ocular reflexes, which is the normal situation during active movement, results in almost complete image stabilization during head rotations. The VOR and the open-loop OKR both have to predict their effects on the visual image and may receive visual feedback signals only with a comparatively long time delay of about 80 ms. For this reason, their performance is constantly monitored through cerebellar feedback loops and subject to fast adaptation mechanisms [ 10,13,192]. Translations of the head in space also induce vestibularly driven compensatory eye movements, called the translational vestibulo-ocular reflex (tVOR). However, there are two further complications in this case. First, for geometrical reasons the required speed of the eye movement cannot be determined from the head movement alone. The visual motion of a fixated scene element during translation depends on the distance of the element from the observer. Accurate image stabilization in this case must take the geometry of the visual scene into account. If the object is close to the eye the same head movement would induce a much larger visual motion than if the object is further away. Hence to achieve accurate image stabilization the compensatory eye speed depends on the viewing distance [193,194]. Scaling of eye speed with viewing distance also occurs for the ocular following reflex [194]. The second problem is that during forward translation it is physically impossible to stabilize the entire retinal image. Forward motion induces an expanding pattern of optic flow in which points in different parts of the visual field move in different directions. Hence it is only possible to stabilize the part of the visual image at which gaze is directed. In this case, the tVOR varies with viewing direction. Eye movement is rightward when gaze is directed to the right and leftwards when gaze is directed to the left [195]. In addition to the tVOR, optokinetic reactions to radial optic flow fields use visual information to stabilize gaze during forward translation [196,197]. These eye movements follow the direction of motion that is present at the fovea and parafovea, stabilizing the retinal image in a small parafoveal region.
7. An overall view
The brain uses sensory information to control motor behavior. Initially, the incoming information provided by the different sensory systems is encoded in different
Information transfer between sensory and motor networks
1037
sensor-specific formats. Typically, this information is either directly or via a first transformation represented in topographic maps. For the guidance of spatially accurate motor action, multiple multisensory representations of space are formed in the parietal cortex. These representations are distributed across populations of neurons. Different task-dependent space coding formats can be specified by the read-out mechanism of the population activity. Different motor tasks, such as the various types of eye movement, are controlled by separate sensorimotor networks. They can mostly be described as feedback control systems. Sensory information acts as the input signal. Feedback is also provided by sensory information and, in addition, by an efference copy of the motor command. The exact types of these signals and their encoding depends on the specific motor task. Abbreviations
CNS, Central Nervous System deg, degree FEF, Frontal eye fields LIP, Lateral inter parietal ms, millisecond MT, Medial temporal MST, Medial superior temporal OKR, Optokinetic reflex s, second SC, Superior colliculus V1, primary visual cortex (area 17) VIP, Ventral inter parietal VOR, Vestibulo-ocular reflex 2D, two-dimensional
Acknowledgements
I would like to thank several people who have commented on parts of the manuscript: Martin Giese, Klaus-Peter Hoffmann, Dieter Kutz, Lars Lfinenburger, Bas Neggers, Peter Thier, Florentin W6rg6tter.
References
1. Kandel, E.R., Schwartz, J.H. and Jessell, T.M. (1991) Principles of Neural Science. Elsevier, Amsterdam. 2. Gazzaniga, M.S. (2000) The New Cognitive Neurosciences. MIT Press, Cambridge, MA. 3. Palmer, S.E. (1999) Vision Science: Photons to Phenomenology. MIT Press, Cambridge, MA. 4. Milner, A.D. and Goodale, M.A. (1995) The Visual Brain in Action. Oxford University Press, Oxford. 5. Felleman, D.J. and Van Essen, D.C. (1991) Cerebral Cortex 1, 1-47. 6. Young, M.P., Scannell, J.W. and Burns, G. (1995) The Analysis of Cortical Connectivity. Springer, Berlin.
1038
M. Lappe
7. 8. 9. 10. 11. 12. 13. 14. 15 16 17 18 19 20. 21.
Carpenter, R.H.S. (1988) Movement of the Eyes, 2nd Edn. Pion, London. Blake, A. and Yuille, A. (1992) Active Vision. MIT Press, Cambridge, MA. Cruse, H. (1996) Neural Networks as Cybernetic Systems. Thieme, Stuttgart, New York. Wolpert, D.M., Miall, R.C. and Kawato, M. (1998) Trends. Cogn. Sci. 2, 338-347. von Helmholtz, H. (1896) Handbuch der Physiologischen Optik. Leopold Voss, Hamburg. Von Holst, E. and Mittelstaedt, H. (1950) Naturwissenschaften 37, 464-476. Ito, M. (1998) Trends. Cogn. Sci. 2, 313-321. Mather, G., Verstraten, F. and Anstis, S. (1998) The Motion Aftereffect. MIT Press. Marr, D. (1982) Vision. Freeman, San Francisco. De Valois, R.L. and De Valois, K.K. (1988) Spatial Vision. Oxford University Press, Oxford. Daugman, J.G. (1980) Vision Res. 20, 847-856. Jones, J.P. and Palmer, L.A. (1987) J. Neurophysiol. 58, 1233-1258. Gibson, J.J. (1950) The Perception of the Visual World. Houghton Mifflin, Boston. Cutting, J.E. (1997) Behav. Res. Meth. Instr. Comput. 29, 27-36. Howard, I.P. and Rogers, B.J. (1996) Binocular Vision and Stereopsis. Oxford University Press, Oxford. Poggio, G. (1995) Cerebral Cortex 5, 193-204. Fleet, D., Wagner, H. and Heeger, D. (1996) Vision Res. 36, 1839-1857. Zhu, Y. and Qian, N. (1996) Neural Comp. 8, 1611-1641. Wagner, H. and Frost, B. (1993) Nature 364, 796-798. Ohzawa, I., DeAngelis, G. and Freeman, R. (1996) J. Neurophysiol. 75, 1779-1805. Qian, N. (1994) Neural Comp. 6, 390-404. Porr, B., Cozzi, A. and W6rg6tter, F. (1998) Biol. Cybern. 78, 329-336. Anzai, A., Ohzawa, I. and Freeman, R.D. (1999) J. Neurophysiol. 82, 874-890. Qian, N. and Mikaelian, S. (2000) Neural Comp. 12, 279-292. Lehky, S.R. and Sejnowski, T.J. (1990) J. Neurosci. 10, 2281-2299. Trotter, Y., Celebrini, S., Stricanne, B., Thorpe, S. and Imbert, M. (1996) J. Neurophysiol. 76, 2872-2885. Gnadt, J. and Mays, E. (1995) J. Neurophysiol. 73, 280-297. Pouget, A. and Sejnowski, T.J. (1994) Cerebral Cortex 4, 314-329. Andersen, R.A., Snyder, L.H., Li, C.-S. and Stricanne, B. (1993) Curr. Opin. Neurobiol. 3, 171-176. Galletti, C. and Battaglini, P.P. (1989) J. Neurosci. 9, 1112-1125. Galletti, C., Battaglini, P.P. and Fattori, P. (1995) Eur. J. Neurosci. 7, 2486-2501. Bremmer, F., Ilg, U.J., Thiele, A., Distler, C. and Hoffmann, K.-P. (1997) J. Neurophysiol. 77, 944-961. Andersen, R.A., Bracewell, R.M., Barash, S., Gnadt, J.W. and Fogassi, L. (1990) J. Neurosci. 10, 1176-1196. Boussaoud, D., Jouffrais, C. and Bremmer, F. (1998) J. Neurophysiol. 80, 1132-1150. Van Opstal, A. and Hepp, K. (1995) Biol. Cybern. 73, 431-445. Zipser, D. and Andersen, R.A. (1988) Nature 331, 679-684. Mazzoni, P., Andersen, R.A. and Jordan, M.I. (1991) Proc. Nat. Acad. Sci. USA. 88, 4433-4437. Goodman, S.J. and Andersen, R.A. (1990) International Journal of Conference on Neural Networks, Vol. II, pp. 381-386, San Diego. Pouget, A., Fisher, S.A. and Sejnowski, T.J. (1993) J. Cog. Neurosci. 5, 150-161. Bremmer, F., Pouget, A. and Hoffmann, K.-P. (1998) Eur. J. Neurosci. 10, 153-160. Pouget, A. and Sejnowski, T.J. (1997) J. Cog. Neurosci. 9, 222-237. Moody, J. and Darken, C. (1989) Neural Comp. 1,281-294. Poggio, T. and Girosi, F. (1990) Proc. IEEE 78, 1481-1497. Salinas, E. and Abbott, L. (1996) Proc. Natl. Acad. Sci. USA. 93, 11956-11961. Salinas, E. and Abbott, L. (1995) J. Neurosci. 15, 6461-6474. Duhamel, J.-R., Bremmer, F., Ben Hamed, S. and Graf, W. (1997) Nature 389, 845-848. Fogassi, L., Gallese, V., di Pellegrino, G., Fadiga, L., Gentilucci, M., Luppino, G., Matelli, M., Pedotti, A. and Rizzolatti, G. (1992) Exp. Brain Res. 89, 686-690.
22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
Information transfer between sensory and motor networks
54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81.
82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103.
1039
Boussaoud, D., Barth, T.M. and Wise, S.P. (1993) Exp. Brain Res. 91, 202-211. Bremmer, F., Distler, C. and Hoffmann, K.-P. (1997) J. Neurophysiol. 77, 962-977. Duhamel, J.-R., Colby, C.L. and Goldberg, M.E. (1992) Science 255, 90-92. Umeno, M. and Goldberg, M. (1997) J. Neurophysiol. 78, 1373-1383. Walker, M., Fitzgibbon, E. and Goldberg, M. (1995) J. Neurophysiol. 73, 1988-2003. Quaia, C., Optican, L.M. and Goldberg, M. (1998) Neural Networks 11, 1229-1240. Colby, C. and Goldberg, M. (1999) Ann. Rev. Neurosci. 22, 319-349. Thier, P. and Andersen, R.A. (1998) J. Neurophysiol. 80, 1713-1735. Schall, J.D. and Thompson, K.G. (1996) Ann. Rev. Neurosci. 22, 241-259. Van Gisbergen, J.A.M., Van Opstal, A.J. and Tax, A.A.M. (1987) Neuroscience 21, 541-555. Lee, C., Rohrer, W.H. and Sparks, D.L. (1988) Nature 332, 357-360. Munoz, D. and Wurtz, R. (1995) J. Neurophysiol. 73, 2313-2333. Basso, M.A. and Wurtz, R. (1997) Nature 389, 66-69. Ottes, F.P., Van Gisbergen, J.A.M. and Eggermont, J.J. (1984) Vision Res. 24, 1169-1179. Kopecz, K. and Sch6ner, G. (1995) Biol. Cybern. 73, 49-60. Kopecz, K. (1995) Vision Res. 35, 2911-2925. Amari, S.I. (1977) Biol. Cybern. 27, 77-87. Robinson, D.A. (1981) Ann. Rev. Neurosci. 4, 463-503. Scudder, C.A. (1988) J. Neurophysiol. 59, 1455-1475. Waitzman, D.M., Ma, T.P., Optican, L.M. and Wurtz, R.H. (1991) J. Neurophysiol. 66, 1716-1737. Droulez, J. and Berthoz, A. (1991) Proc. Natl. Acad. Sci. USA 88, 9653-9657. Dominey, P. and Arbib, M. (1992) Cerebral Cortex 2, 153-175. Van Opstal, A. and Kappen, H. (1995) Network 4, 19-38. Moschovakis, A. (1994) Biol. Cybern. 70, 291-302. Tweed, D. and Vilis, T. (1990) Vision Res. 30, 111-127. Hepp, K. (1995) Vision Res. 35, 3237-3241. Raphan, T. (1998) J. Neurophysiol. 79, 2653-2667. Goldberg, M.E., Musil, S., Fitzgibbon, E., Smith, M. and Olson, C. (1993) in: Role of the Cerebellum and Basal Ganglia in Voluntary Movements, eds N. Mana, I. Hamada and M.R. Delong. pp. 203-211, Elsevier, Amsterdam. Frens, M. and Van Opstal, A. (1997) Brain Res. Bull. 43, 473-483. Schweighofer, N., Arbib, M. and Dominey, P. (1996) Biol. Cybern. 75, 19-28. Quaia, C., Lefevre, P. and Optican, L.M. (1999) J. Neurophysiol. 82, 999-1018. Guitton, D., Munoz, D.P. and L., G.H. (1990) J. Neurophysiol. 64, 509-531. Goossens, H. and van Opstal, A.J. (1997) Exp. Brain Res. 114, 542-560. Bridgeman, B., Hendry, D. and Stark, L. (1975) Vision Res. 15, 719-722. Burr, D.C., Morrone, M.C. and Ross, J. (1994) Nature 371, 511-513. Ilg, U.J. and Hoffmann, K.-P. (1993) Vision Res. 33, 211-220. Matin, L. and Pearce, D. (1965) Science 148, 1485-1488. Honda, H. (1989) Percep. Psychophys. 45, 162-174. Ross, J., Morrone, M. and Burr, D. (1997) Nature 386, 598-601. Lappe, M., Awater, H. and Krekelberg, B. (2000) Nature 403, 892-895. Hogan, N. and Flash, T. (1987) Trends. Neurosci. 10, 170-174. Mussa-Ivaldi, F.A., Giszter, S.F. and Bizzi, E. (1990) Cold Spring Harbor Symposia on Quant. Biol. 55, 827-835. Soechting, J. and Flanders, M. (1992) Ann. Rev. Neurosci. 15, 167-191. Georgopoulos, A. (1995) Trends Neurosci. 18, 506-510. Jeannerod, M., Arbib, M., Rizzolatti, G. and Sakata, H. (1995) Trends. Neurosci. 18, 314-320. Uno, Y., Kawato, M. and Suzuki, R. (1989) Biol. Cybern. 61, 89-101. Hasan, Z. (1986) Biol. Cybern. 53, 373-382. Feldman, A. (1966) Biophysics 11, 565-578. Bizzi, E., Accornero, W.C. and Hogan, N. (1984) J. Neurosci. 4, 2738-2744. Mussa-Ivaldi, F.A. (1992) Biol. Cybern. 67, 491-500.
1040
M. Lappe
104. Gielen, C.C.A.M., Vrijenhoek, E.J., Flash, T. and Neggers, S.F.W. (1997) J. Neurophysiol. 78, 660--673. 105. Georgopoulos, A.P., Schwartz, A.B. and Kettner, R.E. (1986) Science 233, 1416-1419. 106. Caminiti, R., Johnson, P. and Urbano, A. (1990) J. Neurosci. 10, 2039-2058. 107. Scott, S. and Kalaska, J. (1995) J. Neurophysiol. 73, 2563-2567. 108. Kalaska, J.F., Cohen, D.A., Hyde, M.L. and Prudhomme, M. (1989) J. Neurosci. 9, 2080-2102. 109. Burnod, Y., Grandguillaume, P., Otto, I., Ferraina, S., Johnson, P.B. and Caminiti, R. (1992) J. Neurosci. 12, 1435-1453. 110. Barron, J.E., Fleet, D.J. and Beauchemin, S.S. (1994) I.J. Computer Vision 12, 43-77. 111. Borst, A. and Egelhaaf, M. (1989) Trends. Neurosci. 12, 29-38. 112. Reichardt, W. (1957) Z. Naturforsch. 12b, 448-457. 113. Barlow, H.B. and Levick, R.W. (1965) J. Physiol. 173, 477-504. 114. Snippe, H.P. and Koenderink, J.J. (1994) J. Opt. Soc. Am. A 11, 1222-1236. 115. Adelson, E.H. and Bergen, J.R. (1985) J. Opt. Soc. Am. A 2, 284--298. 116. Watson, A.B. and Ahumada, A.J. (1985) J. Opt. Soc. Am. A 2, 322-341. 117. Van Santen, J.P.H. and Sperling, G. (1985) J. Opt. Soc. Am. A 2, 300--320. 118. Wilson, H., Ferrera, V.P. and Yo, C. (1992) Vis. Neurosci. 9, 17-97. 119. Heeger, D.J. (1987) J. Opt. Soc. Am. A 4, 1455-1471. 120. Heeger, D.J. (1993) J. Neurophysiol. 70, 1885-1898. 121. Simoncelli, E. and Heeger, D. (1998) Vision Res. 38, 743-762. 122. Marr, D. and Ullman, S. (1981) Proc. Royal. Soc. London B 211, 151-180. 123. Horn, B.K.P. and Schunck, B.G. (1981) Art. Intell. 17, 185-203. 124. Koch, C., Wang, H.T. and Mathur, B. (1989) J. Exp. Biol. 146, 115-139. 125. Johnston, A., McOwan, P.W. and Buxton, H. (1992) Proc. Royal. Soc. London B 250, 297-306. 126. Hildreth, E.C. (1984) The Measurement of Visual Motion. MIT, Cambridge, Mass. 127. Yuille, A.L. and Grzywacz, N.M. (1988) Nature 335, 71-74. 128. Nowlan, S. and Sejnowski, T. (1995) J. Neurosci. 15, 1195-1214. 129. Movshon, J.A., Adelson, E.H., Gizzi, M.S. and Newsome, W.T. (1985) in: Pattern Recognition Mechanisms, eds C. Chagas, R. Gattass and C. Gross. Springer, Heidelberg. 130. Rodman, H.R. and Albright, T.D. (1989) Exp. Brain Res. 75, 53-64. 131. Maunsell, J.H.R. and Van Essen, D.C. (1983) J. Neurophysiol. 49, 1127-1147. 132. Newsome, W.T. and Par6, E.B. (1988) J. Neurosci. 8, 2201-2211. 133. Salzman, C.D., Britten, K.H. and Newsome, W.T. (1990) Nature 346, 174-177. 134. Albright, T.D. and Desimone, R. (1987) Exp. Brain Res. 65, 582-592. 135. Lappe, M. (1996) Neural Comp. 8, 1449-1461. 136. Andersen, R. and Bradley, D. (1998) Trends. Cogn. Sci. 2, 222-229. 137. Lappe, M., Bremmer, F. and van den Berg, A.V. (1999) Trends. Cogn. Sci. 3, 329-336. 138. Koenderink, J.J. (1986) Vision Res. 26, 161-180. 139. Koenderink, J.J. and van Doorn, A.J. (1992) J. Opt. Soc. Am. A 9, 530.538. 140. Ullman, S. (1984) Perception 13, 255-274. 141. Grzywacz, N.M. and Hildreth, E.C. (1987) J. Opt. Soc. Am. A 4, 503-518. 142. Siegel, R.M. and Andersen, R.A. (1988) Nature 331, 259-261. 143. Hildreth, E., Ando, H., Andersen, R. and Treue, S. (1995) Vision Res. 35, 117-137. 144. Allman, J.M., Miezin, F. and McGuinness, E. (1985) Annual Reviews of Neuroscience 8, 407-430. 145. Raiguel, S., Van Hulle, M., Xiao, D., Marcar, V. and Orban, G. (1995) Eur. J. Neurosci. 7, 20642082. 146. Buracas, G. and Albright, T. (1996) Vision Res. 36, 869-887. 147. Bradley, D., Chang, G.C. and Andersen, R. (1998) Nature 392, 714-717. 148. Royden, C.S. (1997) J. Opt. Soc. Am. A 14, 2128-2143. 149. Lappe, M. (1999) in: Neuronal Processing of Optic Flow, ed M. Lappe. pp. 235-268, Academic Press. 150. Ilg, U.J. (1997) Prog. Brain Res. 53, 293-329. 151. Robinson, D.A., Gordon, J.L. and Gordon, S.E. (1986) Biol. Cybern. 55, 43-57.
Information transfer between sensory and motor networks
152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197.
1041
Dicke, P.W. and Thier, P. (1999) Biol. Cybern. 80, 71-84. Krauzlis, R. and Lisberger, S. (1994) J. Comput. Neurosci. 1, 265-283. Komatsu, H. and Wurtz, R.H. (1988) J. Neurophysiol. 60, 580-603. Dfirsteler, M.R. and Wurtz, R.H. (1988) J. Neurophysiol. 60, 940-965. Lisberger, S.G. and Movshon, J.A. (1999) J. Neurosci. 19, 2224-2246. Sakata, H., Shibutani, H. and Kawano, K. (1983) J. Neurophysiol. 49, 1364-1380. Erickson, R.G. and Thier, P. (1991) Exp. Brain Res. 86, 608-616. Ilg, U.J. and Thier, P. (1999) Vision Res. 39, 2143-2150. Lee, D.N. (1980) Philos. Trans. R. Soc. Lond. B 290, 169-179. Dijkstra, T., Sch6ner, G., Giese, M. and Gielen, C. (1994) Biol. Cybern. 71, 489-501. Prokop, T., Schubert, M. and Berger, W. (1997) Exp. Brain Res. 114, 63-70. Bremmer, F. and Lappe, M. (1999) Exp. Brain Res. 127, 33-42. Tresilian, J.R. (1999) Trends. Cogn. Sci. 3, 301-310. Warren, W.H. and Hannon, D.J. (1990) J. Opt. Soc. Am. A 7, 160-169. Lappe, M. and Rauschecker, J.P. (1995) Biol. Cybern. 72, 261-277. Longuet-Higgins, H.C. and Prazdny, K. (1980) Proc. Royal. Soc. London B 208, 385-397. Bruss, A.R. and Horn, B.K.P. (1983) Comp. Vis. Graph. Image Proc. 21, 3-20. Koenderink, J.J. and van Doorn, A.J. (1987) Biol. Cybern. 56, 247-254. Rieger, J.H. and Lawton, D.T. (1985) J. Opt. Soc. Am. A 2, 354-360. Hildreth, E.C. (1992) Vision Res. 32, 1177-1192. Perrone, J.A. (1992) J. Opt. Soc. Am. 9, 177-194. Perrone, J.A. and Stone, L.S. (1994) Vision Res. 34, 2917-2938. Beintema, J. and van den Berg, A.V. (1998) Vision Res. 38, 2155-2179. Heeger, D.J. and Jepson, A. (1992) I.J. Computer Vision 7, 95-117. Lappe, M. and Rauschecker, J.P. (1993) Neural Comp. 5, 374-391. Lappe, M., Bremmer, F., Pekel, M., Thiele, A. and Hoffmann, K.-P. (1996) J. Neurosci. 16, 62656285. Lappe, M. and Duffy, C. (1999) Eur. J. Neurosci. 7, 2323-2331. Lappe, M. (1998) Neural Networks 11, 397-414. Bradley, D., Maxwell, M., Andersen, R., Banks, M.S. and Shenoy, K.V. (1996) Science 273, 15441547. Page, W.K. and Duffy, C.J. (1999) J. Neurophysiol. 81, 596-610. Duffy, C.J. (1998) J. Neurophysiol. 80, 1816-1827. Bremmer, F., Kubischik, M., Pekel, M., Lappe, M. and Hoffmann, K.-P. (1999) Ann. N. Y. Acad. Sci. 871, 272-281. van den Berg, A.V. and Brenner, E. (1994) Nature 371, 700-702. Grigo, A. and Lappe, M. (1998) Vision Res. 38, 281-290. Lappe, M. and Grigo, A. (1999) Neural Networks 12, 1325-1329. Mergner, T. and Rosemeier, T. (1998) Brain Res. Rev. 28, 118-135. Sch6ner, G. (1991) Biol. Cybern. 64, 455-462. Dijkstra, T., Snoeren, P.R. and Gielen, C. (1994) J. Opt. Soc. Am. A 11, 2184-2196. Duffy, C.J. and Wurtz, R.H. (1995) in: Perception, Memory, and Emotion: Frontier in Neuroscience, eds M. Ono, R. Molotchnikoff and Nishijo. Pergamon Press, Oxford. Miles, F.A. (1998) Eur. J. Neurosci. 10, 811-822. Du Lac, S., Raymond, J., Sejnowski, T. and Lisberger, S. (1995) Ann. Rev. Neurosci. 18, 409-441. Schwarz, U. and Miles, F.A. (1991) J. Neurophysiol. 66, 851-864. Busettini, C., Miles, F., Schwarz, U. and Carl, J. (1994) Exp. Brain Res. 100, 484-494. Paige, G.D. and Tomko, D.L. (1991) J. Neurophysiol. 65, 1184-1196. Lappe, M., Pekel, M. and Hoffmann, K.-P. (1998) J. Neurophysiol. 79, 1461-1480. Niemann, T., Lappe, M., Bfischer, A. and Hoffmann, K.-P. (1999) Vision Res. 39, 1359-1371.
This Page Intentionally Left Blank
Epilogue to Volume 4 Neuro-Informatics and Neural Modelling In this book, we have presented an overview of the biophysics of neurons, networks of neurons and the nonlinear dynamics of neurons and neuronal interactions. This field of research has expanded tremendously in the past decades and it continues to grow. The end of this expansion is not within sight. Not only are there many persistent fundamental problems, but also this field has an impact on many other scientific activities and on philosophy as well. How is it possible that the ensemble of neurons is able to capture the structure of our environment, including not only the physical properties (stimulus-response relations), but also social and interpersonal relations? How is it possible that humans are able to reflect on what happens in the environment and thereby rise above the physical world and reason about relationships in the environment? Evidently, these problems still persist and cannot be answered based on our present knowledge. The aim of this book is to review some of the existing knowledge and to provide an overview of the major theoretical concepts and experiments, which provide the fundaments for the next steps in this field. In the past decades tremendous progress has been made. We have obtained insights into the adaptive processes that shape the synaptic connectivity between neurons and thereby provide a framework for learning. The evolution in time of the states of single neurons and those of neuronal networks arising from the actions of adaptive synapses and external inputs can now be analyzed within a unified theoretical framework. The biological neuron is much better understood, and good analytical approximations are available to describe the role of neurons as computing devices. Moreover, both intracellular processes as well as the states of single neurons can now be understood within the framework of nonlinear dynamics. New insights into the behavior of networks of neurons has been captured by non-linear systems theory. Yet, many challenging problems with far-reaching implications remain. Some of these seem within grasp while others are elusive. For example, it is a matter of debate whether we will ever be able to understand the self-organizing mechanisms that underlie reasoning and consciousness. Yet, the field seems ready for several breakthroughs, and these will be the first logical steps to pave the way for a full understanding of information processing by the Central Nervous System. For
1043
1044
Epilogue to Volume 4
example, one of the long-standing problems is how sensory or motor information is coded in the neuronal activity. Since many signals are processed in parallel in a distributed way, the question arises as to how different features, belonging to the same object, are labeled such that the brain knows which action potentials code the same object and how to distinguish between action potentials encoding different objects. This problem, also known as the binding problem, is of crucial importance for our interpretation of experimental data on neuronal activity. The most viable hypothesis on this topic suggests that the temporal correlation of neuronal activity may provide a label and thereby a possible solution to this problem. Another major problem in neurobiology deals with the question of how extracellular signals, encoded in changing concentrations of extracellular neurotransmitters and hormones, encode or trigger specific intracellular processes. In many neurons, extracellular neurotransmitters (also called first messengers) initiate intracellular waves of second messenger concentration, which propagate from the cell-membrane to the cell nucleus to trigger protein synthesis, cell replication, or the production of hormones. These waves can be deterministic and periodic, or stochastic with a periodic or chaotic nature. New evidence suggests that these temporal characteristics may provide a code to address a particular part of the DNA to initiate further action, such as synthesis of proteins or hormones. These topics are just illustrations of the many challenging problems in the field that seem logical next steps. We hope that this book will provide a useful background for students and researchers to make further progress in the stimulating enterprise of understanding brain function. However, the examples above also illustrate that the content of this book is not only relevant for biophysicists, but also for people working in neurobiology, computer-science and philosophy. We are confident that exciting times are waiting and look forward to the next decade. Stan Gielen and Frank Moss
Subject Index absolute refractoriness 494 absorbing 706 abstract feature maps 724 acetylcholine ACH 106 action potential 233, 235, 236, 242, 250, 457, 458, 463, 466, 475, 479, 480, 485, 488, 507, 774 action potential conduction 396 action potential initiation 5 action potential reflection 410 action potentials APs 85 action-perception cycle 1006 activating function 10, 16 activation 92, 160 activation dynamics 975, 982 activation function 734 activity 781 A-current 372 adaptation 511 adaptive algorithms 224 adaptive estimation 223 adaptive natural gradient 754, 755 additive model 691 additive noise 874 adiabatic elimination 166, 175, 193 adiabatic hypothesis 789 AIC 759 algebraic description 187 all-or-nothing principle 102 alternans 218, 242-244, 246, 249 Amari-Maginu theory 675 ambiguous stimuli 986 amplification of synaptic inputs 433 amplitude 102, 281,285, 287, 311 amplitude adjusted fourier transformed (AAFT) surrogates 144
amplitude coding 97 amplitude regulation 103 amplitude-modulated 104 analytic signal 51, 59, 62, 290, 314, 317 Andronov-Hopf bifurcation 41 annulus map 195 anomalous rectification 375 antiarrhythmic drugs 212 antisynchronization 585 anti-tachycardia devices 214 aperiodicity 116 aperture problem 1030 Arnold tongues 30, 34, 44, 73, 76, 286, 298, 309 arrhythmias 207, 245, 246 ART 704 artificial neural networks 110, 733 assembly 782 associative memory 101, 107, 573 ASSOM 720, 721 asymmetric dilution 677-679 asymmetric networks 536, 538 asymmetry 520 asymptotic consistency 742 asynchronous state 901,904, 917, 925, 931,948 AT line 600, 615 AT-instability 595, 606, 611 atrial fibrillation 150, 232, 238, 245 atrial flutter 209, 245 atrioventricular (AV) node 208 attention 114 attractor 108, 555, 556, 572, 622, 655, 976 attractor networks 585, 650 attractor neural networks 623 auditory cortex 688
1045
1046
Subject Index
auditory nerve 878 auditory pathway 881 autocorrelation 95 automaticity 208 autonomous macroscopic laws 626, 634 axo-axonic synapses 406 axon 158, 472, 962 axon hillock 773, 774 axonal delay 783 axonal transmission 478
brainstem saccade generator 1026 branch line 172-174 branched manifold 169, 171-174, 177-179, 181, 183, 187, 188, 193, 194, 198 breaking of RS 598 Brown 179 Brownian motion 89 bundle of His 208 burst 161, 164, 187, 190, 200, 947 bursters 382
backbone 188 background noise 105 background synaptic activity 424 back-propagation of action potentials 431 bandpass 991 basis functions 1019 basis vectors 721 batch learning 740 batch SOM 720 Bayes relation 858, 861,878 Bayesian models 836 behavior modification 201 behavioural state 844 Beloff 119 Belousov-Zhabotinskii reaction 174 Bezanilla 93 bidomain model 247 bias 734 bifurcation diagram 161, 162, 165, 190, 199, 248 bifurcation parameter 161 bifurcation sequences 190 bifurcations 247, 372 binary networks 565 binary neurons 558, 562, 569, 623, 646, 650 binding 884 binding problem 96 Birman 169 Birman-Williams theorem 193, 201 bistability 232, 249 bivariate data 282, 283, 294 Boltzmann form 568 Boltzmann Machine 520, 526, 538, 544 Boltzmann-Gibbs distribution 526, 529, 531,532, 538, 540, 545 boundary conditions 196 bradycardia 209
Z2 test 180 cable theory 416 calcium waves 325, 332 cardiac arrhythmia 231,253 cardiac cell 233 cardiac electrophysiology 207 cardiorespiratory interaction 282, 304 Cartesian 120 catastrophe theory 197 cavity method 531,533 cell membrane 472 cellular 96 center-of-gravity estimate 862 central limit theorem 777, 814 central nervous system 891 cerebellum 1009 CG method 866 channel capacity 833 channel kinetics 121 channel stochasticity 385 chaos 87, 157, 162, 165, 193, 200, 231 chaos control 215 chaos in higher dimensions 193 chaotic 538 chaotic attractor 133 chaotic oscillators 281,287 chaotic-like dynamics 108 chemical agents 703 cholinergic effects 125 C-inactivation 93 circle map 45, 195 circuit elements 12 circular electrode 19 circular electrode excitation 15 classification 741,750 classification theory 201
1047
Subject Index close returns 175 closed dynamic equations 657 closed dynamic laws 626 closed macroscopic laws 641 closed set of equations 623 closed-loop feedback 231 closed-loop proportional feedback 243 closure 626, 638 cluster entropy 340 clustering 689 codebook 699 coefficient of variation 34 cognition 114, 118 coherence 312 coherence resonance 33 coherent oscillations 892 coherent oscillatory activity 110 coherent space-time clusters 338 coherent versus incoherent input 488 colored noise 144 columnar organization 973 compartmental neuron model 505 competition 991 competitive learning 715 complete synchronization 281 complex cells 974 complex regime 655 computational methods 105 computational neuroscience 448 computer simulations 115 condensed ansatz 660 conductance 8, 122, 159, 473, 496 conductance based model 894, 915, 919 conductance gradients 420 conductance-based neurons 894 conduction blocks 402 conduction velocity 397 conductivity 512 connecting 172, 843 connection weight 125 conscious states 101 consciousness 118 conservation equation 159 conservative forces 567, 649 constitutive equations 159 content-addressable 107 continuous neurons 560, 564, 566, 637, 648
continuous transition 595, 606, 643 contrast invariance 991 control chaos 231 control parameters 197 controlling chaos 240 convergence phase 700, 706 coopertion 991 cooperative dynamical systems 710 cooperative synchrony 901 correlated discharge 835 correlated noise 873 correlation 532, 646, 700, 837 correlation coefficient 70 correlation function 646, 794 cortical maps 988 cortical neurons 834 cortical oscillations 114 cortico-cortical synchronization 300 corticomuscular synchronization 301 cotangent space 745 cotangent vector 745 coupled oscillators 110, 562, 583 Cramer-Rao bound 863 Cramer-Rao inequality 858 critical noise level 585, 608, 629 critical point 740, 757, 761-765 critical slowing down 631 critical temperature 582 cross-correlation 860 cross-entropy 741 crossing 172 crossing matrix 172 crosstalk 689, 949 Curie-Weiss law 571 current 12 current source 13 cusp 196 cuspoids 197 cyclic relative phase 286, 288, 292, 298, 305 cylinder 196 dead neuron 719 decimatable distributions 533 decoding algorithm 835 dendritic spikes 430 dendritic spine 434, 773
1048
dendritic sub-units 422 dendritic tree 504, 773 density of states 578, 587 depolarization 161,200 depolarization wave 158 Descartes, Rene 119, 361 detailed balance 531,532, 565, 566, 647 determinism 87 deterministic evolution 626, 635 diabetic patients 3 diagonable 802 diastolic interval 237 dichotomic process 67, 69 differential embedding 176, 179, 180, 183 differential-integral embedding 176 diffusion matrix elements 638 digit recognition problem 547 dimension 164, 167, 192 dimension-reducing 689 direction 990 direction selectivity 428, 988 direction-selective 1029, 1031 discriminability 874 disinhibition 108, 113 disorder 592, 596, 600 disorder average 592, 658, 668, 676 disorder-averaged generating function 660, 661 disparity 1015, 1017, 1018, 1035 dissipation 164, 170, 189 dissipative limit 188 distributed coding 1011, 1018, 1021 distributed memory 544 divergence 165 domain of control 244 drift 189 drive period 185 Drosophila melanogaster 93 dual vector space 745, 766 Duffing equation 181 Duffing oscillator 172, 173, 184, 188, 200 Duhamel's formula 801 dynamic threshold 491 dynamical methods 167 dynamical models 691 dynamical system 159, 161, 164, 176, 200 dynamics 573
Subject Index ectopic focus 209 Edelman 118 edge 172 effective diffusion constant 50, 60, 66 effective measure 664, 667 effective neuron 659 efference copy 1009 eigenvalue 173 eigenvector 135 electrical noise, excitation 18 electrical stimulation 7 electrocardiogram 211,283, 289, 290, 302, 303, 315 electrocardiography 211 electrochemical excitation 232 electroencephalography (EEG) 86, 104 electromagnetic fields 106 electronic paramagnetic resonance (EPR) 93 electrophysiological testing 211 embedding 135, 175, 180 emergent behavior 689 empirical determinism 88 empirical Fisher information matrix 751 empirical loss 736 encoding, schemes 1010 energy function 107, 713 entrainment test 180 entropy 194, 598, 611,829, 859 epilepsy 327 EPSP 486 equalising time constants 419 equilibrium 565, 587, 649 equilibrium point hypothesis 1028 equilibrium statistical mechanics 565, 568 equivalent cylinder 404 ergodic components 596 ergodic systems 559 ergodicity 529, 530, 537, 538, 597, 605 ergodicity breaking 530, 583 error back propagation 739 estimator 858 evolution 105 evolutionary learning 723 excitable media 60, 232, 236 excitation 931 excitation/inhibition 694 excitatory neurons 975
Subject Index excitatory postsynaptic potential 489 expansion 173 external noise 89 extracellular electrodes 235 extremely diluted attractor networks 675 extremely diluted networks 576, 658 eye position 1018 eye position gain fields 1019, 1021 Eyring rate theory 123 facilitation 810 Faraday 120 FDT 652-654 feature maps 974 feature-selective 693 feedback 108 feedback control 243, 247, 252 feedback control methods 240 feedback controller 1007, 1008, 1026, 1032 feedforward 108 feedforward connections 870 feedforward controller 1007, 1008 feedforward network 519, 876 Fermi function 160 ferromagnetic phase 531,536 fibrillation 210, 216, 231,233, 238, 244-247 firing activities 640 firing patterns 365 firing probability 832 firing rate 475, 483, 493, 828, 830, 856, 895, 919, 920, 923, 926, 928, 939, 940, 948, 962 firing time 479, 482, 488 first return map 164, 180 Fisher information 834, 858, 863, 866, 870 Fisher information matrix 743, 763-765 Fisher matrix 859 fitness 723 Fitzhugh-Nagumo model 32, 377, 478 fixed point 137, 164, 171,216, 241,242, 244, 977, 981 fixed-delay stimulation 217 flip-saddle 147, 225 Floquet multiplier 241 flow 191 flow model 178 flow tubes 173, 188, 198, 199
1049
flows without equations 187 fluctuation-dissipation theorems 652 fluctuations 83, 102, 325 Fokker-Planck 714 Fokker-Planck equation 49, 561,625, 637, 711 fold 194-196 formal neurons 781 Fourier transformation 981 forced synchronization 25 fractal 167, 168 fractal dimensions 167 fractures 995 frames of reference 1010, 1012 Frankenheauser 90 Frankenhaeuser-Huxley 9 free energy 568, 583, 588, 592 Freeman 105 frequency entrainment 281,286 frequency modulation 104 front propagation 397 Fujisaka 179 fully connected Hopfield networks 576 fully synchronized state 916 function 125 functional enhancement 7 functional magnetic resonance imaging (fMRI) 104 functional stability 390 gain factor 697 gain field 1019, 1021 gain function 976 gain ofneurones 387 gain parameter 125 gamma range 96 gamma rhythm 892 gap junction 106, 237 gateau roul6 184, 200 gating 6, 92 gating theory 451 Gauss-Newton method 754 Gaussian approximations 655, 674 Gaussian distributed synapses 592 Gaussian function 126 Gaussian noise 112 Gaussian white noise 860
1050
general linear model 179 generalized phase difference 285, 293, 305, 310, 311 generating function 568, 623 generating functional analysis 556, 658, 659 generating terms 600 generators 96 geometric models 192 germs 197, 198 Glauber dynamics 525, 542 glial cells 325 global inhibition 983 global organization 712 glutamate 778 goodness of fit 179 Gotch 102 graded impulses 104 graded response attractor networks 641 graded response neurons 562, 638, 640, 644, 645, 653 gradient models 1030 Greens function 507 5-HT 106 Haberly 124 Hadamard 88 Hamiltonian 568, 592, 647 harmonic noise 146 head-centric 1012, 1018, 1019, 1021 head-centric encoding 1010 head-centric receptive field 1022, 1023 heading 1033, 1034 heart 232 Hebb 691 Hebbian 126 Hebbian learning 542, 543, 780, 782, 788, 790, 791,793, 799 Hebbian rules 843 Hebbian unlearning 785 Hebbian-type synapses 655 Henon map 139, 194 Hessian matrix 753, 762 higher-order consciousness 118 Hilbert transform 52, 59, 68, 291,292, 300, 314, 315,317 hippocampus 98, 844 Hodgkin 9
Subject Index Hodgkin-Huxley (HH) model 90, 155, 369, 450, 471-475, 476-479, 481-483, 690, 894, 895, 915, 962 Hodgson 119 Homoclinic orbits 395 Hopf bifurcation 195, 372, 894 Hopfield net 107 Hopfield model 361,572, 575, 630, 637, 644, 645, 676 Hopfield-type interactions 628 horizontal disparity 1015, 1017 human nervous tissue 343 Huxley 9, 90 Huyghens 179 hyperbolic 169, 173, 174 hypercolumn 980, 981,985, 995 hyperpolarization 486 hypothalamus 102 hysteresis 232, 978 identity theory 119 illusory movements 986 impedance adaptation 404 Impedance mismatch 402 lmplantable cardiac defibrillators 214 lmplantable cardiac devices 214 reactivation 92 indeterminism 87, 88 inertial manifold 166-168, 175, 180, 193, 195, 197-199, 201 inferior colliculus 881 inferior temporal cortex 834 infinitesimal learning 789 information geometry 742, 747 information processing 114 information theory 827 inhibition 931 inhibitory coupling 926 inhibitory interactions 980 inhibitory neurons 980 inhibitory rebound 476, 483 inhomogeneities 989 initial segment spike 391 initiation of fibrillation 232 input resistance 402 instantaneous amplitude and phase 47, 52, 68 instantaneous frequency 42
1051
Subject Index
instantaneous phase 31 l, 314, 316, 317 integers 166, 174, 187 integral-differential embedding 180, 184 integrate-and-fire model 108, 491-493, 495, 497, 500, 509, 895, 913, 919, 929 integrate-and-fire neuron 920, 921,932, 960, 977 interacting neurons 201 interaction symmetry 566 lnteractionism 117 mteraural time differences 788 lnterbeat intervals 216 interdependence between signals 283 interference 575, 608 lntermittency 165, 184 Internal noise 89 lnterneurons 100 Interpolation 191 lnterspike interval 162, 199, 475, 842 lnterspike interval histograms 33 lnterspike-interval distribution 100 lnterspike time intervals 184 Intra-cortical connections 989 lntracortical feedback 988 lntracortical interactions 974 Intractable 533 intrinsic oscillation 185 intrinsic stochasticity 393 ion channel kinetics 90 ion channels 259, 472 ion conductance 160 ion gate 159, 199 ion pump 158, 160, 199 ion transport 472 Ionic currents 158 lonotropic 778 IPSP 486 irreducibility 527, 529 irregular firing 957 ischemia 209, 216 ISI 842 isoclines 978 I - V curves 374 Jacobian 192 jelly roll 183, 184, 200 joining, array 172
joining chart 171 joint stiffness 1028 Kaplan-Yorke 168 Kaplan-Yorke conjecture 167 Klein bottle 196 Kohonen neural network 695 Kramers-Moyal expansion 625 Kramers rate 58 Kullback entropy 31 Kullback-Leibler divergence 737, 738 Kushner-Clark theorem 709 Langevin equation 48, 334, 560 Lapicque's model 367 Laplace 87 lateral geniculate nucleus 950, 971 lateral inhibitory connections 870 lateral interactions 870 lateral olfactory tract (LOT) 125 lattice 696 leakage channel 473 leaky integrator 492 learning 114, 201,520, 521,538, 545, 546 learning algorithms 790 learning rule 733, 787, 790 learning window 790, 792 least mean square 737 least squares 179 Lefranc 178 Lego 9 171 Liebovitch 95 ligand-gated ion channels 263 limit cycle 38, 147, 284, 537, 563, 572, 637, 978 limit cycle attractor 109 limited-range correlations 875 linear associative memories 692 linear impulse response function 479 linear response theorem 546 linearization 978 link 186 linking numbers 172, 176, 177, 181,183, 185, 192, 193, 199 lipid bilayer 233 Liouville equation 625 Listing's law 1026, 1028 Little model 526
1052
local circuits 900 local fields 558 local Lyapunov dimension 201 local Lyapunov exponent 193, 194, 201 local minima 583, 740, 757 local operations on dendrites 421 local torsion 177, 180, 181 localization 213 logistic map 165, 194, 195 long-term potentiation (LTP) 115, 778 Lorenz equations 172, 173 Lorenzian 95 loss function 736 low-dimensional chaos 100 Lyapunov dimension 167, 192 Lyapunov exponent 166-169, 185, 187, 192-195, 197, 285, 292 Lyapunov function 562, 572, 585, 628, 629, 636 M6bius strip 196 Mafi+ 175 macroscopic 86 macroscopic laws 643 macroscopic probability densities 632, 638 magnetic field lines 170 magnetoencephalography (MEG) 104, 283 magnification factor 717 manifold 137 map 191, 197 MAP-estimator 862 marginally stable regime 982 Markov 172 Markov chain 559 Markov chain method 706 Markov process 90, 520, 525, 526, 706, 708 Markov transition matrix 173, 187 master equation 70, 560 matrix momentum 756 maximum a posteriori estimator 858 maximum entropy quantizer 718 maximum likelihood 179, 857, 865 maximum likelihood estimator 737, 857 maximum likelihood method 737 maximum-entropy 657 Maxwell 120 Mayr 94 McCulloch 107
Subject Index McCulloch and Pitts 690 MDL 759 mean field approximation 540, 546 mean field methods 924 mean field theory 520, 533 mean firing rate 33, 476, 532 mean first passage time 73 mean switching frequency 29 mechanoreceptors 4 membrane 199 membrane capacitance 778 membrane potential 123, 159, 475, 479, 486, 487, 489, 493, 505, 507, 521,523, 524, 640, 641, 778, 779, 894, 896, 903 membrane voltage 475 memory effect 250 mental activity 119 mesoscopic 86 metastable states 712 metric 746, 845 metric methods 167 metric space 696 Mexican hat 694, 980 mlcroelectrode 234 microscopic 86 microscopic laws 557 mind-brain problem 86 mind-brain theories 119 minimal spanning tree 719 m~xture states 579, 580, 582 ML-estimator 862 mode locking 196 model-independent control 215 moment generating function 658 moments of voltage transients 419 monkey 848 monophasic action potential 252 Morpho-Electrotonic-Transform 423 morphology 162, 164 Morris LeCar model 478 Moss 94, 97 motion energy 1030 motion parallax 1015, 1031, 1034 motor control 1007, 1028 motor cortex 844, 880, 1028 motor map 1024 movement direction 880
Subject Index
movement field 883 movement jerk 1028 multi-compartment 504 multi-compartment model 504, 507 multilayer neural network 733, 735 multilayer perceptron 107, 733 multiplicative 126 multiplicative noise 875 multistability 936 multi-unit recording 871 Murphy 201 mutual information 828, 859 mutual overlap 596, 604 mutual synchronization 38 myelinated axon 103 myelination 399 myocardial infarction 209 myocardium 234 Na channels 91 natural gradient 734, 742, 749, 752, 761,763-765 neighborhood 168 neighborhood function 697 neo-cognitron 107 network attractor dynamics 109 network dynamics 105 neural activity 827 neural code 96, 855 neural encoding 855 neural field 1025 neural gas 719 neural networks 193, 827 neurodynamics 104 neuromodulation 114, 390 neuron 199 neuronal adaptation 125 neuronal coding 488, 869 neuronal dynamics 890 neuronal synchrony 901 neurophysics 360 neurotransmitter 774, 775, 807 Newton method 750, 752 Newtonian 120 N-inactivation 93 NMDA 778, 790 nociception 5 noise 3, 87, 126
1053
noise distribution 576 noise-based sensory prosthetics 3, 18 noisy oscillators 286, 287 nonautonomous 184 nonequilibrium statistical mechanical 621 non-Gaussian approximations 656 nonirreducible or nonergodic Markov process 528 nonlinear dynamics 207 nonlinear dynamics control methods 239, 253 nonlinear system 231 nonlocal singularities 195, 196 nonstationary 177, 219, 225 normal sinus rhythm 233, 238, 246 novelty 836 null hypothesis 144 oculomotor system 1005 ODE 708 older adults 3 olfactory bulb 114 olfactory cortex 105 olfactory system 844 "on the path" inhibition 428 online learning 740 ontogeny 789 open-loop 1009, 1032 optic flow 1032, 1035, 1036 optimal coding 871 optimal quantizer 716 optimal tuning 869 optimization 834 optimum gain 700 optimum linear estimator 868 optokinetic 1005 optokinetic reflex 1036 order parameter 555, 577 order parameter equations 589, 607 ordering phase 700, 706 orientation map 971,980 orientation preference 974 orientation preserving 190 orientation tuning 991 Ornstein-Uhlenbeck process 145 oscillations 87, 892, 907, 978 oscillator with inertial nonlinearity 54 oscillatory activity 105
1054
otholiths 1004 overlaps 572, 579, 629, 635, 636, 670 overtraining 760 pacemaker 207, 214 pacemaker neurons 105 Packard 175 parallel 520, 525, 531,532, 538, 544, 558 parallel dynamics 530, 532, 537, 632, 647, 652 parallel dynamics Hopfield model 667 parallel dynamics SK model 680, 681 parallelism 117 paramagnetic phase 531,535 parametric estimation 734 parietal cortex 880, 1018, 1024, 1027 partial dimension 167 partial recall 591 partition function 532, 533, 568 partitions 178 patch clamp technique 90 path integral 421 pathological tremor 282, 298 Patlak 95 pattern formation 795, 983, 987 pattern forming systems 325 pattern recall 655 PCA-type learning rule 692 Peano 699 Peeling method 440 Pei 94 pendulum 179 Penrose 120 perception 114 period 173 period doubling 162, 165 period-doubling bifurcation 231 period one 165 period one orbit 177 periodic orbit transform method 140 periodic orbit 133, 164, 216, 231,249, 170, 172, 178 periodicity 528, 529 peri-spike-event 860 permeability 103, 472 Perron-Frobenius theorem 528, 531 perturbation 179, 198 phase 281,284, 285, 287, 290, 297, 315
Subject Index phase diagram 598, 607, 610, 681,982 phase locking 25,201, 281,284, 285, 295, 298, 305 phase of firing 831 phase patterns 587, 591 phase plane analysis 375, 479 phase slips 47 phase space 135, 163, 178, 187, 193, 976 phase synchronization 281,284, 298, 306 phase transition 571,580, 585, 591,610 physiological SOM 703 pinwheels 995 Pitts 107 plasticity 688, 807, 1009 plateau 760, 761,765 Plumecoq 178 Poincar6 168 Poincar+ map 285, 287, 289 Poincar6 section 135, 164, 173, 174, 178, 195, 197 Poincar6 surface 287 point current source excitation 13 Poisson distribution 816 Poisson model 865 Poisson neuron 796 Poisson process 522, 793, 815 Poissonian distribution 67 polarity 14 Popper 119 population code 834, 856, 862, 874, 878, 880, 881, 1011, 1012, 1025, 1028 population dynamics 976 population encoding 1011, 1031 population of neurons 827 population vector model 867, 878 positron emission tomography (PET) 104 postsynaptic firing 791 postsynaptic neuron 484, 490, 779 postsynaptic potential 484, 486, 558, 623, 775, 776, 806 posture 1032, 1035 posture control 293 potassium channel 473 potential 715 power spectrum 144 precursor phenomona 196 predictability 88 preferred orientation 834 premotor cortex 880
Subject Index
1055
receptive field diameters 995 receptive field updating 1022, 1027 receptor membrane 11 receptor types 4 recruitment 856, 954 recurrent excitation 949 recurrent networks 519, 646 recurrent neural networks 555 reduced dynamical system 166 reduction in dimension 165 reductions of the Hodgkin-Huxely model 379 redundancy 834 re-entrance 682 reentrant arrhythmias 209, 246 reentry 209, 247 refractoriness 486, 926, 928 refractory period 232, 234, 237, 250, 494, 779, 928 regression 740, 750 rejection criterion 178 relative entropy 540, 545 relative rotation rates 177, 178, 181 relaxation equations 159 relaxation oscillations 375 relaxation time 529, 630 reliability 264, 274 REM sleep 786 replica analysis 592, 600 quantization density 717 replica method 533 quantum mechanical 118 replica symmetry 595, 597, 605 quenched average 534, 535 replica theory 556, 622, 671 replica trick 593, 600 R6ssler equations 172, 173 'replicon' fluctuations 599 radial basis functions 733 repolarization 161, 164, 200 radiofrequency ablation 212 representative winner 721 random 87, 96 reset 211,493, 497, 500 random dilution 659 respiratory signal 290, 302, 304 random quantizer 716 respiratory sinus arrhythmia (RSA) 303, 304 randomness 88 response functions 646 rat hippocampal astrocytes 342 response kernels 485 rate 58 restitution relation 237, 241 rate coding 97 retarded self-interaction 659, 671 rate constants 123 retina 1003, 1013 re-afference 1009 retino-centric 1012 realtime 138 retino-centric encoding 1010 recall 573, 590 retino-tectal 688 recall dynamics 573 retinotopic mapping 971 recall time 113 receptive field 860, 867, 869, 883, 971, 1013, 1014 retinotopic organization 989
presynaptic inhibition 405 presynaptic neuron 486, 490 presynaptic spike 501 primary afferent depolarization (definition) 406 primary afferent depolarization (role in presynaptic inhibition) 408 primary consciousness 118 proarrhythmic 212 probability current 565 probability density 624 probability distribution 520 projection 193 projective plane 196 proportional feedback control 246 proportional perturbation feedback 245 proprioceptive information 1004 pseudo-chaotic 116 pseudo-Hamiltonian 577 psycho-physical 117 psychophysical study 3 pure states 579, 589, 609, 611 Purkinje fibers 208 pyramidal cells 101
1056
retrieval 784 return map 164, 165, 197 reversal potential 473, 897 reverse potential 160 RF-LISSOM 704 rhythmicity 208 Riccati-type learning law 692, 703 Riemannian manifold 747 Riemannian metric 742, 743, 747 Robbins-Monro 709 robustness 689 saccade direction 883 saccades 1005, 1024, 1027 saccadic eye movements 881 saddle node bifurcations 190 saddle point 135, 570, 578, 588, 595, 604, 634, 664, 740, 760, 761,765 saddle-node bifurcation 39, 374 saddle-point equations 669, 670, 678 saltatory conduction 399 saturation 573, 600, 646 scaling functions 167 Schmitt trigger 27 Schr6dinger 85 scroll dynamics 185 scroll templates 182, 183 scrolling mechanisms 181 selective attention 118 selective distribution of information 406 self-averaging 777 self-motion 1032, 1035 self-normalization 800 self-organization 105, 689 self-organizing feature maps 107 self-organizing map 695 self-oscillatory systems 283, 289 self-similar 167, 168 self-sustained oscillations 38 semicircular canals 1004 sensitivity to initial conditions 185, 231 sensor fusion 1011 sensorimotor transformations 1003, 1010 sensory input 114 separable attractor networks 626, 627, 635 sequential 544, 558
Subject Index
sequential dynamics 520, 526, 529, 531,623, 648, 650 serotonin 106 Shaker 93 Shannon information 870 Sherrington-Kirkpatrick model 534 short-term depression 806-808 short-term facilitation 806, 808, 810 shuffled surrogates 143 slgmoid 125 sigmoidal function 734 sign convention 172 signal to noise ratio 176, 839 s~gnal transmission 488 simple cells 974 simulation results 630, 674 simulations 114 single condition activity map 990 single-site correlation and response functions 670 singularities 991 singular line 171 singular point 171 singular value decomposition 175 singularities of mappings 194 singularity 171, 193, 194, 197, 198, 756, 758, 759 smoatrial node 207, 233 SK model 592 skeleton 188, 198 skin 3 slaved 175 sliding threshold 480, 481 slowing down 575 Smale 169 Smale horseshoe 169, 181 smooth pursuit eye movements 1005, 1032 sodium channel 473 SOM 695, 701,705, 715, 718, 720 soma 478, 773, 774 somato-dendritic spike 391 somatosensory cortex 1004 somatosensory system 3 sound localization 881 space-time cluster analysis 337 Sparse network 900, 936 spatial complexity 244 spatial gain fields 1018, 1019
Subject Index spatially extended nonlinear node (SENN) model l0 spatio-temporal 104, 238 spatio-temporal chaos 232 spatio-temporal complexity 239, 246 spatio-temporal filters 1030 specific membrane conductance 440 spectral power amplification 33 sphere 196 spike 161, 164, 187, 190, 200, 469, 476, 831,947 spike generation 471,475, 510, 512 spike generation process 521 spike input 483 spike response method 478, 497, 500 spike response model 484, 778 spike train 290, 302 spiking neuron models 469 spiking neurons 791 spin glass 520, 531,536, 544 spin-glass model 107 spin-glass state 600, 608 spiral wave 246, 335 splitting chart 171 splitting point 171, 172, 193 spontaneous activity 96 spurious state 574 squeezing 168, 171, 189, 198, 200 squeezing mechanisms 201 SRM 477, 479, 481-485, 488, 491,779, 806 stability zones 222 stabilograms 294, 295 stable manifold 216 state transitions 110 stationary 133 stationary distribution 528, 531,540 stationary probability 539 stationary probability distribution 527, 532 stationary state 563 statistical asymptotic theory 742 statistical manifold 747 statistical mechanics 555 steepest descent 594, 602, 739 stimulating electrode 14 stimulus orientation 972 stochastic 87 stochastic approximation 706 stochastic control 114
1057
stochastic destruction 200 stochastic gating 94 stochastic gradient descent 713, 715 stochastic Hodgkin-Huxley model 385 stochastic matrices 526 stochastic neural networks 520 stochastic process 706, 855 stochastic resonance 3, 27, 86, 265, 272, 326 stochastic resonance theory 113 stochastic synchronization 26 storage capacity 573, 610, 612, 787 strange attractor 163-166, 168, 174, 180, 182, 183, 187, 188 strange nonchaotic attractors 194 stretch 189 stretch and fold 169 stretching 168, 171, 188, 189, 198, 200 stretching factor 187 stroboscopic technique 304 strong coupling regime 977, 978 strong law of large numbers 814 strong organization 708 strongly contracting 167, 168, 193 structural randomness 110 structure formation 800, 802 subcritical bifurcation 242 subexcitable regime 336 sublattice activities 627, 635 sublattices 635, 641 subthreshold 161 subthreshold oscillations 184, 199 superior colliculus 881, 1025 superior olivary complex 881 surrogate data 310 surrogates 137 symbol sequence 178 symbolic dynamics 178 symmetric 531,532 symmetry braking 991 symmetric connectivity 520 symmetric dilution 676, 680 synapse 101,496, 521,522, 523, 774, 775, 843 synaptic cleft 774 synaptic conductances 897 synaptic coupling 915 synaptic density 843 synaptic efficacy 783, 790, 792, 800, 810, 811
1058
synaptlc efficiency 690 synaptlc fluctuations 101 synapt~c input 505, 975, 983 synapt~c input current 497 synaptlc intergration 412 synaptlc learning 789 synaptlc plasticity 773, 776, 806, 843 synapt~c resources 807 synaptic shunt 423 synaptic strength 779 synaptic symmetry 562 synaptic transmission 774 synaptic weight 790, 791,843 synchrogram 304--307 synchronization 106, 201,247, 843, 856 synchronization between different brain areas 282 synchronization indices 296, 300, 305, 306 synchronization of chaos 26 synchronization transition 286, 307 synchronize 179 synchronized oscillations 96 synchronized state 584, 585 synchronous firing 901 synchronous neuronal activity 892 synchronous oscillatory activity 110 synchronous states 901 synchrony 890, 893, 900, 903, 919, 920, 923, 931, 935, 936, 938, 947, 956, 960 synchrony measure 903 synchrony of bursts 907 synchrony of firing 856, 884 synchrony of spikes 907 synergy 837 tachycardia 209, 246 Takens 175 tangent space 745 tangent vector 745 TAP approximation 541,542, 546 tapering dendrites 420 target waves 62 Taylor expansion 842 temporal 246 temporal code 97, 830 temporal complexity 246 temporal evolution 805 temporal fluctuations 97, 101
Subject Index temporal instabilities 232 tension 186 terminate fibrillation 231,239 thalamus 881 thermal fluctuations 89 thermoreception 5 theta rhythm 105, 831 threshold 457, 459, 470, 475, 476, 479, 480, 481, 484, 488, 492, 505, 510, 779, 977, 982 threshold effect 475, 476 threshold-linear 976 time delay 108, 175 time delay auto-synchronization 252 time-dependent Ornstein-Uhlenbeck process 639 tonotopic map 688 topographic order 698 topological analysis program 174 topological classification 167 topological defect 699 topological entropy 173 topological invariants 177, 178, 180 topological matrix 187 topological methods 168 topological organization 170, 172, 174, 199 topological recurrence method 136 topology 169 topology preservation 705 torque change 1028 torsion 185, 189, 190 torus 177, 196 total postsynaptic potential 486 touch gate 6 training data 736 transduction 5, 11 transition 591 transition matrix 172, 559 transition rates 560 transmembrane voltage 234, 235 transversality 175 traveling waves 984 tree search 720 Tuberous sclerosis 345 tuning curve 834, 985 tuning function 863 twist 186-188, 192, 200 twisting 172 two compartments models 393
Subject Index ultrametric 846 umbilic 197 unbiased estimator 857 unfolding parameters 189 uniform synapses 569, 583 uniqueness theorem 170, 175, 191 unlearning 785 unstable 164, 200 unstable dynamical states 240 unstable fixed point 137 unstable period orbits 133, 164, 216, 231,232, 249 Van der Pol equations 376 Van der Pol oscillator 172, 173, 195 vector quantization 715 vector quantizer 699 ventricular fibrillation 150, 233, 245 ventricular tachycardia 245 vergence 1005, 1015, 1017 vestibular 1035 vestibular organs 1004 vestibulo-ocular reflex 1005, 1036 vibrotactile detection thresholds 7 visual awareness 119 visual cortex 844, 971,973, 980, 984, 1003 visual field 971 visual motion 1031 visual motion detection 1029 visual system 844 voltage 161
1059
voltage clamp 90 voltage dependent conductances on dendrites 429 voltage sensor 92 voltage-gated ion channels 259 voltage-sensitive dyes 234 von der Malsburg 688 Voronoi tessellation 699, 710 wall thickness 160 Wang-Buzs~tki (WB)model 894, 895, 914, 962 weak coupling limit 909, 931,936 weak coupling regime 977, 978 weak organization 708 weight 734 weight vector 696 Wenckebach periodicities 247 Whitney 175, 194, 196 William James 119 Williams 169 Wilson-Cowan model 979 winding number 44 winner 697, 702 winner take all 697, 701 writhe 186, 188, 192, 200 writhing torus 188 YAG laser 181, 184-187, 200 Yamada 179
This Page Intentionally Left Blank