INFORMATION PROCESSING AND LIVING SYSTEMS
SERIES ON ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY
Series Editors: Ying XU (University of Georgia, USA) Limsoon WONG (Institute for Infocomm Research, Singapore) Associate Editors: Ruth Nussinov (NCI, USA) Rolf Apweiler (EBI, UK) Ed Wingender (BioBase, Germany)
See-Kiong Ng (Instfor Infocomm Res, Singapore) Kenta Nakai (Univ of Tokyo, Japan) Mark Ragan (Univ of Queensland, Australia)
Vol. 1: Proceedings of the 3rd Asia-Pacific Bioinformatics Conference Eds: Yi-Ping Phoebe Chen and Limsoon Wong
Vol. 2: Information Processing and Living Systems Eds: Vladimir B. Bajic and Tan Tin Wee
Series on Advances in Bioinformofics and Computational Biology - Volume 2
INFORMATION PROCESSING AND LIVING SYSTEMS ECJiTORS
VkdiiviiR B BAj'ic INSTITUTE foR iNfocoMM RESEARCIH, SiNqApoRE
TAN
Tii\
WEE
NATiONAl UlNliVERSITy o f SiNqApOR, SiNQApORE
_/f9t^
Imperial College Press
Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
INFORMATION PROCESSING AND LIVING SYSTEMS Copyright © 2005 by Imperial College Press All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 1-86094-563-5
Printed in Singapore by B & JO Enterprise
Preface
This is a book on information processing by and information processing for living systems. It is about the Information of Life, of living computers and the human endeavour of computing life. "... No man is an island, entire of itself; every man is a piece of the continent, a part of the main. If a clod be washed away by the sea, Europe is the less, as well as if a promontory were, as well as if a manor of thy friend's or of thine own were: any man's death diminishes me, because I am involved in mankind, and therefore never send to know for whom the bell tolls; it tolls for thee...." Meditation XVII, John Donne (1623) All living systems reproduce after their kind to perpetuate their lineage. The offsprings inherit characteristics from both parents and the basis for this inheritance lies with genetic material, whether it is DNA for most unicellular or higher organisms, or RNA in some viruses. Prom the genetic material inherited from the parents, the progeny must decode the information within the genetic material which then manifests as various traits that we observe amongst the great diversity of organisms in the living world. This involves information processing and information flow at the most fundamental level throughout the course of the organisms development and lifespan. Organisms do not exist isolated, but interact with each other constantly within a complex ecosystem. The relationships between organisms such as those between prey or predator, host and parasite, mating partners, or amongst members of a colony of social insects, etc., are both complex and multi-dimensional. In all cases, there is constant communication and inforv
vi
Information Processing and Living Systems
mation flow at many levels. Therefore it is important for us to appreciate that living systems need to compute and process information. For example, the hottest area in microbiology today is quorum sensing in bacteria. How does a bacterium know how many of its kind is present in its surroundings before it can launch an attack successfully against a host? How does a cell calculate a density gradient and propel its motion towards a food source at the right speed without overshooting the target? How does transcription of DNA know when there is enough messenger RNA, and how does a ribosome know when to stop producing proteins? Where are the feedback loops and how is the regulation tied to the computation or sensing of how much is out there? How do biological systems calculate time - time to live, time to spawn, circadian rhythms and so on? How does the brain process information, count, sense time or store information in memory? How do living systems retain information and transmit it to the next generation? How do living systems share information with others? Thus our book focuses on information processing by life forms and the use of information technology to understand the wonder and fascination of living things. Overview of the Book Today, far too many books have been written about bioinformatics. Most of them are highly mathematical and emphasize the underlying mathematical principles and how they may be applied to biological data; or else, they take a superficial but practical approach towards processing and analyzing biological data, covering the so-called "how-to" approach. Few books come close to dealing with computing challenges and computing phenomena in nature, but none combines these with the complementary bioinformatics topics and useful bioinformatics applications. In this book we attempted to do so and we believe that the benefits to the reader will be multiple, going from a simple-to-grasp presentation, to sophisticated and innovative applications. This book is organised into two major parts, focussing on Biocomputing and Bioinformatics. Both are facets of the information of life - the flow of information in life forms, as well as the use of information technology and computing to understand the mystery of living things. In the first part, the first two chapters present a comprehensive overview of biocomputing. This constitutes the biocomputing part "Living Computers" that attempts to explain natural processing of biological information using physiological models and analogous models in computing.
Preface
vii
The bioinformatics part "Computing Life" - deals with "artificial" processing of biological information as a human endeavour in order to derive new knowledge and insights into life forms and how they function. This part provides overviews of different bioinformatics topics and a glimpse of specific innovative applications for biological discovery as the link and complement to biocomputing. Why Eire we putting these two domains together? Artificial processing of biological information is complementary to natural processing, and by juxtaposing the two, we attempt to enhance our understanding of the natural processing by elucidating and discovering new relevant biological information in a way not commonly done in the literature today. Our thesis is that a better understanding of the natural processing of biological information, as coming from the biocomputing part, helps us improve the way of processing biological information in deriving new knowledge and insights into life forms and how they function. Consequently, readers will be exposed to complementary domains and will be better equipped to grasp ideas of biocomputing and bioinformatics in tandem when dealing with the biological problems of their interest. Overall, this book contains a systematic and comprehensive survey of biocomputing not existing in the current literature, and combines it with overviews of different bioinformatics topics complemented by a number of novel bioinformatics applications that illustrate some of the principles of biocomputing. The book represents a unique source of information on the biological and physiological background against which biological "computing" processes are performed in living systems, including higher cognitive processes. It also shows how some of these computing exemplars in biology have found their way into useful computing applications, many of these applications useful in themselves for dealing with biological information. In particular, we focussed on representative, easy-to-read overviews complemented by a few illustrative applications in dealing with biological information in the bioinformatics realm. The recent phenomenon of genomics, where large amounts of information stored in the genetic code of living organisms have been elucidated, the accompanying wave of proteomics, metabolomics, transcriptomics, systems biology and other newer 'omics, urgently calls for a quantum leap of advance in information processing needed for deciphering the meaning of
viii
Information Processing and Living Systems
all this information in living systems. In dealing with this deluge, we hope that you will enjoy this ecletic combination of thoughts found in this book and take time out from the rush of the Internetised world today, to ponder over the intriguing issues raised in this book. Enjoy! Vladimir B Bajic and Tin Wee Tan September 2004 Singapore
Contents
Preface
v
Chapter 1. A Multi-Disciplinary Survey of Biocomputing: 1. Molecular and Cellular Levels
1
1 2 3 4 5
6
Introduction Lock-Key Paradigm versus Switch-Based Processing Absolute versus Relative Determinism Nested Hierarchy of Biocomputing Dynamics Membrane as a Mesoscopic Substrate 5.1 Localized and Delocalized Potentials in Biomembranes . 5.2 Role of Membrane Fluidity in the Mesoscopic Dynamics 5.3 Electrostatic Interactions as a Molecular Switching Mechanism 5.4 Lateral Mobility of Protons on Membrane Surfaces: the "Pacific Ocean" Effect 5.5 Role and Specificity of Phospholipid Polar Head- Groups 5.6 Effect of Transmembrane Diffusion Potentials and Compartmentalization 5.7 Vesicular Transport, Exocytosis and Synaptic Transmission Shape-Based Molecular Recognition 6.1 Role of Short-Range Non-Covalent Bond Interactions in Molecular Recognition 6.2 Molecular Recognition between Ferredoxin and FNR . . 6.3 Comparison of Plastocyanin and Cytochrome C6 6.4 Molecular Recognition of Transducin and Arrestin . . . ix
2 3 10 13 16 17 24 29 34 36 39 40 42 43 49 51 54
x
Information Processing and Living Systems
6.5 Electronic-Conformational Interactions Intracellular and Intramolecular Dynamics 7.1 Electrostatic Interactions between a Small Molecule and a Macromolecule 7.2 Effect of Phosphorylation 7.3 Concept of Intelligent Materials 7.4 Concept of Calcium-Concentration Microdomain . . . . 7.5 Errors, Gradualism and Evolution 7.6 Protein Folding 8 Stochastic Nature of Neural Events: Controlled Randomness of Macroscopic Dynamics 9 Long-Term Potentiation and Synaptic Plasticity 10 Role of Dendrites in Information Processing 11 Efficiency of Biocomputing 12 General Discussion and Conclusion References
88 100 103 105 110 115
Chapter 2. A Multi-Disciplinary Survey of Biocomputing: 2. Systems and Evolutionary Levels, and Technological Applications
141
7
1 2
3
4
Introduction Background 2.1 Key Conclusions to Part 1 2.2 Element of Non-Equilibrium Thermodynamics 2.3 Element of Cellular Automata 2.4 Element of Nonlinear Dynamic Analysis Biocomputing at the Evolutionary Level 3.1 Is Evolution Deterministic? 3.2 Explanatory Power of Evolution 3.3 Evolution as Problem Solving 3.4 Random Search, Exhaustive Search and Heuristic Search 3.5 Enigma of Homochirality of Biomolecules 3.6 Damage Control and Opportunity Invention 3.7 Analogues and Homologues 3.8 Co-Evolution and Perpetual Novelty 3.9 Punctuated Equilibrium and Cambrian Explosion . . . Cognitive Aspects of Biocomputing 4.1 Models of Creative Problem Solving
60 61 62 64 67 75 77 80
142 147 147 148 149 151 153 153 155 156 157 158 160 163 164 165 166 166
Contents
5
4.1.1 Wallas' Four-Phase Model 4.1.2 Koestler's Bisociation Model 4.1.3 Simonton's Chance-Configuration Model . . . . 4.2 Parallel Processing versus Sequential Processing in Pattern Recognition 4.3 Random Search versus Heuristic Search 4.4 Dogmatism and Self-imposed Constraint 4.5 Retention Phase: The Need of Sequential Verification . 4.6 Picture-Based Reasoning versus Rule-Based Reasoning in Pattern Recognition 4.7 Advantages and Disadvantages of Rule-Based Reasoning 4.8 Contemporary Interpretation of Freud's Concept of the Unconscious and Poincare's Introspective Account . . . 4.9 Interpretation of Hypnagogia and Serendipity 4.10 Gray Scale of Understanding and Interpretation of Intuition and "Aha" Experience 4.11 Pseudo-Parallel Processing 4.12 Need of Conceptualization and Structured Knowledge . 4.13 Koestler's Bisociation versus Medawar's HypotheticoDeduction Scheme 4.14 Behaviorism versus Cognitivism 4.15 Cerebral Lateralization 4.16 Innovation versus Imitation: Gray Scale of Creativity . 4.17 Elements of Anticipation and Notion of Planning Ahead 4.18 Intelligence of Nonhuman Animals: Planning Ahead, Versatility and Language Capability 4.19 Multiple Intelligences: Role of Working Memory . . . . 4.20 Creativity in Music, Art and Literary Works 4.21 Complex and Interacting Factors in the Creative Process: Role of Motivation, Hard Work and Intelligence . . . . 4.22 Education and Training: Present Educational Problem 4.23 Substituted Targets and Goals in Social Engineering . . 4.24 Cognitive Development: Nature versus Nurture 4.25 Is the Crisis in the U.S. Science Education False? . . . 4.26 Simulations of Gestalt Phenomena in Creativity . . . . Consciousness and Free Will 5.1 Consciousness 5.2 Controversy of the Free Will Problem 5.3 Conflict between Free Will and Classical Determinism .
xi
167 167 168 170 175 177 179 181 183 191 205 215 226 229 231 234 236 241 244 248 256 261 274 284 297 300 306 311 325 326 328 330
xii
Information Processing and Living Systems
5.4 5.5 5.6 5.7
6
One-to-One versus One-to-Many Temporal Mapping . . Compatibilists versus Incompatibilists Randomness and Determinism in Microscopic Dynamics Randomness and Determinism in Mesoscopic and Macroscopic Dynamics 5.8 Endogenous Noise 5.9 "Controlled" Randomness in a Hierarchical Biocomputing System 5.10 Impossibility of Proving or Disproving the Existence of Free Will 5.11 Quantum Indeterminacy at the Biological Level . . . . 5.12 Microscopic Reversibility and Physical Determinism . . 5.13 Incompatibility of Microscopic Reversibility and Macroscopic Irreversibility 5.14 Origin of Macroscopic Irreversibility 5.15 Enigmas of Alternativism, Intelligibility and Origination 5.16 Laplace's "Hidden Cause" Argument 5.17 Physical Determinism and Cosmology 5.18 Free Will and Simulations of Consciousness 5.19 Critique of the New-Mysterian View 5.20 Readiness Potential and Subjective Feeling of Volition . Digression on Philosophy and Sociology of Science 6.1 Falsifiability and Non-Uniqueness of Scientific Theories 6.2 Rise of Postmodernism 6.3 Gauch's Analysis 6.4 Fallibility of Falsification 6.5 Science of Conjecture 6.6 Role of Subjectivity in Creative Problem Solving and Value Judgement 6.7 Critiques of Science Fundamentalism and Postmodernism 6.8 Level of Confidence in Scientific Knowledge 6.9 Sociological Aspects of Science 6.10 Logical Inconsistencies of Antirealism 6.11 Objective Knowledge: Popper's Third World 6.12 Method of Implicit Falsification: Is Psychoanalysis Unscientific? 6.13 Life Itself: Epistemological Considerations
332 335 338 341 343 354 355 356 358 363 376 386 394 397 399 403 413 419 419 423 424 425 427 435 442 446 447 448 449 449 451
Contents
6.14 Unity of Knowledge or Great Divide: The Case of Harris versus Edwards 7 Technological Applications 7.1 Expert Systems in Artificial Intelligence 7.2 Neural Network Computing 7.3 Animat Path to Artifical Intelligence 7.4 Agent Technology 7.5 Neuromolecular Brain Model: Multi-Level Neural Network 7.6 Embryonics: Evolvable Hardware 7.7 A Successful Example of Molecular Computing: Solving the Direct Hamiltonian Path Problem 7.8 Prospect of Molecular Electronics in Biocomputing . . 8 General Discussion and Conclusion References
Chapter 3. Models for Complex Eukaryotic Regulatory DNA Sequences 1 2
Introduction Some Biology of Transcription Regulation 2.1 The Basal Transcription Machinery 2.2 Chromatin Structure in Regulatory Regions 2.3 Specific Gene Regulation: Sequence Elements and Transcription Factors 3 Core Promoter Recognition 3.1 Ab Initio Prediction 3.2 Alignment Approaches 4 Prediction of Regulatory Regions by Cross-Species Conservation 5 Searching for Motif Clusters 6 Perspective References
Chapter 4. An Algorithm for Ab Initio DNA Motif Detection 1 2
Introduction Algorithm
xiii
475 481 481 483 488 489 490 493 493 495 499 532
575 575 576 576 579 581 585 585 592 593 596 599 602
611 611 612
xiv
Information Processing and Living Systems
3 Experiments References
613 614
Chapter 5. Detecting Molecular Evidence of Positive Darwinian Selection
615
1
Introduction 1.1 Molecular Evolution Research in a Time of Genomes . 1.2 Some Examples 1.3 Chapter Overview 2 Types of Adaptive Evolution 2.1 Episodic Positive Selection 2.2 Diversifying Selection — The Biological Arms Races . 3 The Neutral Theory of Molecular Evolution 3.1 Cost of Natural Selection 3.2 Recent Tests of the Neutral Theory 3.3 Detecting Departures from Neutrality 4 Selective Sweeps and Genetic Hitchhiking 4.1 Detecting Selective Sweeps 4.2 Correlation between Local Recombination Rates and Diversity 4.3 Distinguishing Complex Demographic Histories or Background Selection from Positive Selection 5 Codon-Based Methods to Detect Positive Selection 5.1 Counting Methods 5.1.1 Window-Based Methods 5.1.2 Site-Specific Methods 5.2 Probabilistic Methods 5.2.1 Site-Specific Methods 5.2.2 Lineage-Specific Methods 5.2.3 Detecting Selection in Non-Coding Regions . . . 5.3 Comparison of Counting and Probabilistic Approaches to Comparative Methods 5.4 Codon Volatibility 5.5 Codon-Based Methods that use Polymorphism Data . . 6 Discussion and Future Prospects References
615 616 616 617 618 618 618 619 619 620 621 622 623 624 625 626 628 629 629 630 631 634 634 635 636 637 638 639
Contents
Chapter 6. Molecular Phylogenetic Analysis: Understanding Genome Evolution 1 2 3
xv
645
What is Phylogenetics? What is a Phylogenetic Tree? Identifying Duplicate Genes 3.1 Generate Protein Families 3.2 Multiple Sequence Alignments 3.3 Reconstructing Phylogenetic Trees 4 Assessing the Accuracy of Phylogenetic Trees 5 High-Throughput Screening of Tree Topologies 6 Concluding Remarks References
645 646 646 647 647 647 649 649 649 650
Chapter 7. Constructing Biological Networks of Protein-Protein Interactions
653
1 2
Introduction Bioinformatic Approaches 2.1 Homology 2.2 Fusion Events 2.3 Co-Localization 2.4 Co-Evolution 2.5 Literature Mining 3 From Interactions to Networks 3.1 False Negatives 3.2 False Positives 4 Conclusion References
653 654 655 656 657 659 660 664 665 666 668 669
Chapter 8. Computational Modelling of Gene Regulatory Networks
673
1 2 3
Introduction 673 A Novel Approach 674 Modelling Application with Integrated Approach of FirstOrder Differential Equations, State Space Representation and Kalman Filter 676 3.1 Discrete-Time Approximation of First-Order Differential Equations 676
xvi
Information Processing and Living Systems
3.2 3.3 3.4 3.5 3.6
State Space Representation Kalman Filter Using GA for the Selection of Gene Subset for a GRN . GA Design for Gene Subset Selection Procedure of the GA-Based Method for Gene Subset Selection 4 Experiments and Results 4.1 Building a Global GRN of the Whole Gene Set out of the GRNs of Smaller Number of Genes (Putting the Pieces of the Puzzle Together) 5 Conclusions References
677 678 678 679
Chapter 9. Overview of Text-Mining in Life-Sciences
687
679 681 683 684 685
1 2 3
Introduction 687 Overview of Text-Mining 689 Scope and Nature of Text-Mining in Life-Sciences Domain . . 689 3.1 Characteristics of Text-Mining Systems 690 3.2 Systems Aimed at Life-Sciences Applications 690 4 Conclusions 692 References 692 Chapter 10. Integrated Prognostic Profiles: Combining Clinical and Gene Expression Information through Evolving Connectionist Approach 1 2
3 4
Introduction Methods 2.1 Data sets 2.2 Data Integration 2.3 Common Feature Set Selection 2.4 Algorithm of Integrated Feature Selection 2.5 Experimental Design Results 3.1 Classification Accuracy Test and Profile Verification . . Discussion 4.1 Discovering Genotype Phenotype Relationships through Integrated Profiles
695 696 697 698 698 698 699 701 701 702 704 704
Contents
xvii
5 Conclusion References
705 706
Chapter 11. Databases on Gene Regulation
709
1 2
Introduction Brief Overview of Common Databases Presenting General Information on Genes and Proteins 3 Specialized Databases on Transcription Regulation 3.1 TRANSFAC® 3.2 TRANSCompel® — A Database on Composite Regulatory Elements 3.3 SMARt DB — A Database on Scaffold/Matrix Attached Regions 4 Databases on Biomolecular Interactions and Signaling Networks 4.1 Regulatory Networks: General Properties and Peculiarities 4.2 Variety of Databases on Protein Interactions and Signaling Networks 4.3 TRANSPATH® — A Database on Signal Transduction Pathways 5 Application of the Databases for Causal Interpretation of Gene Expression Data 5.1 Analysis of Promoters 5.2 Identification of Key Nodes in Signaling Networks . . . References Chapter 12. On the Search of Better Validation and Statistical Methods in Microarray Data Analysis 1 2 3 4 5 6 7
Introduction Microarray Analysis Steps Preprocessing Normalization Identification of Differentially Expressed Genes Validation Strategies Experimental Validation Methods 7.1 Self-Hybridization or Identical Replicates
709 711 712 713 715 715 716 716 717 719 721 721 723 724
729 729 730 730 732 733 734 736 736
xviii
Information Processing and Living Systems
7.2 Quantitative RT-PCR 7.3 Mutant versus Wild Type 7.4 Gene Spike-In Experiments 7.5 Other Validation Experiments 8 Summary References
736 737 738 739 739 740
Chapter 13. Information Extraction from Dynamic Biological Web Sources
741
1 2 3
Introduction Information Extraction from Dynamic Web Sources Survey of Wrapper Maintenance Systems 3.1 Wrapper Verification Methods 3.1.1 RAPTURE 3.1.2 Forward-Backward Scanning Algorithm 3.2 Wrapper Reinduction Methods 3.2.1 ROADRUNNER 3.2.2 DataProg 3.2.3 Schema-Guided Wrapper Maintenance (SG-WRAM) 3.2.4 Relnduce Algorithm 4 Conclusion References
741 743 745 745 745 745 746 746 747 747 748 748 749
Chapter 14. Computer Aided Design of Signaling Networks 751 1 Introduction 2 Signaling Pathways: A Prickly Proposition 3 Challenges of Signaling Modeling 4 The Goals and Features of Cellware 5 Concluding Remarks References
751 751 754 756 757 759
Chapter 15. Analysis of DNA Sequences: Hunting for Genes
761
1 2
Introduction DNA and Genes 2.1 DNA 2.2 Coding Genes
761 762 762 762
Contents
2.3 The Genetic Code and Proteins 2.4 Structure of Coding Genes 2.5 Complementary DNA (cDNA) 2.6 Non-Coding Genes 3 Genomes 3.1 Computational Analysis of the Genome: Coding Gene Prediction 3.2 Computational Analysis of the Genome: Non-Coding Gene Prediction 4 Closing Remarks References Chapter 16. Biological Databases and Web Services: Metrics for Quality 1 2 3
Introduction Growing Need for Quality Control Metrics for Quality Analysis 3.1 Content 3.2 Availability 3.3 Combining Different Metrics 4 Discussion and Conclusion References
xix
764 764 765 765 766 767 768 769 769 771 771 772 773 773 774 775 776 776
CHAPTER 1 A MULTI-DISCIPLINARY SURVEY OF BIOCOMPUTING: 1. MOLECULAR AND CELLULAR LEVELS*
Felix T. Hong Department of Physiology Wayne State University School of Medicine Detroit, Michigan 48201, USA E-mail:
[email protected] The first part of this two-part survey examines the molecular and the cellular aspects of information processing in biological organisms (biocomputing). Information processing is analyzed in the framework of a multi-level nested hierarchical network pioneered by Conrad. The control laws that govern input-output relationships are examined at four separate levels: protein folding (submolecular level), biochemical reactions in cytoplasm (microscopic level), physico-chemical processes at the membrane surfaces and its interiror (mesocopic level), and the neuronal interactions (macroscopic level). Molecular recognition relies on random searching to increase the probability of encounters at large distances but switches to more deterministic searching at mesoscopic distances of close encounters. The presence of endogenous noise associated with ion channel fluctuation suggests that biocomputing abides by a weaker form of determinism. However, the noise does not become steadily amplified in the subsequent steps because the control law re-converges to a well-defined law at higher levels of biocomputing. Thus, information processing proceeds through various hierarchical levels, the control laws swing alternatingly between nearly random and nearly deterministic but never reach either extremes. It appears that the inherent randomness included in biocomputing processes and the dynamic nature of the network structures are essential for the intelligent behavior of a living organism. * Dedicated to the memory of the late President Detlev W. Bronk of The Rockefeller University
1
2
F. T. Hong
1. Introduction The human brain is often compared to a digital computer. However, this comparison is misleading because the similarity is only superficial. A conventional sequential digital computer is long on speed and accuracy for number-crunching but short on cognitive skills. Attempts to emulate the human brain with a digital computer have led to advances in artificial intelligence research. Research efforts seeking an alternative to digital computing have led to the prospects of using molecular materials to construct functional devices and computing elements: an emerging field known as molecular electronics (see Sec. 7.8 of Chapter 2). This two-part survey examines biocomputing principles from the molecular to the systems level with an intent to facilitate technological applications. Biocomputing encompasses various aspects of information processing such as pattern recognition, process control, learning, problem solving, adaptation to environmental changes, optimization for long-term survival, etc. Thus, almost every biological process is related to biocomputing. Although we shall focus primarily on the neural aspect of biocomputing, some non-neural processes that are not computational in nature, such as bioenergetic conversion, will also be discussed. Chloroplasts and mitochondria, the organelles for bioenergetic conversion, are the power supply for biocomputing processes. Their operation indirectly influences the performance of biocomputing. The function and regulation of genes constitute a very important aspect of the life process (process control). The detailed molecular mechanisms that form the core of modern molecular biology are readily available in the literature and will not be covered in this survey. The immune response which involves some kind of memory — the memory of self — will also not be covered. The "disanalogy" (difference) between information processing in a conventional digital computer and in a living organism is a topic that has attracted considerable attention of investigators who seek to "reverseengineer" the human brain. Conrad75 previously pointed out that a digital computer is material-independent and all internal dynamic processes of the hardware components are suppressed, whereas the internal dynamics is allimportant for the operation of the brain. Particularly relevant to our discussion here is a nested hierarchical macroscopic-microscopic (M-m) scheme of biological information processing enunciated by Conrad's group.73 We shall use the M-m scheme, which he later extended to include a mesoscopic link, as a convenient point of departure in this article.181'182-186 The hormonal
Bicomputing Survey I
3
system links various organ systems together, thus constituting another level of network processing. However, we shall not pursue this level of dynamics here. The detail can be found in a standard physiology textbook. The present article will examine the salient features of biocomputing at the molecular, membrane and intercellular levels by exploring actual examples. Since conclusions in this article are to be drawn mainly by means of induction instead of deduction, it is necessary to present numerous examples. Specific examples, most of which are commonly known, are chosen to: a) illustrate the dynamic network nature of biocomputing with particular emphasis on modularity of the structural organization and the input-output relationship of biocomputing (control laws), b) to emphasize the electrostatic interactions and related switching functions which were often treated ambiguously in the standard biochemistry and computer science literatures, and c) to evaluate biological information processing from the point of view of computer science and computer engineering. The choice of examples is biased by the author's background but in no way implies that other examples are less relevant. In principle, validation of a conclusion by means of induction requires an infinite number of examples, whereas refutation requires only a single well-established counter-example.311 Thus, the limited number of examples included in this article hardly suffices to establish any conclusion. It is my hope that the cited examples will remind the readers of additional examples and/or will set the readers on a path of exploration for additional supporting evidence. Preliminary discussions along the same line of reasoning have appeared in Refs. 180, 181, 182, 186 and 188. General treatises on biocomputing can be found in the literature.76'214'77 The cognitive and evolutionary aspects of biocomputing, as well as its technological applications, is presented in Part 2 of this survey (Chapter 2). Extensive cross-references are made between sections and between the two parts of this survey, and may be ignored upon first reading. 2. Lock-Key Paradigm versus Switch-Based Processing Obviously, the mode of operation in the human nervous system is neither completely digital nor completely analog. Transmission of information in the nervous system is coded in the form of action potentials (nerve impulses), which obey the "all-or-none" principle and are mainly digital in nature (e.g., see Ref. 211 or any standard physiology textbook). However, the intensity of a neural message is coded in frequency (or, rather, the interval of trains of action potentials) and is thus analog in nature. At
4
F. T. Hong
a synapse which bridges the gap between two neurons, signal integration starts with a digital process: neurotransmitters are released in packets of a fixed (quantal) amount and the individual postsynaptic (electrical) responses are therefore somewhat discrete. However, the stochastic nature of neurotransmitter release and its binding to receptors on the postsynaptic membrane bestows noise upon the processes of signal integration at the neuronal membrane, thus making it less than strictly deterministic. As a result, the large number of superimposed responses allows the discrete levels to merge into a virtual continuum. On the other hand, certain degrees of predictability are preserved by virtue of the specificity of transmitter-receptor interactions and the "hard-wired" and highly specific neural network pathways, although the hard-wired patterns are subject to modifications (neural plasticity). The pattern of synaptic inputs, consisting of a large number of discrete switching events, acquires an analog nature collectively. The essence of biomolecular computing in the nervous system depends primarily on the unique neural architecture and the underlying specific and complex interactions between the network elements (Conrad's macroscopic dynamics). Conrad70 was among the first to propose the use of biological information processing as an alternative architectural paradigm for future computers. The motivation stemmed from the apparent weakness of a sequential digital computer in sophisticated decision-making processes such as pattern recognition, learning and other cognitive processes. He pointed out that the predominant mode of biological information processing is not a sequence of simple logical operations based on a finite set of switches but rather a subtle form of information processing involving molecular recognition which he referred to as "shape-based information processing"76'78 and which is essentially the lock-key paradigm enunciated by Emil Fischer in 1897 (e.g., see Ref. 29) (Fig. 1). Conrad pointed out the advantage of parallel processing that is inherent in shape-based information processing. A large variety of macromolecules that are distributed in various types of cells constitute the substrate on which biological information processing is carried out (Conrad's microscopic dynamics). The versatility of biomolecular computing apparently stems from the rich repertoire of biochemical reactions. The specificity of these biochemical reactions derives in part from the molecular conformation ("shape"), as exemplified by processes such as enzyme catalysis, antigen-antibody reactions, neurotransmitter-receptor or hormone-receptor interactions, etc. Conrad79 suggested that, in accordance with the lock-key paradigm, biological information processing is equivalent
Bicomputing Survey I
5
Fig. 1. Lock-key paradigm. A. The schematic shows a ligand (a small molecule that binds to an enzyme or a receptor) and a protein with a complementary binding site. The binding site shows the complementary shape as well as matching ionic charges. When the ligand binds to the protein, the complex is stabilized by the ionic bonds and other short-range non-covalent bond interactions. In our subsequent discussion, the ligand is identified as a "template" while the protein with the binding pocket provides the "pattern." However, the role of pattern and template can be reversed. B. The schematic reveals the linear peptide structure of the proteins. The amino acids that interact with the ligand at the binding site need not be at adjacent sites along the polypeptide chain, as indicated in this schematic of the folded protein. The unfolded polypeptide chain is shown below, with the interacting amino acids (shown in black with ionic charges) which spread over several parts of the polypeptide chain. Thus, the shape (conformation) is crucial for making these sites active. (Reproduced from Ref. 381 with permission; Copyright by McGraw-Hill)
to a free-energy minimization problem. This idea is illustrated by the selfassembly model for pattern recognition (Fig. 2). The pattern of signal inputs causes specific macromolecules to be released from internal depots, to be activated, and to undergo conformational changes. These macromolecules then self-assemble to form a mosaic pattern. This mosaic pattern then determines the output signal. Conrad's self-assembly model emphasizes that a good deal of information processing is achieved by shape-matching of macromolecules: a rudimentary form of parallel distributed processing (PDP). The diversity of the elemental pattern processors is derived from the diversity of macromolecular interactions with different molecular specificity and affinity, and different reaction kinetics. In particular, carbohydrates provide the diversity
6
F. T. Hong
Fig. 2. Self-assembly model for pattern recognition. Input signals cause the macromolecules to undergo conformational changes and to assume mutually complementary shapes. These macromolecules then form a supramolecular complex, in an interlocking way, as a result of free energy minimization. The self-assembled complex subsequently generates an output signal. (Reproduced from Ref. 76 with permission; Copyright by Academic Press)
of recognition sites by conjugating with either lipids or proteins at the cell surface for cell recognition.339 These elemental computational steps are collectively known as signal transduction in the molecular biology literature. Conrad76 pointed out that the variety of elemental computational steps is so huge — and is continuing to be revised (via evolution) — that it defies cataloging in a finite user manual. In contrast, the user manual of a digital computer is finite and reasonably manageable. A digital computer is constructed from a small set of switches known as logic gates (which perform AND, OR, NEGATION, Exclusive OR operations) and flip-flops (which can also be configured as memory devices) ("switch-based information processing"). The complexity arises from the complex interconnections between a large number of these switches. The computational process is strictly deterministic. Only at the input and the output stages does a digital computer resort to analog processing via analog-to-digital and digital-to-analog conversions, respectively. In other words, the introduction of errors in digital computing is generally limited to the input and output stages; additional errors are neither admitted nor amplified by the intervening steps of computation unless there are "bugs" in the software program or in the hardware. Strict determin-
Bicomputing Survey I
7
ism is enforced to achieve the reliability of digital computers. Regardless of the constituent materials, the basic computing elements exhibit only two states, the familiar "0" and "1" in binary numbers. All components are designed to recognize only these two states. The appearance of signals not recognized as either "0" or "1" causes the hardware to malfunction. As Conrad frequently pointed out, the internal dynamics is thus deliberately suppressed. This practice makes concurrent use of digital and analog processing impossible. In contrast, enlisting the rich repertoire of biochemical reactions and adopting a nested hierarchical organization make intermixing of digital and analog processing possible in biocomputing. The lock-key paradigm captures a unique feature of biocomputing. However, additional features are needed to make effective and efficient computing possible. For simple inorganic chemicals or small organic chemicals, the rate of reaction depends on a) encounter of the reactants by random diffusion and collisions in the solution phase, and/or b) intrinsic reaction rates that are characteristic of the participating reactants; both processes are usually temperature-dependent. Unless the products are removed or consumed, a chemical reaction seldom goes to completion and, strictly speaking, never goes to completion no matter how close to completion it becomes. This is because the accumulation of products creates a tendency for the reaction to proceed in the reverse direction (reverse reaction or back reaction). That is, most if not all reactions can proceed in both directions, and the net reaction tends to proceed in the direction that will minimize the net Gibbs free energy (under constant temperature and pressure). For small molecules, a close encounter between reactants by means of random diffusion and collisions is all that is required for the reaction to proceed. For macromolecular reactions, mere close encounters alone are not sufficient; the reactant molecules must also perform a search for the reactive sites in order to achieve a lock-key alignment (Sec. 6). If macromolecular reactions were dependent solely on random searching, then information processing based on the lock-key paradigm would be slow and inefficient even though the efficiency conferred by parallel processing partially compensates for the deficiency. It is hard to imagine that intelligence and decisive actions can arise from random searching alone. Conrad's self-assembly model demonstrates two important aspects of biocomputing that are absent in digital computing: a) transformation of states is implemented by means of chemical reactions in addition to physical changes, and b) molecular recognition confers a continuous shade of specificity, affinity and kinetics. However, the self-assembly model is also
8
F. T. Hong
somewhat misleading because it implies that: a) biocomputing is driven exclusively by the tendency of microscopic molecular dynamics to seek the state with the lowest free energy, b) biocomputing is cost-free, and c) biocomputing includes steps of random searching for matching molecular entities prior to consummation of chemical reactions. The self-assembly model neglects several important features in biocomputing. First, life processes do not always seek the global minimum in the free energy landscape. In other words, achieving a metastable state may be sufficient; the stability is required only during the active phase of biocomputing. Second, biocomputing is not cost-free at each and every stage of biocomputing, since the maintenance of life processes requires continuing input of matter and energy and output of waste and heat. The appearance of being cost-free exhibited by a passive chemical relaxation is contingent upon a pre-formed high-energy intermediate (such as ATP) or a preexisting transmembrane potential-energy gradient that has been built up in prior steps of bioenergetic processes. Third, the process of random searching required by Conrad's free-energy minimization scheme would make biocomputing too slow to cope with life. What the energy minimization scheme did not show is the uncertainty generated by branching biochemical reactions. A given metabolite (biochemical intermediate) often participates in several branching chemical reactions that belong to different metabolic pathways, and "cross-talks" are constant hazards. Apparently, some crucial features are missing in Conrad's self-assembly model. Although free-energy minimization is an important principle underlying many biological processes including protein folding (Sec. 7.6) and passive electrodiffusion across the plasma membranes (Sec. 8), additional energy-consuming processes — top-down processes — are required to "restrain" the inherently random processes of reactions and diffusion in the solution phase of the cytoplasm and other subcellular compartments. Some of these processes are controllable via switch-like mechanisms such as activation of enzymes, i.e., digital processes. Activation of enzymes requires energy consumption in terms of ATP hydrolysis — ATP is the energy currency in the living world. Thus, reactions represented by the free-energy minimization scheme can be viewed as an analog process, whereas switchlike processes can be viewed as a digital process. Biocomputing requires an intricate interplay between digital and analog processes. These additional processes that make biochemical reactions more deterministic often involve short-range interactions between molecules and within the same molecule, i.e., non-covalent bond interactions. Basic short-
Bicomputing Survey I
9
range non-covalent bond interactions include the van der Waals interaction, hydrogen bonding, and the hydrophobic and the electrostatic interactions.87 These forces also underlie the intramolecular processes collectively referred to as conformational changes of proteins (intramolecular information processing). Phenomena common to molecule-based information processing such as cooperativity requires a coherent and concerted interplay of these short-range non-covalent bond interactions for its manifestation (Sec. 7.3). A detailed analysis of short-range non-covalent bond interactions in biological interactions and the methodology of measurements can be found in work by Leckband241 and Israelachvili.197 A decision-making step of digital processing is usually implemented in terms of a branching computation algorithm. A branching process, in which one of several alternative decisions are available, in digital computing is clear-cut and deterministic; there is no ambivalence in decision making. In biocomputing, a branching process is, in general, much less clear-cut. A branching process in microscopic dynamics is often implemented in terms of branching biochemical reactions. Prom the perspective of synthetic organic chemists, branching chemical reactions are often undesirable because they produce a mixture of both desired and unwanted products; the unwanted reactions are called side reactions (see also Sec. 5.6 of Chapter 2). The possibility of generating several different reaction products, in various proportions, from the same reactants creates uncertainty and prevents biocomputing from being strictly deterministic. The nearly universal presence of reverse reactions, mentioned above, further contributes to the uncertainty. However, biochemical reactions are at times made less random by intricate top-down control processes such as enzyme activation and deactivation, and gene activation, to name a few.a These switch-like control processes channel the reactions in certain preferential directions at the expense of others. The alternative pathways can be made negligibly small but cannot be completely shut off. We shall examine in detail the molecular control mechanisms mediated by phosphorylation, an energy-consuming process (Sec. 7.2). As we shall see later, phosphorylation acts like a switch that frequently leads to alterations of interacting non-covalent bond types of molecular forces, either within the protein molecules, between protein subunits, between two different molecules, or even between a protein molecule and the membrane surface. The importance of electrostatic interactions is a
An interesting account of digital processes in genetic control was presented by Hayes.156
10
F. T. Hong
particularly emphasized268 (Sees. 5.3 and 6.2). Such interactions tend to be obscure in biochemical reactions in the solution phase. Often an electrostatically controlled biochemical reaction is more deterministic than a diffusion-controlled reaction.34 Such processes are vital in many membranebased processes (Sec. 5). Membrane-based biochemical processes will be extensively discussed since they are usually neglected in the general discussion of biocomputing. Furthermore, many prototype molecular devices are either membrane-based or thin-film-based. It is therefore instructive to examine these membrane-based processes with unique features that are absent in solution-phase biochemical processes. 3. Absolute versus Relative Determinism It is apparent that biocomputing does not abide by the kind of determinism commonly seen in digital computing. However, it is not a foregone conclusion because deterministic events with unknown causes may sometimes disguise as indeterministic events. In Sec. 5 of Chapter 2, we shall argue that biocomputing complies with a weaker form of determinism, which is neither absolutely deterministic nor pure randomness. Anticipating the forthcoming discussion, we shall consider a "gray scale" of determinism ranging from extremely deterministic to extremely random, in place of a simple dichotomy of determinism and indeterminism. Here, determinism will be considered at the level of control laws. A control law in biocomputing is defined as a specific rule or set of rules that "maps" (transform) a set of input variables to output variables, i.e., input-output relationship. If the input variables are represented by a statevector (a set of numbers representing the input condition), a control law is a mathematical operator that maps this input state-vector to the output state-vector. The output state-vector can then become the input statevector for the next computing step. Given a precisely known single-valued input signal (or a set of input signals), a strictly deterministic control law dictates a sharply defined single-valued output signal (or a set of sharply defined output signals). The best known example of strictly deterministic control laws is Newton's equation of motion. Given a precisely known set of variables representing the position and momentum of a particle at a given moment, the precise position and momentum at a subsequent time can be calculated and are uniquely determined. Newton's equation of motion predicts a single sharply denned set of values for position and momentum. Repeated experimental measurements or observations usually yield slightly
Bicomputing Survey I
11
different values that tend to cluster together with the highest frequency of occurrence at the center. The values are thus characterized by a mean value and a standard error (or deviation or variance). Typically, the frequency versus output relationship is a Gaussian curve. In handling the spread of measured output values, there are, in principle, two options: a) attributing the spread to errors in inputs or outputs (boundary conditions), and/or b) attributing the spread to the control law. In Newtonian mechanics, the control law is presumed to be deterministic. The deviations of an observed output value from that predicted by Newton's law of motion are usually attributed to a) observational errors of the output value due to the limitations of the instrumental resolution, the observer's skill, etc., b) uncertainty of the input value, and c) perturbations from external contingencies (e.g., perturbations from other planets in the calculation of the eclipse of the Sun or the Moon as a three-body problem of the Sun, the Moon and the Earth). The above reference to Newton's classical mechanics, as an example of deterministic control laws, is obviously naive and oversimplified. Earman discussed several examples of deviations of Newtonian mechanics from strict determinism in his book A Primer of Determinism.97 A more rigorous discussion of the topic will be deferred to Sec. 5 of Chapter 2. In social sciences, part of the spread of measured values from the central mean may be attributed to the control law itself. The notion of "individual variations" implies that control laws in social sciences are usually not deterministic, and the variations are partly associated with imprecise control laws governing human affairs rather than measurement errors. In biocomputing, it is legitimate to attached the spread of measured values to a control law so that the control law maps a single sharply denned input parameter into a range of output values characterized by a mean and a deviation. This practice appears to be an operational choice simply for convenience, and is justified by the fact that, noise, regardless of its source, affects the outcome of biocomputing. Besides, bioorganisms were "designed" to operate in a noisy environment anyway. If, however, the variations of measured values are not caused by external contingencies or by measurement errors and/or uncertainties of inputs, the variations must then be attributed to the control law itself, and the practice is no longer an operational choice. It then becomes a serious epistemological problem. Of course, the variations can always be attributed to hidden variables and thus indeterminacy of the control law can always be denied or avoided. The distinction between the two epistemological choices is subtle, and will discussed in detail in Sec. 5.16 of
12
F. T. Hong
Chapter 2. In the present article, we shall replace the term "error" with neutral terms such as "dispersion" or "variance" to designate the spread of measured data from the central mean value. If the dispersion is zero, the control law is said to be absolutely (or strictly) deterministic, and the frequency versus output curve yields a delta-function in the mathematical sense, namely, a curve with infinite height (amplitude) and infinitesimal width but with a finite area under the curve. For reasons presented in Sec. 2, biocomputing seldom yields a sharply defined output value, as is commonly seen in measurements performed in physics. The values may be well denned but the dispersions are far from zero. Thus, the control law is relatively deterministic if part of the dispersion can be attributed to the control law alone. This determinism is referred to as relative determinism, whereas absolute determinism will be treated as synonymous with strict determinism. Some biocomputing processes do not yield single-valued outputs. Instead, their governing control law dictates a probability density function of the output values which do not cluster around a central mean but scatter "erratically" over a considerable range (Sec. 8). Thus, relative determinism includes both regular control laws (with a well-defined mean and a non-zero dispersion) and probabilistic control laws (without a "meaningful" mean). It is important to realize that relative determinism does not necessarily mean weak causality in biocomputing since the control law can occupy any position on the gray scale of determinism. Thus, on a gray scale from 0 to 1 (with 0 equal to complete randomness and 1 equal to absolute determinism), relative determinism assumes a value less than 1 but greater than 0. Deviations from strict determinism may be essential in certain biocomputing processes. Physical causality has been analyzed by Yates with respect to brain function.397 In Yates' terminology, absolute determinism and absolute indeterminism occupy the two extremes on the gray scale of determinism. In the above definition of control laws, it was tacitly assumed that biocomputing can be formalized. Rosen323 argued that life processes cannot be formalized. A rigorous discussion will be deferred to Sec. 6.13 of Chapter 2.
Bicomputing Survey I
13
4. Nested Hierarchy of Biocomputing Dynamics Information processing in the nervous system is often described as being massively parallel distributed in nature.325'270 This is largely the consequence of a nested hierarchical organization of networks. The neurons are discrete building blocks of the nervous system. Input and output macroscopic signals link the neural network to the outside world. However, neurons are not merely elements of a vast network of switches. The actions of nerve impulses and hormones on cell membranes may induce additional intracellular signal transduction processes leading to activation of genes, initiation or modulation of the synthesis of specific kinds of proteins, activation of enzymes, and permanent or semi-permanent modifications of the synapses. These processes may lead to long-term and profound cellular changes. Traditionally regarding glia cells as a passive supporting partner of neurons, neuroscientists have been awaken to the prospect of a more critical role of glia cells in thinking and learning.109 Thus, the intracellular molecular dynamics must also be treated as a separate kind of biomolecular computing that can, in turn, affect the signaling at the macroscopic level. Conrad referred to this vertical coupling of computational dynamics as the macroscopic-microscopic (M-m) scheme of molecular computing73 (Fig. 3). This was a somewhat novel but not unprecedented idea. Before Herbert Simon wrote his ground-breaking treatise Administrative behavior in 1945, the hierarchy of authority and the modes of organizational departmentalization were central concepts of organization theory in economics. While most analyses of organization had emphasized "horizontal" specialization — the division of work — as the basic characteristic of organized activity, Simon was the first to consider "vertical" specialization — the division of decision-making duties between operative and supervisory personnel (p. 7 of Ref. 352). A living cell houses a large variety of molecular components and the intracellular molecular processes comprise a myriad of branching biochemical pathways. The complex interactions can be viewed as distributed networks and the computational dynamics involves kinetic interactions of biochemical reactions and diffusion of reactants and products. The complexity of these processes can be visualized in Fig. 4. Conrad and coworkers have carried out a systematic investigation of computer simulations of microscopic biocomputing dynamics. The reactiondiffusion model of neurons implies that the intramolecular dynamics is highly random.225'83 However, the cytoplasm is not merely a bag of water
14
F. T. Hong
Fig. 3. The macroscopic-microscopic scheme of vertical information coupling in biocomputing. Macroscopic information processing is performed by the network of neurons. Microscopic information processing is performed by diffusion and reactions of macromolecules. The two processing systems are linked via chemical signals (second messengers, etc.). (Reproduced from Ref. 76 with permission; Copyright by Academic Press)
containing a random mixture of a large number and variety of biomolecules. Furthermore, intracellular processes are not merely random diffusion and reactions of biomolecules. Indeed, the cytoplasm is highly compartmentalized. Various biochemical processes are segregated by an intricate network of endoplasmic reticulum as well as intracellular organelles, such as mitochondria and chloroplasts. Newly synthesized proteins carry specific segments of signal peptides or signal patches that allow these proteins to be delivered to specific compartments (protein sorting and protein targeting)194'191 (see also Chapters 12 and 13 of Ref. 3 and Chapter 35 of Ref. 364). Lipidation of proteins also plays a dynamic role in targeting.37 Binding of enzymes to
Bicomputing Survey I
15
Fig. 4. The microscopic network of interconnecting biochemical pathways in the cells. About 500 common biochemical reactions are shown, with each chemical species represented by a filled circle. The large circle appearing at the lower center portion (slightly to the left) represents the citric acid cycle. A typical mammalian cell synthesizes 10,000 different proteins. (Reproduced from Ref. 3 with permission; Copyright by Garland Publishing)
an appropriate anchoring protein directs the enzyme to an appropriate intracellular compartment where the action takes place.67 31P-NMR diffusion spectroscopic measurements suggested that, in addition to compartmentalization, other intracellular structure may also influence diffusion.90 There is a host of cytoskeletal components in the cytoplasm (for reviews, see Refs. 118, 161, 162, 233 and 301). Hameroff and coworkers146-147 have conducted a study of the intracellular cytoskeletal system as a biomolecular computing system to model consciousness. Ample evidence indicates that
16
F. T. Hong
some cytoskeletal networks are intimately coupled to the plasma membrane and are involved in the regulation of ion channel distribution and function (membrane-cytoskeleton).150'117 Motor proteins use the microtubule networks as scaffolds to mediate cell motility, to promote cell shape changes, to position membrane-bound organelles and to regulate ion channels.164'333 Experimental evidence indicates that ligand-gated ion channels are linked to cytoskeleton and to appropriate signal transduction pathways through their cytoplasmic domain.342 Cytoskeletal systems are also involved in the process of transporting organelles from one part of the cytoplasm to another part (cytoplasmic cytoskeleton). Thus, in addition to chemical reactions and diffusion, there are somewhat deterministic convective flows of materials in the highly compartmentalized environment. In this context, intracellular information processing can be viewed as network processing. However, the network is distributed rather than lumped, and the network connections are dynamic (time-dependent) rather than fixed ("hard-wired"). 5. Membrane as a Mesoscopic Substrate The processes that link macroscopic signals and microscopic intracellular molecular events are neither macroscopic nor microscopic. These events usually take place in the biomembrane or its immediate vicinity where the spatial scale is too small to be considered macroscopic but too large to be considered microscopic. The events between microscopic and macroscopic are usually referred to as mesoscopic events. The mesoscopic events are intimately related to the plasma membrane and membranes that line intracellular compartments. These membranes have a basic uniform structure: a uniform matrix of a phospholipid bilayer configuration in which membranebound proteins are embedded. The plasma membrane delineates an intracellular environment that is drastically different from the extracellular environment. It is well known that the maintenance of concentration gradients of several biologically important small ions are crucial for nerve signaling in the neural network; metabolic energy is temporarily stored as an electrochemical gradient for ready utilization. This electrochemical energy can be rapidly deployed for nerve signaling via the action of ion channels.165'200 In addition, many neurotransmitter-receptor and hormone-receptor interactions lead to signal transduction events that link the macroscopic dynamics to the intracellular dynamics via G protein and second messenger-mediated
17
Bicomputing Survey I
processes.131'201'293 Conrad76 has singled out the mesoscopic signals carried by second messengers (cyclic AMP, cyclic GMP, Ca 2+ , etc.; see Ref. 364 for general references) as the communication links between the macroscopic dynamics and the microscopic dynamics. He renamed the M-m scheme as the M-m-m scheme (where the italicized m stands for mesoscopic). This line of thinking was based on an experiment in which microinjection of cyclic AMP influenced the firing pattern of a neuron.247 However, second messenger-mediated processes are not the only mesoscopic processes that are important in biocomputing. Furthermore, the plasma membrane is not simply a boundary between two aqueous phases, nor is it an inert supporting matrix to anchor functional proteins for intercellular signaling and transmembrane signal transduction. The biomembrane possesses a structure that exhibits rich internal network dynamics: mesoscopic dynamics. Peculiar phenomena not seen in the solution phase appear in the mesoscopic events and allow unique computational paradigms to be established. It seems appropriate to elevate mesoscopic signaling to a distinct level of computational dynamics, and to reinterpret Conrad's M-m or M-m-m scheme as a full-fledged three-level network computational scheme. Two important membrane properties are particularly relevant to the underlying mesoscopic processes: membrane thickness and membrane fluidity. 5.1. Localized and delocalized potentials in
biomembranes
Biomembranes consist of two oriented molecular layers of phospholipids. The membrane thickness (about 60 A) is of the same order as the dimension of many macromolecules (for general references, see Refs. 125 and 202). The combination of the membrane thickness and the insulating properties of the phospholipids creates a unique environment for electric fields to modulate the mesoscopic dynamics. Since electric field strength inside a membrane is approximately equal to the transmembrane potential divided by the membrane thickness, a small membrane thickness implies that a hefty electric field can appear inside the biomembrane if a modest electric potential is imposed across the membrane. In addition, an externally applied electric field will appear predominantly across the membrane rather than distributed throughout the cytoplasm primarily because the membrane has a considerably higher resistivity than the cytoplasm. This makes the mesoscopic dynamics exquisitely sensitive to the influence of the electromagnetic fields
18
F. T. Hong
present in the environment. In other words, biocomputing is not immune to the interference of electromagnetic fields. Specifically, many experiments reported in recent decades indicated that some intracellular processes, such as gene transcription, are sensitive to extremely low frequency (ELF) electromagnetic fields (about 100 Hz) (e.g., Refs. 40, 41, 183 and 35). Physiologically, there are two types of electric fields in the membrane: delocalized and localized. A delocalized potential across the membrane is caused by the accumulation of excess mobile ions at the two membrane surfaces. The diffusion potential that appears as a consequence of transmembrane ionic gradients and selective ionic permeability is a delocalized potential. Delocalized potentials can also be generated by the pumping of small mobile ions by an electrogenic ion pump. The electric field associated with a delocalized potential vanishes when the membrane ruptures or when the ionic gradients are dissipated by shunting (electric short-circuiting) of the membrane with ionophores. The localized potential is caused by charges that are attached to the membrane surface or to a macromolecule (membrane-bound or watersoluble). A charged membrane surface exhibits a surface potential. In contrast to the delocalized potentials, the electric field originating from a charged membrane surface does not vanish upon rupture or shunting of the membrane. There are two sources of surface charges in a biomembrane: phospholipids and membrane-bound proteins (including glycolipids and glycoproteins). The polar head-groups of membrane phospholipids are ionizable. Some phospholipids carry negatives charges, and others carry positive charges. Still others, such as lecithin, carry zwitterions, which may exhibit zero, net positive or net negative charges depending on the state of ionization. The hydrophilic domains of membrane-bound proteins also contain ionizable groups such as the carboxylate group of aspartic acid or glutamic acid, and the quaternary ammonium groups of arginine or lysine. The ionization state of surface charged groups are sensitive to pH and the ionic strength of the adjacent aqueous phases. The plasma membrane usually possesses net negative surface charges and exhibits a net negative surface potential in the physiological pH range. As one moves away from the membrane surface, the electric potential varies from a negative value at the surface to zero in the bulk phase where the electroneutrality condition prevails (Fig. 5). Recalling that the electric field equals the negative gradient of the potential profile (E = —dV(x)/dx), a steep slope of the potential profile indicates an intense electric field. The
Bicomputing Survey I
19
Fig. 5. Localized electric potential associated with membrane surface charges. A. The schematic shows a charged membrane surface. The aqueous phase in the double layer region is divided into thin layers, 1, 2, 3, 4, 5. In each layer, the net excess charges are shown. Layer 1 sees the full array of fixed positive charges on the membrane surface and has a higher density of excess negative charges than other layers. Layer 2 sees both the fixed positive surface charges as well as the excess negative charges in layer 1. Therefore, layer 2 has a lower excess negative charge density than layer 1 because layer 1 partially screens the surface charges. Layer 5 sees no net charges because the entire surface charges are screened by the combined layers 1 through 4, and can be regarded as the bulk phase. Complete screening (as shown) is possible only if the Debye length is considerably less than the membrane thickness. Otherwise, the amount of net excess negative charges in the double layer is still less than that of fixed positive charges (incomplete screening). The diagram is highly schematic and is not based on an actual computation. Also the number of ions and their distribution are not meant to be realistic. B. The schematic illustrates the incomplete screening in a diffuse double layer when the membrane thickness is comparable to the Debye length. The two surfaces of the membrane are represented by two vertical lines. Only the left surface carries positive surface charges. The right surface is uncharged (assumed theoretical case). The charge density profile, p(x), shows a single peak of positive charges at the left membrane surface, and two peaks of negative charges, at the diffuse double layers of the two aqueous phases (bottom trace). The electric potential profile, *(a;), is also shown (top trace). Note that a positive surface potential exists at the right interface even though the right surface is uncharged. (B. Reproduced from Ref. 172 with permission; Copyright by American Society for Photobiology)
electroneutrality condition does not hold in the vicinity of the membrane surface, a region known as the diffuse electrical double layer in the electrochemistry literature.383 The thickness of the diffuse double layer is indicated by the Debye length. If all of the net excess charges were imagined to be concentrated as a thin sheet of charges at a distance of the Debye length from the membrane surface, the electric potential difference between the membrane surface and the imaginary sheet of charges would be equal to the surface potential. The Debye length, which is a function of the ionic
20
F. T. Hong
strength, is of molecular dimensions in a physiological solution (less than 5 A). For a given charged membrane surface, the shorter the Debye length is, the steeper the potential profile becomes and the more intense the electric field exists within the double layers. The (effective) electric force in this region has a range of action that is longer than that of covalent bond interactions but is shorter than what is dictated by the inverse square law. It therefore plays an important role in mesoscopic events. Since ions in the diffuse double layers are more freely mobile than charges inside the membrane, a charged interface tends to attract ions of opposite polarity (counter-ions) and to repel ions of the same polarity (co-ions). On the other hand, the concentration gradients formed across the double layers cause the counter-ions to diffuse away from the membrane surface and the co-ions to diffuse towards it. The balance between the concentration gradient-driven and electrical gradient-driven diffusions (of counter-ions and co-ions) results in a slight excess of counter-ions at any location within the diffuse double layer. Qualitatively, this is expected from an overall electroneutrality of the region that includes the membrane surface and the diffuse double layer. At a given location, the electric field is determined by the combined effect of the fixed surface charges and the excess mobile counter-ions in the intervening layer (Fig. 5A). In other words, the electric field generated by the fixed surface charges is partially canceled by the excess counter-ions held in the intervening layer. As the distance from the membrane surface increases, the thickness of the intervening layer increases and, therefore, its cancellation effect also increases. This is the reason why, as one moves away from the surface, the electric field declines faster than expected from the inverse square law. Beyond the Debye length, the electricfieldis hardly detectable. This process is called charge screening. It might appear intuitively evident that the excess counter-ions in the double layers neutralize the net surface charges. In reality, however, the neutralization is incomplete in a biomembrane (Fig. 5B). The unneutralized portion of the surface charges can exert an electric field in the membrane interior as well as in the aqueous phase at the opposite side of the membrane, only to be neutralized by the diffuse double layer there. 172 ' 174 ' 187 As shown in Fig. 5B, a substantial surface potential appears at the opposite interface even if that interface possesses no net surface charges. The transmembrane effect of the surface charges is often overlooked in the membrane biophysics literature. Of course, the surface charges are completely neutralized by the combined excess ions accumulated in the two diffuse double layers. From the above analysis, it appears that the electric field generated
Bicomputing Survey I
21
by a charged membrane surface is highly localized and its effects are not extended beyond the two double layer regions. For example, a surface potential of —100 mV would give rise to an electric field inside the membrane that is comparable to what a diffusion potential exerts across the membrane. This same surface potential also gives rise to an electric field in the adjacent double layer region that is at least just as intense as in the membrane phase. This local electric field has the effect of raising the local (surface) concentration of monovalent counter-ions by a factor of e (where e is the base of natural logarithm, 2.71828) for an increase of every 25.6 mV, at room temperature, compared to the bulk concentration. Thus, for a surface potential of —100 mV, the proton concentration at the membrane surface becomes approximately 50 times the value at the remote bulk region, i.e., the local pH is decreased by 1.7 units compared to the bulk pH. The local electric fields generated by a charged membrane surface have a profound effect on the mesoscopic dynamics. The complexity of such interactions can be appreciated by considering the proton transport in a photosynthetic membrane. One of the essential steps is the binding of protons to the photosynthetic reaction center. Such a process can be viewed as a bimolecular reaction between the reaction center complex, which is membrane-bound, and the proton donor, the hydronium ion, H3O"1", in the aqueous phase: (P)m + (H3O+)ag £ (PH+)m + (H2O)aq where the subscripts m and aq designate the membrane and the aqueous phase, respectively. According to the law of mass action, the two factors that affect the chemical equilibrium of this reaction are the proton-binding constant, K, and the aqueous proton concentration (pH). A decrease of pH and an increase of K both favor proton binding, whereas an increase of pH and a decrease of K both favor the reverse reaction, i.e., proton release. Clearly, the aqueous pH to be considered here is the surface (local) pH instead of the bulk pH. Although the bulk pH is generally buffered and remains constant in time, the local (surface) pH is not. As the reaction progresses and advances, and more and more protons are bound to the membrane, there will be an increase in the magnitude of the positive surface potential or a decrease in the magnitude of the negative surface potential. As a result, the local pH will increase accordingly even if the (bulk) aqueous phase is buffered. The surface potential can, in principle, affect the chemical equilibrium by altering the proton binding constant because a change in the surface
22
F. T. Hong
potential also leads to a change in the intramembrane electric field. Superposition of localized and delocalized potentials can lead to dramatic changes of the electric field inside the membrane (see Fig. 11 of Ref. 285). That the intramembrane electric field is capable of affecting the function of a membrane-bound protein is suggested by two experimental observations. The first example is the use of the 515 nm spectral shift of carotenoids inside the thylakoid membrane of chloroplasts as an indicator of the intramembrane electric field of the chloroplast membrane.212'213 The method is based on the electrochromic effect of carotenoids; the photo-induced charge separation generates a sufficiently intense electric field inside the membrane to cause a significant shift of the absorption spectrum of carotenoids. The second example is an observation made by Tsong and coworkers376-375 who demonstrated that an externally applied AC electric field can supply energy for Na+-K+-ATPase to pump Rb+ across the red cell membrane. These investigators formulated an electroconformational coupling theory to explain their observations (cf. Brownian ratchets11'13'12). Essentially, the electric field inside the membrane affects the ion-binding constants at the two membrane surfaces asymmetrically. Therefore, the energy absorbed during the first half cycle of the sinusoid AC field is not fully canceled by the energy released during the second half cycle. The net absorbed energy in a full cycle is thus utilized to pump Rb+ ions. An alternative interpretation has been considered by Blank.39 He claimed that the AC field alters the surface concentration of the ions periodically, thus leading to enhanced one-way transport of the ions (surface compartment model).38 While the controversy awaits future resolution, the problem highlights the importance of mesoscopic dynamics. Potentials associated with fixed charges at the membrane surface and at the exposed portions of ion channels serve to enhance the rate of transport by increasing the local (surface) concentrations of small ions (cf. Blank's surface compartment model). Consider a superfamily of ligand-gated ion channels (LGIC), of which the cation-selective acetylcholine (ACh) receptor and the anion-selective glycine receptor are two well investigated members. 21>304 Based on an electrostatic calculation, Jordan 209 found that a cluster of negatively charged amino acids at the mouth of ACh receptor significantly increases the local Na+ concentration at the extracellular surface, thus resulting in an increase of the passive Na+ influx. Based on site-directed mutagenesis, Imoto et al. 195 found that the rings of negatively charged amino acids determine the ACh receptor channel conductance. This regulatory function has been attributed to the M2 helical segment of LGIC
Bicomputing Survey I
23
proteins: a cluster of negatively and positively charged amino acids on the M2 segment is believed to be responsible for the cation selectivity of the ACh receptor and the anion selectivity of the glycine receptor, respectively ("selectivity filter"; see, for example, Fig. 2 of Ref. 2). Langosch et al.239 incorporated synthetic M2 segments of the glycine receptor into an artificial bilayer lipid membrane (BLM) and found that the charged residues on the segment critically affect the ion selectivity; exchanges of the terminal arginine residues by glutamate resulted in a significant shift towards cation selectivity of the respective channels, as compared to peptide M2. Galzi et al.123 demonstrated that mutations in the M2 segment converted the ion selectivity of the ACh receptor from cationic into anionic. An electrostatics calculation by Adcock et al.2 shows that other protein domains of the ion channels also play an important role in increasing the local concentration of the permeant ion at the mouth of the transmembrane pore, and in "focusing" the electrostatic field due to the pore per se. There is little doubt that electrostatic interactions are the major determinant of ion selectivity of ligand-gated ion channels. One may wonder how ion concentrations in such an ultrathin double layer (with finite amounts of excess cations) is capable of sustaining the enhanced transport rate without being quickly depleted. It turns out that the time required to establish quasi-equilibrium of ion concentrations in the diffuse double layer is governed by the ionic cloud relaxation time constant:
Tr
~ 2D
where L is the Debye length, and D is the diffusion coefficient. For a typical value for the diffusion constant of small ions, 10~5 cm?/s, the relaxation time constant is 0.5 ns in 0.1 M NaCl (Debye length 9.7 A). Therefore, the enriched surface concentrations of a small inorganic cation can never be depleted by the enhanced transport process. As long as its bulk concentration is maintained constant, any local depletion is replenished almost instantly. The above discussion highlights the exquisite sensitivity of the mesoscopic dynamics to electric field effects from within and from outside. Additional examples will be described to indicate that the mesoscopic effects may be responsible for shaping the control law, making an otherwise random process more deterministic once reactive species get into the range of diffuse double layers in a close encounter.
24
F. T. Hong
5.2. Role of membrane fluidity in the mesoscopic
dynamics
Mitochondria and chloroplasts are two important molecular machines (organelles) that are associated with membranes and require membrane fluidity for their function (for general reviews see Refs. 7, 18, 93, 137, 154, 167, 243, 244 and 335). Both organelles produce the energy-rich compound ATP and both rely on long-distance electron transfers for energy conversion. Mitochondria accomplish the energy conversion from nutrients to chemical energy by oxidation (oxidative phosphorylation). Chloroplasts accomplish the energy conversion from sunlight to chemical energy by reduction (photosynthetic phosphorylation). Energy conversion in both organelles leads to the formation of an electrochemical gradient of protons across the membrane. According to the chemiosmotic theory, 278 the dissipation of the proton gradient is utilized for ATP synthesis by ATP synthase, which is a membrane-bound enzyme residing in both the mitochondrial inner membrane and the thylakoid membrane. Thus, the proton gradient is the intermediate state that stores the converted energy temporarily and that links electron transfer to ATP production. In other words, the energy-rich intermediate is not a specific biomolecule but rather a mesoscopic state. This mode of ATP synthesis is referred to as membrane-level phosphorylation. Before these organelles evolved, phosphorylation was achieved exclusively in the solution phase (substrate-level phosphorylation). In the mitochondrial inner membrane, the energy source is supplied in the form of NADH (nicotinamide adenine dinucleotide, reduced form) or FADH2 (flavin adenine dinucleotide, reduced form), both of which are products of biochemical degradation of carbohydrates or fatty acids. The molecular machine in the mitochondrial inner membrane consists of four protein complexes embedded in the membrane. These protein complexes are not rigidly anchored in the inner membrane but are capable of rotational and lateral (translational) diffusion owing to membranefluidity.Electrons are transferred from NADH to Complex I (NADH dehydrogenase), then to Complex III (cytochrome b-C\ complex) and finally to Complex IV (cytochrome oxidase) (Fig. 6A). In transferring electrons from NADH to O2, a substantial fraction of energy is converted into a transmembrane electrochemical gradient of protons. A separate entry point allows FADH2 to feed electrons to Complex II (succinate dehydrogenase), then to Complex III and finally to Complex IV. The complexes are not connected in a fixed chain-like structural relationship but are loosely connected in the twodimensional space of the membrane, i.e., it is a two-dimensional network.
Bicomputing Survey I
25
Fig. 6. Electron transfers in the mitochondrial inner membrane. A. The schematic shows the NADH dehydrogenase complex (Complex I), the cytochrome b-c\ complex (Complex III), and cytochrome oxidase (Complex V). Mobile charge carriers ubiquinone (Q) and cytochrome c (c) are also shown. B. and C. Docking surface of cytochrome c showing lysine groups for electron transfers from the b-ci complex (B) and to cytochrome c (C). The circles are lysine groups on cytochrome c. The full circles are at the front (with the size indicating closeness to the reader), and the dotted circles are behind the molecule. Stippled circles indicate those lysine groups crucial for electron transfer. When these stippled (positively charged) lysine residues are replaced with residues that are either neutral or negatively charged, the electron transfer is inhibited. These critical lysine groups are protected from acetylation when cytochrome c is docked with the matching complex. (A. Reproduced from Ref. 3 with permission; Copyright by Garland Publishing. B. and C. Reproduced from Ref. 358 with permission; Copyright by Journal of Biological Chemistry)
The coupling of these complexes is accomplished with the aid of ubiquinone (connecting Complex I to III) and cytochrome c (connecting Complex III to Complex IV), which thus serve to delineate the two-dimensional network via their dynamic connections.
26
F. T. Hong
A question arises as to whether cytochrome c links Complex III to Complex IV like an electron "wire" or an electron "shuttle." This question was settled by an experiment in which individual lysine residues on the surface domain of cytochrome c were replaced with neutral residues or negatively charged residues.358'56 The replacement of some lysine residues (open circles in Fig. 6B) has no effect but the replacement of others (shaded circles in Fig. 6B) inhibits electron transfer from the cytochrome b-ci complex to cytochrome c. Similarly, replacement of some lysine residues also inhibits electron transfer from cytochrome c to cytochrome oxidase (shown in Fig. 6C). Apparently, common lysine residues are used during "docking" of cytochrome c either to the cytochrome b-c\ complex or to cytochrome oxidase. As is evident from the comparison of Fig. 6B and Fig. 6C, eight of the lysine residues are common to both processes. Furthermore, several of these lysine groups were shown to be protected from acetylation when cytochrome c was bound either to the cytochrome b-c\ complex or to cytochrome oxidase. Apparently, cytochrome c uses the same surface domain to dock either with the cytochrome b-c\ complex or with cytochrome oxidase. A simple explanation that is consistent with both the replacement and the protection experiments is that cytochrome c acts as an electron "shuttle" rather than as an electron "wire," i.e., the electron path is not "hardwired." These positive charged amino acid residues are strategically located on the surface domain so that the residues match the complementary negative charges on the cytochrome b-c\ complex as well as those on cytochrome oxidase (cf. lock-key paradigm, Sec. 2). The matching charge pairs form the basis of a type of electrostatic interaction known as salt-bridges or ionic bonds. Long-distance electron transfer does not necessarily take place upon a collision of these redox partners; a correct mutual orientation is required in addition to sufficient proximity. Since shape-based molecular recognition seldom allows for a perfect match, matching salt-bridges ensures proper molecular alignment and sufficiently long dwell-times of molecular encounters to allow for long-distance electron transfer to consummate (see Sec. 6.1 for detail). Similar electron shuttles also appear in the photosynthetic apparatus. In cyanobacteria, green algae and green plants, there are two separate photosynthetic reaction centers known as Photosystem I and Photosystem II (Fig. 7). Their three-dimensional structures have been determined at 4 A and 8 A resolution, respectively.232'334'318'299'317 Although these two reaction centers are situated side by side — in parallel — in the thylakoid membrane of the chloroplast, their functional connection is actually in series.
Bicomputing Survey I
27
Fig. 7. Schematic diagram showing the two photosystems of a green plant. The segments of the thylakoid membrane are shown to be juxta-positioned in the appressed region or widely separated from each other at the non-appressed region. The Photosystem I (PS1) together with its antenna complex (LHC-1, light-harvesting complex I) are shown in the non-appressed region, whereas the Photosystem II (PS2) and its antenna complex (LHC-2) are shown in the appressed regions. Plastoquinone (PQ) is the mobile membrane-bound charge carrier that shuttles electrons between PS2 and the cytochrome be-f complex, and shuttles protons from the stromal space (at the top) to the thylakoid space (at the bottom). Ferredoxin (Fd) and plastocyanin (PC) are mobile aqueous electron carriers. The ATP synthase (with two subunits CFo and CFi) are shown in the non-appressed region. Photophosphorylation of LHC-2 leads to the state 1-state 2 transition. (Reproduced from Ref. 18 with permission; Copyright by Blackwell Science)
As shown in Fig. 7, electrons are ejected from excited chlorophyll P680, the reaction center of Photosystem II, towards the stromal space. The "hole" (photooxidized chlorophyll) is then refilled with an electron from water, and an oxygen molecule is concurrently released when four electrons are transferred sequentially. The electrons reaching the stromal side are then passed on to plastoquinone, which is a lipid-soluble mobile carrier similar to ubiquinone in the mitochondrial inner membrane.276 Plastoquinone then carries two electrons together with two protons picked up at the stromal side of the aqueous phase, and shuttles the electrons to the cytochrome 66-/ complex. A similar activity in Photosystem I transports electrons to the stromal side and leaves behind a "hole" in oxidized chlorophyll P700, the reaction center of Photosystem I. An aqueous-borne mobile carrier, plastocyanin (a copper containing redox molecule), then carries electrons from the b^-f complex to the oxidizing end of Photosystem I, and thus refills the "hole." The ejected electron from Photosystem I is then transported across the thylakoid membrane and is eventually picked up by ferredoxin (Fd) at the
28
F. T. Hong
stromal side. Fd is another aqueous mobile carrier that feeds the electron to an enzyme ferredoxin:NADP+ reductase (FNR, ferredoxin:nicotinamide adenine dinucleotide phosphate reductase), thus converting NADP+ into NADPH: 2 Fdred + 2 e~ + NADP+ + 2 H+ «=t 2 Fdox + NaDPH + H+ where NADP+ and NADPH are the oxidized and the reduced form of nicotinamide adenine dinucleotide phosphate, respectively, and Fdred and Fdox stand for the fully reduced and the fully oxidized form of ferredoxin. The two transferred electrons are provided by two molecules of Fdred, and are fed, one at a time to FNR. FNR is a two-electron transfer protein, with a partially reduced/partially oxidized semiquinone state, just like other quinones. NADPH is a reducing equivalent that is used in many biosynthetic processes including the fixation of CO2 and the synthesis of carbohydrates, and thus can be construed as an intermediate energy source. Ferredoxin is a highly negatively charged (acidic) molecule (net charge —14; not shown in the above equation) that contains a single [2Fe-2S] (iron-sulfur) center. In contrast, FNR is a basic (positively charged) protein that contains a non-covalently bonded flavine adenine dinucleotide (FAD) molecule as the cofactor for electron transfers. As expected, the opposite charges on FNR and Fd are involved in docking via electrostatic interactions112 (also see Ref. 282 for a brief summary). The scheme in Fig. 7 implies a 1:1 coupling between the two photosystems. In the actual three-dimensional space, the two systems form a two-dimensional network in the plane of thylakoid membranes and, again owing to membrane fluidity, can be decoupled and operate independently depending on the lighting conditions. This regulatory process, known as the State 1-State 2 transition,286'^7 is implemented by means of the action of a specific membrane-bound protein known as the light-harvesting chlorophyll protein complex (LHC-2). LHC-2 serves primarily as the antenna complex that funnels absorbed photon energy to the Photosystem II reaction center by means of resonant energy transfer. Its second function is the regulation of the State 1-State 2 transition. When the cytosolic side (stromal side) of LHC-2 is phosphorylated, LHC-2 and Photosystem II move laterally, thus becoming decoupled from Photosystem I. These lateral movements are caused by electrostatic interactions.237'19 Maintenance of fluidity of the thylakoid membrane is crucial for the State 1-State 2 transition. Interestingly, there is lateral heterogeneity of membrane fluidity. Photosystem II is
Bicomputing Survey I
29
usually located in the appressed region, which is less fluid, whereas Photosystem I is usually located in the non-appressed region, which is more fluid (see Fig. 7). Examples discussed above serve to illustrate the role of the twodimensional mesoscopic dynamics in process control. Furthermore, the network structure is only partially hard-wired through formation of supramolecular complexes; electron shuttles serve as the loose connections. The hard-wired part provides a top-down constraint, whereas the loose connections make possible a bottom-up process of exploration. This is a recurrent theme that appears in several levels of the nested hierarchical organization of biocomputing (see Sees. 7.6 and 11 of this Chapter and Sees. 3.4, 4.8 and 4.17 of Chapter 2). Membrane fluidity, together with the State 1-State 2 transition, makes a dynamic network connection possible, thus allowing for dynamic allocation of computing resources. It is evident that the mesoscopic dynamics is neither highly deterministic nor highly random. This observation further justifies the adoption of a gray scale of determinism. Superficially, bioenergetics is not biocomputing. However, the supply of energy sources such as ATP critically affects neural signaling processes. How deterministic these bioenergetic processes are certainly influence how deterministic biocomputing is with respect to the gray scale. In contrast, fluctuations of the power supply to a digital computer either has no effect whatsoever on computing (when the fluctuations stay within the tolerable range), or causes the computer to crash (when the fluctuations exceed the tolerable range). 5.3. Electrostatic interactions as a molecular switching mechanism In addition to the formation of ionic bonds, there is yet another electrostatic interaction between small charged molecules or ions, on the one hand, and charged membrane surfaces, on the other hand. Rapid redistribution of small ions in the aqueous phase guarantees a concentration jump of small ions near the membrane surface if the surface charges are abruptly generated or abolished. This feature makes the electrostatic interaction an ideal molecular switching mechanism.175'176'184 The onset of its action is on the nanosecond time scale, which is considerably faster than ionic diffusion in the aqueous phase. The relatively short range of action ensures that the switching process occurs only at a restricted region where the stimu-
30
F. T. Hong
lus is applied, thus minimizing "cross-talks" and preserving the network structure. That the action of photopotential is short-ranged and highly localized has been demonstrated by Sokolov et al.359 in an artificial planar bilayer lipid membrane. These investigators sensitized one of the two membrane surfaces with aluminum phthalocyanine tetrasulfonate. Illumination of the sensitizer caused a localized damage (photodynamic action) at the half lipid-leaflet facing the sensitizer. The following example illustrates the principle of photo-gated and electrostatically controlled switching process. Visual phototransduction is a major molecular switching event during which the stimulating light energy is amplified 100,000 fold. The biochemical process is known as the cyclic GMP (cGMP) cascade: the main biochemical amplification mechanism of visual transduction.362'363'248'190'152'151 Illumination converts the visual pigment rhodopsin into metarhodopsin II (Mil), which is the photoactivated form of rhodopsin that triggers the cGMP cascade. The process starts with binding of inactive transducin (Gt) — a GDP-binding protein or, simply, G protein — to Mil, resulting in the activation of Gt, which, in turn, activates phosphodiesterase, a peripheral protein at the cytosolic surface. Activated phosphodiesterase then hydrolyzes cGMP. The role of cGMP is to keep Na+ channels of the photoreceptor plasma membrane open in the dark. The diminished cGMP concentration following its hydrolysis allows Na+ channels to close, thus constituting the excitation of the photoreceptor. Accompanying the formation of Mil is the appearance of a photoelectric signal known as the R2 component of the early receptor potential (ERP), which was discovered by Brown and Murakami four decades ago.47 Unlike most bioelectric signals, this signal has a submicrosecond rise-time. The conventional wisdom has attributed the ERP to intramolecular charge displacements caused by the conformational change when metarhodopsin II is formed from its precursor, metarhodopsin I (MI), by virtue of an acid-base reaction: Metarhodopsin I + H+ «=± Metarhodopsin II By analyzing the interfacial photoelectric process in terms of the GouyChapman theory of membrane electrostatics, Hong172'173 demonstrated that intramolecular charge displacement is not the only possibility; an interfacial proton binding could, in principle, give rise to a fast photoelectric signal that possesses all the major characteristics of the ERP. Hong also suggested that the appearance of such a fast photosignal is accompanied by a positive-going surface potential change, which may serve as the molecular
Bicomputing Survey I
31
switch that triggers visual transduction. Subsequently, this surface potential was experimentally demonstrated by Cafiso and Hubbell. 52 As is evident in the above reaction scheme, the formation of metarhodopsin II is accompanied by binding of a proton. Schleicher and Hofmann331 attributed this proton uptake to the process of transducin binding to MIL However, by directly measuring the pH of the aqueous phase in a reconstituted system of photoreceptor disk membrane vesicles during rhodopsin photolysis, Ostrovsky and coworkers343'305 demonstrated that such a proton binds to the cytoplasmic surface of rhodopsin (Mil) without the presence of transducin. That is, the generation of Mil alone alkalinizes the cytoplasmic interface, and causes the light-induced surface potential change. By means of FTIR (Fourier transform infrared) spectroscopy, Fahmy105 suggested that the protonation site is a carboxylate at the rhodopsin cytoplasmic surface, thus independently confirming the earlier observation of Ostrovsky and coworkers. In principle, the positive-going surface potential appears in the right place at the right time to initiate visual phototransduction. Hong176 further theorized that the termination of visual transduction may be hastened by a negative-going surface potential at the cytoplasmic side. The deactivation of visual transduction involves mainly the binding of arrestin to MIL Subsequent to the activation of Gt, Mil is phosphorylated by rhodopsin kinase at its C-terminal tail, which faces the cytosol. Nine groups of serine and threonine residues at the C-terminal tail are phosphorylated at the stage of deactivation.390 Thus, a negative-going surface potential appears at the cytoplasmic side where the cGMP cascade takes place. The idea that this negative-going surface potential may be involved in resetting the photoreceptor was shared by Liebman and coworkers,128 who did extensive experimental work to support this view. Until recently, surface potentials have not been seriously considered in elucidation of visual transduction. This might have been due to the fact that most experimental investigations of visual transduction were carried out in the solution phase. Therefore, such a switching mechanism might have been ignored for the following reason. Some investigators presume that all membrane potentials vanish upon rupture of the membrane (personal observation). Since the switching action did take place in a broken membrane preparation, a possible electrostatic mechanism was not suspected to be relevant. However, that membrane potentials cannot exist in a ruptured membrane is an unwarranted generalization. As a matter of fact, localized potentials (such as surface potentials) persisted in a ruptured membrane;
32
F. T. Hong
only delocalized potentials vanish (see Sec. 5.1 and Ref. 187). A detailed discussion mainly based on the analysis of Liebman and coworkers 129 will be given in Sec. 6.4, after a general discussion of molecular recognition. Here, we shall demonstrate that a surface-potential based switching mechanism is physically realistic and mechanistically competent by describing a biomimetic system reported by Drain et al.96 More recently, Rokitskaya et al.321 used the aforementioned phthalocyanine sensitizer to inactivate gramicidin ion channels in a planar bilayer lipid membrane. Drain and coworkers studied the transport of hydrophobic ions, tetraphenyl borate and tetraphenyl phosphonium, in an artificial bilayer lipid membrane system (Fig. 8). These organic ions have larger diameters, and, therefore, smaller Born charging energy than small ions.240 As a consequence, large organic ions partition more favorably into the membrane phase than small inorganic ions. These organic ions are often called hydrophobic ions. An externally applied potential across the membrane causes these hydrophobic ions to drift down the electric potential gradient, and the event can be detected as a transmembrane current. Drain and coworkers incorporated a lipid-soluble pigment, magnesium octaethylporphyrin, into the bilayer lipid membrane and enriched the two aqueous phases with equal concentrations of an electron acceptor, methyl viologen. Illumination with a brief pulse of laser light caused electrons to be transferred from the photoexcited magnesium octaethylporphyrin to the aqueous-borne methyl viologen. As a result, a positive surface potential appeared at each of the two membrane-water interfaces upon illumination (Fig. 8A). Since the electron acceptor methyl viologen was present in equal concentrations on both sides of the membrane, nearly equal numbers of electrons moved from the two membrane surfaces into the two adjacent aqueous phases but in opposite directions, thus generating two equal but opposing photoelectric currents which cancel each other. Therefore, no photovoltaic effect was detected externally. However, the light-induced positive surface potential caused a concentration jump of the negatively charged tetraphenyl borate ion. This effect was detected as a sudden increase of the membrane current carried by tetraphenyl borate ions when a transmembrane potential was applied (Fig. 8B). If, however, the tetraphenyl borate ion was replaced with the positively charged tetraphenylphosphonium ion, a sudden decrease of the membrane current was detected, as expected from a decrease of its surface concentration (Fig. 8C). The switching event demonstrated here is reminiscent of a field-effect transistor. The above examples demonstrate that an electrostatic interaction, if
Bicomputing Survey I
33
Fig. 8. Experimental prototype showing switching of ionic currents, based on lightinduced surface potentials. A. An artificial bilayer lipid membrane contained lipid-soluble Mg octaethylporphyrin (3.6 mM). The aqueous solution contained the electron acceptor methyl viologen (A~), in equal concentrations on both sides (20 mM). Tetraphenyl borate ions, B~, were partitioned into the membrane near the region of polar head-groups. Photoactivation of Mg octaethylporphyrin formed P+, thus generating two symmetrical positive surface potentials. The surface potentials increased the surface concentration of B~, but decreased the surface concentration of tetraphenylphosphonium ions (not shown). B. The ionic current, carried by 1 fJ,M tetraphenyl borate ions, increased 50% upon illumination with a laser pulse (1 us; indicated by the arrow marked with hv). C. The ionic current, carried by 5 mM of tetraphenylphosphonium ions, decreased 25% upon illumination. In both B and C, the spikes preceding the illumination were capacitative transients caused by the application of a transmembrane potential of +50 mV. The steady-state dark current /o was generated by the externally imposed potential. The additional current (the second spike) was generated by light. (Reproduced from Ref. 96 with permission; Copyright by National Academy Sciences)
it appears or vanishes rapidly, may serve as an ideal switch-based control mechanism. Of course, the above-demonstrated effect is not the only way that an electrostatic interaction can exert a switching action. For example, protonation of rhodopsin may cause local rearrangement of non-covalent bonds, thus leading to a conformational change and exposing critical transducin binding groups (see Sec. 6.4 for detailed discussion).
34
F. T. Hong
5.4. Lateral mobility of protons on membrane surfaces: the "Pacific Ocean" effect There is a long-standing controversy regarding the bioenergetic process of membrane-level phosphorylation. As described briefly in Sec. 5.2, membrane-level phosphorylation relies on the conversion of externally supplied energy into a transmembrane proton gradient via long-distance electron transfers. The coupling of the proton gradient to ATP production requires lateral movement of protons from the electron transfer site to the phosphorylation site. Yet, the chemiosmotic theory implies that the photon energy conversion process allows protons to be dumped into the vast extracellular space. Thus, the major fraction of transported protons is dispersed to the bulk solution phase of the extracellular space. There are two conceivable effects. First, the transmembrane proton gradient will be diminished by the buffering capacity of the solution phase. Second, the remaining "free" protons diffuse and disperse throughout the bulk phase. Although the dispersed protons are not forever lost, they must be retrieved by reversing the buffering reaction and by diffusion from the bulk phase back to the double layer region, before the protons can be used for ATP synthesis. Both processes cause some free energy to be dissipated as heat, thus resulting in an unacceptable loss of converted free energy. This objection against the so-called delocalized chemiosmotic theory is referred to as the "Pacific Ocean" effect, and has been extensively debated.394 Partial alleviation of the problem is accomplished by the convoluted extracellular space between the plasma membrane and the cell wall (known as the periplasmic space in photosynthetic bacteria) and between the outer and inner membrane of a mitochondrion ("San Francisco Bay" effect?). The small (intra)thylakoid space in a chloroplast corresponds to the extracellular space because thylakoid membranes are inverted membranes (insideout orientation). The extracellular space of a mitochondrion is the small space between the mitochondrial outer and inner membranes (intermembrane space; Fig. 6A). Thus, the "Pacific Ocean" effect may not be as dramatic as suggested by the opponents of the (delocalized) chemiosmotic theory; the space into which protons are pumped is smaller than implied by the opponents. Still, the coupling process seems rather inefficient and unnecessarily awkward. The rival "localized" chemiosmotic theory392'393'395 requires that protons be preferentially channeled to the phosphorylation site either through an internal membrane route or through a pathway provided by a proton
Bicomputing Survey I
35
network on the membrane surface. Kell224 suggested that the interfacial region (Stern layer) is separated by a diffusion barrier that keeps most of the converted protons in this adjacent layer and thus preferentially diverts protons to the phosphorylation site. Such preferential proton lateral mobility has subsequently been demonstrated by Teissie and coworkers in phospholipid monolayers371'370'369 and in pure protein films,122 and by Alexiev et al.4 in the purple membrane of Halobacterium salinarum. Teissie and coworkers detected rapid lateral movement of protons on a phospholipid monolayer-water interface by a number of measurements: fluorescence from a pH indicator dye near the membrane surface, electrical surface conductance, and surface potential. These investigators found that the conduction of protons along the surface is considerably faster than proton conduction in the bulk phase (2-3 minutes vs. 40 minutes for a comparable distance in their measurement setup). This novel conduction mechanism is proton-specific, as has been confirmed by a radioactive electrode measurement as well as by replacement with deuterated water. The conduction mechanism is truly mesoscopic because it is present only in the liquid expanded (fluid) state of the monolayer but disappears in the liquid condensed (gel) state. It is a consequence of cooperativity between neighboring phospholipid molecules; the conduction mechanism disappears when phospholipid molecules are not in contact with each other. Teissie and coworkers suggested that the enhanced lateral proton movement occurs along a hydrogen-bonded network on the membrane surface in accordance with a mechanism previously proposed by Onsager303 for proton conduction in ice crystals. The hydrogen-bonded network is formed by the polar head-groups of phospholipids and the associated ordered water molecules at the membrane surface. Thus, a proton jumps from H 3 O + to a neighboring H2O. The newly formed H 3 O + then rotates and becomes a new proton donor to its neighbor farther down the chain in the network. As a result, protons move considerably faster on the membrane surface than expected by classical diffusion. The importance of the integrity of the hydrogen-bonded network was demonstrated by Fisun and Savin.111 These investigators simulated the long-range proton transfer along the hydrogenbonded chain that is formed by amino acids containing OH groups, and found that the replacement of the L-amino acid residues by the corresponding D-isomers in a peptide chain suppresses proton transport through the network. In fact, a similar mechanism has been used to explain proton diffusion in the bulk phase, which is considerably faster than the diffusion of ions of comparable size in the same medium. Therefore, the "hop-and-
36
F. T. Hong
turn" mechanism enhances proton movement both at the interface and in the bulk phase. So what makes the lateral proton movement on the membrane surface so much faster than proton diffusion in bulk water? I think that there are at least two reasons. First, the proton movement at the membrane surface is two-dimensional, whereas the movement in bulk water is three-dimensional {reduction-of-dimensionality principle of Adam and Delbriick.1) It is true that simple diffusion in a two-dimensional space is faster than in a three-dimensional space according to the probability theory. It may also be true for the "hop-and-turn" mechanism. The second reason may be the increased surface concentration of protons as compared to the bulk concentration caused by a negative surface potential (see Sees. 5.1 and 5.3). The negative surface potential is primarily due to the net negative surface charges of the phospholipid polar head-groups. The increased concentration at the surface makes the "hop-and-turn" mechanism work better along the surface than in bulk water. Teissie's group has shown that there is indeed a steep gradient near the membrane surface (2 pH units in less than 1.5 nm). However, these authors also observed similar facilitated lateral proton movement with monolayers formed from neutral and zwitterionic phospholipids. Therefore, the effect of a negative surface potential is probably secondary and supplementary but not essential. In summary, a special mechanism of the mesoscopic dynamics serves the purpose of containment of randomness. Protons which are pumped across the membrane need not search for the site of ATP synthase randomly and waste time (and energy) wandering into the bulk solution phase (aqueous phase beyond the Debye length) senselessly — it is a heuristic search instead of a random search. The question raised by the "Pacific Ocean" effect thus finds a satisfactory answer. 5.5. Role and specificity of phospholipid polar head-groups The phospholipid polar head-groups serve at least two purposes: a) to provide hydrophilicity for stabilizing the bilayer configuration of biomembranes, and b) to provide the hydrogen-bonded network for efficient proton lateral mobility. Neither of these functions require much of the head-group specificity. Thus, the diversity of head-groups suggests additional functions that may require head-group specificity. We shall examine a head-groupdependent process that modulates the activity of protein kinase C.295 Protein kinase C (PKC) is a ubiquitous protein that plays a crucial role in signal transduction and in protein targeting.191 Receptor-mediated hy-
Bicomputing Survey I
37
drolysis of phosphatidyl inositol bisphosphate (PIP2), which is initiated by extracellular signaling, generates two second messengers: a) inositol triphosphate (IP3, water-soluble, negatively charged), and b) diacylglycerol (lipid-soluble). Inositol triphosphate, in turn, mobilizes intracellular Ca2+, causing PKC — a peripheral protein initially located in the aqueous environment of cytosol — to bind to the plasma membrane, where it becomes activated. The activation of PKC appears to be partly electrostatic in nature (Fig. 9). The process can be conveniently divided into two stages. The first stage appears to be the electrostatic interaction between PKC and an acidic membrane (negatively charged head-group) in the presence of Ca2+. This step is sensitive to ionic strength and surface charges, and is accompanied by a global conformational change of PKC. The first stage can also be induced by nonspecific acidic phospholipids. In contrast, the second stage is a highly specific interaction that requires both diacylglycerol and phosphatidyl serine (PS), a negatively charged phospholipid. It is also insensitive to increasing ionic strength. The specific interaction induces a local conformational change that exposes the pseudosubstrate domain (which carries multiple positive charges) of PKC, which is further stabilized by binding to any acidic lipid (electrostatic interaction). The interaction between PKC and PS is cooperative. The PS dependence of PKC activity exhibits a steep sigmoidal kinetics (cf. hemoglobin cooperativity, Sec. 7.3). The interaction is highly specific. Negative charges and the amino group configuration of PS are important, since partial activation can be achieved by other phospholipids that have similar amino group functionality or negative charges at the polar head-group region. It is also highly stereo-specific (D-serine ineffective) and configuration-specific (L-homoserine ineffective). The binding of diacylglycerol is a prerequisite for PKC activation (stoichiometry 1:1); it increases the affinity of PKC for PS and is responsible for induction of the specificity and the cooperativity of PS binding. Strict stereospecificity is required of the structure of the diacylglycerol backbone and ester linkages to the fatty acids. However, the requirement of fatty acid acyl side-chain specificity is less stringent; only sufficient hydrophobicity of the acyl groups seems relevant. Ca2+ is required for some isozymes of PKC, but not for others. The underlying mechanism remains to be elucidated. Thus, electrostatic interactions are invoked twice in PKC activation: first as the homing device to help PKC target acidic membranes, and second as part of the allosteric mechanism to produce the catalytic activity. The
38
F. T. Hong
Fig. 9. Model for the interaction of protein kinase C (PKC) with phospholipid headgroups. A. In the first step, electrostatic interactions drive PKC to bind to acidic lipids (middle). This binding results in a conformational change that exposes the hinge region of PKC. B. The second stage of activation is rather specific. Diacylglycerol (DG) promotes cooperativity and specificity in the protein-lipid interaction and causes a marked increase in the affinity of PKC to phosphatidyl serine (PS), thus resulting in a PSdependent release of the pseudosubstrate domain from the active site and activation of PKC. (Reproduced from Ref. 295 with permission; Copyright by Annual Reviews)
first stage represents a rather general mechanism shared by the association of coagulation proteins with anionic membranes,85 as well as binding of synapsins to acidic membranes.30 In Sec. 5.3, we have postulated a similar homing mechanism for visual transduction: a photo-induced positive surface potential triggers the binding of a peripheral protein, transducin. The difference here is that the surface charges are photo-induced on the membrane-bound protein rhodopsin rather than on the phospholipid (which is not light-sensitive). The plausibility of this mechanism is also suggested by the following investigation about thyrotropin-releasing hormone (TRH). Colson et al.69 demonstrated that the binding of TRH to the receptor is not merely mediated by a process of random collisions, but rather by an elaborate two-step gradient-like process (cf. Sec. 6.1). The docking process involves the formation of an initial recognition site between TRH and the surface of its receptor. TRH may then be guided into the transmembrane binding pocket by fluctuations in
Bicomputing Survey I
39
the extracellular loop of the receptor. The activation of PKC serves as an excellent example showing how an electrostatic mechanism can make a lock-key paradigm work more deterministically and efficiently. By first "locking onto" the membrane surface, the searching for the target molecule is transformed from three-dimensional random searching to less random two-dimensional searching (reduction-ofdimensionality principle; cf. proton lateral mobility, Sec. 5.4). 5.6. Effect of transmembrane diffusion potentials and compartmentalization The efficiency of membrane-level phosphorylation discussed in Sees. 5.2 and 5.4 is further enhanced by additional mesoscopic mechanisms. Consider the following reaction, which is catalyzed by ATP synthase: ADP +Pi^
ATP
where Pi stands for inorganic phosphate. As the reaction proceeds in the forward direction, the ATP concentration will increase and the ADP concentration will decrease. Both changes of concentration will decrease the forward reaction rate and increase the reverse reaction rate. Eventually, the (net) forward reaction will come to a halt when chemical equilibrium is reached. In accordance with the law of mass action, the equilibrium can be shifted to the right by the following mechanisms. First, there is a transporter system (antiporter) that moves ADP into and ATP out of the matrix space of the mitochondria.297'378 Of course, both actions favor continuation of the net forward reaction and more ATP will be formed and exported. Second, the transmembrane movement of ADP and ATP is further assisted by the existence of a transmembrane diffusion potential (inside negative polarity) which favors the exit of ATP (containing three negatively charged phosphate groups and being transported as ATP4~) more than the exit of ADP (containing two phosphate groups and being transported as ADP3~) (pp. 159-164 of Ref. 297 and pp. 212-213 of Ref. 378). The equilibrium of the exchange is affected tenfold for each 60 mV of membrane potential. Again, this example demonstrates the role of membrane and electrostatic interactions in making the intracellular dynamics more deterministic. This example also illustrates the effect of compartmentalization: keeping biomolecules in appropriate compartments to make biochemical processes more deterministic than possible in a homogeneous solution phase.
40
F. T. Hong
5.7. Vesicular transport, exocytosis and synaptic transmission A general mechanism is employed by cells to deliver synthesized biomolecules to targeted subcellular compartments, e.g., from the endoplasmic reticulum to the Golgi apparatus. Essentially, the same mechanism is also utilized to export biomolecules to the extracellular space by exocytosis. Synaptic transmission depends on an exceptionally rapid process of exocytosis, upon cuing by an arriving action potential at the axonal terminals, i.e., neurotransmitter release. Not only is the process far more deterministic than purely random diffusion, but the destination is also reasonably assured by means of specific molecular recognition. Basically, the scheme is known as the SNARE hypothesis.360 Experimental evidence in support of the hypothesis has begun to surface at a rapid rate in recent years140'203' 126,272,205,48 ( see a l s o Chapter 35 of Ref. 364). SNARE is a widely used acronym, standing for soluble iV-ethylmaleimidesensitive factor attachment protein receptors. iV-ethylmaleimide-sensitive fusion factor (NSF) is a molecule that is required for the fusion of vesicles to another membrane; W-ethylmaleimide blocks the fusion process by alkylating sulfhydryl groups of NSF. Attachment of NSF to the vesicular membrane is a prerequisite for its fusion with another membrane. NSF itself is an ATPase; ATP binding is necessary for membrane attachment, whereas ATP hydrolysis is required for release. The latter process needs the help of additional cytosolic proteins, called SNAPs (soluble iV-ethylmaleimide-sensitive factor attachment proteins), which recognizes the receptor on a membrane. Thus, SNAREs are SNAP receptors. SNAREs are further classified into two categories: u-SNAREs, which are present in the vesicular membrane, and i-SNAREs, which are present in the target membrane. Recognition of the target membrane by a particular vesicle requires that the w-SNAREs match the i-SNAREs, in a "cognate" fashion (cognate SNAREs). The assembled u-SNAREs/£-SNAREs complex consists of a bundle of four helices.119 For example, synaptobrevin94 is a u-SNARE that is present on the surface of synaptic vesicles, whereas syntaxin is a f-SNARE that is present on presynaptic plasma membranes; the fusion is triggered by a rising cytosolic Ca2+ concentration as a consequence of membrane depolarization. The mutual recognition of these cognate SNAREs is absolutely required for rapid neurotransmitter release. The requirement of fairly specific complementary pairings of v-SNAREs and t-SNAREs has been demonstrated in reconstituted systems.272
Bicomputing Survey I
41
Just how specific are these cognate docking processes? Specificity arises from the availability of "cognitively" different u-SNAREs and i-SNAREs. There are enough different w-SNAREs and £-SNAREs to make targeting somewhat deterministic. However, it turns out that the destination targeting is not absolutely accurate. A vesicular cargo, of endoplasmic reticulum (ER) origin, that is destined for the Golgi complex could have fused with the plasma membrane, were it not for the proximity factor. The proximity factor makes the encounter of ER-derived vesicles with the Golgi membrane more likely than with the plasma membrane. 329 Thus, vesicle-targeting is deterministic but not absolutely deterministic (relatively deterministic). The accuracy of vesicle targeting is safeguarded by imposing several layers of constraints sequentially. The situation is similar to the problem-solving process described by Boden, in which a large number of constraints must be met en masse but none is individually necessary: each constraint "inclines without necessitating" (p. I l l of Ref. 42; cf. Sec. 4.21 of Chapter 2). The processes described above are also utilized in exocytosis of secretory granules, and in synaptic transmission. Yet, vesicular fusion in synaptic transmission is far more rapid than in exocytosis of secretory granules. Apparently, special mechanisms have evolved to ensure the high speed fusion required for synaptic transmission; exocytosis at synapses is orders of magnitude faster than in other secretory systems. Synaptic vesicles are not randomly distributed in the cytoplasm of axonal terminals. Rather they are organized and concentrated near specialized areas of the presynaptic membrane called "active zones."203 Regulatory mechanisms include tethering and binding of Rab GTPase proteins. 388,134,273 Synaptic vesicles can be pre-positioned at designated portions of target membranes by long tethers extending from the membranes. Distinct sets of tethering and vesicle-bound Rab proteins are used for different target membranes. Rab proteins are active only when GTP is bound. There are additional peripheral proteins associated with nerve terminals. Phosphorylation of these proteins, such as synapsin I, modulates the synaptic efficacy and, therefore, the synaptic plasticity.379'140'163 There are two pools of synaptic vesicles: a reserve pool that is tethered to the cytoskeleton and not immediately available for release, and a releasable pool that is immediately available. Dephosphorylated synapsin I links synaptic vesicles in the reserve pool to the actin-based cytoskeleton, and also exerts an inhibitory constraint on fusion (with the plasma membrane) of the vesicles in the releasable pool. The phosphorylation state of synapsin I increases under all conditions that promote the Ca2+-dependent release of neurotransmitter
42
F. T. Hong
molecules, such as the arrival of the action potential or Jf+-induced depolarization. However, the mechanism of synapsin I action is not as well established as the cognate SNARE pairing for vesicle docking. Classical studies based on measurements of synaptic potentials suggested that neurotransmitters are released in quantal packets after complete fusion of synaptic vesicles with the plasma membrane218 (Sec. 8). However, Neher290 thought that this classical picture presented a wasteful scheme because the fused vesicular membrane would have to be retrieved for recycling only a short moment after complete fusion. Recent evidence based on electrophysiological studies and atomic force microscopy suggests that the vesicles fuse with the plasma membrane only transiently, forming fusion pores and releasing only part of the content. For reviews see Refs. 234 and 206. The above example illustrates how both shape-based and switch-based processing in the microscopic dynamics are linked to the mesoscopic switching process that, in turn, leads to the macroscopic switching process. In other word, both shape-based and switch-based processing are utilized in synaptic transmission; the process is thus more deterministic than implied by simple shape-based processing alone. 6. Shape-Based Molecular Recognition Conrad's version of the lock-key paradigm stipulates that macromolecules recognize each other by complementary shape matching like a lock and key, via random diffusion and collisions. Such a simplified mechanism of shapebased molecular recognition is problematic in the solution phase (intracellular dynamics). The recognition process will be slow and inefficient for the following reasons. Macromolecules must find each other by random diffusion and collisions in a three-dimensional space. In the event of a random collision, the encountering molecules can be deflected from each other, just as rapidly as they approach each other, because of their linear momenta. The contact dwell-time, the duration of the period during which the encountering partners remain in close contact, may not be sufficiently long to ensure consummation of a chemical reaction. To make matters worse, a chemical reaction between two highly asymmetric macromolecules requires proper orientation because the reactive site often occupies a small area relative to the entire macromolecular's exposed surface. Thus, a pair of macromolecules with complementary shapes may not match like a key and lock for immediate docking upon collision since it would be like throwing a key
43
Bicomputing Survey I
at the keyhole from a considerable distance; an additional adjustment for mutual re-orientation is often required prior to recognition. Re-orientation by means of rotational diffusion (rotational Brownian motion) can be a slow process because of the size of a macromolecule. Since the reaction rate of many enzymes is diffusion-limited (encounter-limited),108 searching for the correct docking orientation must be a rapid process; additional mechanisms may be involved in making shape-based recognition more deterministic and efficient. Nature solved this problem by evolving mechanisms for homing, transient complex formation and docking of reaction partners, by recruiting short-range non-covalent bond interactions. The lock-key paradigm is thus made more deterministic, and therefore more efficient, than a plain shapefitting scheme. A strong causality is often achieved by means of intermixing shape-based and switch-based mechanisms, some of which are nevertheless energy-consuming. Conversely, a switching mechanism often requires the lock-key paradigm for its functioning. For example, enzyme activation is often accomplished by binding an activator via the allosteric effect.281 This topic was recently reviewed by Hong.188 The discussion will be recapitulated here. 6.1. Role of short-range non-covalent bond interactions molecular recognition
in
It is instructive to examine the docking process of cytochrome c and mitochondrial inner membrane complexes (Figs. 6B and 6C). The recognition involves primarily a small number of matching ionic bonds. Furthermore, the lock-key matching is not unique: cytochrome c utilizes two nearly identical sets of ionic bonds to dock with the cytochrome b-c\ complex and with cytochrome oxidase. Thus, shape-fitting in macromolecular recognition is only approximate, and the geometric shapes of the two docking sites are not precisely complementary to each other. Apparently, perfect shapefitting in molecular recognition is neither desirable nor necessary. As we shall see, pattern recognition in cognitive problem solving is useful only if imperfect patterns can also be recognized (fault tolerance; cf. Sec. 4.2 of Chapter 2). Likewise, molecular recognition is functionally useful only if perfect shape-fitting is not required. The requirement of perfect shapefitting would make molecular recognition an unacceptably slow process. Also, perfect shape-fitting tends to make the complex much too stable to be able to come apart subsequently. In other words, perfect or near-perfect
44
F. T. Hong
shape fitting confers an affinity (to the encountering partners) that is too high to make molecular recognition functionally useful. Molecular recognition is determined in part by shape-matching and in part by the matching of specifically aligned ionic bonds, hydrogen bonds, van der Waals contacts, and hydrophobic interactions. The docking process is achieved by minimizing the combined potential energy of all shortrange non-covalent bond interactions, and is equivalent to settling at the potential-energy minimum of the free energy landscape (cf. Sec. 7.5). These non-covalent bonds are weaker than covalent bonds. However, depending on the total number of bonds, together they are sufficiently strong to overcome dislodging by thermal agitation. Again, in Boden's words, each constraint "inclines without necessitating."42 The non-covalent bonds exert their collective effect by acting en masse. The van der Waals forces, which are nonspecific, require close proximity of atoms (3-4 A) for its action. These forces contribute to the stability of complex formation but require close shape-match to exert effect. Hydrophobic forces are also non-specific and confer no polarity of bonding. As we shall see, a docking cavity is often lined with hydrophobic amino acid residues. Hydrogen and ionic bonds, however, possess "directionality" and require matching parts with opposite polarities, just like Velcro (registered mark) fasteners' loops and hooks. These bonds thus confer some specificity on molecular recognition. However, as indicated by the action of cytochrome c, the specificity is not absolute and unique. Since these bonds are not highly specific, a "false" (imperfect or incorrect) match can occur with fewer, "accidentally" aligned, hydrogen or ionic bonds, thus settling in a local potential-energy minimum. However, a false match helps trap the encountering molecules and allows them to form a transient complex. As a result, the contact dwell-time is significantly increased and a threedimensional searching process is converted into a two-dimensional one for the global potential-energy minimum (reduction-of-dimerisionality principle). The reduction-of-dimensionality effect has been observed in a Brownian dynamics simulation of cytochrome c and cytochrome c peroxidase complex formation.302 The specificity of the correct docking position is provided by the pattern formed collectively by the geometric distribution of these bonds at the interface of the two complexing molecules. In other words, the complementary shape of the docking site, which often looks like a cavity in the larger one of the two partners, helps define the bond distribution pattern and, hence, its specificity. The shape of the docking cavity affects the distance between
Bicomputing Survey I
45
the matching recognition sites. The shape of the docking cavity matters because, with the exception of electrostatic interactions, the non-covalent bond interactions are extremely short-ranged (no more than a few A). Is two-dimensional searching for the correct docking position purely random? Not necessarily. In operations research and human problem solving, approaching the task by trial and error and by examining every possible way is simply not realistic. This is because the number of possibilities rapidly increases beyond bound as the complexity of the problem increases (a situation known as combinatorial explosion). A typical example is provided by the enormous number of possible moves, countermoves, countermoves against countermoves, etc., in a chess game when a player tries to outsmart the opponent and searches for a strategic move by planning ahead at a search depth of several levels (or, rather, plies — half moves) (see Sec. 4.3 of Chapter 2). Even IBM supercomputer Deep Blue, which defeated world chess champion Garry Kasparov in 1997, could not afford to explore the search space exhaustively294 (see Sec. 5.18 of Chapter 2). Therefore, selective searching based on explicitly prescribed rules of thumb or heuristics often allows a problem to be solved in a reasonable length of time, whereas an undirected or trial-and-error search would require an enormous amount of time, and often could not be completed in a human lifetime. The approach is known as heuristic searching in cognitive science and operations research.351'353 Searching in molecular recognition and other biocomputing processes cannot be readily quantified as in chess games, but the implication of combinatorial explosion is no less alarming. As we shall see, Nature was perfectly capable of evolving additional mechanisms which are increasingly deterministic so as to avoid random searching. Variation in the number of hydrogen or ionic bonds permits fine-tuning of the overall bonding strength by evolution. Protein conformations offer virtually unlimited variations of the shape of contact surfaces. The variation in shape allows for variation in geometric distributions of non-covalent bonds. Thus, in principle, a gradient of bonding strengths could be evolutionarily arranged and a cascade of local minima could be evolutionarily constructed to guide the two-dimensional searching process towards the global minimum instead of away from it (biased twodimensional random searching or heuristic searching). This speculative idea is illustrated by a simple diagram which shows the matching of non-covalent bonds (Fig. 10). For simplicity, the bond polarity and the shape-factor of the
46
F. T. Hong
matching surfaces are ignored in the schematic diagram.13 When two (flat) matching surfaces are sliding over each other, down the potential-energy gradient, the initial match involves only one or two matching bonds, then three bonds, and finally all seven bonds. In reality, all of the non-covalent bond interactions are involved. Furthermore, the shape-factor modulates the interactions because non-covalent bond formation requires the proximity of matching sites. The validity of the "gradient" strategy is supported by the following observations. A steering effect based on a two-step strategy has been reported by Colson et al.69 A simulation of the free-energy landscape for two complexing proteins, reported by Camacho et al.,55 showed that electrostatic interactions have a steering effect toward the correct mutual orientation in some cases, whereas desolvation energy (hydrophobic interaction) serves as an attracting and aligning force in other cases. As will be shown in Sec. 7.6, protein folding also exhibits a gradient strategy. Figure 10 shows a steering effect that arises from a preexisting gradient of increasing non-covalent bond interactions. An additional steering effect can take place at a greater distance than is possible to form ionic or hydrogen bonds because the electrostatic force does not suddenly vanish but rather tapers off gradually, albeit at a rate that is still faster than predicted by the inverse square law due to charge screening (Sec. 5.1). Two clusters of charges on two separate molecules can interact electrostatically at a considerably greater distance than possible for other types of non-covalent bond interactions because the collective magnitude of the clustered charges may be sufficiently large to make the force, though attenuated by distance and charge screening, still felt at a distance of more than just a few A. However, the cluster of charges has to be regarded as a point source of charges under this circumstance. This is because the distance of separation between individual charges in the same cluster is small compared to the distance between the two macromolecules. Every charge on a macromolecule is of about equal distance to any charge on its docking partner at a large distance. Therefore, the detailed charge distribution pattern on its surface cannot be recognized by its docking partner at such a distance. This type of electrostatic interaction shall be referred to as global electrostatics. In contrast, when the partners are within a few A apart, each charged residue experiences mainly the attraction of another charged residue that is wellb
The design of the diagram is flawed, as will be discussed in Sec. 6.13 of Chapter 2. However, the flaw does not invalidate the argument presented in this section.
Bicomputing Survey I
47
Fig. 10. A gradient strategy of molecular recognition. A. The schematic diagram shows that any initial imperfect match between two encountering macromolecules locks them together, while the gradient of bonding strengths guides the two encountering molecules to find the "intended match" by means of free energy minimization. Here, the shape factor is ignored, and the search for docking is restricted to one dimension for simplicity. Only one type of non-covalent bond interaction, such as ionic or hydrogen bonding, is considered and the bond polarity is disregarded. The sequence shows unidirectional sliding, but in reality the two encountering molecules slide against each other back and forth in Brownian motion. B. The free energy landscape is therefore represented by a numbers of peaks, which depict bonding at one, two, three and seven sites. The initial search is conducted by means of two-dimensional random diffusion and collisions. When the two encountering parties slide along each other, some local peaks (i.e., local minima in the free energy landscape) of 2- and 3-site bonding are reached. The gradient was evolutionarily constructed so that the gradient guides the search towards the ultimate 7-site bonding. If the two molecules continue to slide in the same direction and overshoot their goal, the number of matching sites will decrease to 3 and 2 (separated by some onesite matches). If thermal agitation dislodges the two molecules from the global minimum, the gradient will "discourage" the two encountering molecules from sliding away from each other, thus prolonging the dwell-time of the encounter. On the other hand, thermal agitation prevents the encountering molecules from being trapped at any of the local minima. (Reproduced from Ref. 188 with permission; Copyright by CRC Press)
aligned to form a complementary pair together; attractions or repulsions due to other charged residues that are not aligned are weak by comparison (charge pairing). It is the pairing of complementary charges that helps define the specificity of the correct docking position. This type of electrostatic interaction shall be referred to as local electrostatics. Of course, other types of non-covalent bond interactions and the shape of the matching docking
48
F. T. Hong
sites also contribute to the local specificity. In particular, hydrogen bonding provides another means of pairing for local specificity. As we shall see, mobile electron carriers often exhibit an asymmetry of charge distribution on its exposed surface so as to form an electric dipole. The interaction of two encountering electric dipoles thus helps align the two approaching partners for preferred mutual orientation prior to a collision. This type of interaction must be distinguished from interactions that form bonafideionic bonds. The dipole-dipole interaction does not require charge pairing.
Thus, electrostatic interactions are invoked twice: for the purpose of homing (oriented collision) and for the purpose of defining the specificity of transient complex formation (docking). It is often referred to as long-range interactions in the biochemistry literature even though it is mesoscopic in origin and it operates at a shorter range than what is dictated by the inverse square law, as explained in Sec. 5.1. The homing action during the first stage of activation of PKC belongs to this type of steering effect (Sec. 5.5). In some cases, complementary shape-fitting takes place only after the substrate is bound to the enzyme (Koshland's231 concept of induced fit; e.g., binding of glycyltyrosine to carboxypeptidase A, see Figs. 8.14 and 9.25 of Ref. 364). In this case, a transient complex of the two encountering partners must form prior to the consummation of the reaction (cf. Type III electron transfer mechanism of Hervas et al.159) The search in the free energy landscape for a complementary match in molecular recognition is therefore similar to the search in the evolutionary fitness landscape: evolution to a new adaptive state by a given organism often alters the fitness landscape (see Sec. 3.4 of Chapter 2). In Fig. 10, interactions other than electrostatic interactions and hydrogen bonding are ignored. The choice was not based on its relative importance among various types of non-covalent bond interactions but was rather motivated by the ease of designing such a diagram (see Sec. 6.13 of Chapter 2 for a discussion of top-down designs). Electrostatic interactions are important mainly as the initial complex formation and crude alignment but are not the sole determinant for the final orientation in an electron transfer reaction. Other types of interactions, such as hydrophobic interactions, are often called upon tofinalizethe docking process, as was demonstrated in the interaction of the following redox partners: plastocyanin and cytochrome / complex (of the cytochrome b$-f complex),100 cytochrome ce and Photosystem I, 279 and ferredoxin and FNR (ferredoxin:NADP+ reductase).193'265
Bicomputing Survey I
49
In fact, direct measurements of the ionic strength dependence of intracomplex electron transfer rate constants for several redox partners, such as yeast or horse cytochrome c-cytochrome c peroxidase, spinach ferredoxinFNR, and bovine cytochrome c-cytochrome c oxidase, have demonstrated that electrostatic forces are not the sole determinant of optimal electron transfer rates (reviewed in Ref. 373). At low ionic strengths, the electrostatic interactions are enhanced but the redox pairs are "frozen" into a non-optimal electrostatically stabilized orientation. This is in part because charge pairing is not highly specific and false matches are common. Excessive electrostatic interactions distort the delicate balance among various types of non-covalent bond interactions and increase the chance of trapping the redox partners in a local minimum, thus preventing them from reaching the global minimum in the free energy landscape. Additional experiments were performed to cross-link redox-pairs at low ionic strengths and lock the pairs into electrostatically stabilized complexes. In all systems examined, which included cytochrome c-cytochrome c peroxidase, flavodoxin-FNR,c and cytochrome c-cytochrome c oxidase, electron transfer rate constants were actually smaller for the covalent complex than for the transient complex formed upon collision at the optimum ionic strength. Apparently, a certain degree of flexibility of the docking process is required to allow for exploration of the optimal orientation. In other words, short-range interactions between the docking partners should be sufficiently strong to link them together so as to increase the contact dwell-time but not so strong as to inhibit two-dimensional explorations for the optimal orientation. Perhaps the transient complexation that involves several mobile electron carriers serves to illustrate the nuance of molecular recognition. 6.2. Molecular recognition between ferredoxin and FNR We shall first consider the interactions between FNR and its redox partners: ferredoxin (Fd) and NADP + . A ternary complex consisting of Fd, FNR and NADP+ has been studied.22'23 It was proposed that a transient ternary complex involving Photosystem I, Fd and FNR could be formed.380 More recently, a complex of Fd and FNR was crystallized.282 The complex is biologically relevant because the structure agrees with what has been inferred from studies in the solution phase. The two proteins can co-crystallize only c
Flavodoxin is an electron carrier, synthesized by cyanobacteria to replace ferredoxin, under a condition of iron deficiency.
50
F. T. Hong
with a molar excess of FNR over Fd, resulting in a ternary complex consisting of two reductases (named FNR1 and FNR2) and one Fd molecule. Fd is bound to a hydrophobic cavity provided by FNR1. The Fd [2Fe-2S] cluster is 7.4 A distant from the exposed C8-isoalloxazine methyl group of the cofactor FAD of FNR1, which is sufficiently close to enable effective fast electron transfers. In contrast, Fd makes a contact with FNR2 at the site where NADP+ is physiologically bound. The Fd iron-cluster is too far from the FNR2-FAD (14.5 A) to enable a fast electron transfer. Therefore, the complexation between Fd and FNR2 is probably a crystallographic artifact; the role of FNR2 could be to stabilize crystal packing by contacts with neighboring proteins. Morales et al.282 pointed out that the interactions between Fd and FNR1 are far more specific than those between Fd and FNR2, as indicated by the following pairs of data: the number of hydrogen bonds and ion pairs (10 vs. 3), van der Waals contacts (< 4.1 A) (54 vs. 21), and buried area upon complex formation, i.e., cavity (1600 A2 vs. 1100 A2). The molecular electric dipoles of Fd and FNR1 are nearly colinearly orientated, resulting in the negative pole of Fd being close to the positive pole of FNR1.296 Additional arguments in support of the biological relevance of the Fd-FNRl pair of the ternary crystal complex can be found in an important article by Gomez-Moreno and coworkers.282 Gomez-Moreno and coworkers have extensively analyzed the effect of site-directed mutagenesis of FNR and/or its redox partners. Charged amino acids were replaced with neutral or amino acids with reversed charge polarity (and sometimes with concurrent side-chain elimination). These investigators found that some charged amino acids are absolutely critical for electron transfers, whereas replacements of other nearby residues have either moderate or no effect. For example, Phe65 and Glu94 on Fd are required for efficient electron transfers between Fd and FNR (Fig. 1A of Ref. 192). In contrast, nearby residues Glu95, Asp67, Asp68 and Asp69 play significant but less dominant roles in maintaining an appropriate protein-protein orientation (Fig. 1A of Ref. 192). These investigators suggested that charge pairing (ionic bonds) involving Glu94 is apparently important for maintaining a productive orientation — ionic bonds for docking — for electron transfers. The exact role of the aromatic residue Phe65 is not certain but is more likely involved in docking (hydrophobic interaction) rather than in the electron transfer pathway. Similarly, a cluster of hydrophobic residues (Leu76, Leu78, and Vall36) on FNR was found to be critical for electron transfers
Bicomputing Survey I
51
between FNR and Fd, but not between FNR and NADP+.265 Likewise, alterations of several basic residues on FNR that line the Fd-binding cavity (Lys75, Lysl38, Lys290 and Lys294) also caused significant impairment of electron transfer between Fd and FNR, but the affinity of FNR for NADPH did not seem to change (NADPH binds to the opposite side of FNR) (see Fig. IB of Ref. 132). An appreciation of the spatial relationship of the two docking partners is facilitated by viewing the stereoscopic structure of Fd and FNR on Fig. 1 of Ref. 132 or Fig. 1 of Ref. 192. A critical role was confirmed for ArglOO on FNR for the FNR-NADP+ interaction, whereas Arg264 is not critical.264 However, the mutant R264E (charge-reversal; positive Arg to negative Glu) showed an altered behavior in its interaction and electron transfer with Fd and flavodoxin. Here, it should be pointed out that ArglOO is near the binding pocket of FNR for NADP+, whereas Arg264 is near the binding pocket of FNR for Fd (see Figs. 1 and 6 of Ref. 264, respectively). In summary, there are two classes of amino acid residues: one involved in critical alignment of docking for electron transfers, and the other involved in crude orientation or transient complex formation. Those involved in critical docking alignment tend to be different for the two substrates, Fd and NADP+, which are bound to opposite sides of FNR. Finally, hydrophobic clusters usually lined the binding cavity and are critical for electron transfer. The situation with plastocyanin and cytochrome c§ to be presented next is similar; the two independent sets of data demonstrate a common design scheme for redox partners of long-distance electron transfers. 6.3. Comparison of plastocyanin and cytochrome c& We shall now consider two mobile electron carriers that feed electrons from the cytochrome b@-f complex to the oxidizing end of Photosystem I: plastocyanin and cytochrome c$ (see Sec. 5.2 for a brief description). In green plants, copper-containing plastocyanin is the sole mobile electron carrier between the cytochrome b$-f complex and Photosystem I. Cyanobacteria and green algae, under the condition of a low copper concentration, can synthesize an alternative heme-containing mobile carrier, cytochrome c$, to replace plastocyanin. 62>289 De la Rosa and coworkers159 compared the structures and functions of these two electron carriers in two cyanobacteria (prokaryotes), Synechocystis and Anabaena, in a green alga (eukaryote) Monoraphidium and in spinach chloroplast (eukaryote). These investigators found three different types of reaction mechanisms of the Photosystem I re-
52
F. T. Hong
duction: Type I (oriented collisional reaction mechanism), which involves an oriented collision between the two redox partners; Type II (a minimal two-step mechanism), which proceeds through the formation of a transient complex prior to electron transfer; and Type III, which requires an additional rearrangement step to make the redox centers orientate properly within the complex. Cytochrome c§ and plastocyanin are quite different in their chemical structures. Their redox centers are heme and copper, respectively. The /?sheet structure of plastocyanin is absent in cytochrome CQ , which has several a-helices instead. However, there is analogous surface topology in these two mobile proteins. Both proteins feature a north hydrophobic pole around the redox center (Site 1), and an east face with charged groups (Site 2). The north pole is important in electron transfer, whereas the east face is important for orientation during the interaction of these proteins with Photosystem I, but is not directly involved in electron transfer. As expected from their equivalent roles, the east face of plastocyanin and that of cytochrome CQ are both acidic, or both basic. In cyanobacterium Synechocystis, both proteins contain an acidic patch at Site 2: the equivalent charges are Asp44 and Asp47 in plastocyanin and Asp70 and Asp72 in cytochrome C6, respectively. When these charges were reversed by site-directed mutagenesis (e.g., plastocyanin mutants D44R and D47R, in which the respective negatively charged aspartate was changed to positively charged arginine), the electron transfer reaction was retarded but not abolished.d The ionic strength dependence became the opposite of the wild type, i.e., the electron transfer rate was increased with increasing ionic strengths. Kinetic analysis revealed that the interactions followed the simple oriented collision (Type I). Apparently, these charged amino acids are not involved directly in the docking of the redox partners but are responsible for the generation of an electric dipole. This electric dipole helps orient the mobile electron carrier with the correct orientation towards the binding site on its redox partner (global electrostatics). There was no evidence of transient complex formations in the interactions of Photosystem I and wildtype or mutant plastocyanin, such as D44R and D47R. The exception was the double mutant D44R/D47R (see later). The prokaryote Anabaena presents a different scenario. Site 2 of both plastocyanin and cytochrome c§ are positively charged. However, the function of Site 2 in plastocyanin is similar in Anabaena and Synechocysd
D and R are single-letter codes for aspartic acid and arginine, respectively.
Bicomputing Survey I
53
tis. Alterations of these charges by site-directed mutagenesis (e.g., D49K, K58A, K58E) alter the surface-potential profile, thus allowing plastocyanin to modulate its kinetic efficiency of electron transfer: the more positively charged, the higher the rate constant.280 Similar changes took place when the positively charged patch of cytochrome c§ of Anabaena was altered by site-directed mutagenesis at Lys62 or Lys66.279 Kinetic analysis by laser flash absorption spectroscopy indicated that these charges in plastocyanin and cytochrome c$ are involved in the formation of a transient complex (Type III mechanism). Some mutations at the east face changed the kinetics to either Type I or Type II. Alterations of charges in the east face modulate the electron transfer rate without completely abolishing electron transfer, whereas alterations of charges at the north hydrophobic region are critical. Arg88 (adjacent to the copper ligand His87) in plastocyanin and Arg64 (close to the heme group) in cytochrome c6 are required for efficient electron transfers.279'280 These charges form part of the binding pocket and are critical for proper orientation of the redox partners. In eukaryote green alga Monoraphidium braunii, both plastocyanin and cytochrome c$ contain an acidic east face and both proteins exhibit a prominent electric dipole moment.114 The reaction mechanism of electron transfer belongs to Type III: an additional arrangement is required after the formation of a transient complex. The postulated rearrangement can be either a conformational change of the induced fit type or elimination of water from the hydrophobic pocket (desolvation, involving reorganization energy, as stipulated by Marcus262), or a two-dimensional random walk, as stipulated in the reduction-of-dimensionality principle,1 or both. As expected, cytochrome / (of the cytochrome 6 6 -/ complex) also possesses an electric dipole moment; its docking surface with plastocyanin or cytochrome CQ is negatively charged.114 In summary, studies of the structure-function relationships in the interactions of plastocyanin/cytochrome CQ and their redox partners reveal the evolution of reaction mechanisms.159 The north hydrophobic region is the cavity for lock-key style docking, whereas the east face is the site responsible for electrostatic interactions for mutual orientation and/or transient complex formation. In prokaryote Synechocystis, electrostatic interactions are invoked as a homing mechanism. The distribution of individual charges is not critical. The electron transfer process is made more deterministic by a proper orientation prior to collision (Type I). The overall electric dipole moment helps orient the mobile carrier via long-range electrostatic
54
F. T. Hong
interactions. The Type II mechanism then evolved to increase the contact dwell-time of the redox partners. In prokaryote Anabaena, eukaryote Monoraphidium and higher plant, the reaction mechanism evolved to three steps: long-range electrostatic interactions for homing, a possible two-dimensional search or conformational rearrangement to refine the docking alignment, which requires shape fitting, and a final step of electron transfer. The mechanism of heuristic searching in molecular recognition is thus a product of evolution rather than an inherent feature of molecular interactions. 6.4. Molecular recognition of transducin and arrestin Let us now resume the discussion of visual transduction in the context of molecular recognition, which was introduced in Sec. 5.3. Three major proteins are involved in the initiation and termination of visual transduction: rhodopsin, transducin and visual arrestin. Rhodopsin belongs to a large family of molecules known as G protein receptors that are characterized by seven transmembrane a-helices. One of the partners, transducin (Gt), belongs to one of the most important classes of molecules in signal transduction: GTP-binding proteins (G proteins; e.g., see Refs. 131 and 66). Visual arrestin is also a member of a large family of molecules that are involved in the deactivation of G proteins. Like other G proteins, inactive transducin is a heterotrimer consisting of three subunits: Ga (which has a GTPase domain that binds a molecule of GDP and two switch regions that initiate a major conformational change, once activated by its receptor), G/3, and G7. Photoactivated rhodopsin — i.e., metarhodopsin II, or Mil — binds the inactive transducin heterotrimer, triggers the switch regions, and causes the bound GDP to be released and a fresh GTP molecule to be bound to Ga. The GDP-GTP exchange then causes the three subunits of transducin to dissociate from metarhodopsin II and from one another, thus activating transducin and forming Ga-GTP and G/?7. The activated Ga-GTP then triggers the cGMP cascade, as described in Sec. 5.3. Subsequent hydrolysis of the bound GTP in Ga then reassembles the three subunits into the inactive transducin heterotrimer. The cGMP cascade is a nonlinear system that greatly amplifies the action of an absorbed photon on rhodopsin. Sitaramayya and Liebman357 demonstrated that the termination of the cGMP cascade requires phosphorylation of rhodopsin. The phosphorylation of photoactivated rhodopsin is catalyzed by rhodopsin kinase356 and takes place at the C-terminal region where nine serine and threonine residues are phosphorylated.390 Phospho-
Bicomputing Survey I
55
rylation of rhodopsin reduces249'235 or even blocks355 Gt activation. This inhibitory effect is greatly enhanced by arrestin,33 which binds specifically to phosphorylated MIL235'332 The investigations of molecular recognition between rhodopsin and its two reaction partners in the cGMP cascade were greatly facilitated by the elucidation of the crystal structures of rhodopsin,306 transducin300'386'238 and arrestin.138'168 Using site-directed cysteine mutagenesis to place crosslinkers on rhodopsin, Khorana and coworkers53'199 identified the contact sites of the photoactivated rhodopsin-transducin transient complex. Ga are cross-linked at residues 310-313 and 342-345 at the C-terminal region, which had previously been known to be the receptor-binding domains,44'148 and at residues 19-28 of the JV-terminal region; both the C- and iV-terminal regions are in contact with the third cytoplasmic loop of photoactivated rhodopsin, which has been postulated to be the site that binds Gt.113 The AT-terminus of Ga and the C-terminus of G7 are relatively close together. Both are sites of lipid modification, and are likely to be the sites of membrane attachment. The region around Glul34-Argl35-Tyrl36 on helix II of rhodopsin is surrounded by hydrophobic residues and is known to bind Gt.306 However, this region is buried in the ground state rhodopsin and is in no way to make contact with Gt. Therefore, this region must become exposed in the photoactivated rhodopsin via a conformational change. Definite conclusions regarding the docking of Gt with Mil must await the elucidation of the crystal structure of photoactivated rhodopsin-transducin complex. The C-terminal of Mil is crucial for arrestin binding and rapid deactivation of the cGMP cascade, as could be demonstrated by abnormally prolonged responses in a rhodopsin mutant of which 15 amino acid residues at the C-terminal region were truncated.60 The truncated segment is apparently not needed for Gt binding and activation.60'356'275 This C-terminal segment of rhodopsin is most likely the initial recognition site but not the ultimate binding site of arrestin, since excess Gt displaces arrestin from phosphorylated rhodopsin. Gt, which binds to both phosphorylated and nonphosphorylated Mil, apparently competes with arrestin for the same binding site.235 Arrestin undergoes a conformational change after binding to the C-terminal of phosphorylated MIL313'144'139 This change exposes a critical buried region of arrestin that enables it to bind to a second binding site on phosphorylated Mil, which also binds Gt competitively. This interpretation is supported by an experiment with a synthetic peptide that mimics the C-terminal tail of phosphorylated MIL This synthetic peptide
56
F. T. Hong
comprises the fully phosphorylated C-terminal phosphorylation region of bovine rhodopsin, residues 330-348 (known as 7P-peptide).313 This peptide binds to arrestin, induces a conformational change that is similar to that induced by binding to phosphorylated Mil, and activates it. Furthermore, this activated arrestin can then bind to photoactivated but nonphosphorylated rhodopsin, i.e., nonphosphorylated Mil. Thus, an electrostatic mechanism is implicated in at least the initial binding of arrestin to the phosphorylated C-terminal of MIL However, the multiple phosphate groups apparently do more than just provide negative charges since synthetic peptides in which all potential phosphorylation sites were substituted with glutamic acid (7E-peptide) or with cysteic acid (7Cya-peptide) did not mimic the negative charge of phosphorylated amino acid residues.271 Evidence of electrostatic effects came from kinetic analysis of the complexation between Gt and Mil and between arrestin and MIL Investigations of the kinetics of rhodopsin-arrestin complexation were hampered by the similarity of the absorption spectra of free Mil, MII-Gt complex and MH-arrestin complex; they all have an absorption maximum at 390 nm; the three forms of Mil are not spectroscopically distinguishable. Since there exists an acid-base equilibrium between MI and Mil, binding of Gt shifts the equilibrium to the right, by virtue of the law of mass action, thus stabilizing highly phosphorylated Mil as a MII-Gt complex.102'103 Similarly, binding of arrestin stabilizes phosphorylated Mil as a Mll-arrestin complex.332 Both complexation processes increase absorption at 390 nm. The situation is further complicated by the finding that phosphorylation increases the MI-MII equilibrium constant.127'128 Thus, the equilibrium of MI-MII is shifted to the right for two reasons: the change of the equilibrium constant and the effect of product removal. As a consequence, the relative proportions of free Mil and its two complexed forms cannot be directly measured spectrophotometrically. These enhanced portions of Mil formation are known as extra-metarhodopsin II. Parkes et al.307 subsequently developed a nonlinear least-square curvefitting procedure (Simplex) to sort out the three forms of Mil amidst three concurrent equilibria. Using this approach, Gibson et al. 129 found that the binding affinity of Gt to Mil is dependent on ionic strengths, with maximum affinity at 200 mM. The affinity decreases toward both higher and lower ionic strengths. Similar ionic strength dependence was found for the binding affinity of arrestin to Mil, at various phosphorylation levels. The decrease of affinity at high ionic strengths is consistent with the effect of charge screening, but the decrease at lower ionic strengths is not. Gibson
Bicomputing Survey I
57
et al.129 attributed the decrease to the possibility of two separate binding processes: binding to Mil and to the photoreceptor membrane. However, similar decreases have been observed in several mobile electron carriers in photosynthetic membranes; there is no compelling evidence to postulate an additional binding site on the photosynthetic membrane. In view of the ubiquity and possible universality of this type of ionic strength dependence among mobile electron carriers in photosynthetic membranes and in the spirit of Ockham's razor, we favor, instead, the interpretation, proposed by Tollin and Hazzard,373 that excessive electrostatic stabilization of a docking complex hampers further interactions between the complexed partners by inhibiting exploration. This is especially relevant since there appears to be two separate stages of arrestin binding to Mil; an excessively stabilized complex of arrestin with the C-terminal binding site of phosphorylated Mil would interfere with its binding to a second site. Tollin and Hazzard's interpretation is further justified by additional independent evidence suggesting that protein flexibility may be a universal necessity in macromolecular reactions. Garbers et al.124 studied the electron transfer from negatively charged semiquinone Q^* to oxidized quinone QB in Photosystem II (see Fig. 7). These investigators found that the extent of re-oxidation of Q^* starts to decrease below 275°K and is almost completely suppressed at 230°K. Detailed analyses of Mossbauer spectra measured at different temperatures in 57Fe-enriched samples indicated that the onset of fluctuations between conformational subsets of the protein matrix occurs also at around 230°K. They suggested that the head-group of plastoquinone-9 (a membrane-bound mobile electron carrier) bound to the Qs-site in Photosystem II requires a structural reorientation for its reduction to the semiquinone. The kinetic analysis of Gibson et al. also showed that Gt affinity for Mil decreases with increasing phosphorylation, whereas arrestin affinity for Mil increases almost linearly with increasing phosphorylation. The data were also consistent with the assumption of 1:1 binding stoichiometry for GtMII and for arrestin-MII. The reciprocal fashion of the phosphorylation dependence of Gt and arrestin binding affinity is in keeping with their respective roles as the activator and the deactivator of the cGMP cascade, thus reflecting Nature's way of optimization by means of concurrent modulation of the two binding processes. The incremental change for each added phosphate indicates that the switches of visual transduction are not simple on-off switches but are similar to the kind of rotary switch (incremental switch) that allows for continuous adjustment of the brightness level of a
58
F. T. Hong
lamp.389 Despite the similarity of affinity changes, the (second-order) binding rates of Gt and arrestin are different at equal concentrations. The kinetics of Gt binding to Mil at 0.6 /xM Gt are exponential with a relaxation time constant near 1 s, whereas arrestin binding to Mil at 0.6 fiM arrestin is much slower, with a relaxation time constant of more than 20 s. 129 This disparity of kinetics allows for a window of time for Gt to take action before deactivation prevails, thus avoiding shutting down the process prematurely. Although phosphorylation does not cause global conformational changes of Mil, local changes that include changes of the membrane surface potential and the membrane surface pH may affect the binding kinetics of both Gt and arrestin.128 As explained in Sec. 5.1, the increased negativity of the membrane surface potential lowers the surface pH relative to the bulk pH. The ensuing lowering of pH can affect the protonation state (ionization state) of exposed charged amino acid residues in two possible ways: increasing reactant (proton) availability by virtue of the law of mass action and the direct effect of pH on the binding constant. The resulting change of the protonation state of charged amino acid residues at the membrane surface further changes the membrane surface potential. The interaction is therefore highly nonlinear. Using the nonlinear least squares fitting scheme, Gibson et al.129 determined the pH dependence of the respective binding affinity of Gt and of arrestin to Mil at various levels of Mil phosphorylation. They found a phosphorylation-dependent shift in the pH maximum for the binding affinity of Gt. The Gt affinity for nonphosphorylated Mil peaks at pH 8, whereas the pH for the maximum binding affinity shifted to the alkaline values with increasing phosphorylation and became immeasurably high at 4.1 PO-4/Rhodopsin or higher. This shift was more apparent than real because the shift disappeared if the binding affinity was plotted against the surface pH instead of the bulk pH. Thus, the Gt-MII interaction depends directly on the membrane surface pH. The maximum Gt affinity now peaks at the membrane -surface pH of 7.5 at all levels of phosphorylation. Gibson et al.129 also determined the pH dependence of the binding affinity of arrestin to Mil, at various levels of Mil phosphorylation. The affinity of arrestin for Mil increases more or less linearly with pH up to pH 8.5, beyond which no data were available because of technical difficulty. The arrestin binding kinetics is also pH-dependent. With increasing pH or decreasing phosphorylation levels, the speed of arrestin binding decreases, so that the reaction does not reach equilibrium in time between flashes of il-
Bicomputing Survey I
59
lumination. The effect of phosphorylation on the arrestin binding kinetics is also mediated via its electrostatic effect on the membrane surface pH, according to the analysis of Gibson et al. Considerably more specific local electrostatic interactions may be involved in arrestin action, as suggested by a model for the rhodopsin phosphorylation dependence of arrestin binding.168'384'385 The model suggests that the negative phosphate charges on the C-terminal tail of Mil destabilizes a series of salt bridges and hydrogen bonds in the polar core of arrestin (switch region), which has a strongly positive local electrostatic potential, thus causing the conformational change of arrestin upon its activation. The recognition of Mil by Gt and by arrestin apparently belongs to a Type III mechanism designated by Hervas et al.159 (transient complex formation with molecular rearrangement). Both Gt and arrestin undergo major conformational changes after initial binding. Furthermore, arrestin eventually binds to a second site that is close to or identical to the binding site of Gt. There are several sequences of contiguous cationic amino acids in arrestin, which can be identified as positively charged patches near the iV-domain of arrestin,138'168 the proposed rhodopsin binding site (see also Fig. 11A of Ref. 129). The key residue Argl75 in the so-called polar core of arrestin allows arrestin to distinguish between phosphorylated and nonphosphorylated rhodopsin.144'139 The complementarity of charge pairing is apparent. The putative binding surface of Gt is mostly negatively charged, with some small patches of positive potentials (Fig. 11B of Ref. 129). Hamm149 pointed out the overall charge complementarity between the cytoplasmic surface of rhodopsin (positively charged residues Lys248, Lysl41 and Argl47) and the receptor-binding surface of Got (negatively charged residues Asp311 and Glu212) (see Fig. 2 of Ref. 149). The charge distribution on the surfaces of Gt and that of arrestin do not suggest the presence of a sufficiently strong electric dipole moment to lead to oriented collisions. However, judging from the contrast of the overall negative surface of Gt and the patchy positive surface of arrestin, we suspect that the membrane surface potential may play an additional role in making the encounter of the partners more deterministic via long-range electrostatic interactions (homing mechanism). The largely positive cytoplasmic surface of rhodopsin favors attraction of Gt but not arrestin (see Fig. 2 of Ref. 149). The additional positive surface potential generated by the ERP (R2 component) probably adds little to the already positive potential. The electrostatic interactions caused by the ERP are probably local in nature. However, its role in triggering Gt binding cannot be completely
60
F. T. Hong
ruled out. The membrane surface potential subsequently turns in favor of arrestin binding; the dramatic buildup of negative surface charges brought about by phosphorylation is probably sufficient to overshadow the positive surface charges that have originally been present on Mil prior to phosphorylation, thus activating a homing mechanism for arrestin. Of course, as suggested by Gibson et al.,129 the opposite surface potentials can also influence the affinity of both proteins to Mil (short-range electrostatic interactions). Their interpretation is supported by the data showing that the affinity of arrestin for Mil increases linearly with pH between pH 7.0 and 8.5,129 and that the negative photoreceptor membrane surface potential also increases linearly within this same pH range.128 In summary, the interactions between rhodopsin and its two partners, transducin and arrestin, echo the general scheme of interactions of mobile electron carriers, discussed earlier in this section, despite the differences in terms of structures and functions. Again, Nature deployed short-range non-covalent bond interactions to make collisions and reactions between encountering partners more deterministic than random processes. Nature also orchestrated the sequence of events with appropriate timing. When the interactions between Gt and Mil need to be more deterministic, the interactions between arrestin and Mil are made more random. Conversely, when the interactions between arrestin and Mil need to be more deterministic, the interactions between Gt and Mil are made more random, all by means of the intricate interplays of the same set of fundamental forces. During visual transduction, electrostatic interactions are invoked in a number of different mechanisms: local and global, nonspecific and specific. Nature seemed to recruit whatever mechanisms that are workable, in a highly flexible way, and did not stick to any particular "ideological" scheme. On the other hand, highly successful schemes were evolutionarily multiplied and perpetuated in different systems in the same organism and among different species of organisms, as exemplified by the huge family of G protein receptors and their partners, G proteins and arrestins. 6.5. Electronic-conformational
interactions
An alternative explanation for the enhancement for the docking speed was proposed by Conrad in terms of electronic-conformational interactions: the quantum speedup principle.79'80'81 In this formulation, the effect stems from the quantum mechanical superposition principle as a result of perturbative
Bicomputing Survey I
61
interactions between non-Born-Oppenheimer electrons and atomic nuclei. Conrad claimed that the interaction has a self-amplifying character, thus leading to ordered conformational motions. Conrad further claimed that the superposition of wavefunctions provides a kind of parallel processing. Furthermore, this process enhances docking speed via an electronic analog of Brownian motion. According to the quantum speedup principle, superposition effects in the electronic system provide a search mechanism for macromolecules, in addition to the use of thermal (Brownian) motion, to explore each other's surface feature.81 Note that the gradient strategy mentioned above is based on short-range non-covalent bond interactions that are also quantum mechanical in origin. However, there is an important difference. Whereas Conrad's quantum speedup principle is a pure physical effect, the gradient strategy is a "product" of evolution. An examination of the interactions between the mobile electron carrier plastocyanin and its redox partner cytochrome / or Photosystem I (PsaF subunit) offers crucial insights into the mechanisms of speeding up of the microscopic dynamics. The general view is that the interactions evolved from a randomly oriented collision (Type I) mechanism in prokaryotes, to binding by electrostatic complementarity in eukaryotes.114'159 Electrostatic complementarity is believed to be achieved by a pair of adjacent negatively charged zones that are conserved in eukaryotes. However, this is not merely a speculative interpretation commonly found in evolutionary arguments. De la Cerda et al. 91 studied site-directed mutagenesis of plastocyanin from a cyanobacterium Synechocystis. These investigators found that, like the wild-type plastocyanin, most mutants of plastocyanin react with the Photosystem I PsaF subunit by following a simple collisional kinetic mechanism, with one exception. The double mutant D44R/D47R of plastocyanin follows a reaction mechanism involving not only a complex formation with PsaF but also further reorientation to properly accommodate the redox center prior to electron transfer (Type III), as is the case in eukaryotes.159'160 This experiment demonstrated that it is realistic to acquire speedup of reactions by structural changes via evolution. Speedup of electron transfer could be acquired through evolution rather than a nonspecific quantum mechanical property of macromolecules. 7. Intracellular and Intramolecular Dynamics The mesoscopic phase refers to the membrane phase and its vicinity roughly defined by the Debye length. Outside of the Debye length is the bulk phase.
62
F. T. Hong
The intracellular dynamics refers to biochemical reactions taking place in the bulk phase of the cytosol or other subcellular compartments such as the matrix of the mitochondria. The intramolecular dynamics refers to the internal dynamics of a macromolecule (conformational dynamics). 7.1. Electrostatic interactions a macromolecule
between a small molecule and
Some biomolecules are too small for shape-matching. However, electrostatic interactions between such a molecule and a macromolecule still play an important role via different mechanisms. Water-soluble macromolecules contain charged groups at their exposed hydrophilic domains and can be treated as polyelectrolytes. The charges on the surface domains prevent aggregation and precipitation of macromolecules in the solution phase. These surface charges give rise to a zeta potential on the surface of macromolecules, and an intense but short-ranged electric field at the external border of the macromolecules for the same reason elaborated in Sec. 5.1. If a small molecule carries charges with polarity opposite to that of a macromolecule with which it is reacting, its local concentration within the range of the encounter will be greatly increased. The electrostatic effect thus makes encounters of small charged molecules with their target macromolecules more deterministic than simple diffusion in the absence of electrostatic effects. A detailed computer simulation of Brownian dynamics of the diffusion of a small negatively charged substrate superoxide radical towards an active site of enzyme superoxide dismutase340 showed that the focusing of electrostatic lines of force towards the active site may guide the diffusion of the superoxide radical towards the active site (cf. gradient strategy in Fig. 10). In this simulation, both the detailed geometric shape of the protein and the accurate description of its electrostatic potential were considered. It was revealed that the electric field of the enzyme enhances the association rate of the anion by a factor of 30 or more. On the other hand, the enhancement of local concentrations of other macromolecules surrounding a particular (central) macromolecule is less pronounced because macromolecules have an increased ionic cloud relaxation time as a result of the decreased diffusion coefficient (see the formula in Sec. 5.1). It is well known that Ca2+ and cyclic AMP, which carry net positive and negative charges, respectively, are second messengers. These second messengers act intracellularly as allosteric effectors. By binding to their target proteins, these second messengers can alter the conformation of the
Bicomputing Survey I
63
target proteins. In addition, activation and inactivation of many proteins are implemented by means of phosphorylation and dephosphorylation. For example, cyclic AMP exerts its effect primarily by activating cyclic AMPdependent protein kinase. The latter enzyme, in its active form, catalyzes the transfer of a phosphate group from ATP to serine or threonine residues in a protein, thus phosphorylating it. Since phosphorylation and dephosphorylation are effective ways of changing the charge density at the surface domain of a protein, I suspect that many regulatory processes mediated by second messengers may be based on electrostatic interactions. Of course, the regulatory mechanisms initiated by phosphorylation are far more complex than the above electrostatic analysis indicates. Exactly how much is due to electrostatic interactions and how much is due to shape-based matching is rather complicated to sort out. The topic of phosphorylation will be treated in detail in Sec. 7.2. There is yet another regulatory role a small divalent cation such as Ca2+ can play in the intracellular dynamics. Multi-valent ions are much more effective in charge screening than monovalent ions such as Na+ and K+. The effect of a trace amount (micromolar range) of Ca2+ is comparable to that of Na+ in the millimolar range (e.g., see Refs. 284 and 285). Thus, Ca2+ is very effective in modulating electrostatic interactions between macromolecules. Of course, Ca2+ has another powerful regulatory role to play via the activation of calmodulin. This latter action of Ca2+ tends to overshadow its charge screening effect. Ca2+ and Mg2+, which have an equivalent role in charge screening, often exhibit opposite physiological effects. Factors other than electrostatic interactions are concurrently at work, e.g., specific binding. Thus, control of the intracellular Ca2+ concentration can be used directly or indirectly to activate a switching process, e.g., muscle contraction and neurotransmitter release. Of course, controlling the intracellular Ca2+ concentration can be a switching process by itself. The cell maintains a steep (free) Ca2+ concentration gradient across the plasma membrane, as well as across the membranes of the intracellular sequestrating compartments, such as the sarcoplasmic reticulum (1 mM vs. 0.1 /iM). This gradient is maintained by active Ca2+ transport (requiring hydrolysis of ATP). The switching function is implemented by means of opening and closing of Ca2+ channels in these membranes. Note that the "turning on" step depends on passive diffusion of Ca2+ down its electrochemical gradient (toward the cytosolic compartment), and is therefore rapid and cost-free — it is tantamount to taking an energy "loan." In contrast, the "turning off" process
64
F. T. Hong
requires active transport of Ca2+ to the extracellular medium or the intracellular compartment of the endoplasmic reticulum (sequestration), and is, therefore, a slower process than "turning on," and is not cost-free — it is "loan" payback time. Surface charges of macromolecules also affect intermolecular interactions by a non-specific mechanism: osmosis. The osmotic effect is secondary to the presence of electrical double layers. As explained above, the presence of surface charges alters the concentrations of small ions (relative to the bulk aqueous phase) in the gap (electrical double layers) between the interacting macromolecules, thus creating an osmotic pressure there.197 This electrostatic double-layer force, as is referred to in the protein chemistry literature, decays exponentially with the distance, as explained in Sec. 5.1. 7.2. Effect of
phosphorylation
Protein phosphorylation is the most common regulatory mechanism in both the mesoscopic and the intracellular dynamics. It regulates metabolic pathways, gene transcription and translation, ion channels and membrane transport, muscle contraction, light harvesting and photosynthesis, synaptic processes, learning and memory, to name a few. Of particular relevance to biocomputing is the role of phosphorylation in synaptic processes (Sec. 5.7) and memory (Sec. 9). The cyclic AMP responsive element binding protein (CREB) is a nuclear protein that may be a universal modulator of processes required for memory formation.347 The crucial event in the activation of CREB is the phosphorylation of Serl33 in the P-Box, or kinase-inducible domain.133 In eukaryotic cells, reversible phosphorylation is catalyzed by protein kinases (for phosphorylation) and phosphatases (for dephosphorylation). Signal transduction in prokaryotes is also mediated by phosphorylation.45 It is instructive to examine how phosphorylation and dephosphorylation actually regulate protein functions. This topic has been reviewed by Johnson and Barford207 and the discussion presented here follows theirs closely. Phosphorylation exhibits a multiplicity of effects. In some enzymes, phosphorylation changes the surface properties that affect self-association or recognition by other proteins (electrostatic and/or steric hindrance effect). In others, phosphorylation may promote conformational changes at domains that are remote from the site of phosphorylation (allosteric effect). Here, we shall illustrate phosphorylation mechanisms in molecular detail by citing rabbit muscle glycogen phosphorylase as an example be-
Bicomputing Survey I
65
cause it exploits all the above-mentioned mechanisms made available by phosphorylation. Other examples which exploit only part of the available mechanisms will also be briefly mentioned. Glycogen phosphorylase exists in an inactive/low affinity T state as the nonphosphorylated phosphorylase b (GPb). Its activation is accomplished by transfer of a phosphate group from Mg2+-ATP to a serine residue, Serl4 (residue No. 14, counting from the iV-terminus), via the catalysis of phosphorylase kinase (another enzyme). Phosphorylation at Serl4 converts GPb into phosphorylase a (GPa), its active/high affinity R state. Shown in Fig. 11 is the enzyme as a dimer exhibiting a two-fold symmetry. The subunit at the top is designated with primed labels, whereas the labels for the bottom subunit are unprimed. For example, Serl4'-P designates the phosphorylated Serl4 residue of the top subunit. Phosphorylation results in a conformational change mainly involving a "swing" of the JV-terminal subunit from a position depicted by a thick black band labeled GPb in the bottom subunit to a position depicted by a thick white band labeled GPa. The TV-terminal segment from residue 10 to 18 is disordered in GPb but becomes more ordered in GPa. In contrast, the C-terminal segment (from residues 838 to 842) undergoes an order-to-disorder transition so as to accommodate the conformational swing of the TV-terminal segment. These interactions, which are crucial to the allosteric action, are a combination of electrostatic interactions made by the Serl4-P, and polar and nonpolar interactions of the surrounding residues. In GPb, the JV-terminal segment lies in the intra-subunit region near three a-helices (thick black band). The sequence of the TV-terminal segment contains a number of positively charged residues around Serl4 (ArglOLysll-Glnl2-Ilel3-Serl4-Vall5-Argl6, where positively charged residues are boldfaced). These positively charged residues are located near a cluster of acidic residues (bearing negative charges) on the protein surfaces, thus stabilizing the positioning of the TV-terminal segment in GPb. Upon phosphorylation, the introduction of negative charges to the Serl4 residue (Serl4-P) causes so much electrostatic repulsion as to offset the preexisting electrostatic attraction and to compel the TV-terminal segment to undergo a dramatic swing; residues 10-22 are rotated by approximately 120° and the Serl4 residue shifts more than 36 A between GPb and GPa. In GPa, these charged amino acid residues now enter a new spatial relationship. For example, Argl6 now interacts electrostatically with the phosphate group on Serl4. Serl4-P also interacts with other nearby arginine groups such as Arg69 and Arg43' (from the other subunit), both of which are actually quite
66
F. T. Hong
Fig. 11. Phosphorylation of rabbit muscle glycogen phosphorylase. The view of the phosphorylase dimer is directed down the two-fold axis with the phosphorylation sites (Ser 14-P and Ser 14'-P) and allosteric sites (AMP and AMP') towards the viewer. Access to the catalytic sites are from the far side of the molecule. The glycogen (substrate) storage sites are also indicated. The subunit at the top is designated with primed labels, whereas the labels for the bottom subunit are unprimed. Phosphorylation results in a swing of the AT-terminal subunit from a position indicated by a thick black band labeled GPb (bottom subunit) to a position depicted by a thick white band labeled GPa. (Reproduced from Ref. 207 with permission; Copyright by Annual Reviews)
remote on the linear sequence but happen to be nearby due to the threedimensional folding. On the other hand, ArglO, which is only 4 residues upstream, makes a hydrogen bond with Glyll6' from the other subunit. It is evident that the conformational changes taking place at the Nterminus and at the C-terminus are highly coordinated and well orchestrated; new ionic and hydrogen bonds form in the right place at the right time as if the molecule anticipated all the requirements of stable bonding after the enzyme activation. It is indeed like a three-dimensional jigsaw puzzle with two acceptable solutions, corresponding to the nonphosphorylated
Bicomputing Survey I
67
form and the phosphorylated form, respectively. Phosphorylation allows GPb to partially disassemble and reassemble into GPa in a smooth fashion; no complete disassembly and reassembly are required. The end result of all these conformational changes is the exposure of the catalytic site. The allosteric effect described for muscle glycogen phosphorylase is, however, not universal, and some proteins exploit only some of the possible effects of phosphorylation. For example, the phosphorylation of isocitrate dehydrogenase (IDH) does not involve global conformational changes that accompany the allosteric effect. Instead, phosphorylation of Serll3 causes the introduction of negative charges of the dianionic phosphate group at the catalytic site and prevents binding of the anion substrate, isocitrate (primarily an electrostatic effect). The inhibition effect can be mimicked by a single mutation of Serll3-to-Asp (thus introducing a negative carboxylate charge). The inhibition of IDH is not purely electrostatic in nature; steric hindrance for substrate binding is an additional factor. In contrast, a Serl4-to-Asp mutation in glycogen phosphorylase is insufficient to generate an enzyme activation because it does not promote conformational changes via an allosteric effect. The diversity of phosphorylation events is further demonstrated by the difference between yeast glycogen phosphorylase, a scantily regulated enzyme, and muscle glycogen phosphorylase, an elaborately regulated enzyme.49 The phosphorylation site at Serl4 of the iV-terminal region is replaced with Asn in yeast glycogen phosphorylase. Phosphorylation thus acts on different sites with surprisingly different mechanisms. Phosphorylation of muscle glycogen phosphorylase is an excellent example of the coherent and concerted interplay between switch-based processing and shape-based processing. The glycogen storage site and the catalytic site utilize the lock-key paradigm for recognition of the correct substrates. Phosphorylation of a single residue Serl4 is the major switching event that precipitates the conformational change leading to activation (exposure) of the catalytic site at a remote location (allosteric effect); the switching effect is transmitted through a change of shape. 7.3. Concept of intelligent materials The allosteric effect caused by the phosphorylation of glycogen phosphorylase exhibits an elaborate process control sequence. The seamless (or, rather, coherent) and concerted movement of polypeptide segments relative to each other leads to exposure of the catalytic site previously hidden in the
68
F. T. Hong
inactive precursor. The new conformation is stabilized with new hydrogen and ionic bonds replacing old ones before the conformational change. Such an elaborate behavior exhibits an element of anticipation as if the entire molecule were aware of the goal of the conformation change, and is, therefore, suggestive of virtual intelligence (cf. final causes in Sec. 6.13 of Chapter 2; see also Ref. 54). Of course, a macromolecule like glycogen phosphorylase has neither consciousness nor intelligence in the conventional sense. It appears as though it had "consciousness" and were able to "anticipate" the outcome and had some crucial groups readily lined up by "having planned ahead." This is an illusion on our part but it is also a testimonial to the power of evolution. The elaborate internal dynamics of macromolecules can be summarized by the concept of intelligent materials. The concept originated from the term "smart materials" coined by the U.S. Army Research Office and was first applied to the materials science of ceramics and composites.320 Intelligent materials or smart materials are defined as composite materials in which sensors and actuators are embedded in the structural components. Subsequently, Japanese investigators extended the idea to single molecules; all the sensor, actuator and even processor capabilities are included and compacted into a single molecule. Ito et al.198 synthesized such a molecule by linking insulin to glucose oxidase via a disulfide bond. Glucose oxidase is the sensor that senses the glucose concentration. The resulting oxidation causes the cleavage of the disulfide bond, thus triggering the actuator and releasing insulin. Such an intelligent molecular complex was intended to be a prototype of drug-delivering devices. According to the Science and Technology Agency (STA) of Japan, intelligent materials are "materials with the ability to respond to environmental conditions intelligently and manifest their functions."368 The definition implies the construction of functional materials in the metaphor of a neural reflex arc. A neural reflex consists of an input neuron (sensor), one or more interneurons (processor) and an output neuron (actuator). These elements are necessary but not sufficient for a useful reflex-like function. An additional element is needed: the control law that specifies the input-output relation. The sensor-processor-actuator assembly constitutes the substrate of an intelligent system. However, the seat of intelligence lies in the control law. The control laws that regulate biochemical reactions and physiological processes often display features that allow biomolecules or biological structures to perform more tasks than are expected from a simple mechanical device. Indeed, a hallmark of the intramolecular dynamics of biomolecules
Bicomputing Survey I
69
is the concerted and interlocking steps of conformational changes that lead to a purposeful action: each part fits spatially and each step fits temporally (kinetically) with an element of anticipation of the purposeful outcome. These concerted and interlocking steps are what Rosen referred to as closed entailment loops (Sec. 6.13 of Chapter 2). The overall process exhibits intentionality that is conducive to the suggestion of a master hand behind the design. Of course, the evolutionary mechanism replaces the need of a master hand. The concept of material intelligence is eminently demonstrated by the oxygen-carrying protein, hemoglobin, of the red blood cell.50 Although other simpler examples may readily come to mind, hemoglobin is selected for a detailed discussion because it has been exhaustively investigated, and some mysteries about its extraordinary properties have been replaced by mechanistic explanations. The detailed molecular mechanisms further illustrate many important features of short-range non-covalent bond interactions. Hemoglobin is a heterotetramer consisting of four subunits, 2 a-chains and 2 /3-chains. Each subunit contains a prosthetic group, heme, that can bind a molecule of oxygen. In addition, binding of the first oxygen molecule enhances the subsequent binding of additional oxygen molecules, and the oxygen binding curve of hemoglobin (fraction of hemoglobin oxygenated vs. oxygen partial pressure) has a characteristic sigmoid-shape (Fig. 12A). A comparison with another oxygen binding protein in muscle cells, myoglobin, reveals the advantage of a sigmoidal oxygen saturation curve (Fig. 12A). Myoglobin is a monomeric protein similar to a single hemoglobin subunit chain. However, myoglobin exhibits a hyperbolic saturation curve, characteristic of a one-to-one binding of oxygen to myoglobin. The oxygen binding curve of myoglobin reaches its half-saturation level at a Po 2 (oxygen partial pressure) of 1 mmHg (half saturation pressure, P50 = 1 mmHg). In contrast, the P 50 of hemoglobin is about 26 mmHg. Thus, the oxygen affinity of hemoglobin is much reduced as compared to myoglobin, and even less as compared to free heme. Since the operating range of hemoglobin is between approximately 20 mmHg in the capillaries of active muscles (where oxygen is released) and 100 mmHg in the capillaries of the alveoli (where oxygen is bound), the reduction of oxygen affinity is necessary for hemoglobin to function effectively. If hemoglobin had an affinity like myoglobin, the binding of oxygen would be too tight for hemoglobin to unload significant amounts of oxygen in the tissues. On the other hand, if hemoglobin had a hyperbolic satura-
70
F. T. Hong
Fig. 12. Cooperative binding of oxygen to hemoglobin. A. The oxygen binding curve for myoglobin and normal hemoglobin. A hypothetical hyperbolic binding curve of myoglobin with the same half-saturation level as hemoglobin is computed and shown as a dotted curve for comparison. The sigmoid-shaped binding curve of normal hemoglobin allows it to bind more oxygen at the lung and to release more oxygen in the tissues than the hypothetical hyperbolic curve. B. The right-shift of the sigmoidal oxygen binding curve under increased Pco2 and/or lowered pH (Bohr effect) allows additional oxygen to be bound at the lung and more oxygen to be released in the tissue. (A. Reproduced from Ref. 50 with permission; Copyright by W. B. Saunders. B. Reproduced from Ref. 226 with permission; Copyright by American College of Chest Physicians)
Bicomputing Survey I
71
tion curve, the oxygen carrying function would be compromised even if the affinity had been reduced. This can be seen from a comparison of the hypothetical (calculated) hyperbolic saturation curve (dotted curve) and the normal sigmoidal curve for hemoglobin shown in Fig. 12A. A hyperbolic saturation curve with the same half saturation level (P50 = 26 mmHg) would allow much less oxygen to be bound in the lungs (reaching about 80% saturation instead of 97%) and significantly less oxygen to be released in the tissues (reaching about 43% saturation instead of about 30% or less) than the sigmoidal saturation curve. As a consequence, the amount of oxygen delivered to the tissues per hemoglobin molecule would be drastically reduced. In other words, the sigmoidal saturation curve significantly enhances the performance of hemoglobin. The performance of hemoglobin is further enhanced by a phenomenon known as the Bohr effect: pH dependence of the oxygen saturation curve. In Fig. 12B, lowering of pH at the tissues (due to an increased CO2 production and/or formation of lactic acid, a glycolytic by-product) shifts the curve to the right relative to the curve at the high pH of the lungs. An oxygen binding function associated with the Bohr effect allows more oxygen to be bound to hemoglobin at the lungs and more oxygen to be unloaded at the tissues. There is a similar effect due to the increase of Pco 2 (CO2 partial pressure). The CO2 effect is primarily due to the acidification of the blood (as a result of carbonic anhydrase-catalyzed formation of carbonic acid). There is a secondary effect due to direct binding of CO2 to hemoglobin as a carbamino derivative, also resulting in a right-shift of the saturation curve, at constant pH. However, this latter effect is relatively minor compared to the pH effect. Increasing temperature also causes a right-shift of the curve. Again, this latter effect favors loading of oxygen at the lungs, and unloading of oxygen at the tissues since the tissue temperature is higher than the temperature at the lungs. The oxygen affinity of hemoglobin in red blood cells is actually lower than purified hemoglobin in a regular electrolyte solution. The reduction of its affinity (a shift of the saturation curve to the right) is the consequence of binding 2,3-diphosphoglycerate (DPG) with a 1:1 stoichiometry. Without DPG, the half saturation level of hemoglobin is the same as myoglobin, 1 mmHg. Incidentally, the concentration of DPG in the red cells is considerably higher than in other cell types in keeping with its unique role. The sigmoidal saturation curve of hemoglobin and related phenomena on the regulation of oxygen binding is a consequence of the intramolecular dynamics; changes in one part of the molecule are transmitted to other
72
F. T. Hong
parts in the form of conformational changes (allosteric effect). First, let us examine the oxygen binding sites in both hemoglobin and myoglobin. The porphyrin ring of heme holds a ferrous ion (a transition metal ion) with four coordination bonds linked to the four pyrrole nitrogen atoms. The remaining two coordination bonds of the ferrous ion are linked to two histidine residues, known as the proximal and the distal histidine. In the deoxygenated form, the ferrous ion is slightly out of the plane of the porphyrin ring of heme. An oxygen molecule approaches the ferrous ion from the side opposite to the proximal histidine. The binding is somewhat weakened by the distal histidine (on the same side) because the latter forces the O-O-Fe bond to bend by virtue of steric hindrance. Furthermore, binding of oxygen causes the ferrous ion to move into the ring plane. This movement is transmitted by the corresponding movement of the proximal histidine to other residues at the subunit interface between al and /32 or al and /31, thus disrupting a number of ionic and hydrogen bonds between the two chains, a l and /32 or al and (31. These bonds are harder to disrupt initially when the first oxygen molecule is bound. As increasing numbers of oxygen molecules are bound, the ionic and hydrogen bonds gradually loosen up (and are replaced with new but weaker hydrogen bonds). The affinity of oxygen binding increases accordingly. The phenomena of the sequential and gradual increase of oxygen affinity of hemoglobin is commonly referred to as cooperativity. The role of DPG in the cooperativity is mediated by the formation of ionic bonds between the highly negative DPG (with 5 titratable acid groups with an average of 3.5 charges due to incomplete ionization) with positively charged histidine and lysine residues from all four subunits at a pocket that is located at the intersection of all four subunits. The ionic bonds are gradually weakened and DPG is eventually extruded from the cavity as a consequence of oxygenation. The (positive) cooperative behavior of hemoglobin allows the molecule to respond to the environment and make appropriate adjustments, resulting in an enhanced function. Hemoglobin's characteristics, therefore, fit the definition of intelligent materials. The elaborate control laws ensure that the performance of hemoglobin is significantly enhanced as compared to that governed by a simpler control law such as a hyperbolic saturation curve. Elaborate control laws that exhibit negative cooperativity are involved in the control of complexation of Fd, FNR, and NADP+ (see Sees. 5.2 and 6.2 for descriptions of these macromolecules). Physiologically, electrons flow from Fd to FNR and then to NADP+. Fd and NADP+ bind to two separate binding sites on the opposite surfaces of FNR. The presence of
Bicomputing Survey I
73
NADP+ destabilizes the Fd-FNR complex. The dissociation constant K& for oxidized binary Fd-FNR complex is less than 50 nM; /Q(Fd) increases with increasing NADP+ concentrations, approaching 0.5-0.6 \iM when the flavoprotein is saturated with NADP + . Likewise, the presence of Fd destabilizes the FNR-NADP"1" complex. The dissociation constant i^NADP+ a l s o increases from 14 fiM to approximately 310 /uM, upon addition of excess Fd. What is the impact of the negative cooperativity on the overall electron transfers from reduced Fd to NADP+? According to Batie and Kamin's interpretation,22 the rate of electron transport may be limited by the dissociation of oxidized Fd from FNR. In other words, oxidized Fd is a potent competitive inhibitor with respect to reduced Fd. Batie and Kamin23 also found that, in the absence of NADP+, the reduction of FNR by Fd was much slower than the catalytic rate: 37-80 e~s~1 versus at least 445 e~s~1; dissociation of oxidized spinach ferredoxin (Fdox) from FNRsq (the one-electron oxidized semiquinone state) limited the rate of reduction of FNR.266 Therefore, destabilization of the Fd-FNR complex by NADP+ may facilitate the overall reaction. Thus, FNR is an intelligent material. The following is another way of looking at negative cooperativity. The dissociation constant is an indication of the probability that the redox partners are in a free state versus a complexed state. As electrons are transferred from Fdred to FNRQX or to FNRsq and then to NADP+, ferredoxin and NADP+/NADPH must come on and off frequently (coming on as Fdred and NADP+ and coming off as Fdox and NADPH). There is a distribution of fractions in a ternary or a binary complex and in the free form. The ternary form is not desirable because it facilitates a back flow of electrons from FNRgq or FNRred back to Fdox (FNR is known to catalyze the reverse electron flow).142'143 From a heuristic point of view, there is no need for FNR to bind Fd when electrons are being transferred to NADP+. Likewise, there is no need for FNR to bind NADP+ when electrons are being transferred from Fd to FNR. The fact that the ternary complex does exist is a consequence of chemical equilibrium between association and dissociation; no dissociation can be complete. Thus, the negative cooperativity means an additional reduction of the fraction of the ternary form, thus enhancing the overall one-way electron flow from Fdred to NADP+. Since the binding sites for Fd and for NADP+ are located at opposite surfaces of FNR, the negative cooperativity is apparently not caused by steric hindrance. Most likely, it is caused by conformational changes of FNR resulting from binding of one or the other redox partner. A confirmation of the conformational changes must await the determination of structures of
74
F. T. Hong
the protein in its various states of binding (free, binary or ternary complex), and, perhaps also in various oxidation states (oxidized, semiquinone, or reduced). Such desired structural information is available for cytochrome CQ, mentioned in Sec. 6.3. Banci et al. have determined the solution structure of cytochrome c6 in the oxidized form16 and in the reduced form.17 The comparison of the solution structure of cytochrome CQ shows that the overall folding of the oxidized form is the same as that of the reduced form. The largest difference between the two structures is the change in the orientation of the C/3-C7 bond of propionate 7, which leads to a different orientation of the carboxylate group of the heme ring. Concomitant with this movement, there is a reorientation of His30 (which, unlike Hisl9, is not an axial ligand of the heme group) and one of the two carboxylate oxygen atoms of propionate 7. The reorientation of His30 is, however, absent in the reduced form in the crystal structure. The change of propionate 7 orientation of the reduced cytochrome c% observed in the solution structure is the same as that observed in the crystal structure.114 However, the data showing changes of binding constants between redox partners upon a conformational change, as a consequence of electron transfer and a change of redox state, are lacking. However, it has been suggested that such a conformational change favors electron transfer rather than the dissociation of redox partners.16 Implicit in our evaluation of the function of hemoglobin is the criterion associated with its intended function: carrying oxygen from the lungs to tissues. The cooperative effect of hemoglobin is Nature's solution to achieve an enhanced function. We have previously pointed out that molecular intelligence must be evaluated with the intended function in mind whenever human beings use biomaterials to construct molecular devices.177'178'179-185'187 Often a biomaterial is used for applications that are only remotely related to its native function. What is considered intelligent for its native function may not be sufficiently intelligent for the intended application, because we must now adopt a completely different set of optimization criteria than Nature did. On the other hand, what is considered epiphenomenal in biological function may be recruited for technological applications (latent function). Sometimes, it is necessary to alter existing biomaterials to enhance certain molecular functionalities and suppress others for specific applications. This approach was once a formidable task due to the complexity of biological macromolecules. However, this situation is partially alleviated by the advent of biotechnology, including site-directed mutagenesis. Several successful applications of bacteriorhodopsin, invoking its latent functions, will
75
Bicomputing Survey I
be described in Sec. 7.8 of Chapter 2. 7.4. Concept of calcium-concentration
microdomain
As a second messenger, Ca2+ is involved in a multitude of intracellular processes. The intracellular Ca2+ concentration is controlled by Ca2+ influxes from the extracellular space or from sequestrated depots of intracellular compartments such as endoplasmic reticulum or sarcoplasmic reticulum (Sec. 7.1). In principle, Ca2+, after entry, can freely diffuse to every part of the cytoplasm. A spatially widespread and unrestricted increase of the cytoplasmic Ca2+concentration is tantamount to a broadcast signal for activating numerous intracellular processes, both wanted and unwanted. In view of the latter possibility, subcellular compartmentalizations as delineated by intracellular membranes may not be sufficient to prevent spurious actions by Ca2+. Normal synaptic transmission is initiated by activation of voltagesensitive Ca2+ channels. The resulting increase of Ca2+ concentration then causes the synaptic vesicles to fuse with the presynaptic membrane (Sec. 5.7). Llinas255 has long speculated that the sharp increase of Ca2+ concentration during synaptic transmission must be spatially restricted near the active zones of a synapse with a high concentration profile that lasts only a short duration. This suspicion was supported by the following experimental observations. The latency for calcium activation of the release process in the squid giant synapse was of the order of 200 [is.256 Measurements with ion-sensitive electrodes suggest rapid changes in the intracellular Ca2+ concentration with a steep spatial profile close to the membrane during the Ca2+ current flow.246'58 This steep profile was once thought to be produced by the retardation of Ca2+ diffusion by the cytoplasmic buffing mechanism. Llinas et al.256 modeled the process in response to the passage of a 200 nA Ca2+ current through a single active zone containing many Ca2+ channels and predicted a steep spatial gradient of Ca2+ even in the absence of a Ca2+ buffering system. As explained in Sec. 5.1, the presence of a negative surface potential also contributes to the maintenance of the transient spatial gradient. The presence of this steep gradient is also supported by the finding that Ca2+entering the presynaptic terminal through Na+ channels did not contribute to transmitter release. 258 Simon and Llinas354 showed, in a comprehensive model, that the opening of Ca2+ channels leads to discrete and spatially restricted peaks of the Ca2+ concentration, which reaches a steady state rapidly relative to the open time of the channels: the Ca2+
76
F. T. Hong
loss to the bulk phase due to diffusion is balanced by the continuing influx through Ca2+ channels. Subsequently, by means of time-resolved imaging of the Ca2+-dependent aequorin luminescence, Silver et al.350 demonstrated the predicted localized transient elevation of the intracellular Ca2+ concentration, with the concentration profiles reaching 200-300 fiM at peak positions. These intracellular Ca2+ concentration profiles are composed of groups of short-lived spatial domains of a diameter of 0.5 fim, which they named quantum emission domains (QEDs). These domains were present in locations close to the active zones of the synapse. The time course of QEDs was found to follow closely the time course of the presynaptic Ca2+ current.367 The Ca2+ concentration microdomains reaches a peak within 200 /xs and has an overall duration of 800 (is.257 Silver348'349 extended the concept of calcium microdomains to the study of the control of cell division in sand dollar blastomeres. He found that the transient increase of Ca2+ concentration, prior to the nuclear envelope breakdown, take places within 1 fim of the nuclear envelope. These calcium microdomains are 3-10 fim in diameter, and last 900-3000 ms in duration, appearing about 6 minutes prior to the nuclear envelope breakdown. Further analysis of the quantum emission domains showed that the calcium signals are modulated in the frequency domain, exhibiting regular periodicity in space-time; the signal pattern appears to be consistent with and necessary for models of frequency or pulse code modulated signals. The restricted window of time, at 6 minutes before the nuclear envelope breakdown, that the Ca2+ signal can act on the target membrane further ensures the specificity of the Ca2+ action. This is similar to the strategy of "gating" in microelectronics. Thus, through spatial juxtaposition of the release and the target sites and through controlled timing of Ca2+ entry, calcium microdomains exert highly specific and deterministic control of intracellular events. These dynamic entities essentially enforce an alternative way of intracellular compartmentalization. The nonrandom space-time patterns confer high specificity to these Ca2+ signals; cross-talks with "unintended" targets are minimized. The transient nature of the space-time pattern of calcium microdomains thus invalidates any attempt to apply the concepts of equilibrium thermodynamics to the analysis.
Bicomputing Sur^uey I
77
7.5. Errors, gradualism and evolution As illustrated by examples presented in previous sections, Nature, through enzyme catalysis, compartmentalization, intervention of electrostatic interactions and other non-covalent bond interactions, manages to rescue biocomputing from the assault of randomness inherent in molecular diffusion and reactions. The mechanisms by which side reactions are minimized and matching reactants are compartmentalized and the mechanisms that enhance the odds of desirable encounters are cumulative improvements made during evolution. However, why did Nature not make it error-free? The answer is obvious with regard to genetic encoding. Were it not for occasional errors in gene replication, there would never be any evolution. However, a computer program can also have occasional errors, i.e., "bugs." Most of the time, a bug causes the entire program to crash. That is, the digital computer often degrades catastrophically upon the encounter of a single programming "bug." In contrast, a living organism usually degrades gracefully with a point mutation. A single point mutation may not threaten the life of a living organism but instead fuels possible improvements through evolution. The peculiar feature of proteins in their relative forgiveness of structural errors is termed gradualism by Conrad.71'72 Conrad defined gradualism as the condition of evolutionary improvements achieved by means of a single point mutation (p. 255 of Ref. 76). He reasoned that if an improvement requires more than one point mutation the likelihood of an intervening lethal mutation would prevent the improvement from being achieved. Whether such a severe restriction is necessary or not may be debatable since concurrent mutations at multiple sites are not uncommon (Sec. 7.6). However, the latter fact and Conrad's claim are not mutually exclusive. It is well known that there are 20 different amino acids that constitute the primary building blocks of proteins. These amino acids are identified by three-letter codons of messenger RNAs, which are polymers of four different kinds of subunits differing only in the attached side chain of one of the four nucleotide bases: A (adenine), C (cytosine), G (guanine) and U (uracil). Permutations of three letters out of four give rise to 64 (43) different three-letter codes (codons). The correspondence between codons and amino acids is therefore not one-to-one; one amino acid is often coded by more than one codon. The advantages of this redundancy were revealed by computer simulation analysis of variant genetic codes. The following summary recapitulates the main points of a recent review article by Preeland
78
F. T. Hong
and Hurst.116 The now-standard genetic code seemed to be shared by a wide range of organisms from bacteria to humans. Investigators once thought that the code was a "frozen" accident, meaning that the code appeared randomly but the appearance of the first workable code made competitive codes difficult to "survive," by virtue of a self-reinforcing mechanism (cf. Sec. 3.4 of Chapter 2). However, molecular biologists now know that there are some variant codes which assign different meanings to certain codons. For example, while many organisms translate the RNA codon "CUG" as the amino acid leucine, many species of the fungus Candida translate CUG as serine, instead. Mitochondrial genomes also exhibit variations that are characteristic of bacterial genomes, thus betraying their bacterial origin (cf. endosymbiont hypothesis; pp. 714-715 of Ref. 3). These discoveries suggested that the genetic code was not frozen but could, in principle, evolve just like proteins, and the emergence of the standard genetic code was perhaps by no means accidental but a consequence of evolutionary improvements of the codes. A detailed comparison of the individual codons revealed several recognizable patterns. First, synonymous codons tend to differ by just a single letter, usually the last. It was previously known that mistakes in translation of a codon in a messenger RNA into the corresponding amino acid occur most frequently at the codon's third letter, because the binding affinity between a messenger RNA and a transfer RNA is weakest at this position ("wobble" phenomenon). Thus, mistranslations often yield the same amino acid meaning. So do mistranscriptions and point mutations at the third position of a codon. Second, codons for amino acids with similar affinities for water (hydrophilicity or hydrophobicity) tend to differ by their last letter. This redundancy ensures that mutations or translation errors involving the third letter result in little change of local hydrophobicity of the affected peptide chain — an important factor for protein folding, in general, and for maintaining the integrity of folding nuclei, in particular (Sec. 7.6). Third, codons sharing the same first letter often code for amino acids that are products or precursors of one another. Again, similar codons code similar amino acids. In summary, a single point mutation — the change of a single codon letter — or a translation error involving a single reading frame (letter) does not cause a catastrophic failure in protein folding or a drastic change of protein properties in case of a successful folding. Just how much did evolution improve the genetic codes in terms of grad-
Bicomputing Survey I
79
Fig. 13. Extradimensional bypass. A. A deep lake is located between two peaks. B. The cross-section shows two peaks and the lake separating them. If paths from one peak to the other are confined to a plane (two dimensions), there is no choice but to go through the lake, which is considered energetically unfavorable. The addition of an extra dimension (in A) provides an additional degree of freedom. The paths along the lake shore, made possible by the additional degree of freedom, are energetically less formidable than the path through the deep lake.
ualism? Computer simulation analysis of a vast number of variant codes furnished some quantitative evaluations. If a mutation or a translation error is evaluated in terms of the degree of change of hydrophobicity of the coded amino acid, and if the computer simulation is based on random sampling (i.e., objective statistics), then only about 100 out of one million alternative codes have a lower error value than the natural code. If additional restrictions are incorporated to reflect observed patterns in the way DNA tends to mutate and the ways in which genes tend to be mistran-
80
F. T. Hong
scribed into RNA (i.e., akin to Bayesian analysis which includes a priori knowledge of the statistical distribution), then the standard genetic code outperforms its variants one in a million — it delivers a grade far better than expected of A + . 115 Proteins are therefore robust enough to tolerate disruptions caused by some unwanted mutations, and/or by mistranscriptions and/or mistranslations. The process of evolution can be pictured as moving from one fitness peak to another on the surface of a metaphoric adaptive landscape or fitness landscaped If two peaks are separated by an unbridgeable abyss, there would be no way to go from one fitness peak to another. Gradualism provides many degrees of freedom in the variation of protein structures and renders proteins amenable to evolutionary improvements. The adaptive landscape is therefore multi-dimensional. This feature has been referred to as extradimensional bypass by Conrad76 (Fig. 13). The idea is illustrated by a three-dimensional fitness landscape with two adaptive peaks separated by an abysmally deep lake. Going from one peak to the other along a plane that cuts through both peaks requires a direct passage across the deep lake and thus would be improbable. If the path is extended from two dimensions to three dimensions, going around the lake shore is favored. In other words, a multi-dimensional adaptive landscape offers additional redundancy making gradualism possible. The protein folding process parallels the event of protein evolution in the sense that extradimensional bypass is at work during the intricate process of folding through free energy-minimization. 7.6. Protein folding The doctrine of modern molecular biology indicates that genetic information provides the linear sequence of the polypeptide chains; no explicit information is provided for the secondary, tertiary or quaternary structures. In other words, the primary sequence is explicitly programmed by the genes, whereas the secondary, tertiary and quaternary structures follow the physico-chemical interactions between amino acid residues in a proper temporal and spatial sequence (implicitly programmed). Most naturally occurring proteins assume two conformations: a native (folded) conformation and a denatured (unfolded) conformation. Many denatured proteins are capable of refolding into their native conformation. Thus, the linear sequence of a polypeptide not only encodes its threedimensional conformation but also contains information about its folding pathway. Protein folding can be regarded as a form of problem solving
Bicomputing Survey I
81
similar to assembling a three-dimensional jigsaw puzzle. Although protein folding remains an unsolved problem, considerable insight has been gained into the major forces and their interplay that makes folding possible and Unique
14 15 84 9
> > > 8,130,274,319,277
Early theories of protein folding consisted of two rival theories about the primary force involved: hydrophobic force222 versus hydrogen bonding.309'308 It is well known that secondary structures, such as a-helices, 310 helicese and /3-sheets, contain numerous hydrogen bonds between the main-chain carbonyl and amide groups. Yet, water molecules also form extensive hydrogen bonding, and thus the most stable state of a soluble protein would be the denatured state if hydrogen bonding were the only important force. Another line of thinking draws inspiration from the finding that nonpolar amino acid residues tend to gather at the interior of a globular protein. This emphasizes the hydrophobic interaction as the primary driving force for protein folding. However, unlike the oil droplet forming by hydrophobic interactions in the presence of water, the protein interior is not disordered but is densely packed into a crystalline or quasi-crystalline state. Thus, it appears that the hydrophobic effect alone is insufficient to account for the self-organized folding of proteins into unique stable structures. Specific internal packing of hydrophobic amino acids is a major factor for specific internal structure. The side chains of amino acid residues of a polypeptide tend to fit together with an amazing complementarity like a threedimensional jigsaw puzzle. As proposed and summarized in a review by Rose and Wolfenden,322 the contemporary view of protein folding centers around a hierarchical process with the hydrophobic effect as the driving force to a globular collapse, forming the so-called molten globule as an intermediate state of folding. Even at this stage, the secondary structures such as a-helices and ^-sheets are already in rudimentary forms, which implies the relative importance of hydrogen bonding as a stabilizing factor. Subsequent folding is built on top of these rudimentary folds and does not require extensive restructuring. Segments of secondary structures associate with each other, thus forming supersecondary structures by fitting an a-helix with another ohelix, a /?sheet with another /?-sheet or a /3-sheet with an a-helix. These supersecondary structures can be categorized into half a dozen three-dimensional e
A 3io helix has a tighter configuration than an a-helix. It contains a n i - » i + 3 hydrogen bonding, instead of an i —» i + 4 hydrogen bonding. 336 ' 216
82
F. T. Hong
basic modules (modularity). These modules exhibit functional versatility and ease of integration into larger units. Some of them with the JV-terminus and the C-terminus located at opposite ends can readily be linked into a longer strand as a result of tandem duplication of the encoding gene ("in line" type). A module with the two termini located at the same end can readily be inserted into a loop region of a protein ("plug-in" type). It is well known that the extensive hydrogen bonding between the acarbonyl and amide groups is involved in both /3-sheet and a-helix formation. These structures are then linked by two other secondary structure motifs: turns and loops. Apparently, these secondary structures maintain considerable stability with tolerance towards amino acid residue substitution because bonding involves primarily main-chain carbonyl and amide groups; side-chain specificity (except proline) is not an issue except near the two peptide terminals. However, amino acid specificity does come into play in the formation of the secondary structures. An examination of the a-helical structure reveals that hydrogen bonding takes place regularly with the amide hydrogen and carbonyl oxygen at a fixed pitch of four, the carbonyl group being approximately four (3.6) residues upstream from the amide residue. As a consequence, the initial four N-H groups and final four >C=O groups lack the intrahelical hydrogen-bonding partners. The hydrogen bonding requirement of these terminal groups is satisfied by capping with side chains of polar residues flanking the two helix termini: Asp, Glu, Ser, Thr, Asn, Gin and neutral His at the JV-terminus, and Lys, Arg, Ser, Thr, Asn, Gin and protonated His at the C-terminus. As for the central residues of a helix, the shape of side chains appears to play a discriminatory role. Studies by means of site-directed mutagenesis of proteins have demonstrated that the buried interior of a protein can tolerate considerable diversity of residue substitution with only minor effects on the structure, stability and function. Despite the three-dimensional jigsaw puzzle analogy, the buried protein is not that tightly packed, but maintains a fluid-like environment, at least for certain parts and at certain times. Activation of an enzyme often involves the conformational change from a taut (tense) form (T state) to a more relaxed form (R state), whereas inactivation involves the reversal of the change (see Sec. 7.2). Presumably, a more relaxed state facilitates molecular recognition by promoting exploration between the enzyme and its substrate. Thus, proteins are endowed with forgiving tolerance to unwanted mutations, and, therefore, with malleability that is compatible with gradualism. Apparently, the tolerance has been built in the
Bicomputing Survey I
83
evolutionary selection of 20 different kinds of amino acids as the building blocks of proteins. How Nature arrived at this particular set of building blocks was once a completely open question.227 Intuitively, the amino acid redundancy with regard to polarity and steric property (shape), when coupled with thefluid-likeenvironment, permits the protein to tolerate a major packing fault most of the time. Thus, a "kink" can be straightened out by a local rearrangement without precipitating a global rearrangement. This intuitive inference remains largely correct, after computer simulation studies of the evolution of genetic codes have partially lifted the "fog" surrounding the problem (Sec. 7.5). Recent advances in protein folding research resulted mainly from experimental and theoretical studies of a broad class of small proteins and simple generic lattice and off-lattice models rather than detailed all-atom models (see p. 176 of Ref. 43 for a brief description of lattice and off-lattice models). Simplicity of these models permitted efficient computations and a huge number of simulations, resulting in statistical descriptions of folding events. These models also made possible simulations of the evolution of proteins under various selection pressures. Such simulations re-affirmed the view that protein folding involves heuristic searching rather than random searching. Some key insights extracted from a review article by Mirny and Shakhnovich277 are presented below. The actual situation of protein folding is more complicated, and rival alternative theories have not been fully rejected. The critical issues are stability and fast folding kinetics. The key concept is cooperativity. Theoretical studies of various approaches converged on the view that amino acid sequences that had undergone evolutionary selections fold cooperatively, whereas random sequences do not. To exhibit a cooperative folding transition and to fold fast, the native structure must reflect a pronounced potential energy minimum separated from the rest of the structure by a large energy gap. Evolutionarily favored sequences fold by a nucleation mechanism whereby a small number of residues (folding nucleus) need to form their native contacts in order for folding reactions to proceed fast into the native folded state. Small single-domain proteins fold like two-state systems with only folded and unfolded states being stable, whereas all intermediate (partly folded) states are unstable. The two-state transition is akin to the first-order phase transition of a finite system. However, the transition state is not a single protein conformation, but is rather an ensemble of conformations (transition state ensemble, TSE). The progress of a simple chemical reaction is characterized by the motion
84
F. T. Hong
along a single reaction coordinate, and the transition state resides at the "great divide," from where the reaction can either go towards the product (forward reaction) or go towards the reactant (reverse reaction). Protein folding reactions have many reaction coordinates in many dimensions, and the TSE is the equivalent of the transition state of a simple reaction. Correct folding occurs when the TSE proceeds in the forward direction towards the native folded state, whereas unfolding occurs when the TSE proceeds in the reverse direction towards the unfolded state. In the language of nonlinear dynamics, the TSE is positioned at or around a separatrix (akin to a "watershed") that separates the two basins of attraction — one for the folded state and the other for the unfolded state — in the multi-dimensional conformational space (see Sec. 2.4 of Chapter 2). The number of conformations constituting the TSE is as important in determining the protein folding rate as the height of the free energy of the TSE does. The multiplicity of the TSE provides the extradimensional bypass, mentioned in Sec. 7.5. That extradimensions are critical is demonstrated by studies of Chan and coworkers.59'223 These investigators found that, of all studied sequence models, only three-dimensional models with twenty types of amino acids feature folding cooperativity comparable to that of naturally occurring proteins. Two-dimensional models and hydrophobic polar models were found to be far less cooperative than real proteins. Protein conformations at the separatrix fluctuate among various members of the TSE. Once a particular conformation passes beyond the separatrix, it no longer encounters any major energy barrier and is well on the path to fast folding. This latter conformation that is committed to fast folding is called a postcritical conformation. What distinguishes a postcritical conformation from those that eventually unfold? The nucleation theory suggests that fast-folding proteins have a small substructure common to most of the conformations constituting the TSE. This substructure is called a folding nucleus. Nucleation in protein folding is thus analogous to nucleation in crystallization. Postcritical conformations are those that are positioned beyond the separatrix and are destined to folding along the steep slope of a single deep U-shaped potential gradient, thus ensuring rapid folding and the stability of the folded protein. The gradient strategy mentioned in Sec. 6.1 is apparently in action (cf. Fig. 10). In most of the studied proteins, the folding nucleus is a dense cluster of amino acid residues stabilized by either hydrophobic or hydrogen bonding interactions (intramolecular recognition). The key residues in a folding nucleus — to be referred to as nucleation residues — are usually scattered
Bicomputing Survey I
85
among several non-contiguous locations on a polypeptide chain (cf. Fig. IB). If so and if the nucleation theory is generally correct, then mutations of the nucleation residues of a protein are less fault-tolerant than mutations of amino acid residues that are not critical to the formation of the folding nucleus (to be referred to as non-nucleation residues). This is because the non-covalent bond interactions that hold a folding nucleus in its native conformation are rather delicately and precariously poised — the nucleation residues, after all, are non-contiguous prior to folding — and a slight misalignment of a single non-covalent bond may knock the rest of non-covalent interactions completely out of alignments. Thus, evolutionarily speaking, a folding nucleus may, in principle, suddenly dissolve and vanish without a trace, as a consequence of mutations that have targeted the folding nucleus per se (see later for a discussion of additional factors that minimize the probability of this occurrence). Therefore, mutant organisms with mutations involving only the non-nucleation residues of a protein have a better chance to reproduce and to perpetuate the new traits in future generations than mutant organisms with mutations affecting the folding nucleus of the same protein. As a consequence, viable mutations of non-nucleation residues of a protein tend to outnumber viable mutations of the corresponding nucleation residues, and organisms preserving an old successful folding nucleus of a given protein tend to outnumber organisms equipped with a newly emerging mutant folding nucleus of the same protein. In other words, a newly emerging folding nucleus faces a formidable competition with an already successful folding nucleus of the same protein, by virtue of a self-reinforcing mechanism mentioned in Sec. 3.4 of Chapter 2. It takes a significant merit for an emerging folding nucleus to attract a sufficient number of "subscribers" — i.e., mutant proteins relying on this emerging folding nucleus to fold — to become established. Another factor that favors the emergence of new folding nuclei may be the law of "diminishing return." Overpopulation with (subscriber) mutants eventually erodes the competitive edge which an existing folding nucleus has enjoyed, thus ushering in the emergence of a new folding nucleus. In the language of search algorithms and search spaces, the subspace corresponding to an old folding nucleus is searched more frequently than the subspace corresponding to an emerging folding nucleus — a biased random search, so to speak. But the encounter with a problem that is refractory to repeated attacks calls for the exploration of new search directions. Apparently, the availability of the nucleation mechanism converted evolution from random searching into focused searching in an apparently more "profitable" sub-
86
F. T. Hong
space (heuristic searching). Since a successful folding nucleus provides a fertile ground for viable mutations, nucleation residues tend to be more evolutionarily conserved than non-nucleation residues. But does this mean that a folding nucleus must remain forever conserved and "frozen" until other folding nuclei emerge by random chance? Insight gained into the evolution of the genetic code may shed some light on the problem and may alleviate this concern. On the other hand, Preeland and Hurst 116 speculated that the optimized genetic code might speed up evolution. In the latter case, it is the nucleation theory that supplies the other half of the story so as to explain how the scenario suggested by Freeland and Hurst could be possible. A merger of insight gained into protein folding and insight gained into the evolution of the genetic code furnishes a more complete picture regarding both problems. As discussed in Sec. 7.5, it is the optimized genetic code that makes mutations of proteins fault-tolerant, whether mutations occur in nucleation or non-nucleation residues. As a consequence, an unwanted mutation of a folding nucleus may not always irrevocably disrupt the existing non-covalent bond interactions that hold the folding nucleus together. Thus, folding nuclei are not "frozen" but can evolve more gradually than abruptly, just like non-nucleation residues. But the evolution of the folding nucleus of a protein is expected to be slower than the evolution of the remaining residues since nucleation residues and non-nucleation residues of a protein belong to two different levels of folding hierarchy, as explained above. Perhaps it was no accident that key proteins, such as G proteins, could be grouped as various tree-like superfamilies. Likewise, the evolution of the genetic code and the evolution of folding nuclei belong to two different levels of evolutionary hierarchy. Therefore, the evolution of the genetic code is expected to be even slower than the evolution of folding nuclei. By pooling together the knowledge regarding folding nuclei and the genetic code, an additional insight can be gained. It is the stratification brought about by the hierarchy of protein folding — nucleation versus nonnucleation residues — and of evolution — proteins versus the genetic code — that is responsible for the implementation of heuristic searching and for the difference in the speed of evolution among these various hierarchical levels. The metaphor of the adaptive (fitness) landscape offers a convenient visualization. But in the present context, it is even more convenient to use the metaphor of the potential energy landscape or, rather in the language of nonlinear dynamics, the metaphor of attractor basins (Sec. 2.4 of Chapter 2). Thus, population growth and land development tend to favor valleys
Bicomputing Survey I
87
(attractor basins) and where water sources are well within reach, rather than high mountain peaks. It is the hierarchy of valleys, subvalleys, and sub-subvalleys, etc, that encourages additional searches within same regions, thus consolidating the geographic differences in terms of population growth, culture and the extent of land development among various valleys, subvalleys and sub-subvalleys. In the above metaphor, if most of the peaks that separate valleys were either difficult to surmount or even insurmountable, evolution would hardly proceed or almost reach a standstill. Apparently, in the case of protein evolution, the redundancy of codons116 and of amino acids and the multiplicity of short-range non-covalent bond interactions provide the needed extradimensional bypass (metaphorically, a mountain pass). Gradualism ensures that two neighboring fitness valleys (or, rather, attractor basins) are not separated by a vast and extended mountain range but rather by isolated peaks surrounded by readily accessible mountain passes. Thus, the folding nucleus exerts a top-down constraint on protein folding, whereas fluctuations of the TSE, facilitated by the fluid-like protein environment, reflect the bottom-up process. The top-down constraint ensures that the search process is not random but rather heuristic. The bottom-up fluctuations explore various avenues of folding beyond the immediate valley so as to avoid being trapped at a local attractor basin. Likewise, the optimized genetic code exerts a top-down constraint on the evolution of proteins, whereas gradualism in the way amino acids are coded in mRNAs facilitates bottom-up explorations, thus speeding up evolution. Apparently, peaks separating various folding nuclei are lower than peaks separating various variant genetic codes. The difference in barrier heights between valleys, subvalleys, etc., is reflected in the difference in speeds of evolution at various hierarchical levels. Folding faults are sometimes inevitable because folding is subject to spatial or temporal variance in the dynamics of physico-chemical interactions (see Sees. 5.6, 5.7 and 5.8 of Chapter 2 for discussions of endogenous noise). Living organisms thus evolved mechanisms to deal with folding errors. This problem is handled in at least two separate ways. First, molecular chaperons (heat shock proteins Hsp60 and Hsp70) are capable of helping an inappropriately folded protein refold.153'314'372 Second, incorrectly folded proteins may be targeted for proteolytic degradation, via ubiquitination,110 as part of a quality control process. A lingering question remains: Why are proteins so uniquely suitable for evolutionary improvements? Partial insights are available from a compar-
88
F. T. Hong
ative study of RNA and protein enzymes. Narlikar and Herschlag288 compared these two types of enzymes from the standpoint of transition-state theory, and found that protein enzymes excel in their ability to position functional groups within the active site and to manipulate and control the electrostatic nature of the active-site environment. On the other hand, the commonality of the two types of enzymes also suggests that the use of binding interactions away from the site of chemical transformation to facilitate reactions is a fundamental principle of biological catalysis, thus allowing exquisite specificity and enormous rate enhancements to be concurrently implemented and separately regulated. However, the ensuing topographical specialization, for separate tasks of catalysis and regulation, in a macromolecule demands proper orientation in molecular recognition (Sec. 6). 8. Stochastic Nature of Neural Events: Controlled Randomness of Macroscopic Dynamics Neural excitation in general is characterized by a switching event known as an action potential (see Ref. 211 or any standard physiology textbook). An action potential in an excitable cell such as a neuron or a muscle cell is preceded by a localized change of the transmembrane electric potential (known as depolarization). If this localized potential change does not reach threshold, no action potential can be initiated, and the electrical change remains localized (known as an electrotonic response). An electrotonic response spreads only over a limited distance from the site of stimulation (governed by a space constant of the order of millimeters), and lingers only briefly after the cessation of the stimulus (governed by a time constant of the order of milliseconds). If this localized change reaches or exceeds the threshold, a full-fledged (non-graded) action potential will then be generated that is capable of traveling over a long distance along a nerve fiber (or a muscle fiber) by means of a regenerative process much like a burning fuse. Once the threshold is exceeded, a stronger stimulus that causes any additional localized potential change makes no difference. Thus, the action potential is a digital process. An action potential in a mammalian cell, however, cannot travel beyond a cell's boundary and reach another cell. Initiation of a new action potential in a neighboring cell requires a mechanism known as synaptic transmission. A synaptic transmission is an analog process because it involves: a) the release of numerous neurotransmitter molecules from the presynaptic cell, b) the diffusion of the transmitter molecules across the synaptic cleft (a 50
Bicomputing Survey I
89
nm aqueous space separating the two excitable cells), c) the binding of the transmitter to a receptor molecule residing on the postsynaptic membrane, and d) the graded (continuous) change of ionic conductance of the postsynaptic membrane caused by binding of the transmitter molecules. The latter step is the predecessor of a new action potential. It is apparent that the process of transmitter release is highly stochastic in nature (see a review by Bennett and Kearns32). It is instructive to examine how these individual mesoscopic stochastic events are integrated into highly deterministic macroscopic signals (action potentials). We shall first consider the neuromuscular junction (known as the motor end-plate) which is the synapse between a motor nerve ending and a skeletal muscle cell.
Fig. 14. End plate potentials and miniature end plate potentials of frog muscles. A. End plate potentials (EPP) from a single motor end plate show variations of amplitude. These potentials were neurally evoked by action potentials arriving at the attached motor nerve ending. In each record, three superimposed responses are seen. B. Spontaneous activity at the motor end plate region leads to the random appearance of numerous miniature end plate potentials (MEPP), which are shown in the upper part (voltage and time scales: 3.6 mV and 47 msec). A full-fledged muscle action potential is also shown in the lower part (voltage and time scales: 50 mV and 2 msec). Note that scattered spontaneous miniature end plate potentials were also recorded along with regular end plate potentials in A. (A. Reproduced from Ref. 95 with permission; Copyright by The Physiological Society. B. Reproduced from Ref. 107 with permission; Copyright by The Physiological Society)
The synaptic process of neuromuscular transmission starts with the arrival of an action potential at the motor nerve ending, which makes a synaptic contact with a muscle cell. The arriving action potential causes
90
F. T. Hong
a momentary rise of the intracellular Ca2+ concentration, which, in turn, causes many synaptic vesicles inside the nerve ending to fuse with the nerve terminal membrane and to release their content, acetylcholine (ACh), into the synaptic cleft (Sec. 5.7). A large number of ACh molecules must then move across the synaptic cleft by random diffusion before they are able to bind to a ACh receptor on the muscle membrane on the other side of the synapse. The subsequent opening of numerous ion channels associated with activated ACh receptors causes a significant increase of the Na+ conductance at the end-plate membrane. Consequently, the membrane potential moves towards and beyond the threshold level, thus generating a new action potential at the muscle membrane. This initial localized membrane potential change at the end-plate muscle membrane, preceding the appearance of the muscle action potential, is called the end plate potential (EPP) (Fig. 14A).
Fig. 15. Statistical control law governing the random appearance of miniature end plate potentials. The number of observed intervals, of duration less than t, has been plotted against interval duration t. Smooth curve: a theoretical curve, y = A^l — exp(—t/T)), where N is the total number of observations, and T the mean interval, marked by arrow. Circles: observed numbers. (Reproduced from Ref. 107 with permission; Copyright by The Physiological Society)
In the absence of stimulation from the motor nerve ending, spontaneous localized changes of the membrane potential can be detected at the endplate membrane.107 These spontaneous electric activities, known as miniature end plate potential (MEPP), are similar to the EPP but are much smaller in amplitude (Fig. 14B). The MEPPs appear randomly. The inter-
Bicomputing Survey I
91
vals between successive MEPPs, in a very large series of observations, is distributed exponentially (Fig. 15). While these results strongly suggest a random distribution of MEPPs, Katz and coworkers were somewhat cautious with this conclusion since the "constituent" causing these MEPPs were not known at that time.
Fig. 16. Histogram of neurally evoked EPPs and spontaneous MEPPs (inset) amplitude distributions in a mammalian skeletal muscle fiber. The neuromuscular transmission was blocked by increasing the magnesium ion concentration of the bathing solution to 12.5 mM. Peaks of neurally evoked EPP amplitude distribution occur at 1, 2, 3, etc., times the mean amplitude of MEPPs. A Gaussian curve is fitted to the spontaneous potential distribution (inset) and used to calculate the theoretical distribution of neurally evoked EPP amplitude (continuous curve), which fits well with the amplitude distribution of those experimentally observed (bar graph). The bar placed at zero indicates the number of failures observed in a series of trials whereas the arrows and the dotted line above the bar indicate the number of failures expected theoretically from the Poisson distribution. (Reproduced from Ref. 46 with permission; Copyright by The Physiological Society)
Katz and coworkers95'218 subsequently demonstrated that the macroscopic EPP was actually the result of summation of many nearly identical MEPPs. By reducing the extracellular Ca2+ concentration, they were able to reduce the amplitude of the neurally evoked EPPs. They found that the amplitude of these EPPs did not vary continuously but rather in steps the size of which is comparable to the amplitude of a single MEPP. That is, the amplitude of EPPs is "quantized." Shown in Fig. 16 is the distribution of the amplitude of the neurally evoked EPPs as a histogram. The superimposed smooth curves are computation based on the prediction of a Poisson distribution. Also shown in the inset is the distribution of the
92
F. T. Hong
amplitude of a single MEPP. Katz and coworkers pointed out that the dispersion of the MEPP amplitude does not necessarily imply a dispersion of the quantal content of an individual synaptic vesicle but could instead be a manifestation of fluctuations of the Ca2+ concentration. Katz and Miledi219'220'221 subsequently analyzed the noise spectrum of the voltage fluctuations appearing at the end plate membrane but continued to interpret the data in terms of the (mesoscopic) change that has the same time course of the MEPPs, namely, a time course with a single exponential decay like the MEPPs. In other words, they treat an EPP as an integral multiple of an MEPP. In what constituted a "paradigm shift," Anderson and Stevens5'292 interpreted the end-plate noise, instead, as the manifestation of the abrupt opening and closing of many individual ion channels. These events of random opening and closing are the "constituents" alluded to by Katz. This prophetic interpretation received a resounding confirmation when rapid fluctuating conductance changes were demonstrated by means of the patch clamp technique.291 When a macroscopic EPP exceeds threshold, it erupts into a new action potential that travels over the entire muscle cell, causing the muscle to contract. The switching event underlying the generation of a new action potential depends on a delicate balance between the K+ and Na+ currents through the membrane. Essentially, the membrane potential is jointly decided by the electrodiffusion of these two current systems which tend to oppose each other as far as their influence on the membrane potential is concerned. Simply put, both current systems strive to maintain their own electrochemical equilibrium by balancing the ionic flow that is driven by the membrane potential against the ionicflowthat is driven by its concentration gradient across the membrane, through their respective specific ion channels. The K+ current pushes the membrane potential toward a negative value, which is dictated by its transmembrane concentration gradient, whereas the Na+ current pushes the membrane potential toward a positive value, which is similarly dictated. Since these two equilibrium values cannot be realized at the same time, the membrane potential that actually prevails is a value resulting from an "arbitration" process that is determined primarily by the relative magnitudes of the permeability of the two ion channels, with due consideration being given to the relative magnitudes of the concentration gradient of the two ions (for a concise summary, see a standard physiology textbook). When a nerve or muscle cell is not excited, the permeability of K+ chan-
Bicomputing Survey I
93
nels is approximately 100 times greater than that of Na+ channels. Therefore, the resting membrane potential settles at a steady negative value. When the threshold is exceeded, the permeability of Na+ channels suddenly and rapidly increases by some 500 fold and thus the ./Va+current exceeds the K+ current momentarily. During this brief moment, the membrane potential swings towards a positive value. The dominance of the Na+ current is however short-lived because the + Na channels spontaneously close after a brief period of opening. The opening and closing of the Na+ channels are well defined kinetic processes known as activation and inactivation of the Na+ conductance. Activation of the Na+ conductance critically depends on the membrane potential, i.e., the iVa+channels are voltage-gated. Below the threshold, activation of the Na+ conductance is insignificant. Thus, the resting membrane potential is fairly stable if the perturbation of the membrane potential is small. When the membrane potential is slightly depolarized — i.e., shifted towards the positive polarity — but still falls far short of the threshold, the K+ current maintains its dominance over the Na+ current, and a net outward current will restore the resting potential once the perturbation ceases (negative feedback regime). If, however, the membrane potential is depolarized substantially so that it goes beyond the threshold potential, the system is transformed into a positive feedback regime with the following cyclic events, in which one event causes the next event to follow, and then starts over again: a) the Na+ channels become activated, b) the enhanced inward Na+ current exceeds the outward K+ current, c) the net inward current causes the membrane potential to depolarize further, and d) the Na+ channels become even more activated by the depolarization of the membrane potential, thus completing the vicious cycle (a positive feedback mechanism known as the Hodgkin cycle). In other words, the threshold is a branching point directly controlled by the membrane potential. The events leading to the generation of a muscle action potential and the subsequent restoration of the resting membrane potential are essentially the same as in a neuronal membrane. The following discussion applies to both the muscle and the neuronal membranes. Inactivation, which is an intrinsic property of Na+ channels, ensures that an action potential has a short duration and allows the K+ current system to regain control of the membrane potential during the interlude between two successive action potentials. A separate kinetics of delayed activation of the K+ conductance speeds up the restoration of the resting potential. However, the activation of the K+ conductance is not absolutely
94
F. T. Hong
needed to restore the resting potential. In the absence of activation of the K+ conductance, the restoration of the resting membrane potential would still take place but it would take a longer time. The K+ current system will be ignored in the subsequent discussion for simplicity.
Fig. 17. Voltage dependence of peak sodium ion conductance (pjvo). Activation of the Na+ conductance in response to a sudden step of depolarization reaches the peak within a couple of milliseconds before the onset of inactivation. The peak Na~*~ conductance is plotted against the depolarization step on a semi-logarithmic scale (A) and a linear scale (B). (A. Reproduced from Ref. 169 with permission; Copyright by The Physiological Society. B. Reproduced from Ref. 324 with permission; Copyright by W. B. Saunders)
In order for the action potential to be a reliable switch, the transition between the negative and positive feedback regimes must be sharp. This
Bicomputing Survey I
95
sharp transition is made possible by a highly nonlinear voltage dependence of the Na+ conductance. If the logarithm of the Na+ conductance is plotted as a function of the membrane potential, the initial portion appears linear but the level ultimately saturates around 0 mV (Fig. 17A). If, however, a linear plot is made instead, the g-V (conductance vs. membrane potential) curve is sigmoidal in shape (Fig. 17B). The g-V curve, though not a stepfunction, is sufficiently steep to guarantee the reliability of its switching function. What is the molecular basis of this sigmoidal activation curve? Considerable insight has been gained from two types of studies: the patch-clamp study and the study of gating currents. The advent of the patch-clamp technique326 made it possible to observe the "constituent" events: the opening and closing of individual ion channels. Shown in Fig. 18 are patch-clamp records of current flow in a small patch of the rat muscle membrane in response to 10 mV of sudden depolarization.346 Record C shows nine individual responses, in which the opening and closing of ion channels appears random and unpredictable at first glance, and the individual response appears irreproducible upon repeated measurements. None of these current traces provides direct clues of the well-defined time course of activation and inactivation of the Na+ conductance. Furthermore, the conductance of a single Na+ channel — the unitary Na+ conductance — is almost voltage-independent. The question is: How does the voltage dependence of the amplitude of the macroscopic Na+ conductance arise from an ensemble of constituent Na+ (single-channel) currents, of which the amplitude is voltage-independent, and how does the well-defined time course of the macroscopic Na+ conductance arise from superficially random opening and closing of an ensemble of individual Na+ channels? It turned out that increasing depolarization of the membrane potential does not increase the amplitude of the conductance of individual channels, but rather increases the probability of opening and closing of these channels (see, for example, Ref. 287). At relatively negative membrane potentials close to the resting potential, the Na+ channels are closed most of the time, whereas the channels open and close more frequently at more positive potentials. Furthermore, the individual channels open abruptly and close abruptly; the time course of the single-channel currents bears no resemblance to that of the macroscopic Na+ current. That the peculiar time course of abrupt activation and more gradual inactivation of the macroscopic Na+ conductance is indeed derived from the probabilistic distribution of opening and closing events in response to the step depolarization can
96
F. T. Hong
Fig. 18. Patch clamp records of sodium ion currents. A small patch of the rat muscle membrane was voltage-clamped, and the Na+ current was recorded. Tetrethylammonium was used to block any if+ channels that were inadvertently included in the patch. A step depolarization of 10 mV was used to activate the sodium channels present in the patch (Record A). The traces in Record C show nine separate responses to the 10 mV depolarization. Record B is the average of 300 individual responses. (Reproduced from Ref. 346 with permission; Copyright by Macmillan Magazines)
be seen from Fig. 18B. The average of 300 patch-clamp current traces gives rise to a time course that strongly resembles that of the macroscopic Na+ current, as is observed by the conventional voltage-clamp technique.169 It appears that the superficially erratic opening and closing of an ensemble of individual Na+ channels are actually orchestrated by a specific probabilistic control law. Alterations of this control law by point mutations could lead to illness: channelopathies (see Fig. 5 of Chapter 2). Note that opening and closing of ion channels, as dictated by the probabilistic control law, is not completely random but is subtly controlled by the membrane potential. Inspection of the nine traces in Fig. 18C reveals that, in this limited number of measured samples, almost all of the opening and closing events take place in the first few milliseconds after the step depolarization but none after 5 milliseconds. In other words, the opening and closing of an individual Na+ channel, following a step depolarization, is governed by a probabilistic control law, which occupies an intermediate position on the gray scale of determinism: relative determinism. In contrast, the control law for the time course of an entire ensemble of Na+ channels is well defined and the collective behavior of a large number of Na+ channels is significantly more deterministic.
Bicomputing Survey I
97
The above discussion about the control law governing the mesoscopic process of opening and closing of individual ion channels illustrates the essence of controlled randomness in biocomputing. Controlled randomness in individual Na+ channels collectively gives rise to a well-defined time course of voltage-dependent activation and inactivation of the Na+ conductance. Likewise, the arrival of neural impulses at the presynaptic terminal of a neuromuscular junction transforms the completely random process of spontaneous neurotransmitter release into a controlled random process of opening and closing of ACh channels, which collectively gives rise to a well defined macroscopic process underlying EPP. What is the source of this controlled randomness? Molecular biology of ion channels is one of the most active areas of investigation in neuroscience, and our conceptual understanding of channel operation continues to evolve and change. The following discussion is intended to capture some insights that are relevant to the elucidation of controlled randomness. It must not be regarded as a definitive answer with regard to the molecular mechanisms of ion channel operation. For further detail, authoritative treatises must be consulted.267-166-145'365'165'36'310 Ion channel fluctuations were often modeled by a Markov process, in which the ion channel was assumed to have a finite number of open and closed discrete conformational states, with characteristic transition probabilities per unit time between these states. Hodgkin and Huxley's study suggests that there are at least three discrete conformational states of the Na+ channel: the closed state, the open state "id the inactivated (closed) state. However, single channel and gating current analysis revealed additional conformational substates. The gating current is an externally measurable capacitative current associated with the charge displacement of the putative voltage sensor of the Na+ channel during opening and closing. This current was predicted by Hodgkin and Huxley and was subsequently discovered by Armstrong and Bezanilla.8'9 These gating charges are thought to be associated with segment 54 of the Na+ channel.366'365'36 By assuming that the charge movements on the voltage sensor are correlated with the transition between different channel states, the gating current analysis provides information about the voltage dependence of the channel states. The analysis was performed by integrating the gating current to obtain the gating charges, which was then plotted as a function of the membrane potential to obtain a Q-v curve. The Q-v curve, which is sigmoidal in shape, allows for the estimation of the charge movement during each transition. Unfortunately, the detailed
98
F. T. Hong
interpretation of the analysis is modd-dependent, and refinement of the analysis often leads to an increasing number of conformation states. An alternative approach was proposed by Liebovitch et al.,251 who discovered a fractal nature of a non-selective ion channel in the corneal epithelium. These investigators found that the time course of ion channel fluctuations exhibits a statistical self-similarity on various scales of time resolution. Liebovitch and Sullivan252 measured currents through K+ channels in hippocampal neurons and found that the effective kinetic rate constant is governed by a power law of the time scale used to analyze the data. The effective kinetic rate constant k is given by k = Atl~D , where A is the kinetic setpoint and D the fractal dimension. Liebovitch and Sullivan found that the fractal dimensions were ~ 2 for the closed times and ~ 1 for the open times and did not depend on the membrane potential. For open and closed times, the logarithm of the kinetic setpoint was found to be proportional to the applied membrane potential. Contrary to conventional wisdom, the fractal model is actually more parsimonious than the Markov model because the latter incurs an excessive number of adjustable parameters250 (cf. analysis in Sec. 20 of Ref. 187). In addition, the fractal model is also consistent with the modern understanding of protein dynamics.57-217'101 Although it is evident that ion channel fluctuations are regulated by the membrane potential, the detailed mechanism of the conductance fluctuation is not known. Using the approach of nonlinear dynamics, Chinarov et al.61 have shown that bistability of ion channels can arise from interactions of the ion flux through ion channels with the conformational degree of freedom for some polar groups that form the ion channels. In essence, the passage of ions through a channel is modulated by both the global electric field imposed by the transmembrane potential and the local electric field generated by the passing ions themselves. Christophorov and coworkers65'63'64 cast this type of interaction in a general formulation of charge-conformation interactions, which has also been invoked to interpret nonlinear light-dependent phenomena in bacterial reaction centers.135'136 This line of thinking is compatible with the fractal model because the fractal behavior is often a manifestation of the self-organization process. Nonlinear processes in ion channels are also suggested by a series of experimental analyses by Matsumoto and coworkers. Kobayashi et al.228 demonstrated that the subaxolemmal cytoskeleton of the squid giant axon
Bicomputing Survey I
99
was highly specialized and mainly composed of tubulin, actin, axolinin, and a 255-kD protein. Tsukita et al.377 further demonstrated that the axolemmal membrane is heterogeneous and is specialized into two domains (microtubule- and microfilament-associated domains). Hanyu and Matsumoto150 also studied the bifurcation property that links the resting and the spontaneous oscillatory states of the squid giant axon. These investigators suggested that this bifurcation property is a consequence of the spatial long-range interactions in the axons, due possibly to the specific distribution of Na+ channels. Electron microscopic evidence points to the possibility of the regulation of Na+ channel distribution by subaxolemmal cytoskeletons. Shcherbatko et al.341 expressed Na+ channel a-subunits in Xenopus oocytes and analyzed ion channel kinetics in such preparations. These investigators compared the "cut-open" recordings from oocytes with the patch-clamp recordings that were made in a macropatch configuration, and found that kinetics in the patch-clamp recordings shifted to a faster time course than the "cut-open" recording kinetics. Furthermore the shift was correlated with the rupture of sites of membrane attachment to cytoskeleton. This finding again suggested that ion channel kinetics are regulated by membrane-cytoskeleton. Interestingly, ion channel fluctuations are also present in artificial pores in synthetic polymer membranes.245 Further discussion of the origin of ion channel fluctuations is given in Sec. 5.8 of Chapter 2. In summary, it is apparent that the mesoscopic controlled randomness is an essential element required for normal operation of Na+ channels. Partial randomness in the ion channel kinetics is not a manifestation of external interference of other concurrent but causally independent processes; it must be regarded as endogenous noise. The key factor regulating the controlled randomness in the Na+ conductance is the membrane potential. A welldefined macroscopic control law thus arises from voltage-controlled randomness at the mesoscopic level. Participation of membrane-cytoskeletons in the underlying nonlinear processes is implicated, i.e., it is a cooperative phenomenon. The mesoscopic event of opening and closing of Na+ channels is highly random and is digital in nature, whereas the macroscopic event of activation and inactivation of the Na+ conductance is reasonably well defined and is analog in nature. Once an action potential is generated, the macroscopic event becomes discrete and once again assumes a digital characteristic. A large number of discrete and well-defined nerve impulses subsequently impinge upon the postsynaptic membrane via synaptic transmission, thus completing the al-
100
F. T. Hong
ternating cycle of digital and analog processing and the alternating cycle of probabilistic and well-defined control laws. In conclusion, information processing at the macroscopic neural network level is neither as random as individual microscopic biochemical reactions suggest nor as deterministic as an all-digital system like a digital computer. Randomness participates in information processing in a peculiar way. When the process appears digital, the amplitude is well denned (digital) but the timing is erratic. In contrast, when the process becomes analog, the timing is well denned but the amplitude then contains a non-zero dispersion. Thus, for a given neuron, partial randomness is introduced at several different levels: a) a variation in the amount of released neurotransmitters leads to fluctuations of the stimulus strength, b) the sigmoidal curve denning the branching point of firing or breaking shifts randomly on the voltage (membrane potential) axis leads to a variation of the threshold level for the initiation of an action potential, and c) ion channel fluctuations lead to a variation of the time course of the action potential after it has been initiated. Partial randomness is indispensable for the ion channel function rather than a mere nuisance for investigators (see Sec. 5.8 of Chapter 2 for its implication to the free will problem). Since action potential generation is a crucial process in macroscopic signal transmission and processing, it is apparent that biocomputing does not strictly adhere to absolute determinism but, instead, it conforms to a weaker form of determinism: relative determinism. The neural signaling mechanism features a probabilistic control law at the mesoscopic level but resumes a well-defined control law at the macroscopic level. Since events at these two levels are connected in series, a probabilitistic control law, once introduced, renders the entire process of biocomputing short of absolutely deterministic, but not completely random.
9. Long-Term Potentiation and Synaptic Plasticity Memory is one of the most important components in biocomputing. It is a prerequisite for higher mental processes such as consciousness and intelligence. With the development of modern neuroscience, memory research has become linked to cellular and molecular processes in the nervous system. The advent of techniques, such as positron emission tomography (PET), event-related brain potential (ERP), and functional magnetic resonance imaging (fMRI), permits accurate localization of brain activities that accompany conscious acts. Although full elucidation remains elusive, consid-
Bicomputing Survey I
101
erable progress has been made to render a discussion at the cellular and molecular level possible. About half a century ago, Hebb157 pioneered a model of memory in which the memory is not stored in a discrete location but rather distributed in the collective property of the synaptic connections in a neural network. In the Hebbian model, processes such as learning lead to changes of synaptic weights (strengths), and the stored information (memory) is represented by the characteristic distribution of synaptic weights of synapses among a population of neurons that form the neural network.121 The model is thus rich with a flavor of parallel distributed processing. The Hebbian model of memory is thus a type of associative memory. The profound influence of the Hebbian model is witnessed by the birth and prosperity of artificial neural network research (Sec. 7.2 of Chapter 2; see also Refs. 189, 99). Presently, memory research is one of the most active areas in neuroscience. The Hebbian model was meant to be a simplified model of neural plasticity. The real neural network and synaptic plasticity is of course far more complicated. Among numerous examples of activity-dependent modifications of synapses, long-term potentiation (LTP) stands out as the most studied candidate of memory substrates.260-24'215'27'25'26'261'31 It provides an excellent example of vertical information coupling involving the microscopic, the mesoscopic and the macroscopic computational dynamics. Long-term potentiation is the collective feature of synaptic processes in which a brief burst of repetitive activation of presynaptic fibers leads to an enhancement of the excitatory synaptic strength that lasts from hours to weeks or longer. LTP was first reported in the hippocampal formation,259 and was subsequently found in other parts of the brain. In one type of LTP, the induction is the result of the activation of the NMDA (iV-methyl-D-aspartate)-receptor (one of several types of glutamate receptors), which was once billed as the "Holy Grail of neurotransmitterreceptor molecular neurobiology." Cloning of the NMDA receptor in 1991 was thus one of the major events of neurobiology.283'236 The induction is peculiar in its requirement of the co-activation of two different types of synaptic activities:260'89'68'298 a) the presence of a neurotransmitter (glutamate), and b) the presence of membrane depolarization. The depolarization is caused by the activation of a non-NMDA glutamate receptor, such as the AMPA (a-amino-3-hydroxy-5-methyl-4-isoxazolepropionate) receptor. The NMDA-receptor channel is blocked by Mg2+, and depolarization relieves the blockade. Thus, the activation of NMDA-receptors is both ligand-dependent and voltage-dependent; it is a cooperative phenomenon.
102
F. T. Hong
The NMDA-channels so opened are permeable to Ca2+. The Ca2+ influx is crucial to the development of LTP; microinjection of a calcium chelator prevents the induction of LTP and the release of Ca2+ via photoactivation of caged Ca2+ leads to potentiation. However, the rise of intracellular Ca2+ concentration is only transient; other processes must be involved in sustaining LTP. The following summary must be considered highly tentative because LTP research is characterized by a lack of consensus and because advances in LTP research are made with a rapid pace. A popular hypothesis links the transient rise of the intracellular Ca2+ concentration to Ca2+-dependent protein kinases such as protein kinase C or Ca2+-calmodulin dependent protein kinase II (CaMK II). CaMK II is known to undergo autophosphorylation (a positive feedback mechanism), making it an effective biochemical switch. Activation of the NMD A-receptor also produces a retrograde signal that acts on the presynaptic terminal. Nitric oxide (NO), carbon monoxide (CO), and a couple of other biomolecules have been implicated as the retrograde diffusible signal.171'269'155 As a result, subsequent activities in the presynaptic terminal release an increased quantity of the neurotransmitter glutamate. An alternative possibility is the induction of an increased number of glutamate receptors.298 The diffusible signal may spread and induce LTP in neighboring synapses directly or indirectly, and neighboring synapses may exert synergistic effect on each other.337'338 This effect that confers spatial uncertainty to synaptic plasticity seems rather general. Barbour and Hausser20 pointed out that a large number of experimental observations are consistent with "cross-talk" or "spill-over" as a result of miersynaptic diffusion of neurotransmitters. These investigators predicted, by means of a simple model, that, during concentrated synaptic activity, a cross-talk between distinct synapses is likely to activate high-affinity receptors and may also desensitize certain receptors. The determinism entailed by specific pathways in a neural network is somewhat degraded (relative determinism). It is an intriguing possibility that the randomness so introduced may contribute to the subtle variability required by creative problem solving (Sec. 4 of Chapter 2). The role of presynaptic protein kinase is implicated in the action of the unknown retrograde messenger. LTP induction also exhibits associativity: a repetitive activation of a weak input that is not strong enough to induce LTP by itself can do so when taking place in close temporal contiguity with repetitive activation of a separate but stronger input to the same neuron. A discussion with regard to the role of CREB (cyclic AMP responsive element binding protein) in long-term potentiation and memory can be found in a
Bicomputing Survey I
103
review article by Silva et al.347 The dynamic state of synaptic plasticity is also reflected in a similar state of cortical sensory maps of adult animals that represent different body parts.51 The cortical maps are continuously modified by experience; the cortex can preferentially allocate cortical areas to represent the selected (and augmented) peripheral inputs. A similar dynamic state of plasticity is also evident in the primary motor cortex.327 Compelling evidence in support of the role of LTP induction in learning is derived from studies of LTP in the amygdala.263 Amygdala is a component of the neural circuit governing emotional learning and memory88'242 (see Sec. 4.17 of Chapter 2 for the self-referential control of knowledge acquisition proposed by Korner and Matsumoto229'230). There are several lines of evidence: a) antagonists of the NMD A receptor block LTP induction in the amygdala and Pavlovian fear conditioning, b) fear conditioning induces increases in synaptic transmission in the amygdala that resembles LTP, and c) genetic manipulations that disrupt LTP induction in the amygdala also eliminate fear conditioning. A role for CaMK II in spatial learning and memory has been demonstrated.120 A similar phenomenon known as the long-term depression (LTD) was first found in the cerebellum, and subsequently in other brain parts such as hippocampus.253'28'86 The induction of LTD is similar. LTD is generally thought to be the reverse process of LTP. That the NMDA receptor is a central player in memory processes was more directly demonstrated by Tsien and coworkers,374 who engineered a mouse named Doogie by modifying subunits of NMDA receptors in the brain region. Doogie exhibited superior ability in learning, as compared to normal mice. 10. Role of Dendrites in Information Processing In the classical picture, dendrites are regarded as the sites of synaptic inputs to a neuron. Interactions of excitatory and inhibitory postsynaptic potentials, which are similar to EPP but appear at synapses between two neurons, constitute the integration within a neuron that determines whether a new action potential is to be initiated at the soma-axon hillock region of the neuron. Dendrites thus assume a passive role in the classical interpretation. However, such a view is no longer tenable in our contemporary understanding. Pribram has previously emphasized the importance of dendritic microprocessing (e.g., see pp. 275-293 of Ref. 312).
104
F. T. Hong
The shapes and sizes of dendrites of different neurons exhibit striking diversity. Dendrites grow significantly during development, especially when the animal is reared in a sensory-rich environment.141 Diverse active properties were found in dendrites, since the advent of the fluorescence imaging and the patch clamp techniques. The subject has been reviewed by Johnston et al.208 The conclusions were based on experiments done in hippocampal pyramidal neurons, cerebellar Purkinje cells, and neocortical pyramidal neurons. Some salient points are summarized below. There is little doubt that dendrites propagate action potentials, in addition to electrotonically spreading postsynaptic potentials to the soma. Action potentials initiated at the soma-axon hillock region propagate antidromically back into dendrites. Action potentials can also start from synapses at dendrites and propagate orthodromically towards the soma. Dendritic membranes contain Na+ channels more or less uniformly. However, the distribution of Ca+2 channels is more varied than Na+ channels and a great deal of diversity of different types of Ca2+ channels appear in different parts of dendrites and in different types of neurons. However, very little is known about K+ channels. Conventional wisdom used to regard the space constant of dendrites as too small to allow local graded postsynaptic potentials generated at the synapses to spread electrotonically to the soma. Precise data of time and space constant measurements revealed that these local potentials do reach the soma electrotonically. The interplay between local graded postsynaptic responses and action potentials that propagate in both orthodromic and antidromic directions essentially creates an intraneuronal network that involves both mesoscopic and microscopic events. Johnston et al.208 speculated about possible roles of dendritic processes in synaptic integration and neuronal plasticity. A possible important role of dendritic voltage-gated (Na+ and Ca2+) channels is boosting local postsynaptic activities. Although excitatory postsynaptic potentials originating in dendrites can reach the soma, they are attenuated several fold before reaching the soma owing to the cable property of dendrites (electrotonic attenuation due to a small space constant). Activation of voltage-gated channels helps sustain these graded responses beyond their "normal" range dictated by the space constant. On the other hand, the antidromic propagation of action potentials from the soma-axon hillock region back to the dendrites serves to reset these local responses. Timing of the interplay between these two types of electrical events critically affects how information processing is carried out within the same
Bicomputing Survey I
105
neuron. The changes of the intracellular Ca2+ concentration is worth noting. A consistent finding in hippocampal pyramidal neurons during trains of action potentials is a characteristic M-shaped profile of the intracellular Ca2+ concentration. The rise in the intracellular Ca2+ concentration is small in the soma, highest in the proximal apical and basal dendrites and small again in the distal apical dendrites. Evidence suggests that the characteristic distribution is not a consequence of a non-uniform distribution of Ca2+ channels, but rather due to the uneven Ca2+ influx driven by iVa+-dependent action potentials. This characteristic is reminiscent of the spike-like Ca2+ concentration profile under the active zone of a synapse during the voltage-gated Ca2+ influx (Sec. 7.4). Both of them are dynamic entities. Voltage-gated Ca2+ channels are present throughout the dendritic tree of hippocampal pyramidal neurons. However, there appears to be a differential distribution of Ca2+ channels types in the soma and dendrites. In contrast, cerebellar Purkinje cell dendrites lack ./Va+-dependent action potentials. However, neocortical dendrites are capable of sustaining action potentials, both Na+ and Ca2+-dependent. The diverse variety of Ca2+ channels (e.g., high-voltage gated vs. lowvoltage gated) and their elaborate distribution along the dendritic tree offers the potential of modulating events such as LTPs and LTDs since activation of these channels will alter the Ca2+ concentration in the dendrites (Sec. 9). Different levels of concentration changes may produce different types of plasticity.10-254 Thus, the dendritic processes may impact on neuronal plasticity in a profound way. In this regard, two types of Ca2+ concentration changes need to be differentiated. The subthreshold Ca2+ entry caused by activation of low-threshold Ca2+ channels tends to be highly localized. The superthreshold Ca2+ entry caused by back propagating action potentials tends to be global. 11. Efficiency of Biocomputing The "disanalogy" between the human brain and a digital computer is summed up in the trade-off principle proposed by Conrad.82'76 The tradeoff principle states that "a computing system cannot have all of the following three properties: structural programmability, high computational efficiency, and high evolutionary adaptability." A von Neumann type sequential machine is structurally programmable mainly because all internal dynamic processes are suppressed and are, in fact, material-independent in
106
F. T. Hong
its operation. A digital computer could employ any kind of switches: from the primitive abacus, or mechanical calculators, to digital computers constructed with vacuum tubes or state-of-the-art solid state components. As a consequence, it is possible for a digital computer to have a finite and simple user manual. The hardware of a digital computer consists of a small number of building blocks such as logic gates and flip-flops, and its complexity lies in the combinations and interconnections. Likewise, the assembly (programming) language consists of a small vocabulary (program instruction set); the complexity lies in the software which is a linear linkage of these codes with well-defined branching sequences. As for a living organism, its operation critically depends on the internal dynamics of its constituents, as depicted by the M-m-m scheme of Conrad. The internal dynamics comprises a myriad of interactions at several different levels of hierarchy. These interactions are what Rosen referred to as causal entailments.323 Now the Human Genome Project has been completed by two competing teams, among bitter disputes, at about the same time,196'382 and a "blue print" for the "design" of human body is now available. However, according to Conrad's interpretation,74'82 it is not a complete user manual in the conventional sense, because the genome does not automatically furnish step-by-step instructions for the operation and maintenance of the "biocomputer." Rosen's analysis showed that it is impossible to do so; there are simply too many causal entailments in bioorganisms, some of which unknown, to be included in a user manual. As Ezzell104 put it, the next steps towards further elucidation of the internal dynamics are: a) to identify all the proteins made in a given cell, tissue or organism, b) to determine how those proteins join forces to form networks, and c) to outline the high-resolution three-dimensional structures of the proteins in an effort to find sites of possible drug actions. These endeavors are collectively known as proteomics. These constitute perhaps the biggest piece of domain-specific knowledge needed for the implementation of what is known as molecular medicine: biological managements of human diseases via knowledge of molecular biology. Every week, a big chunk of new data regarding signal transduction are added to the literature (see Ref. 315 and accompanying articles on signal transduction). With additional enhancements, the genome can then be used as a "road map" for the purpose of the development of new drugs and new treatments for disease managements. From the foregoing discussions, it is evident that one of the "complications" of including internal dynamic processes in biocomputing is the inclusion of uncertainties in the operation. In principle, all of the blue prints
Bicomputing Survey I
107
for the "construction" (development) of a living organism are included in the genome, of which each and every cell — with the exception of red blood cells — carries an identical copy. However, the "programs" dictated by the genetic codes explicitly specify only the primary structure of proteins; the secondary, tertiary and quaternary structures are only implicitly specified by the genetic codes. In the terminology of Yates,398 the formation of the primary structure of a protein is program-driven. The remaining procedures of "construction" (protein folding, protein targeting, etc.) follow the physics and chemistry of the existing components with regard to both spatial and temporal arrangements, and is what Yates called execution-driven. In plain English, the program-driven parts are those causal entailments dictated by nature, whereas the execution-driven parts are those causal entailments shaped by the environment (nurture). The triumph of the Human Genome Project has spurred the advocacy of molecular medicine. The long-standing debate of nature-versus-nurture is likely to be revived, and the camp that advocates biological determinism may regard that as a major triumph. In the past, the nature-versus-nurture problem was investigated by studies of identical (monozygotic) twins reared apart or reared together.344'210 These twin studies failed to settle the question because interpretations were not as straightforward as we were led to believe (see Farber's reanalysis,106 Sec. 4.24 of Chapter 2). Ironically, the advance in molecular biology also provides an approach to settle this debate. That the genetic codes do not dictate the phenotypes in absolute terms was dramatically demonstrated by cloning of a cat (Felis domesticus) named CC (acronym for Copy Cat or Carbon Copy), by means of nuclear transplantation of a "processed" donor cumulus cell from an adult multicolored cat:345 the kitten's coat-coloration pattern is similar to that of the nuclear-donor cat but not an exact carbon copy (compare figures in Refs. 170 and 345). Even before the advent of molecular biology, few would dispute the biological basis of human diseases, and the question has always revolved around the relative contributions of nature and nurture to the etiology of diseases. Psychiatry may be an exception. For a long time, mental illness exhibit no detectable "organic lesions" (objective physical evidence). Though a neurologist in training, Sigmund Freud, the founder of the psychoanalysis school, rose to counter the idea that "hereditary trait" was the immutable cause of mental illness and steadfastly kept his inner circle from linking his concepts with biology. As Farber106 pointed out, Freud did so probably as a protection for his nascent discipline, but the need for such protection no longer exists. Yet his followers continue to "psychologize" in ever more
108
F. T. Hong
esoteric ways. This might have led to the isolation of psychoanalysts from mainstream cognitive scientists and the subsequent fall from grace of the psychoanalytic school (cf. Sees. 4.8 and 6.12 of Chapter 2). In the meantime, increasing evidence began to link diseases such as schizophrenia and depression to dysfunction of brain circuits, plasticity and molecular mechanisms.6'204 The advance thus raises the hope of genetic interventions in the prevention and treatments of mental illnesses. On the other hand, medical research regarding major diseases has shown that genetic and environmental factors, including diet and life-style, both contribute to cardiovascular diseases, cancers, and other major causes of mortality.391 Schizophrenia is known to be multifactorial in origin, with both genetic and environmental contributions.328 Noting the rise of the genocentric view in medical research, Rees316 called for a balance between molecular medicine research and patientoriented research that addresses the effects of diet, life style and the environment. Strohman361 pointed out that human diseases phenotypes are controlled not only by genes but by "lawful self-organizing networks that display system-wide dynamics," i.e., Conrad's M-m-m dynamics. Primatologist de Waal92 called for an end to the dispute regarding the dichotomy between nature and nurture, and suggested a combination of "the best of both worlds." This is not — in Farber's words — "a middle-of-the-road fuzzing of boundaries," but rather a consequence of enhanced understanding of biocomputing. Interestingly, the programs specified by the genome are not completely fixed. Faithful translation of DNA into messenger RNA is restricted to "hard-wired" organisms only. In "soft-wired" organisms, processing of UNA allows for two mutually exclusive outcomes in some of the steps: a default outcome, in the absence of certain regulatory signals, and an alternative outcome, in the presence of appropriate signals.158 The alternative outcome regulates cellular responses as well as other RNA processing events. Through reverse transcription, favorable outcomes selected by evolution may be incorporated into the genome. In brief, the genome provides the top-down instructions, but the development is, to a significant extent, a bottom-up process, subject to influences from the environment (indeed a combination of the best of both worlds)/ Thus, the user manual must include all physical and chemical interactions f
Korner and Matsumoto 229 ' 230 indicated that both the cognitive process and the development of the brain are subject to a top-down constraint (Sec. 4.17 of Chapter 2).
Bicomputing Survey I
109
of a vast array of molecular components, taking into consideration the unraveling of their spatial and temporal relationship as well as unforeseen interference from external contingencies of the environment. On the other hand, the influence of external contingencies is also limited in its ability to reshape the development. External contingencies set new constraints on the unraveling of new spatial-temporal order of existing structures and components. Thus, the structural programmability of a biocomputer is rather limited. On the other hand, a digital computer has poor evolutionary adaptability primarily because of its top-down design. In a von Neumann type sequential digital computer, the utilization efficiency of the hardware resource is extremely low; the majority of the hardware components are often idling and waiting for their turn for a piece of action. Attempts to reduce idling time and increase hardware utilization have spawned the design of parallel computers using a large number of central processing units (CPU). An additional strategy of decentralization using microprocessors for local control is at the heart of the design of smart printers, for example. Still, the top-down design makes the digital computer ill-suited for decentralization and parallel distributed processing. The top-down design is epitomized by the use of a master clock to orchestrate the computation. All transactions must conform to the operation of this clock cycle or subcycles that are tied to the master clock cycle. Let us take the example of data exchange between the CPU of a digital computer and a peripheral device such as a modem. The (asynchronous) communication between the two requires some sort of "hardware handshake" (notification/acknowledgment). Essentially, these protocols are top-down in spirit; it is typically a bureaucratic scheme of "execute when and only when being told to do so." In biocomputing, timing is implemented by means of biochemical reactions' own intrinsic reaction kinetics and is influenced by many intricate positive or negative feedback loops, predominantly at the local level. Computer science and engineering have made impressive progress in overcoming the limitations just mentioned (see Sees. 4.26 and 7 of Chapter 2). Still, a digital environment is inherently ill-suited for such remedies. Ostensibly, the reliability (or, rather, predictability) of biocomputing diminishes without a rigid top-down control. However, the ability of biosystems to make limited and subtle errors is an asset rather than a liability. Subtle errors may be the sources of evolutionary adaptability, creativity and, perhaps also free will (Sec. 5 of Chapter 2). The somewhat diminished role
110
F. T. Hong
of top-down constraints is the source of efficiency and evolutionary adaptability in biosystems. A prerequisite for a living organism to work properly is the self-organizing ability of its components. In a human metaphor, a decentralized management system requires a motivated work force for its proper operation. Conversely, an unmotivated work force often invites a bureaucratic crack-down. 12. General Discussion and Conclusion Part 1 of this survey examines life processes in a multi-cellular organism and highlights the salient features of information processing, which distinguish a human brain from a digital computer. The two-level macroscopicmicroscopic (M-m) scheme of biological information processing, originally proposed by Conrad, serves as a convenient frame of reference. Two additional levels are added. Thus, four levels of processing dynamics are considered, in the order of decreasing scales of the component sizes: a) the macroscopic dynamics at the intercellular level, b) the mesoscopic dynamics at the membrane level, c) the intracellular dynamics at the cytoplasmic and the cytoskeletal level, and d) the intramolecular conformational dynamics of macromolecules. These four levels of dynamics do not operate independently but constitute a system of nested and interdependent networks. We consider representative examples of each level of dynamics and specifically examine how deterministic the input-output relation of the computation is (control law). For the sake of inquiring how deterministic the control laws are, a gray scale of determinism is defined, with absolute determinism and complete randomness at the two ends on the gray scale. As frequently pointed out by Conrad, the predominant mode of biological information processing is not switch-based as in digital computing. Instead, it is based on shape-fitting of macromolecules (lock-key paradigm). The microscopic dynamics comprises a myriad of biochemical reactions which form the substrates of life processes. However, the dependence of biochemical reactions on molecular diffusion and random collisions seem incompatible with the orderly and meaningful information processing exhibited by the human brain. Conrad speculated a quantum mechanical speedup principle to address this concern. The absence of convincing experimental evidence motivated us to seek an alternative explanation based on known physico-chemical principles. By examining several well elucidated examples of molecular recognition, it became apparent that it is the short-range non-covalent bond inter-
Bicomputing Survey I
111
actions that transform initially random collisions into more deterministic processes at the range of close encounters. These short-range forces include van der Waals force, hydrogen bonding, electrostatic and hydrophobic interactions, as well as the double-layer force. These short-range forces allow for the deployment of a gradient strategy in molecular recognition that steers the encountering macromolecules towards the correct docking orientation and position rather than away from it. Thus, macromolecules search for each other in an exploratory fashion at long distances, whereas the search becomes more deterministic upon close encounters. Likewise, these shortranged forces are also responsible for efficient protein folding. Protein folding was once thought to involve random searching for a stable conformation. The contemporary understanding indicates that the idea is oversimplified; it would take too long for a protein to fold. A typical folding process is completed in seconds, which is considerably faster than is possible by resorting to random searching. The hierarchical processes of protein folding usually leads to a unique stable structure by excluding other possible stable structures as being not in the "main thoroughfare" of the folding process. Formation of the folding nucleus provides the top-down constraint on subsequent folding so as to limit the range of searches for stable conformations. Borrowing terminology from cognitive science and operations research, molecular recognition and protein folding invoke heuristic searching instead of random searching.
The gradient strategy is apparently a consequence of evolutionary selection based on a recombination of material properties that are also shared by inanimate objects. That evolution is an important factor is suggested by the fact that a synthetic polypeptide formed by an arbitrary combination of naturally occurring amino acids often cannot fold properly. The short range non-covalent bond interactions are of quantum mechanical origin, but are not included in Conrad's quantum speedup mechanism. The latter mechanism specifically pertains to interactions between electronic configurations and the conformation of a macromolecule under non-Born-Oppenheimer conditions. In addition, evolution is not required for the latter mechanism to take effect. Specific examples dispute Conrad's claim that switch-based information processing is not relevant in biocomputing. The ubiquitous process of phosphorylation/dephosphorylation is a switching mechanism although it does not achieve the degree of precision expected of absolute determinism. It appears that the microscopic dynamics taking place inside a cell utilize a mixed digital and analog mode of information processing.
112
F. T. Hong
Biocomputing performed inside a cell is linked to the macroscopic neural network via the plasma membrane. The membrane is more than just a cellular boundary and a transit station between the microscopic and the macroscopic dynamics. The surface and the interior of the membrane exhibits a rich computational dynamics: mesoscopic dynamics. There are biochemical reactions inside the membrane that form a two-dimensional network. Several examples of electrostatic switching at both the membrane and intracellular levels illustrate the importance of electrostatic interactions. Conventional wisdom claims that the concentration of a molecule can be adjusted quickly only if the lifetime of the molecule is short; the adjustment of molecular concentrations can only be achieved by passive relaxation processes (called turn-over).330 However, this claim is contradicted by experimental observations. In fact, passive chemical relaxation, which does not require energy, is not the only way to change the concentration of a particular biomolecule. Often the rapid "reset" of a molecular function in biocomputing can be achieved by phosphorylation. A typical example is photophosphorylation of activated rhodopsin that leads to the rapid recovery from visual excitation. In the mesoscopic dynamics, the intracellular Ca2+ concentration is usually increased by a passive Ca2+ influx following the opening of Ca2+ channels, but the subsequent decrease is often achieved by an ATP-consuming active transport process (Ca2+ sequestration). The effect of phosphorylation is however not purely electrostatic. Phosphorylation may exert its switching function by allosteric activation of enzymes, by steric hindrance, or by a combination of these individual mechanisms. Interestingly, it appears that evolution, as problem solving, did not follow a particular "ideological line of thinking." Rather, evolution was highly exploratory. Evolution would recruit whatever relevant mechanisms — often in combinations — it could find. However, modularity of biological structures and functions suggests that evolution invoked heuristic searching instead of random searching; evolution had not completely ignored "ideology" or preferences (Sec. 3.4 of Chapter 2). Network or network-like interactions are present in all four levels of dynamics, thus achieving massively parallel distributed processing. However, the networks are not completely fixed but exhibit graded and dynamic connectivity. At the microscopic level, the quasi-network of biochemical pathways is degraded by diffusion and collisions. At the mesoscopic level, some components of membrane-bound biomachines appear to be "hardwired," whereas other mobile components link these hard-wired components together in a dynamic fashion. Some control laws are time-dependent
Bicomputing Survey I
113
or environment-dependent. The variable connectivity of the network allows for a dynamic allocation of the computing resources. At the macroscopic level, the neural pathways are mostly hard-wired but the relative synaptic weights can be altered by various mechanisms of neural plasticity. Cortical plasticity in higher animals shows an exceptional degree of malleability. In addition, intersynaptic diffusion of neurotransmitter blurs the spatial localization of neural plasticity and memory storage.20 Thus, the multi-level network of biocomputing is not exactly a connectionist model. The mixed architecture of partly "hard-wired" and partly loose connections seems to extend beyond the macroscopic dynamics. As will be demonstrated in Sec. 4.14 of Chapter 2, behaviorism represents the quasi"hard-wired" aspect of behaviors (habits) and cognitivism represents the dynamically malleable aspect of behaviors (intelligence). In a multi-cellular organism, highly organized components form an interlocking and reasonably stable but dynamic structure in a nested hierarchical manner. Various components at each of the four levels of organizations work in a coherent and concerted manner as if each part knew what the other part expected and acted accordingly in harmony. The result is an enhanced functional performance suggestive of virtual intelligence. This virtual intelligence can be traced from the macroscopic level all the way down to the submolecular level. The concept of intelligent materials serves as a guiding principle for technology-minded investigators who attempt to engineer biomaterials for molecular device construction (Sec. 7.8 of Chapter 2). The reliability and intelligence of a computing system depends critically on the control laws that govern the input-output relationship of each computational step. A digital computer employs a strictly deterministic control law (absolute determinism). The performance in digital computing thus achieves high reliability and predictability. However, a slight error in design and/or programming often leads to a catastrophic failure (lack of gradualism and fault-tolerance). In contrast, a completely random control law confers no purposes or reliability in computing. Biocomputing avoids either extreme. Most biocomputing processes follow a well-defined control law that is characterized by a mean value and a variance. However, processes such as ion channel operation in the mesoscopic dynamics do not follow a well-defined control law but is governed by a probabilistic control law. The control of ion channel operation constitutes one of the most critical steps of biocomputing: generation of nerve impulses. The existence of ion channel fluctuations thus casts a serious doubt on the claim of absolute biological determinism. We shall argue, in Sec. 5.8 of Chapter 2, that ion
114
F. T. Hong
channel fluctuations are not a consequence of the intrusion of external interference, but is actually the manifestation of endogenous noise that is a key constituent of meaningful operation of ion channels. Thus, control laws in biocomputing are not strictly deterministic as in an all-digital design, but are also not as random as is suggested by the diffusion and reactions of molecules in an isotropic aqueous medium. Tracing vertically from the microscopic dynamics, through the mesoscopic dynamics towards the macroscopic dynamics, the control laws swing alternatingly from completely random, to probabilistically defined with limited and controlled randomness, and to well defined with a finite but non-zero dispersion. Sometimes the control laws of biocomputing are time-dependent, as exemplified by the arrival of neural impulses at the neuromuscular junction, or environment-dependent, as exemplified by the State 1-State 2 transition in green plant photosynthesis. Alternating cycles of analog and digital processing are evident at all levels of biocomputing dynamics. Thus, the control laws cover a gray scale from near zero to near unity. In no cases do the control laws become strictly deterministic, i.e., unity on the gray scale of determinism. This is the price a living organism pays to include the internal dynamics, thus admitting analog processing as part of biocomputing. Purely digital processing prevents any intrusion of dispersion in the computing steps except at the interface with the outside world (where analog-to-digital conversion is required at the input stage, and digital-toanalog conversion is required at the output stage). However, purely digital computing offers severely limited flexibility. Biocomputing is much more flexible and is well suited for pattern recognition. Pattern recognition is an important element in the execution of intelligence in the human brain. Without the ability to recognize similar but not precisely identical patterns, it would be difficult to make predictions for the future. Thus, pattern recognition for use in intelligence does not demand a perfect match. This forgiving tolerance to an imperfect match is the source for the brain's fault tolerance (Sec. 4.2 of Chapter 2). Strict and rigid matching eliminates a large part of the intelligence and perhaps all of high creativity. This is why digital computing is ill-suited for pattern recognition. In brief, dynamic control laws are positioned strategically on the gray scale of determinism, thus optimally achieving both reliability and intelligence. One of the major consequences of admitting analog processes in biocomputing is the decline, but not the complete loss, of predictability. There are some computational outcomes that are not sensitive to variations of input
Bicomputing Survey I
115
(or initial) conditions, but there are also others that are highly sensitive to errors and fluctuations of conditions at critical steps of interactions. Some well-defined control laws in biocomputing confer reliability, stability and predictability to the biosystem, whereas the variable but controlled randomness inherent in biocomputing allows for evolutionary improvements, creativity and perhaps also the so-called free will. Thus, the essence of true intelligence in living organisms may be the ability to make and to exploit subtle "errors" that are absent in strictly digital computing. The inclusion of errors in biocomputing may not always be a hazard. It may indeed be a blessing, and the payoff is well worth the price. Acknowledgments This work was supported in part by a contract from Naval Surface Warfare Center (N60921-91-M-G761) and a Research Stimulation Fund of Wayne State University. The author thanks the following individuals for critical reading and helpful discussion of the manuscript: Martin Blank, Krzystof Bryl, Leonid Christophorov, Donald DeGracia, Stephen DiCarlo, Piero Foa, Hans Kuhn, Gabriel Lasker, Gen Matsumoto, Koichiro Matsuno, James Moseley, Nikolai Rambidi, Juliusz Sworakowski, Harold Szu, Ann Tate and Klaus-Peter Zauner. The author is indebted to the late Professor Michael Conrad of Wayne State University for his profound influence. The author thanks Robert Silver for the help in locating key references. The editorial help provided by Stephen DiCarlo, Filbert Hong and Timothy Zammit during the preparation of the manuscript was indispensable and is deeply appreciated. Last but not least, I wish to thank my editor Vladimir B. Bajic for his guidance, editorial help and extraordinary patience. References 1. G. Adam and M. Delbriick, Reduction of dimensionality in biological diffusion processes, in Structural Chemistry and Molecular Biology, Eds. A. Rich and N. Davidson (W. H. Freeman, San Francisco, 1968), pp. 198-215. 2. C. Adcock, G. R. Smith and M. S. P. Sansom, Electrostatics and the ion selectivity of ligand-gated channels. Biophys. J. 75, 1211-1222 (1998). 3. B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts and J. D. Watson, Molecular Biology of the Cell, 3rd edition (Garland Publishing, New York and London, 1994). 4. U. Alexiev, R. Mollaaghababa, P. Scherrer, H. G. Khorana and M. P. Heyn, Rapid long-range proton diffusion along the surface of the purple membrane and delayed proton transfer into the bulk. Proc. Natl. Acad. Sci. USA 92, 372-376 (1995).
116
F. T. Hong
5. C. R. Anderson and C. F. Stevens, Voltage clamp analysis of acetylcholine produced end-plate current fluctuations at frog neuromuscular junction. J. Physiol. 235, 655-691 (1973). 6. N. C. Andreasen, Linking mind and brain in the study of mental illnesses: a project for a scientific psychopathology. Science 275, 1586-1593 (1997). 7. L.-E. Andr^asson and T. Vanngard, Electron transport in photosystems I and II. Annu. Rev. Plant Physiol. Plant Mol. Biol. 39, 379-411 (1988). 8. C M . Armstrong and F. Bezanilla, Currents related to movement of the gating particles of the sodium channels. Nature 242, 459-461 (1973). 9. C. M. Armstrong and F. Bezanilla, Charge movement associated with the opening and closing of the activation gates of the Na Channels. J. Gen. Physiol. 63, 533-552 (1974). 10. A. Artola and W. Singer, Long-term depression of excitatory synaptic transmission and its relationship to long-term potentiation. Trends Neurosci. 16, 480-487 (1993). 11. R. D. Astumian, Thermodynamics and kinetics of a Brownian motor. Science 276, 917-922 (1997). 12. R. D. Astumian, Making molecules into motors. Sci. Am. 285(1), 56-64 (2001). 13. R. D. Astumian and I. Derenyi, Fluctuation driven transport and models of molecular motors and pumps. Eur. Biophys. J. 27, 474-489 (1998). 14. R. L. Baldwin and G. D. Rose, Is protein folding hierarchic? I. local structure and peptide folding. Trends Biochem. Sci. 24, 26-33 (1999). 15. R. L. Baldwin and G. D. Rose, Is protein folding hierarchic? II. folding intermediates and transition states. Trends Biochem. Sci. 24, 77-83 (1999). 16. L. Band, I. Bertini, M. A. De la Rosa, D. Koulougliotis, J. A. Navarro and O. Walter, Solution structure of oxidized cytochrome CQ from the green alga Monoraphidium braunii. Biochemistry 37, 4831-4843 (1998). 17. L. Banci, I. Bertini, G. Quacquarini, O. Walter, A. Diaz, M. Hervas and M. A. De la Rosa, The solution structure of cytochrome CQfromthe green alga Monoraphidium braunii. J. Biol. Inorg. Chem. 1, 330-340 (1996). 18. J. Barber, Photosynthetic electron transport in relation to thylakoid membrane composition and organization. Plant, Cell and Environ. 6, 311-322 (1983). 19. J. Barber, Membrane conformational changes due to phosphorylation and the control of energy transfer in photosynthesis. Photobiochem. Photobiophys. 5, 181-190 (1983). 20. B. Barbour and M. Hausser, Intersynaptic diffusion of neurotransmitter. Trends Neurosci. 20, 377-384 (1997). 21. E. A. Barnard, Receptor classes and the transmitter-gated ion channels. Trends Biochem. Sci. 17, 368-374 (1992). 22. C. J. Batie and H. Kamin, Ferredoxin:NADP+ oxidoreductase: equilibria in binary and ternary complexes with NADP + and ferredoxin. J. Biol. Chem. 259, 8832-8839 (1984). 23. C. J. Batie and H. Kamin, Electron transfer by ferredoxin:NADP+ reductase: rapid-reaction evidence for participation of a ternary complex. J. Biol.
Bicomputing Survey I
117
Chem. 259, 11976-11985 (1984). 24. M. Baudry and J. L. Davis, Eds., Long-Term Potentiation: A Debate of Current Issues (MIT Press, Cambridge, MA, and London, 1991). 25. M. Baudry and J. L. Davis, Eds., Long-Term Potentiation, Vol. 2 (MIT Press, Cambridge, MA, and London, 1994). 26. M. Baudry and J. L. Davis, Eds., Long-Term Potentiation, Vol. 3 (MIT Press, Cambridge, MA, and London, 1997). 27. M. Baudry and G. Lynch, Long-term potentiation: biochemical mechanisms, in Synaptic Plasticity: Molecular, Cellular, and Functional Aspects, Eds. M. Baudry, R. F. Thompson and J. L. Davis (MIT Press, Cambridge, MA, and London, 1993), pp. 87-115. 28. M. F. Bear and W. C. Abraham, Long-term depression in hippocampus. Annu. Rev. Neurosci. 19, 437-462 (1996). 29. J.-P. Behr, Ed., The Lock-and-Key Principle: The State of the AH — 100 Years On (John Wiley and Sons, Chichester and New York, 1994). 30. F. Benfenati, P. Greengard, J. Brunner and M. Bahler, Electrostatic and hydrophobic interactions of synapsin I and synapsin I fragments with phospholipid bilayers. J. Cell Biol. 108, 1851-1862 (1989). 31. M. R. Bennett, The concept of long term potentiation of transmission at synapses. Prog. Neurobiol. 60, 109-137 (2000). 32. M. R. Bennett and J. L. Kearns, Statistics of transmitter release at nerve terminals. Prog. Neurobiol. 60, 545-606 (2000). 33. N. Bennett and A. Sitaramayya, Inactivation of photoexcited rhodopsin in retinal rods: the roles of rhodopsin kinase and 48-kDa protein (arrestin). Biochemistry 27, 1710-1715 (1988). 34. O. G. Berg and P. H. von Hippel, Diffusion-controlled macromolecular interactions. Annu. Rev. Biophys. Biophys. Chem. 14, 131-160 (1985). 35. F. Bersani, Ed., Electricity and Magnetism in Biology and Medicine (Kluwer Academic/Plenum Publishers, New York, Boston, Dordrecht, London and Moscow, 1999). 36. F. Bezanilla and E. Stefani, Voltage-dependent gating of ionic channels. Annu. Rev. Biophys. Biomol. Struct. 23, 819-846 (1994). 37. R. S. Bhatnagar and J. I. Gordon, Understanding covalent modifications of proteins by lipids: where cell biology and biophysics mingle. Trends Cell Biol. 7, 14-20 (1997). 38. M. Blank, The surface compartment model (SCM) during transients. Bioelectrochem. Bioenerg. 9, 427-437 (1982). 39. M. Blank, Na,K-ATPase function in alternating electric fields. FASEB J. 6, 2434-2438 (1992). 40. M. Blank, Ed., Electricity and Magnetism in Biology and Medicine (San Francisco Press, San Francisco, 1993). 41. M. Blank, Ed., Electromagnetic Fields: Biological Interactions and Mechanisms, Advances in Chemistry Series, No. 250 (American Chemical Society, Washington, DC, 1995). 42. M. A. Boden, The Creative Mind: Myths and Mechanisms (Basic Books, New York, 1991).
118
F. T. Hong
43. R. Bonneau and D. Baker, Ab initio protein structure prediction: progress and prospects. Annu. Rev. Biophys. Biomol. Struct. 30, 173-189 (2001). 44. H. R. Bourne, How receptors talk to trimeric G proteins. Curr. Opin. Cell Biol. 9, 134-142 (1997). 45. R. B. Bourret, K. A. Borkovich and M. I. Simon, Signal transduction pathways involving protein phosphorylation in prokaryotes. Annu. Rev. Biochem. 60, 401-441 (1991). 46. I. A. Boyd and A. R. Martin, The end-plate potential in mammalian muscle. J. Physiol. 132, 74-91 (1956). 47. K. T. Brown and M. Murakami, A new receptor potential of the monkey retina with no detectable latency. Nature 201, 626-628 (1964). 48. A. T. Brunger, Structure of proteins involved in synaptic vesicle fusion in neurons. Annu. Rev. Biophys. Biomol. Struct. 30, 157-171 (2001). 49. J. L. Buchbinder, V. L. Rath and R. J. Fletterick, Structural relationships among regulated and unregulated phosphorylases. Annu. Rev. Biophys. Biomol. Struct. 30, 191-209 (2001). 50. H. F. Bunn, B. G. Forget and H. M. Ranney, Human Hemoglobins (Saunders, Philadelphia, London and Toronto, 1977). 51. D. V. Buonomano and M. M. Merzenich, Cortical plasticity: from synapses to maps. Annu. Rev. Neurosci. 21, 149-186 (1998). 52. D. S. Cafiso and W. L. Hubbell, Light-induced interfacial potentials in photoreceptor membranes. Biophys. J. 30, 243-263 (1980). 53. K. Cai, Y. Itoh and H. G. Khorana, Mapping of contact sites in complex formation between transducin and light-activated rhodopsin by covalent crosslinking: use of a photoactivatable reagent. Proc. Natl. Acad. Sci. USA 98, 4877-4882 (2001). 54. W. H. Calvin, The emergence of intelligence. Sci. Am. 271(4), 100-107 (1994). 55. C. J. Camacho, Z. Weng, S. Vajda and C. DeLisi, Free energy landscapes of encounter complexes in protein-protein association. Biophys. J. 76, 1166— 1178 (1999). 56. R. A. Capaldi, V. Darley-Usmar, S. Fuller and F. Millett, Structural and Functional features of the interaction of cytochrome c with complex III and cytochrome c oxidase. FEBS Lett. 138, 1-7 (1982). 57. G. Careri, P. Fasella and E. Gratton, Statistical time events in enzymes: a physical assessment. CRC Crit. Rev. Biochem. 3, 141-164 (1975). 58. J. Chad, J. W. Deitmer and R. Eckert, Spatio-temporal characteristics of Ca2+ dispersal following its injection into Aplysia neurons (Abstr. T-Pos8). Biophys. J. 45 (2, Pt.2), 181a (1984). 59. H. S. Chan, Modelling protein density of states: additive hydrophobic effects are insufficient for calorimetric two-state cooperativity. Proteins: Struct. Fund. Genet. 40, 543-571 (2000). 60. J. Chen, C. L. Makino, N. S. Peachey, D. A. Baylor and M. I. Simon, Mechanisms of rhodopsin inactivation in vivo as revealed by a COOH-terminal truncation mutant. Science 267, 374-377 (1995). 61. V. A. Chinarov, Y. B. Gaididei, V. N. Kharkyanen and S. P. Sit'ko, Ion pores
Bicomputing Survey I
62. 63. 64. 65. 66. 67. 68. 69.
70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81.
119
in biological membranes as self-organized bistable systems. Phys. Rev. A46, 5232-5241 (1992). P. R. Chitnis, Q. Xu, V. P. Chitnis and R. Nechushtai, Function and organization of Photosystem I polypeptides. Photosynth. Res. 44, 23-40 (1995). L. N. Christophorov, Dichotomous noise with feedback and chargeconformational interactions. J. Biol. Physics 22, 197-208 (1996). L. N. Christophorov, Self-controlled flow processing by biomolecules. Solid State Ionics 97, 83-88 (1997). L. N. Christophorov, V. N. Kharkyanen and S. P. Sitko, On the concept of the nonequilibrium conformon (self-organization of a selected degree of freedom in biomolecular systems). J. Biol. Physics 18, 191-202 (1992). D. E. Clapham, The G-protein nanomachine. Nature 379, 297-299 (1996). M. Colledge and J. D. Scott, AKAPs: from structure to function. Trends Cell Biol. 9, 216-221 (1999). G. L. Collingridge and J. C. Watkins, Eds., The NMDA Receptor, 2nd edition (Oxford University Press, Oxford, New York and Tokyo, 1994). A.-O. Colson, J. H. Perlman, A. Smolyar, M. C. Gershengorn and R. Osman, Static and dynamic roles of extracellular loops in G-protein-coupled receptors: a mechanism for sequential binding of thyrotropin-releasing hormone to its receptor. Biophys. J. 74, 1087-1100 (1998). M. Conrad, Information processing in molecular systems. Currents in Modern Biology fnow BioSystems) 5, 1-14 (1972). M. Conrad, Evolutionary adaptability of biological macromolecules. J. Mol. Evolution 10, 87-91 (1977). M. Conrad, Adaptability: The Significance of Variability from Molecule to Ecosystem (Plenum, New York and London, 1983). M. Conrad, Microscopic-macroscopic interface in biological information processing. BioSystems 16, 345-363 (1984). M. Conrad, On design principles for a molecular computer. Comm. ACM 28, 464-480 (1985). M. Conrad, The brain-machine disanalogy. Biosystems 22, 197-213 (1989). M. Conrad, Molecular computing, in Advances in Computers, Vol. 31, Ed. M. C. Yovits (Academic Press, Boston, San Diego, New York, London, Sydney, Tokyo and Toronto, 1990), pp. 235-324. M. Conrad, Ed., Special issue on molecular computing. IEEE Computer 25(11), 6-67 (1992). M. Conrad, Molecular computing: the lock-key paradigm. IEEE Computer 25(11), 11-20 (1992). M. Conrad, Quantum molecular computing: the self-assembly model. Int. J. Quant. Chem: Quantum Biology Symp. 19, 125-143 (1992). M. Conrad, Amplification of superpositional effects through electronicconformational interactions. Chaos Solitons and Fractals 4, 423-438 (1994). M. Conrad, Speedup of self-organization through quantum mechanical parallelism, in On Self-Organization, Springer Series in Synergetics, Vol. 61, Eds. R. K. Mishra, D. Maafi and E. Zwierlein (Springer-Verlag, Berlin and Heidelberg, 1994), pp. 92-108.
120
F. T. Hong
82. M. Conrad, The price of programmability, in The Universal Turing Machine: A Half-Century Survey, 2nd edition, Ed. R. Herken (Springer-Verlag, Wien and New York, 1995), pp. 261-281. 83. M. Conrad, R. R. Kampfher and K. G. Kirby, Simulation of a reactiondiffusion neuron which learns to recognize events (Appendix to: M. Conrad, Rapprochement of artificial intelligence and dynamics). Eur. J. Oper. Research 30, 280-290 (1987). 84. T. E. Creighton, Ed., Protein Folding (W. H. Freeman, New York, 1992). 85. G. A. Cutsforth, R. N. Whitaker, J. Hermans and B. R. Lentz, A new model to describe extrinsic protein binding to phospholipid membranes of varying composition: application to human coagulation proteins. Biochemistry 28, 7453-7461 (1989). 86. H. Daniel, C. Levenes and F. Crepel, Cellular mechanisms of cerebellar LTD. Trends Neurosci. 21, 401-407 (1998). 87. N. Davidson, Weak interactions and the structure of biological macromolecules, in The Neurosciences: A Study Program, Eds. G. C. Quarton, T. Melnechuk and F. O. Schmitt (Rockefeller University Press, New York, 1967), pp. 46-56. 88. M. Davis, The role of the amygdala in fear and anxiety. Annu. Rev. Neurosci. 15, 353-375 (1992). 89. N. W. Daw, P. S. G. Stein and K. Fox, The role of NMDA receptors in information processing. Annu. Rev. Neurosci. 16, 207-222 (1993). 90. R. A. de Graaf, A. van Kranenburg and K. Nicolay, In vivo 31P-NMR diffusion spectroscopy of ATP and phosphocreatine in rat skeletal muscle. Biophys. J. 78, 1657-1664 (2000). 91. B. De la Cerda, J. A. Navarro, M. Hervas and M. A. De la Rosa, Changes in the reaction mechanism of electron transfer from plastocyanin to Photosystem I in the cyanobacterium Synechocystis sp. PCC 6803 as induced by site-directed mutagenesis of the copper protein. Biochemistry 36, 1012510130 (1997). 92. F. B. M. de Waal, The end of nature versus nurture. Sci. Am. 281(6), 94-99 (1999). 93. J. Deisenhofer and H. Michel, The photosynthetic reaction center from the purple bacterium Rhodopseudomonas viridis. Science 245, 1463-1473 (1989). 94. D. L. Deitcher, A. Ueda, B. A. Stewart, R. W. Burgess, Y. Kidokoro and T. L. Schwarz, Distinct requirements for evoked and spontaneous release of neurotransmitter are revealed by mutations in the Drosophila gene neuronal-synaptobrevin. J. Neurosci. 18, 2028-2039 (1998). 95. J. del Castillo and B. Katz, Quantal components of the end-plate potential. J. Physiol. 124, 560-573 (1954). 96. C. M. Drain, B. Christensen and D. Mauzerall, Photogating of ionic currents across a lipid bilayer. Proc. Natl. Acad. Sci. USA 86, 6959-6962 (1989). 97. J. Earman, A Primer on Determinism, University of Western Ontario Series in Philosophy of Science, Vol. 32 (D. Reidel Publishing, Dordrecht, Boston, Lancaster and Tokyo, 1986).
Bicomputing Survey I
121
98. W. A. Eaton, V. Muiioz, S. J. Hagen, G. S. Jas, L. J. Lapidus, E. R. Henry and J. Hofrichter, Fast kinetics and mechanisms in protein folding. Annu. Rev. Biophys. Biomol. Struct. 29, 327-359 (2000). 99. R. C. Eberhart and R. W. Dobbins, Early neural network development history: the age of Camelot. IEEE Eng. Med. Biol. Magaz. 9(3), 15-18 (1990). 100. M. Ejdeback, A. Bergkvist, B. G. Karlsson and M. Ubbink, Side-chain interactions in the plastocyanin-cytochrome / complex. Biochemistry 39, 50225027 (2000). 101. R. Elber and M. Karplus, Multiple conformational states of proteins: a molecular dynamics analysis of myoglobin. Science 235, 318-321 (1987). 102. D. Emeis and K. P. Hofmann, Shift in the relation between flash-induced metarhodopsin I and metarhodopsin II within the first 10% rhodopsin bleaching in bovine disc membranes. FEBS Lett. 136, 201-207 (1981). 103. D. Emeis, H. Kiihn, J. Reichert and K. P. Hofmann, Complex formation between metarhodopsin II and GTP-binding protein in bovine photoreceptor membranes leads to a shift of the photoproduct equilibrium. FEBS Lett. 143, 29-34 (1982). 104. C. Ezzell, Proteins rule. Sci. Am. 286(4), 40-47 (2002). 105. K. Fahmy, Binding of transducin and transducin-derived peptides to rhodopsin studied by attenuated total reflection-Fourier transform infrared difference spectroscopy. Biophys. J. 75, 1306-1318 (1998). 106. S. L. Farber, Identical Twins Reared Apart: A Reanalysis (Basic Books, New York, 1981). 107. P. Fatt and B. Katz, Spontaneous subthreshold activity at motor nerve endings. J. Physiol. 117, 109-128 (1952). 108. A. Fersht, Enzyme Structure and Mechanism, 2nd edition (W. H. Freeman, New York, 1985). 109. R. D. Fields, The other half of the brain. Sci. Am. 290(4), 54-61 (2004). 110. D. Finley and V. Chau, Ubiquitination. Annu. Rev. Cell Biol. 7, 25-69 (1991). 111. O. I. Fisun and A. V. Savin, Homochirality and long-range transfer in biological systems. BioSystems 27, 129-135 (1992). 112. G. P. Foust, S. G. Mayhew and V. Massey, Complex formation between ferredoxin triphosphopyridine nucleotide reductase and electron transfer proteins. J. Biol. Chem. 244, 964-970 (1969). 113. R. R. Franke, B. Konig, T. P. Sakmar, H. G. Khorana and K. P. Hofmann, Rhodopsin mutants that bind but fail to activate transducin. Science 250, 123-125 (1990). 114. C. Frazao, C. M. Soares, M. A. Carrondo, E. Pohl, Z. Dauter, K. S. Wilson, M. Hervas, J. A. Navarro, M. A. De la Rosa and G. M. Sheldrick, Ab initio determination of the crystal structure of cytochrome CQ and comparison with plastocyanin. Structure 3, 1159-1169 (1995). 115. S. J. Freeland and L. D. Hurst, The genetic code is one in a million. J. Mol. Evol. 47, 238-248 (1998). 116. S. J. Freeland and L. D. Hurst, Evolution encoded. Sci. Am. 290(4), 84-91
122
F. T. Hong
(2004). 117. S. C. Proehner, Regulation of ion channel distribution at synapses. Annu. Rev. Neurosci. 16, 347-368 (1993). 118. S. C. Proehner and V. Bennett, Eds., Cytoskeletal Regulation of Membrane Function, Society of General Physiologists Series, Vol. 52 (Rockefeller University Press, New York, 1997). 119. R. Pukuda, J. A. McNew, T. Weber, F. Parlati, T. Engel, W. Nickel, J. E. Rothman and T. H. Sollner, Functional architecture of an intracellular membrane t-SNARE. Nature 407, 198-202 (2000). 120. K. Fukunaga and E. Miyamoto, Current studies on a working model of CaM kinase II in hippocampal long-term potentiation and memory. Jap. J. of Pharmacol. 79, 7-15 (1999). 121. J. M. Fuster, Network memory. Trends Neurosci. 20, 451-459 (1997). 122. B. Gabriel and J. Teissie, Proton long-range migration along protein monolayers and its consequences on membrane coupling. Proc. Natl. Acad. Sci. USA 93, 14521-14525 (1996). 123. J.-L. Galzi, A. Devillers-Thiery, N. Hussy, S. Bertrand, J.-P. Changeux and D. Bertrand, Mutations in the channel domain of a neuronal nicotinic receptor convert ion selectivity from cationic to anionic. Nature 359, 500-505 (1992). 124. A. Garbers, F. Reifarth, J. Kurreck, G. Renger and F. Parak, Correlation between protein flexibility and electron transfer from Q^* to Q B in PSII membrane fragments from spinach. Biochemistry 37, 11399-11404 (1998). 125. R. B. Gennis, Biomembranes: Molecular Structure and Function (SpringVerlag, New York, Berlin, Heidelberg, London, Paris and Tokyo, 1989). 126. J. E. Gerst, SNAREs and SNARE regulators in membrane fusion and exocytosis. Cell. Mol. Life Sci. 55, 707-734 (1999). 127. S. K. Gibson, J. H. Parkes and P. A. Liebman, Phosphorylation stabilizes the active conformation of rhodopsin. Biochemistry 37, 11393-11398 (1998). 128. S. K. Gibson, J. H. Parkes and P. A. Liebman, Phosphorylation alters the pH-dependent active state equilibrium of rhodopsin by modulating the membrane surface potential. Biochemistry 38, 11103-11114 (1999). 129. S. K. Gibson, J. H. Parkes and P. A. Liebman, Phosphorylation modulates the affinity of light-activated rhodopsin for G protein and arrestin. Biochemistry 39, 5738-5749 (2000). 130. L. M. Gierasch and J. King, Eds., Protein Folding: Deciphering the Second Half of the Genetic Code (American Association for the Advancement of Science, Washington, DC, 1990). 131. A. G. Gilman, G proteins and dual control of adenylate cyclase. Cell 36, 577-579 (1984). 132. C. Gomez-Moreno, M. Martinez-Jiilvez, M. F. Fillat, J. K. Hurley and G. Tollin, Molecular recognition in protein complexes involved in electron transfer. Biochem. Soc. Trans. 24, 111-116 (1996). 133. G. A. Gonzalez and M. R. Montminy, Cyclic AMP stimulates somatostatin gene transcription by phosphorylation of CREB at serine 133. Cell 59, 675680 (1989).
Bicomputing Survey I
123
134. L. Gonzalez, Jr. and R. H. Scheller, Regulation of membrane trafficking: structural insights from a Rab/effector complex. Cell 96, 755-758 (1999). 135. A. O. Goushcha, V. N. Kharkyanen and A. R. Holzwarth, Nonlinear lightinduced properties of photosynthetic reaction centers under low intensity irradiation. J. Phys. Chem. B101, 259-265 (1997). 136. A. O. Goushcha, V. N. Kharkyanen, G. W. Scott and A. R. Holzwarth, Self-regulation phenomena in bacterial reaction centers. I. general theory. Biophys. J. 79, 1237-1252 (2000). 137. Govindjee and W. J. Coleman, How plants make oxygen. Sci. Am. 262(2), 50-58 (1990). 138. J. Granzin, U. Wilden, H.-W. Choe, J. Labahn, B. Kraft and G. Biildt, X-ray crystal structure of arrestin from bovine rod outer segments. Nature 391, 918-921 (1998). 139. M. P. Gray-Keller, P. B. Detwiler, J. L. Benkovic and V. V. Gurevich, Arrestin with a single amino acid substitution quenches light-activated rhodopsin in a phosphorylation-independent fashion. Biochemistry 36, 7058-7063 (1997). 140. P. Greengard, F. Valtorta, A. J. Czernik and F. Benfenati, Synaptic vesicle phosphoproteins and regulation of synaptic function. Science 259, 780-785 (1993). 141. W. T. Greenough, Experimental modification of the developing brain. American Scientist (Sigma Xi) 63, 37-46 (1975). 142. M. G. Guerrero, J. Rivas, A. Paneque and M. Losada, Mechanism of nitrate and nitrite reduction in Chlorella cells grown in the dark. Biochem. Biophys. Res. Comm. 45, 82-89 (1971). 143. M. G. Guerrero, J. M. Vega and M. Losada, The assimilatory nitratereducing system and its regulation. Annu. Rev. Plant Physiol. 32, 169-204 (1981). 144. V. V. Gurevich and J. L. Benkovic, Visual arrestin binding to rhodopsin: diverse functional roles of positively charged residues within the phosphorylation-recognition region of arrestin. J. Biol. Chem. 270, 60106016 (1995). 145. H. R. Guy and F. Conti, Pursuing the structure and function of voltagegated channels. Trends Neurosci. 13, 201-206 (1990). 146. S. R. Hameroff, Ultimate Computing: Biomolecular Consciousness and NanoTechnology (North-Holland Publishing, Amsterdam, New York, Oxford and Tokyo, 1987). 147. S. R. Hameroff and R. Penrose, Conscious events as orchestrated space-time seletions, in Explaining Consciousness: The 'Hard Problem,' Ed. J. Shear (MIT Press, Cambridge, MA, and London, 1997), pp. 177-195. 148. H. E. Hamm, The many faces of G protein signaling. J. Biol. Chem. 273, 669-672 (1998). 149. H. E. Hamm, How activated receptors couple to G proteins. Proc. Natl. Acad. Sci. USA 98, 4819-4821 (2001). 150. Y. Hanyu and G. Matsumoto, Spatial long-range interactions in squid giant axons. Physica D49, 198-213 (1991).
124
F. T. Hong
151. P. A. Hargrave, Rhodopsin structure, function, and topography: the Friedenwald Lecture. Invest. Ophthal. Vis. Sci. 42, 3-9 (2001). 152. P. A. Hargrave and J. H. McDowell, Rhodopsin and phototransduction. Int. Rev. Cytology 137B, 49-97 (1992). 153. F. U. Hartl and J. Martin, Protein folding in the cell: the role of molecular chaperones Hsp70 and Hsp60. Annu. Rev. Biophys. Biomol. Struct. 21, 293322 (1992). 154. Y. Hatefi, The mitochondrial electron transport and oxidative phosphorylation system. Annu. Rev. Biochem. 54, 1015-1069 (1985). 155. R. D. Hawkins, H. Son and O. Arancio, Nitric oxide as a retrograde messenger during long-term potentiation in hippocampus. Prog. Brain Research 118, 155-172 (1998). 156. B. Hayes, Computing comes to life. American Scientist (Sigma Xi) 89, 204208 (2001). 157. D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory (John Wiley, New York, and Chapman and Hall, London, 1949). 158. A. Herbert and A. Rich, RNA processing in evolution: the logic of soft-wired genomes, in Molecular Strategies in Biological Evolution, Annal. NY Acad.
159.
160.
161. 162. 163. 164. 165. 166.
167.
Sci. Vol. 870, Ed. L. H. Caporale (New York Academy of Sciences, New York, 1999), pp. 119-132. M. Hervas, J. A. Navarro, A. Diaz, H. Bottin and M. A. De la Rosa, Laserflash kinetic analysis of the fast electron transfer from plastocyanin and cytochrome CQ to Photosystem I: experimental evidence on the evolution of the reaction mechanism. Biochemistry 34, 11321-11326 (1995). M. Hervas, J. A. Navarro, A. Diaz and M. A. De la Rosa, A comparative thermodynamic analysis by laser-flash absorption spectroscopy of Photosystem I reduction by plastocyanin and cytochrome CQ in Anabaena PCC 7119, Synechocystis PCC 6803, and spinach. Biochemistry 35, 2693-2698 (1996). J. E. Hesketh and I. F. Pryme, Eds., The Cytoskeleton, Volume 1: Structure and Assembly (JAI Press, Greenwish, CT, and London, 1995). J. E. Hesketh and I. F. Pryme, Eds., The Cytoskeleton, Volume 2: Role in Cell Physiology (JAI Press, Greenwish, CT, and London, 1996). S. Hilnker, V. A. Pieribone, A. J. Czernik, H.-T. Kao, G. J. Augustine and P. Greengard, Synapsins as regulators of neurotransmitter release. Phil. Trans. R. Soc. Lond. B354, 269-279 (1999). D. W. Hilgemann, Cytoplasmic ATP-dependent regulation of ion transporters and channels: mechanisms and messengers. Annu. Rev. Physiol. 59, 193-220 (1997). B. Hille, Ionic Channels of Excitable Membranes, 3rd edition (Sinauer Associates, Sunderland, MA, 2001). B. Hille and D. M. Fambrough, Eds., Proteins of Excitable Membranes, Society of General Physiologists Series, Vol. 41 (Society of General Physiology and Wiley-Interscience, New York, Chichester, Brisbane, Toronto and Singapore, 1987). P. C. Hinkle and R. E. McCarty, How cells make ATP. Sci. Am. 238(3), 104-123 (1978).
Bicomputing Survey I
125
168. J. A. Hirsch, C. Schubert, V. V. Gurevich and P. B. Sigler, The 2.8 A crystal structure of visual arrestin: a model for arrestin's regulation. Cell 97, 257-269 (1999). 169. A. L. Hodgkin and A. F. Huxley, Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116, 449-472 (1952). 170. C. Holden, Carbon-copy clone is the real thing. Science 295, 1443-1444 (2002). 171. C. Holscher, Nitric oxide, the enigmatic neuronal messenger: its role in synaptic plasticity. Trends Neurosci. 20, 298-303 (1997). 172. F. T. Hong, Charge transfer across pigmented bilayer lipid membrane and its interfaces. Photochem. Photobiol. 24, 155-189 (1976). 173. F. T. Hong, Mechanisms of generation of the early receptor potential revisited. Bioelectrochem. Bioenerg. 5, 425-455 (1978). 174. F. T. Hong, Internal electric fields generated by surface charges and induced by visible light in bacteriorhodopsin membranes, in Mechanistic Approaches to Interactions of Electric and Electromagnetic Fields with Living Systems, Ed. M. Blank and E. Findl (Plenum, New York and London, 1987), pp. 161-186. 175. F. T. Hong, Electrochemical approach to the design of bioelectronic devices, in Proceedings of the 2nd International Symposium on Bioelectronic and Molecular Electronic Devices, December 12-14, 1988, Fujiyoshida, Tokyo, Ed. M, Aizawa (Research and Development Association for Future Electron Devices, Tokyo, 1988), pp. 121-124. 176. F. T. Hong, Relevance of light-induced charge displacements in molecular electronics: design principles at the supramolecular level. J. Mol. Electronics 5, 163-185 (1989). 177. F. T. Hong, Intelligent materials and intelligent microstructures in photobiology. Nanobiol. 1, 39-60 (1992). 178. F. T. Hong, A reflection on the First International Conference on Intelligent Materials. Intelligent Materials (Tokyo) 2(2), 15-18 (1992). 179. F. T. Hong, Bacteriorhodopsin as an intelligent material: a nontechnical summary, in 6th Newsletter of MEBC, Ed. A. E. Tate (International Society for Molecular Electronics and BioComputing, Dahlgren, VA, 1992), pp. 1317. 180. F. T. Hong, Do biomolecules process information differently than synthetic organic molecules? BioSystems 27, 189-194 (1992). 181. F. T. Hong, Mesoscopic processes in biocomputing: the role of randomness and determinism, in Proceedings of the 5th International Symposium on Bioelectronic and Molecular Electronic Devices and the 6th International Conference on Molecular Electronics and Biocomputing, November 28-30, 1995, Okinawa, Japan, Ed. M. Aizawa (Research and Development Association for Future Electron Devices, Tokyo, 1995), pp. 281-284. 182. F. T. Hong, Biomolecular computing, in Molecular Biology and Biotechnology: A Comprehensive Desk Reference, Ed. R. A. Meyers (VCH Publishers, New York, Weinheim and Cambridge, 1995), pp. 194-197.
126
F. T. Hong
183. F. T. Hong, Magnetic field effects on biomolecules, cells, and living organisms. BioSystems 36, 187-229 (1995). 184. F. T. Hong, Molecular electronic switches in photobiology, in CRC Handbook of Organic Photochemistry and Photobiology, Eds. W. M. Horspool and P.S. Song (CRC Press, Boca Raton, FL, New York, London and Tokyo, 1995), pp. 1557-1567. 185. F. T. Hong, Biomolecular electronics, in Handbook of Chemical and Biological Sensors, Eds. R. F. Taylor and J. S. Schultz (Institute of Physics Publishing, Bristol and Philadelphia, 1996), pp. 257-286. 186. F. T. Hong, Control laws in the mesoscopic processes of biocomputing, in Information Processing in Cells and Tissues, Eds. M. Holcombe and R. Paton (Plenum, New York and London, 1998), pp. 227-242. 187. F. T. Hong, Interfacial photochemistry of retinal proteins. Prog. Surface Set. 62, 1-237 (1999). 188. F. T. Hong, Molecular electronic switches in photobiology, in CRC Handbook of Organic Photochemistry and Photobiology, 2nd edition, Eds. W. Horspool and F. Lenci (CRC Press, Boca Raton, London, New York and Washington, DC, 2004), pp. 134-1-134-26. 189. J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Nail. Acad. Sci. USA 79, 2554-2558 (1982). 190. J. B. Hurley, Molecular properties of the cGMP cascade of vertebrate photoreceptors. Annu. Rev. Physiol. 49, 793-812 (1987). 191. J. H. Hurley and S. Misra, Signaling and subcellular targeting by membranebinding domains. Annu. Rev. Biophys. Biomol. Struct. 29, 49-79 (2000). 192. J. K. Hurley, M. Fillat, C. Gomez-Moreno and G. Tollin, Structure-function relationships in the ferredoxin/ferredoxin:NADP+ reductase system from Anabaena. Biochimie 77, 539-548 (1995). 193. J. K. Hurley, J. T. Hazzard, M. Martinez-Julvez, M. Medina, C. GomezMoreno and G. Tollin, Electrostatic forces involved in orienting Anabaena ferredoxin during binding to Anabaena ferredoxin:NADP"*" reductase: sitespecific mutagenesis, transient kinetic measurements, and electrostatic surface potentials. Protein Science 8, 1614-1622 (1999). 194. S. M. Hurtley, Ed., Protein Targeting (IRL Press, Oxford, New York and Tokyo, 1996). 195. K. Imoto, C. Busch, B. Sakmann, M. Mishina, T. Konno, J. Nakai, H. Bujo, Y. Mori, K. Fukuda and S. Numa, Rings of negatively charged amino acids determine the acetylcholine receptor channel conductance. Nature 335, 645648 (1988). 196. International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). 197. J. N. Israelachvili, Intermolecular and Surface Forces, 2nd edition (Academic Press, London, San Diego, New York, Boston, Sydney, Tokyo and Toronto, 1992). 198. Y. Ito, D.-J. Chung and Y. Imanishi, Design and synthesis of a protein device that releases insulin in response to glucose concentration. Bioconjugate
Bicomputing Survey I
127
Chemistry 5, 84-87 (1994). 199. Y. Itoh, K. Cai and H. G. Khorana, Mapping of contact sites in complex formation between light-activated rhodopsin and transducin by covalent crosslinking: use of a chemically preactivated reagent. Proc. Natl. Acad. Sci. USA 98, 4883-4887 (2001). 200. M. B. Jackson, Activation of receptors directly coupled to channels, in Thermodynamics of Membrane Receptors and Channels, Ed. M. B. Jackson (CRC Press, Boca Raton, FL, Ann Arbor, MI, London and Tokyo, 1993), pp. 249293. 201. M. B. Jackson, Activation of receptors coupled to G-proteins and protein kinases, in Thermodynamics of Membrane Receptors and Channels, Ed. M. B. Jackson (CRC Press, Boca Raton, FL, Ann Arbor, MI, London and Tokyo, 1993), pp. 295-326. 202. M. B. Jackson, Ed., Thermodynamics of Membrane Receptors and Channels (CRC Press, Boca Raton, FL, Ann Arbor, MI, London and Tokyo, 1993). 203. R. Jahn and T. C. Siidhof, Synaptic vesicles and exocytosis. Annu. Rev. Neurosci. 17, 219-246 (1994). 204. D. C. Javitt and J. T. Coyle, Decoding schizophrenia. Sci. Am. 290(1), 48-55 (2004). 205. B. P. Jena, Ed., Special Issue: 'Membrane Fusion: Machinery and Mechanism.' Cell Biol. Int. 24, 769-848 (2000). 206. B. P. Jena and S.-J. Cho, The atomic force microscope in the study of membrane fusion and exocytosis, in Atomic Force Microscopy in Cell Biology, Methods in Cell Biology, Vol. 68, Eds. B. P. Jena and J. K. H. Horber (Academic Press, Amsterdam, Boston, London, New York, Oxford, Paris, San Diego, San Francisco, Singapore, Sydney and Tokyo, 2002), pp. 33-50. 207. L. N. Johnson and D. Barford, The effects of phosphorylation on the structure and function of proteins. Annu. Rev. Biophys. Biomol. Struct. 22, 199232 (1993). 208. D. Johnston, J. C. Magee, C. M. Colbert and B. R. Christie, Active properties of neuronal dendrites. Annu. Rev. Neurosci. 19, 165-186 (1996). 209. P. C. Jordan, How pore mouth charge distributions alter the permeability of transmembrane ionic channels. Biophys. J. 51, 297-311 (1987). 210. N. Juel-Nielsen, Individual and Environment: Monozygotic Twins Reared Apart, revised edition (International Universities Press, New York, 1980). 211. D. Junge, Nerve and Muscle Excitation, 3rd edition (Sinauer Associates, Sunderland, MA, 1992). 212. W. Junge and H. T. Witt, On the ion transport system of photosynthesis: investigations on a molecular level. Z. Naturforsch. 23b, 244-254 (1968). 213. W. Junge and H. T. Witt, Analysis of electrical phenomena in membranes and interfaces by absorption changes. Nature 222, 1062 (1969). 214. T. Kaminuma and G. Matsumoto, Eds., Biocomputers: The Next Generation from Japan, translated by N. D. Cook (Chapman and Hall, London, New York, Tokyo, Melbourne and Madras, 1991. Original Japanese version (Kinokuniya Company, Tokyo, 1988). 215. E. R. Kandel and R. D. Hawkins, The biological basis of learning and indi-
128
F. T. Hong
viduality. Sd. Am. 267(3), 78-86 (1992). 216. I. L. Karle and P. Balaram, Peptide conformations in crystals, in Protein Folding: Deciphering the Second Half of the Genetic Code, Eds. L.
217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228.
229.
230.
231. 232.
M. Gierasch and J. King (American Association for the Advancement of Science, Washington, DC, 1990), pp. 75-84. M. Karplus and J. A. McCammon, The internal dynamics of globular proteins. CRC Crit. Rev. Biochem. 9, 293-349 (1981). B. Katz, Quantal mechanism of neural transmitter release. Science 173, 123-126 (1971). B. Katz and R. Miledi, Further observations on acetylcholine noise. Nature New Biol. 232, 124-126 (1971). B. Katz and R. Miledi, The statistical nature of the acetylcholine potential and its molecular components. J. Physiol. 224, 665-699 (1972). B. Katz and R. Miledi, The characteristics of 'end-plate noise' produced by different depolarizing drugs. J. Physiol. 230, 707-717 (1973). W. Kauzmann, Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1-63 (1959). H. Kaya and H. S. Chan, Polymer principles of protein calorimetric twostate cooperativity. Proteins: Struct. Fund. Genet. 40, 637-661 (2000). D. B. Kell, On the functional proton current pathway of electron transport phosphorylation: an electrodic view. Biochim. Biophys. Ada 549, 55-99 (1979). K. G. Kirby and M. Conrad, The enzymatic neuron as a reaction-diffusion network of cyclic nucleotides. Bull. Math. Biol. 46, 765-783 (1984). R. A. Klocke, Oxygen transport and 2,3-diphosphoglycerate (DPG). Chest 62, 79S-85S (1972). R. D. Knight, S. J. Freeland and L. F. Landweber, Selection, history and chemistry: the three faces of the genetic code. Trends Biochem. Sd. 24, 241-247 (1999). T. Kobayashi, S. Tsukita, S. Tsukita, Y. Yamamoto and G. Matsumoto, Subaxolemmal cytoskeleton in squid giant axon: I. biochemical analysis of microtubules, microfilaments, and their associated high-molecular-weight proteins. J. Cell Biol. 102, 1699-1709 (1986). E. Korner, U. Korner and G. Matsumoto, Top-down selforganization of semantic constraints for knowledge representation in autonomous systems: a model on the role of an emotional system in brains. Bull. Electrotechnical Lab. (Tsukuba) 60(7), 405-409 (1996). E. Korner and G. Matsumoto, Cortical architecture and self-referential control for brain-like computation: a new approach to understanding how the brain organizes computation. IEEE Eng. Med. Biol. Magaz. 21, 121-133 (2002). D. E. Koshland, Jr., The role of flexibility in enzyme action. Cold Spring Harbor Symp. Quant. Biol. 28, 473-482 (1963). N. Kraufi, W.-D. Schubert, O. Klukas, P. Fromme, H. T. Witt and W. Saenger, Photosystem I at 4 A resolution represents the first structural model of a joint photosynthetic reaction centre and core antenna system.
Bicomputing Survey I
129
Nature Struct. Biol. 3, 965-973 (1996). 233. T. Kreis and R. Vale, Eds., Guidebook to the Cytoskeletal and Motor Proteins (Oxford University Press, Oxford, New York and Tokyo, 1993). 234. M. E. Kriebel, B. Keller, G. Q. Fox and O. M. Brown, The secretory pore array hypothesis of transmitter release. Cell Biol. Int. 24, 838-848 (2000). 235. H. Kiihn, S. W. Hall and U. Wilden, Light-induced binding of 48-kDa protein to photoreceptor membranes is highly enhanced by phosphorylation of rhodopsin. FEBS Lett. 176, 473-478 (1984). 236. K. N. Kumar, N. Tilakaratne, P. S. Johnson, A. E. Allen and E. K. Michaelis, Cloning of cDNA for the glutamate-binding subunit of an NMDA receptor complex. Nature 354, 70-73 (1991). 237. D. J. Kyle and C. J. Arntzen, Thylakoid membrane protein phosphorylation selectively alters the local membrane surface charge near the primary acceptor of Photosystem II. Photochem. Photobiophys. 5, 11-25 (1983). 238. D. G. Lambright, J. Sondek, A. Bohm, N. P. Skiba, H. E. Hamm and P. B. Sigler, The 2.0 A crystal structure of a heterotrimeric G protein. Nature 379, 311-319 (1996). 239. D. Langosch, K. Hartung, E. Grell, E. Bamberg and H. Betz, Ion channel formation by synthetic transmembrane segments of the inhibitory glycine receptor — a model study. Biochim. Biophys. Ada 1063, 36-44 (1991). 240. P. Lauger and B. Neumcke, Theoretical analysis of ion conductance in lipid bilayer membranes, in Membranes, Volume 2: Lipid Bilayers and Antibiotics, Ed. G. Eisenman (Marcel Dekker, New York, 1973), pp. 1-59. 241. D. Leckband, Measuring the forces that control protein interactions. Annu. Rev. Biophys. Biomol. Struct. 29, 1-26 (2000). 242. J. E. LeDoux, Emotion circuits in the brain. Annu. Rev. Neurosci. 23, 155184 (2000). 243. C. P. Lee, Ed., Structure, Biogenesis, and Assembly of Energy Transducing Enzyme Systems, Current Topics in Bioenergitics, Vol. 15 (Academic Press, San Diego, New York, Berkeley, Boston, London, Sydney, Tokyo and Toronto, 1987). 244. C. P. Lee, Ed., Photosynthesis, Current Topics in Bioenergitics, Vol. 16 (Academic Press, San Diego, New York, Boston, London, Sydney, Tokyo and Toronto, 1991). 245. A. A. Lev, Y. E. Korchev, T. K. Rostovtseva, C. L. Bashford, D. T. Edmonds and C. A. Pasternak, Rapid switching of ion current in narrow pores: implications for biological channels. Proc. R. Soc. Lond. B252, 187-192 (1993). 246. S. Levy, D. Tillotson and A. L. F. Gorman, Intracellular Ca ~*~ gradient associated with Ca + channel activation measured in a nerve cell body (Abstr. T-AM-C4). Biophys. J. 37(2, Pt.2), 182a (1982). 247. E. A. Liberman, S. V. Minina, N. E. Shklovsky-Kordy and M. Conrad, Microinjection of cyclic nucleotides provides evidence for a diffusional mechanism of intraneuronal control. BioSystems 15, 127-132 (1982). 248. P. A. Liebman, The molecular mechanism of visual excitation and its relation to the structure and composition of the rod outer segment. Ann. Rev. Physiol. 49, 765-791 (1987).
130
F. T. Hong
249. P. A. Liebman and E. N. Pugh, Jr., Gain, speed and sensitivity of GTP binding vs PDE activation in visual excitation. Vision Res. 22, 1475-1480 (1982). 250. L. S. Liebovitch, Testing fractal and Markov models of ion channel kinetics. Biophys. J. 55, 373-377 (1989). 251. L. S. Liebovitch, J. Fischbarg, J. P. Koniarek, I. Todorova and M. Wang, Fractal model of ion-channel kinetics. Biochim. Biophys. Ada 896, 173-180 (1987). 252. L. S. Liebovitch and J. M. Sullivan, Fractal analysis of a voltage-dependent potassium channel from cultured mouse hippocampal neurons. Biophys. J. 52, 979-988 (1987). 253. D. J. Linden and J. A. Connor, Long-term synaptic depression. Annu. Rev. Neurosci. 18, 319-357 (1995). 254. J. Lisman, A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. Proc. Natl. Acad. Sci. USA 86, 9574-9578 (1989). 255. R. R. Llinas, Calcium and transmitter release in squid synapse, in Approaches to the Cell Biology of Neurons, Society for Neuroscience Symposia, Vol. II, Eds. W. M. Cowan and J. A. Ferendelli (Society for Neuroscience, Bethesda, MD, 1977), pp. 139-160. 256. R, Llinas, I. Z. Steinberg and K. Walton, Relationship between presynaptic calcium current and postsynaptic potential in squid giant synapse. Biophys. J. 33, 323-351 (1981). 257. R. Llinas, M. Sugimori and R. B. Silver, The concept of calcium concentration microdomains in synaptic transmission. Neuropharmacol. 34, 14431451 (1995). 258. R. Llinas, M. Sugimori and S. M. Simon, Transmission by presynaptic spikelike depolarizations in the squid giant synapse. Proc. Natl. Acad. Sci. USA 79, 2415-2419 (1982). 259. T. L0mo, Frequency potentiation of excitatory synaptic activity in the dentate area of the hippocampal formation. Ada Physiol. Scand. 68(Suppl. 277), 128 (1966). 260. D. V. Madison, R. C. Malenka and R. A. Nicoll, Mechanisms underlying long-term potentiation of synaptic transmission. Annu. Rev. Neurosci. 14, 379-397 (1991). 261. R. C. Malenka and R. A. Nicoll, Long-term potentiation — a decade of progress. Science 285, 1870-1874 (1999). 262. R. A. Marcus and N. Sutin, Electron transfers in chemistry and biology. Biochim. Biophys. Ada 811, 265-322 (1985). 263. S. Maren, Long-term potentiation in the amygdala: a mechanism for emotional learning and memory. TYends Neurosci. 22, 561-567 (1999). 264. M. Martinez-Jiilvez, J. Hermoso, J. K. Hurley, T. Mayoral, J. SanzAparicio, G. Tollin, C. Gomez-Moreno and M. Medina, Role of ArglOO and Arg264 from Anabaena PCC 7119 ferredoxin-NADP+ reductase for optimal NADP+ binding and electron transfer. Biochemistry 37, 17680-17691 (1998).
Bicomputing Survey I
131
265. M. Marti'nez-Jiilvez, I. Nogues, M. Faro, J. K. Hurley, T. B. Brodie, T. Mayoral, J. Sanz-Aparicio, J. A. Hermoso, M. T. Stankovich, M. Medina, G. Tollin and C. Gomez-Moreno, Role of a cluster of hydrophobic residues near the FAD cofactor in Anabaena PCC 7119 ferredoxin-NADP+ reductase for optimal complex formation and electron transfer to ferredoxin. J. Biol. Chem. 276, 27498-27510 (2001). 266. R. Masaki, S. Yoshikawa and H. Matsubara, Steady-state kinetics of oxidation of reduced ferredoxin with ferredoxin-NADP+ reductase. Biochim. Biophys. Ada 700, 101-109 (1982). 267. G. Matsumoto and M. Kotani, Eds., Nerve Membrane: Biochemistry and Function of Channel Proteins (University of Tokyo Press, Tokyo, 1981). 268. J. B. Matthew, Electrostatic effects in proteins. Annu. Rev. Biophys. Biophys. Chem. 14, 387-417 (1985). 269. B. Mayer and B. Hemmens, Biosynthesis and action of nitric oxide in mammalian cells. Trends Biochem. Set. 22, 477-481 (1997). 270. J. L. McClelland and D. E. Rumelhart, Eds., Parallel Distributed Processing — Explorations in the Micro structure of Cognition, Volume 2: Psychological and Biological Models (MIT Press, Cambridge, MA, and London, 1986). 271. J. H. McDowell, P. R. Robinson, R. L. Miller, M. T. Brannock, A. Arendt, W. C. Smith and P. A. Hargrave, Activation of arrestin: requirement of phosphorylation as the negative charge on residues in synthetic peptides from the carboxyl-terminal region of rhodopsin. Invest. Ophthal. Vis. Sci. 42, 1439-1443 (2001). 272. J. A. McNew, F. Parlati, R. Fukuda, R. J. Johnston, K. Paz, F. Paumet, T. H. Sollner and J. E. Rothman, Compartmental specificity of cellular membrane fusion encoded in SNARE proteins. Nature 407, 153-159 (2000). 273. I. Mellman and G. Warren, The road taken: past and future foundations of membrane traffic. Cell 100, 99-112 (2000). 274. K. M. Merz and S. M. Le Grand, Eds., The Protein Folding Problem and Tertiary Structure Prediction (Birkhauser, Boston, Basel and Berlin, 1994). 275. J. L. Miller and E. A. Dratz, Phosphorylation at sites near rhodopsin's carboxyl-terminus regulates light initiated cGMP hydrolysis. Vision Res. 24, 1509-1521 (1984). 276. P. A. Millner and J. Barber, Plastoquinone as a mobile redox carrier in the photosynthetic membrane. FEBS Lett. 169, 1-6 (1984). 277. L. Mirny and E. Shakhnovich, Protein folding theory: from lattice to allatom models. Annu. Rev. Biophys. Biomol. Struct. 30, 361-396 (2001). 278. P. Mitchell, Chemiosmotic coupling in oxidative and photosynthetic phosphorylation. Biol. Rev. 41, 445-502 (1966). 279. F. P. Molina-Heredia, A. Diaz-Quintana, M. Hervas, J. A. Navarro and M. A. De la Rosa, Site-directed mutagenesis of cytochrome CQ from Anabaena species PCC 7119: identification of surface residues of the hemeprotein involved in photosystem I reduction. J. Biol. Chem. 274, 33565-33570 (1999). 280. F. P. Molina-Heredia, M. Hervas, J. A. Navarro and M. A. De la Rosa, A single arginyl residue in plastocyanin and in cytochrome CQ from the cyanobacterium Anabaena sp. PCC 7119 is required for efficient reduction
132
F. T. Hong
of Photosystem I. J. Biol. Chem. 276, 601-605 (2001). 281. J. Monod, J.-P. Changeux and F. Jacob, Allosteric proteins and cellular control systems. J. Mol. Biol. 6, 306-329 (1963). 282. R. Morales, M.-H. Charon, G. Kachalova, L. Serre, M. Medina, C. GomezMoreno and M. Frey, A redox-dependent interaction between two electrontransfer partners involved in photosynthesis. EMBO Reports 1, 271-276 (2000). 283. K. Moriyoshi, M. Masu, T. Ishii, R. Shigemoto, N. Mizuno and S. Nakanishi, Molecular cloning and characterization of the rat NMDA receptor. Nature 354, 31-37 (1991). 284. R. U. Muller and A. Finkelstein, Voltage-dependent conductance induced in thin lipid membranes by monazomycin. J. Gen. Physiol. 60, 263-284 (1972). 285. R. U. Muller and A. Finkelstein, The effect of surface charge on the voltagedependent conductance induced in thin lipid membranes by monazomycin. J. Gen. Physiol. 60, 285-306 (1972). 286. J. Myers, Enhancement studies in photosynthesis. Annu. Rev. Plant Physiol. 22, 289-312 (1971). 287. T. Narahashi, Chemical modulation of sodium channels, in Ion Channel Pharmacology, Eds. B. Soria and V. Ceiia (Oxford University Press, Oxford, New York and Tokyo, 1998), pp. 23-73. 288. G. J. Narlikar and D. Herschlag, Mechanistic aspects of enzymatic catalysis: lessons from comparison of RNA and protein enzymes. Annu. Rev. Biochem. 66, 19-59 (1997). 289. J. A. Navarro, M. Hervas and M. A. De la Rosa, Co-evolution of cytochrome CQ and plastocyanin, mobile proteins transferring electrons from cytochrome b6f to photosystem I. J. Biol. Inorg. Chem. 2, 11-22 (1997). 290. E. Neher, Secretion without full fusion. Nature 363, 497-498 (1993). 291. E. Neher and B. Sakmann, Single-channel currents recorded from membrane of denervated frog muscle fibres. Nature 260, 799-802 (1976). 292. E. Neher and C. F. Stevens, Conductance fluctuations and ionic pores in membranes. Annu. Rev. Biophys. Bioeng. 6, 345-381 (1977). 293. S. R. Neves, P. T. Ram and R. Iyengar, G protein pathways. Science 296, 1636-1639 (2002). 294. M. Newborn, Kasparov versus Deep Blue: Computer Chess Comes of Age (Springer-Verlag,. New York, Berlin and Heidelberg, 1997). 295. A. C. Newton, Interaction of proteins with lipid headgroups: lessons from protein kinase C. Annu. Rev. Biophys. Biomol. Struct. 22, 1-25 (1993). 296. A. Nicholls, K. A. Sharp and B. Honig, Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins: Struct. Fund. Genet. 11, 281-296 (1991). 297. D. G. Nicholls, Bioenergetics: An Introduction to the Chemiosmotic Theory (Academic Press, London and New York, 1982). 298. R. A. Nicoll and R. C. Malenka, Expression mechanisms underlying NMDA receptor-dependent long-term potentiation, in Molecular and Functional Diversity of Ion Channels and Receptors, Annal. NY Acad. Sci. Vol. 868, Eds.
Bicomputing Survey I
299. 300. 301. 302. 303. 304. 305. 306.
307.
308. 309. 310. 311. 312. 313.
314.
133
B. Rudy and P. Seeburg (New York Academy of Sciences, New York, 1999), pp. 515-525. J. Nield, C. Punk and J. Barber, Supermolecular structure of photosystem II and location of the PsbS protein. Phil. Trans. R. Soc. Lond. B355, 13371344 (2000). J. P. Noel, H. E. Hamm and P. B. Sigler, The 2.2 A crystal structure of transducin-acomplexed with GTP7S. Nature 366, 654-663 (1993). E. Nogales, Structural insights into microtubule function. Annu. Rev. Biochem. 69, 277-302 (2000). S. H. Northrup, J. 0 . Boles and J. C. L. Reynolds, Brownian dynamics of cytochrome c and cytochrome c peroxidase association. Science 241, 67-70 (1988). L. Onsager, The motion of ions: principles and concepts. Science 166, 13591364 (1969). M. O. Ortells and G. G. Lunt, Evolutionary history of the ligand-gated ion-channel superfamily of receptors. Trends Neurosci. 18, 121-127 (1995). M. A. Ostrovsky, Animal rhodopsin as a photoelectric generator, in Molecular Electronics: Biosensors and Biocomputers, Ed. F. T. Hong (Plenum, New York and London, 1989), pp. 187-201. K. Palczewski, T. Kumasaka, T. Hori, C. A. Behnke, H. Motoshima, B. A. Fox, I. L. Trong, D. C. Teller, T. Okada, R. E. Stenkamp, M. Yamamoto and M. Miyano, Crystal structure of rhodopsin: a G protein-coupled receptor. Science 289, 739-745 (2000). J. H. Parkes, S. K. Gibson and P. A. Liebman, Temperature and pH dependence of the metarhodopsin I-metarhodopsin II equilibrium and the binding of metarhodopsin II to G protein in rod disk membranes. Biochemistry 38, 6862-6878 (1999). L. Pauling and R. B. Corey, The pleated sheet, a new layer configuration of polypeptide chains. Proc. Natl. Acad. Sci. USA 37, 251-256 (1951). L. Pauling, R. B. Corey and H. R. Branson, The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. USA 37, 205-211 (1951). C. Peracchia, Ed., Handbook of Membrane Channels: Molecular and Cellular Physiology (Academic Press, San Diego, New York, Boston, London, Sydney, Tokyo and Toronto, 1994). K. R. Popper, The Logic of Scientific Discovery, revised edition (Hutchinson, London, 1968). Reprinted (Routledge, London and New York, 1992). Original German version, Logik der Forschung (Vienna, 1934). K. H. Pribram, Brain and Perception: Holonomy and Structure in Figural Processing (Lawrence Erlbaum Associates, Hillsdale, NJ, Hove and London, 1991). J. Puig, A. Arendt, F. L. Tomson, G. Abdulaeva, R. Miller, P. A. Hargrave and J. H. McDowell, Synthetic phosphopeptide from rhodopsin sequence induces retinal arrestin binding to photoactivated unphosphorylated rhodopsin. FEBS Lett. 362, 185-188 (1995). J. Rassow, O. von Ahsen, U. Bomer and N. Pfanner, Molecular chaperones:
134
315. 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333.
F. T. Hong
towards a characterization of the heat-shock protein 70 family. Trends Cell Biol. 7, 129-133 (1997). L. B. Ray and N. R. Gough, Orienteering strategies for a signaling maze. Science 296, 1632-1633 (2002). J. Rees, Complex disease and the new clinical sciences. Science 296, 698701 (2002). K.-H. Rhee, Photosystem II: the solid structural era. Annu. Rev. Biophys. Biomol. Struct. 30, 307-328 (2001). K.-H. Rhee, E. P. Morris, J. Barber and W. Kiihlbrandt, Three-dimensional structure of the plant photosystem II reaction centre at 8 A resolution. Nature 396, 283-286 (1998). F. M. Richards, The protein folding problem. Sci. Am. 264(1), 54-63 (1991). C. A. Rogers, Ed., Smart Materials, Structures and Mathematical Issues: U.S. Army Research Office Workshop, September 15-16, 1988, Blacksburg, VA (U.S. Army Research Office, Research Triangle Park, NC, 1988). T. I. Rokitskaya, M. Block, Yu. A. Antonenko, E. A. Kotova and P. Pohl, Photosensitizer binding to lipid bilayers as a precondition for the photoinactivation of membrane channels. Biophys. J. 78, 2572-2580 (2000). G. D. Rose and R. Wolfenden, Hydrogen bonding, hydrophobicity, packing, and protein folding. Annu. Rev. Biophys. Biomol. Struct. 22, 381-415 (1993). R. Rosen, Life Itself: A Comprehensive Inquiry Into the Nature, Origin, and Fabrication of Life (Columbia University Press, New York, 1991). T. C. Ruch and H. D. Patton, Eds., Physiology and Biophysics, 19th edition (Saunders, Philadelphia and London, 1965). D. E. Rumelhart and J. L. McClelland, Eds., Parallel Distributed Processing — Explorations in the Microstructure of Cognition, Volume 1: Foundations (MIT Press, Cambridge, MA, and London, 1986). B. Sakmann and E. Neher, Patch clamp techniques for studying ionic channels in excitable membranes. Ann. Rev. Physiol. 46, 455-472 (1984). J. N. Sanes and J. P. Donoghue, Plasticity and primary motor cortex. Annu. Rev. Neurosd. 23, 393-415 (2000). A. Sawa and S. H. Snyder, Schizophrenia: diverse approaches to a complex disease. Science 296, 692-695 (2002). S. J. Scales, J. B. Bock and R. H. Scheller, The specifics of membrane fusion. Nature 407, 144-146 (2000). R. T. Schimke, On the role of synthesis and degradation in regulation of enzyme levels in mammalian tissues. Cur. Top. Cell. Regul. 1, 77-124 (1969). A. Schleicher and K. P. Hofmann, Proton uptake by light induced interaction between rhodopsin and G-protein. Z. Naturforsch. C: Biosci. 40, 400-405 (1985). A. Schleicher, H. Kiihn and K. P. Hofmann, Kinetics, binding constant, and activation energy of the 48-kDa protein-rhodopsin complex by extrametarhodopsin II. Biochemistry 28, 1770-1775 (1989). A. Schmidt and M. N. Hall, Signaling to the actin cytoskeleton. Annu. Rev.
Bicomputing Survey I
135
Cell Dev. Biol. 14, 305-338 (1998). 334. W.-D. Schubert, O. Klukas, N. Kraufi, W. Saenger, P. Fromme and H. T. Witt, Photosystem I of Synechococcus elongatus at 4 A resolution: comprehensive structure analysis. J. Mol. Biol. 272, 741-769 (1997). 335. B. E. Schultz and S. I. Chan, Structures and proton-pumping strategies of mitochondrial respiratory enzymes. Annu. Rev. Biophys. Biomol. Struct. 30, 23-65 (2001). 336. G. E. Schulz and R. H. Schirmer, Principles of Protein Structure (SpringerVerlag, New York, Heidelberg and Berlin, 1979). 337. E. M. Schuman and D. V. Madison, Locally distributed synaptic potentiation in the hippocampus. Science 263, 532-536 (1994). 338. E. M. Schuman and D. V. Madison, Nitric oxide and synaptic function. Annu. Rev. Neurosci. 17, 153-183 (1994). 339. N. Sharon and H. Lis, Carbohydrates in cell recognition. Sci. Am. 268(1), 82-89 (1993). 340. K. Sharp, R. Fine and B. Honig, Computer simulations of the diffusion of a substrate to an active site of an enzyme. Science 236, 1460-1463 (1987). 341. A. Shcherbatko, F. Ono, G. Mandel and P. Brehm, Voltage-dependent sodium channel function is regulated through membrane mechanics. Biophys. J. 77, 1945-1959 (1999). 342. M. Sheng and D. T. S. Pak, Ligand-gated ion channel interactions with cytoskeletal and signaling proteins. Annu. Rev. Physiol. 62, 755-778 (2000). 343. T. F. Shevchenko, G. R. Kalamkarov and M. A. Ostrovsky, The lack of H + transfer across the photoreceptor membrane during rhodopsin photolysis (in Russian). Sensory Systems (USSR Acad. Sci.) 1, 117-126 (1987). 344. J. Shields, Monozygotic Twins Brought Up Apart and Brought Up Together: An Investigation into the Genetic and Environmental Causes of Variation in Personality (Oxford University Press, London, New York and Toronto, 1962). 345. T. Shin, D. Kraemer, J. Pryor, L. Liu, J. Ruglia, L. Howe, S. Buck, K. Murphy, L. Lyons and M. Westhusin, A cat cloned by nuclear transplantation. Nature 415, 859 (2002). 346. F. J. Sigworth and E. Neher, Single Na+ channel currents observed in cultured rat muscle cells. Nature 287, 447-449 (1980). 347. A. J. Silva, J. H. Kogan, P. W. Frankland and S. Kida, CREB and memory. Annu. Rev. Neurosci. 21, 127-148 (1998). 348. R. B. Silver, Calcium, BOBs, QEDs, microdomains and a cellular decision: control of mitotic cell division in sand dollar blastomeres. Cell Calcium 20, 161-179 (1996). 349. R. B. Silver, Imaging structured space-time patterns of Ca2+ signals: essential information for decisions in cell division. FASEB J. 13, S209-S215 (1999). 350. R. B. Silver, M. Sugimori, E. J. Lang and R. Llinas, Time-resolved imaging of Ca2+-dependent aequorin luminescence of microdomains and QEDs in synaptic preterminals. Biol. Bull. 187, 293-299 (1994). 351. H. A. Simon, Scientific discovery as problem solving, in Economics, Bounded
136
352. 353.
354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364. 365. 366. 367. 368.
F. T. Hong
Rationality and the Cognitive Revolution, Eds. M. Egidi and R. Marris (Edward Elgar Publishing, Hants, UK, and Brookfield, VT, 1992), pp. 102-119. H. A. Simon, Administrative Behavior: A Study of Decision-Making Processes in Administrative Organizations, 4th edition (Free Press, New York, London, Toronto, Sydney and Singapore, 1997). H. A. Simon and A. Newell, Heuristic problem solving: the next advance in operations research. Operations Research 6, 1-10 (1958). Reprinted in Models of Bounded Rationality, Volume 1: Economic Analysis and Public Policy, Ed. H. A. Simon (MIT Press, Cambridge, MA, and London, 1982), pp. 380-389. S. M. Simon and R. R. Llinas, Compartmentalization of the submembrane calcium activity during calcium influx and its significance in transmitter release. Biophys. J. 48, 485-498 (1985). A. Sitaramayya, Rhodopsin kinase prepared from bovine rod disk membranes quenches light activation of cGMP phosphodiesterase in a reconstituted system. Biochemistry 25, 5460-5468 (1986). A. Sitaramayya and P. A. Liebman, Mechanism of ATP quench of phosphodiesterase activation in rod disc membranes. J. Biol. Chem. 258, 1205-1209 (1983). A. Sitaramayya and P. A. Liebman, Phosphorylation of rhodopsin and quenching of cyclic GMP phosphodiesterase activation by ATP at weak bleaches. J. Biol. Chem. 258, 12106-12109 (1983). H. T. Smith, A. J. Ahmed and F. Millett, Electrostatic interaction of cytochrome c with cytochrome c\ and cytochrome oxidase. J. Biol. Chem. 256, 4984-4990 (1981). V. S. Sokolov, M. Block, I. N. Stozhikova and P. Pohl, Membrane photopotential generation by interfacial differences in the turnover of a photodynamic reaction. Biophys. J. 79, 2121-2131 (2000). T. Sollner, S. W. Whiteheart, M. Brunner, H. Erdjument-Bromage, S. Geromanos, P. Tempst and J. E. Rothman, SNAP receptors implicated in vesicle targeting and fusion. Nature 362, 318-324 (1993). R. Strohman, Maneuvering in the complex path from genotype to phenotype. Science 296, 701-703 (2002). L. Stryer, Cyclic GMP cascade of vision. Annu. Rev. Neurosci. 9, 87-119 (1986). L. Stryer, The molecules of visual excitation. Sci. Am. 257(1), 42-50 (1987). L. Stryer, Biochemistry, 4th edition (W. H. Freeman, New York, 1995). W. Stiihmer, Structure-function studies of voltage-gated ion channels. Annu. Rev. Biophys. Biophys. Chem. 20, 65-78 (1991). W. Stiihmer, F. Conti, H. Suzuki, X. Wang, M. Noda, N. Yahagi, H. Kubo and S. Numa, Structural parts involved in activation and inactivation of the sodium channel. Nature 339, 597-603 (1989). M. Sugimori, E. J. Lang, R. B. Silver and R. Llinas, High-resolution measurement of the time course of calcium-concentration microdomains at squid presynaptic terminals. Biol. Bull. 187, 300-303 (1994). T. Takagi, Ed., The Concept of Intelligent Materials and the Guidelines
Bicomputing Survey I
369.
370. 371. 372. 373. 374. 375. 376.
377.
378. 379. 380.
381. 382.
137
on R and D Promotion (Science and Technology Agency, Government of Japan, Tokyo, 1989). J. Teissie and B. Gabriel, Lateral proton conduction along lipid monolayers and its relevance to energy transduction, in Proceedings of the 12th School on Biophysics of Membrane Transport, May 4-13, 1994, Koscielisko-Zakopane, Poland, Part II, Eds. S. Przestalski, J. Kuczera and H. Kleszczyriska (Agricultural University of Wroclaw, Wroclaw, Poland, 1994), pp. 143-157. J. Teissie, B. Gabriel and M. Prats, Lateral communication by fast proton conduction: a model membrane study. Trends Biochem. Sci. 18, 243-246 (1993). J. Teissie, M. Prats, P. Soucaille and J. F. Tocanne, Evidence for conduction of protons along the interface between water and a polar lipid monolayer. Proc. Natl. Acad. Sci. USA 82, 3217-3221 (1985). D. Thirumalai and G. H. Lorimer, Chaperonin-mediated protein folding. Annu. Rev. Biophys. Biomol. Struct. 30, 245-269 (2001). G. Tollin and J. T. Hazzard, Intra- and intermolecular electron transfer processes in redox proteins. Arch. Biochem. Biophys. 287, 1-7 (1991). J. Z. Tsien, Building a brainier mouse. Sci. Am. 282(4), 42-48 (2000). T. Y. Tsong, Electrical modulation of membrane proteins: enforced conformational oscillations and biological energy and signal transductions. Annu. Rev. Biophys. Biophys. Chem. 19, 83-106 (1990). T. Y. Tsong, D.-S. Liu, F. Chauvin, A. Gaigalas and R. D. Astumian, Electroconformational coupling (ECC): an electric field induced enzyme oscillation for cellular energy and signal transductions. Bioelectrochem. Bioenerg. 21, 319-331 (1989). S. Tsukita, S. Tsukita, T. Kobayashi and G. Matsumoto, Subaxolemmal cytoskeleton in squid giant axon. II. Morphological identification of microtubule- and microfilament-associated domains of axolemma. J. Cell Biol. 102, 1710-1725 (1986). A. Tzagoloff, Mitochondria (Plenum, New York and London, 1982). F. Valtorta, F. Benfenati and P. Greengard, Structure and function of the synapsins. J. Biol. Chem. 267, 7195-7198 (1992). J. J. van Thor, T. H. Geerlings, H. C. P. Matthijs and K. J. Hellingwerf, Kinetic evidence for the PsaE-dependent transient ternary complex photosystem I/ferredoxin/ferredoxin:NADP+ reductase in a cyanobacterium. Biochemistry 38, 12735-12746 (1999). A. Vander, J. Sherman and D. Luciano, Human Physiology: The Mechanisms of Body Function, 8th edition (McGraw Hills, Boston, New York and London, 2001). J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. G. Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan,
138
F. T. Hong
L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R.-R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Y. Wang, A. Wang, X. Wang, J. Wang, M.-H. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. C. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. Mclntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y.-H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J. Campbell, K. V. Sjotander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafha, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y.-H. Chiang, M. Coyne, C. Dahlke, A. D. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh and X. Zhu, The sequence of the human genome. Science 291, 1304-1351 (2001). 383. E. J. W. Verwey and J. Th. G. Overbeek, Theory of the Stability of Lyophobic Colloids (Elsevier, New York, Amsterdam, London and Brussels, 1948). 384. S. A. Vishnivetskiy, C. L. Patz, C. Schubert, J. A. Hirsch, P. B. Sigler and V. V. Gurevich, How does arrestin respond to the phosphorylated state of rhodopsin? J. Biol. Chem. 274, 11451-11454 (1999). 385. S. A. Vishnivetskiy, C. Schubert, G. C. Climaco, Y. V. Gurevich, M.-G. Velez and V. V. Gurevich, An additional phosphate-binding element in arrestin molecule: implications for the mechanism of arrestin activation. J.
Bicomputing Survey I
139
Biol. Chem. 275, 41049-41057 (2000). 386. M. A. Wall, D. E. Coleman, E. Lee, J. A. Iniguez-Lluhi, B. A. Posner, A. G. Gilman and S. R. Sprang, The structure of the G protein heterotrimer G ia i/3i7 2 . Cell 83, 1047-1058 (1995). 387. R. T. Wang and J. Myers, On the state 1-state 2 phenomenon in photosynthesis. Biochim. Biophys. Acta 347, 134-140 (1974). 388. M. G. Waters and S. R. Pfeffer, Membrane tethering in intracellular transport. Curr. Opin. Cell Biol. 11, 453-459 (1999). 389. U. Wilden, Duration and amplitude of the light-induced cGMP hydrolysis in vertebrate photoreceptors are regulated by multiple phosphorylation of rhodopsin and by arrestin binding. Biochemistry 34, 1446-1454 (1995). 390. U. Wilden and H. Kiihn, Light-dependent phosphorylation of rhodopsin: number of phosphorylation sites. Biochemistry 21, 3104-3022 (1982). 391. W. C. Willett, Balancing life-style and genomics research for disease prevention. Science 296, 695-698 (2002). 392. R. J. P. Williams, Possible functions of chains of catalysts. J. Theoret. Biol. 1, 1-17 (1961). 393. R. J. P. Williams, Possible functions of chains of catalysts II. J. Theoret. Biol. 3, 209-229 (1962). 394. R. J. P. Williams, The history and the hypotheses concerning ATPformation by energized protons. FEBS Lett. 85, 9-19 (1978). 395. R. J. P. Williams, The multifarious couplings of energy transduction. Biochim. Biophys. Acta 505, 1-44 (1978). 396. S. Wright, The roles of mutation, inbreeding, crossbreeding and selection in evolution, in Proceedings of the 6th International Congress of Genetics 1, 356-366 (1932). 397. F. E. Yates, Physical causality and brain theories. Am. J. Physiol. 238 {Regulatory Integrative Comp. Physiol. 7), R277-R290 (1980). 398. F. E. Yates, Quantumstuff and biostuff: a view of patterns of convergence in contemporary science, in Self-Organizing Systems: The Emergence of Order, Ed. F. E. Yates (Plenum, New York, 1987), pp. 617-644.
CHAPTER 2 A MULTI-DISCIPLINARY SURVEY OF BIOCOMPUTING: 2. SYSTEMS AND EVOLUTIONARY LEVELS, AND TECHNOLOGICAL APPLICATIONS* Felix T. Hong Department of Physiology Wayne State University School of Medicine Detroit, Michigan 48201, USA E-mail:
[email protected] The second part of this survey examines biocomputing in intact multicellular organisms. It also covers philosophy and sociology of science, and technological applications of biocomputing. The parallel between creative problem solving and evolution has long been recognized. If Simonton's chance-configuration theory of creative problem solving is recast as a process of pattern recognition and analyzed in terms of parallel and sequential processing, the geniuses thought process can be understood in terms of a penchant for visual thinking — a parallel process. Intuition, being primarily a parallel multi-tasking process, is difficult to describe sequentially in words whereas the "aha" phenomenon is a consequence of random access in parallel processing. It is argued that physical laws are nearly deterministic but not absolutely so. Electrophysiological evidence indicates that Nature (evolution) specifically recruited endogenous noise as a crucial component for biocomputing. It is shown that, contrary to common belief, microscopic reversibility is contradictory to macroscopic irreversibility whereas microscopic irreversibility (quasi-reversibility) and chaos theory can account for macroscopic irreversibility. The common belief that the uncertainty principle of quantum mechanics does not enter life processes is disputed. It is also shown that Laplace's absolute physical determinism can neither be proved nor disproved, and is therefore an epistemological choice. * Dedicated to the memory of the late President Detlev W. Bronk of The Rockefeller University
141
142
F. T. Hong
1. Introduction Understanding biological information processing (biocomputing) at the systems level is important for at least two reasons. In basic research, the mindbrain problem is one of the deepest problems that humans aspire to solve in their quest for knowledge. In applied research, emulating the human brain has been one of the most important goals in machine intelligence. A conventional sequential digital computer usually conveys the image that it does not have a "mind." Yet, in spite of this perception, astonishing progress has been made in the past half century in machine intelligence solely by means of digital computing. The advance in machine intelligence has also enhanced our understanding of how the brain performs at the systems level. On the other hand, the perceived limitation of digital computing renews the interest in the pursuit of biocomputing, with the goal of overcoming the impasse. One of the most important factors contributing to the advance of microelectronics and digital computing is the ever-increasing degree of miniaturization. As illustrated by Moore's law,164 the number of device components packaged into a single microelectronic integrated circuit (IC) has grown exponentially with the passage of time. However, scientists and engineers have predicted that the trend towards continuing miniaturization may come to a screeching halt early in the 21st century. The primary concern is interference stemming from the quantum size and thermal effects which will make digital computing unreliable.367'121 Standing in sharp contrast is a living organism that utilizes computing elements with a molecular (nanometer) dimension and computing processes that critically depend on the quantum size effect also known as chemical reactions (nanobiology). It seems that the hope of breaking the barrier of miniaturization lies in: a) the utilization of organic materials, b) the exploitation of their chemistry as well as physics, and c) the design of radically different computer architectures (discussed in Refs. 320 and 209). This unproved contention has inspired the birth of a new science and technology: molecular electronics.106'107'108'25'26'49 Scientists and engineers are lured to biocomputing by the impressive systems performance of living organisms, especially cognition and pattern recognition. Yet, it is painfully evident that detailed knowledge of the molecular and cellular architecture of a living organization alone is insufficient for understanding the systems performance of a living organism. A popular approach towards achieving a deeper understanding is to simulate the complex interactions by means of a conventional digital computer. The increasing
Bicomputing Survey II
143
power of digital computers has resurrected a branch of mathematics customarily referred to as nonlinear dynamic analysis (e.g., Refs. 727 and 664). Simulation studies of complex systems frequently require solving nonlinear differential equations. Such an approach was once considered intractable until the advent of digital computers and until the computing power became sufficiently great and the price of a computer became sufficiently affordable to individual investigators. Using nonlinear dynamic analysis as a tool, scientists have discovered phenomena that were not evident from the knowledge of the detailed structure and function of the individual components of a computing system alone. Thus, one of the keys to understanding how the brain works is the elucidation of the law of "complexity." Rosen569'570 applied category theory to biology and concluded that contemporary physics is too restricted to fully elucidate life phenomena. He questioned the validity of the machine metaphor for a living organism and the value of simulation studies in the understanding of life phenomena per se. However, he acknowledged the value of simulation studies in technological applications. Computer-simulation studies have also been extended to the study of evolution and the origin of life. The possibility of simulating evolutionary mechanisms and of testing theories of the origin of life overcomes the obstacle of impossibility to conduct investigator-controlled experiments on an evolutionary time scale. The act of running a computer simulation is often referred to as running an "experiment." A simulation "experiment" can also be regarded as an extension of traditional calculations which are performed to find out the prediction of a mathematical model. Although theoretical models about the systems performance of biocomputing are usually too complex to afford quantitative predictions by means of traditional calculations, a computer simulation makes evaluating the prediction of such models possible. Computer simulations are therefore valuable and even indispensable for discovering the laws of "complexity" and can contribute to our understanding of emergent phenomena and other higher cognitive functions of a living organism. Another approach towards understanding the systems performance is to investigate the behavior of a living organism (behavioral experiments). Behavioral experiments were initially performed at a purely phenomenological level as psychological research. Dominance of behaviorism in the thinking of psychology lasted until the 1960s when cognitivism gradually emerged as an alternative point of view. As will be discussed in Sec. 4.14, behaviorism and cognitivism roughly correspond to the manifestation of
144
F. T. Hong
a "hard-wired" and a dynamically connected neural network, respectively. With the advent of modern neuroscience, increasing understanding of the structure and function of the brain led to the merger of behaviorism and cognitivism. Modern cognitive science has made significant progress in part because of concurrent advances in neuroscience research at the cellular and the molecular levels. A major goal in artificial intelligence (AI) or machine intelligence research is to capture the essence of the cognitive capability exhibited by living organisms and to simulate the essential features in man-made machines. The success of early AI research is exemplified by a number of knowledge-based computer programs known as expert systems. In light of these accomplishments, some AI investigators (strong AI supporters) were led to believe that it may eventually be possible to simulate consciousness. Others claimed that consciousness cannot possibly be understood in terms of contemporary physics and chemistry. Consciousness has been a controversial topic. In the past, only discussions based on abstract theories and behavioral experiments were feasible. Owing to major advances made in molecular and cellular neuroscience research, it is now possible to discuss the topic in concrete terms of anatomy and physiology. Owing to major advances made in computer science and artificial intelligence research, it is now possible to examine the topic from a computational point of view. There has been a resurgence of interest in the mind-body problem. A large number of publications have appeared in the past few decades. The following selections provide a glimpse of diverse views: Refs. 29, 30, 112, 125, 129, 150, 151, 170, 187,188, 189, 206, 307, 449, 470, 512, 513, 531, 601, 606, 609, 643 and 671. Consciousness is a mental attribute that reflects an individual's inner (introspective) feeling, which was traditionally excluded from scientific inquiry by the long-held doctrine of Rene Descartes' dualism. Another related mental attribute which is even more elusive and controversial is free will. Free will is a topic of great importance in ethic theory and the major philosophical underpinning of the concepts of law, rewards, punishments, and incentives. The issue of free will is linked to biocomputing for at least two reasons. First, one of the goals of artificial intelligence research is to construct a machine that has intuition and creativity. Such a machine, if it is capable of "independent thinking" instead of "merely acting out" an elaborate preprogrammed scheme, is tantamount to having its own agenda in defiance of human control. It conjures up a frightening image of a "Frankenstein's monster" that possesses free will in the lay public literature.
Bicomputing Survey II
145
Second, a discussion of control laws in biocomputing is inevitably linked to a discussion of biological and physical determinism, which, in turn, is linked to the well-known conflict between free will and determinism — an age-old controversy dated back to St. Augustine's time or older. Free will was denied by some of those who advocated various forms of determinism, whereas others proposed arguments to support the claim that there is no real conflict between free will and determinism. Central to this discussion is whether physical laws are absolutely deterministic or not. In Part 1 of this survey (Chapter 1), we approached the problem of biocomputing by examining the control laws that "map" (transform) inputs to outputs at various levels of computational dynamics. We defined a gray scale of determinism that ranges from absolutely deterministic to completely random. We chose to lump together deviation of outputs from a central mean value, regardless of origin, as dispersion of the control laws. Absolute determinism means that all control laws give rise to a sharp prediction of outputs with no dispersion (no error, zero variance). Thus, if a control law maps a single-valued input parameter to an output parameter which is no longer single-valued but covers a range of values around a central mean, it is not absolutely deterministic but rather relatively deterministic. However, it must be made clear that our practice of lumping dispersion with the control laws is a matter of operational choice with the understanding that what appears to be noise may actually be the output of a control law that is not known and/or not being explicitly considered. It was demonstrated in Sec. 8 of Chapter 1 that some mesoscopic processes require direct participation of noise or randomness. Thus, we are led to the question: Could some noise actually be part of the manifestation of a control law — i.e., endogenous noise — rather than interference from an independent external contingency, i.e., the output of another control law not being explicitly considered? Laplace apparently thought so; he asserted that any belief that an event can be completely random and follow no deterministic natural laws is simply due to our ignorance of the cause (see Chapter 2 of Ref. 406 or pp. 226-228 of Ref. 145). We shall demonstrate that Laplace's assertion can neither be proved nor disproved by means of conventional scientific methodology. Thus, the above question is an epistemological problem instead of a scientific problem. In my opinion, it is also a problem that biocomputing should not ignore or evadeSince absolute physical determinism implies absolute biological determinism unless vitalism is invoked, it is not sensible to discuss control laws
146
F. T. Hong
in biocomputing without a concurrent discussion of physical determinism. Epistemological arguments will be presented to question the validity of absolute physical determinism and a related concept, microscopic reversibility. Although an epistemological discussion is customarily excluded from a scientific exposition, it is sincerely hoped that the reader will tolerate the author's transgression and suspend judgment until the end of Sec. 5 has been reached. In this article, both evolution and cognitive aspects of biocomputing will be discussed in the context of control laws. The discussion will revolve around the notion of randomness and determinism. Evolution is problem solving performed by a given species of living organisms on an evolutionary time scale (Sec. 3).a An analogy between evolutionary learning and creative problem solving will be exploited. Simonton's chance-configuration theory 633 will be evaluated in light of current understanding of cognitive science and artificial intelligence (Sec. 4). Rather than viewing creativity merely as a mystic spectacle or pyrotechnics practiced exclusively by geniuses, an attempt is made to demystify (demythologize) the process of creative problem solving and to make the strategy of creative problem solving available to everyone. In particular, it will be demonstrated that the thinking process practiced by geniuses is not fundamentally different from that practiced by ordinary folks. But certain mind habits feature prominently in geniuses. Contrary to the commonly held dichotomy between innovation and imitation, a gray scale of creativity shall be adopted. The implication to education will be discussed. Philosophy and sociology of science will also be evaluated from a biocomputing's perspective (Sec. 6). The conflict between science and postmodernism will be analyzed. A final section is devoted to technological applications (Sec. 7). Several approaches are evaluated from the point of view of biocomputing. Fascinating accounts of what future intelligent machines would look like can be found in Kurzweil's two books.401'402 This chapter is intended for stand-alone reading. A background is presented in Sec. 2 for the convenience of readers who do not wish to be mired with biological details presented in Chapter 1. Extensive cross-references may be ignored upon first reading. The seemingly excessive cross-references were made for a specific purpose of multicategorization: redundant connections, at multiple hierarchical levels, of superficially unrelated knowledge a
This anthropomorphic statement should not be construed as an endorsement of Lamarckism.
Bicomputing Survey II
147
modules (Sec. 4.10). 2. Background 2.1. Key conclusions of Part 1 The discussion presented in Part 1 of this survey (Chapter 1) is centered around the macroscopic-microscopic dynamics of biocomputing, originally proposed by Conrad.134 Four nested hierarchical levels of computational networks are considered: a) intramolecular dynamics, b) microscopic networks of biochemical pathways in the intracellular space, c) mesoscopic networks in the membrane phase and its immediate vicinity, and d) macroscopic neural networks. The control laws that map inputs to outputs at any given stage of computation in each level are considered. Both analog and digital processing are implemented in biocomputing. Therefore, the control laws are not absolutely deterministic, but they are not completely random either. The control laws swing alternatingly between highly random and highly deterministic as biocomputing proceeds vertically from the molecular level all the way up to the macroscopic level. The processes of intracellular microscopic dynamics appear somewhat random but the randomness has been "tamed" by highly organized cellular compartmentalization and short-range intermolecular non-covalent bond interactions. As a consequence, a partial (dynamic) network structure is maintained. In mesoscopic dynamics, the opening and closing of ion channels appears somewhat random; the control law is not well-defined but only probabilistically denned. At the macroscopic level, biocomputing regains a certain degree of determinacy. The hierarchical four-level networks described above are partly "hardwired" and partly loosely connected. This feature allows for dynamic allocation of computing resources. Another important feature is modularity, which is evident at all network levels. Furthermore, modular molecular functionalities can be re-configured to serve entirely different purposes. A predominant mode of biocomputing, referred to by Conrad as "shapebased processing," depends on molecular recognition via shape-fitting. In the solution phase, random diffusion and collisions precede the step of molecular recognition. However, the process is assisted by short-range intermolecular forces such as electrostatic interactions, hydrogen bonding, double-layer forces (osmosis), hydrophobic and van der Waals interactions. Thus, the mutual search between two macromolecules prior to recognition via shape-fitting is not a random search but rather a heuristic search (a fa-
148
F. T. Hong
miliar concept in operations research; see Sec. 4.3 for a detailed discussion). Chemical reactions usually do not proceed to completion. A forward chemical reaction is partially counter-balanced by a reverse reaction. In addition, branching reactions lead to different biochemical pathways in various proportions. Switch-based processes such as phosphorylation help make these somewhat random processes more deterministic. Macromolecules exhibit the capability of self-organization. Selforganization also appears in protein folding and in the regulation of ion channels (Sees. 7.6 and 8 of Chapter 1, respectively). It is not surprising that the time course of both ion channel fluctuations and protein dynamics exhibits a fractal nature.51 The events look remarkably similar to the phenomenon of self-organizing criticality observed by Bak and coworkers38'37 in the formation of a pile of sands. In their observation, the event of avalanche was irregular, non-periodic and unpredictable, and is not governed by a welldefined control law, but, instead, by a probabilistic one. The occurrence of earthquakes is also governed by a probabilistic control law.354 Thus, probabilistic control laws are not unique to living organisms. 2.2. Element of non-equilibrium thermodynamics The doctrines of classical (equilibrium) thermodynamics and statistical mechanics indicate that, in an isolated system, the event of time-evolution of the system is for the free energy to reach the minimum (strictly speaking, Gibbs free energy reaches a minimum at constant temperature and pressure, and Helmholtz free energy reaches a minimum at constant temperature and volume) and for the entropy to reach the maximum. In other words, natural physical laws call for the ultimate dissipation of free energy and the attainment of maximum disorder. However, the above statement applies only to an isolated system, which exchanges neither matter nor energy with the surrounding environment. Life forms are not isolated systems because there are constant exchanges, between a life form and its environment, of both materials (as nutrients and waste) and energy (absorption of sunlight by green plants and intake of food by animals, and dissipation of energy as heat to the environment). Therefore, the tendency of an isolated system in equilibrium to establish minimum energy and maximum disorder does not apply to life forms: systems far from equilibrium do not conform to any minimum principle that is valid for functions of free energy or entropy production (see p. 64 of Ref. 542). Thus, life forms are not prohibited from appearing by classical ther-
Bicomputing Survey II
149
modynamics and statistical mechanics. However, how and why life forms have appeared and how and why life forms maintain their stability require additional explanations. Partial answers come from non-equilibrium thermodynamics.540'360-274'275 Prigogine and coworkers492'543 pointed out that, in situations that are far away from thermodynamic equilibrium, selforganization of matter may appear as a result of delicate coupling between energy dissipation and matter flows, known as dissipative structures. Prominent examples from the inanimate world include clouds and galaxies. A well-known chemical example, the Belousov-Zhabotinsky reaction, has been studied from the point of view of biocomputing and exploited for image processing such as contour enhancement. 358,400,549,438 Anderson and Stein,15 however, do not believe that speculation about dissipative structures can be relevant to questions of the origin and persistence of life. These investigators suspected that stable dissipative structures do not exist. On the other hand, life forms are not stable but rather metastable; it is a question of time scales being considered. In other words, stability exhibits a gray scale. 2.3. Element of cellular
automata
Computer simulations with cellular automata are frequently used in modeling various aspects of life and information processing.690 A simple example of a cellular automaton is an array of cells each of which has a value of 0 or 1. The value of each cell is updated in each computational cycle according to a set of deterministic rules. One of the most famous cellular automata related to the problem of life is a game named "the Game of Life" devised by mathematician John H. Conway (Chapter 25 of Ref. 61; also cited in Ref. 226). Starting with any initial configuration, the pattern can evolve into either a static pattern, periodic (life-like) patterns or ultimate extinction. Wolfram728'729 studied cellular automata in general and classified the transition rules into four separate classes. Class I rules lead the system quickly to a dead pattern. Class II rules allow for more lively patterns to evolve, resulting in a combination of static and oscillating patterns but show an overall stability. Class III rules usually lead to rich and rapidly changing patterns. Finally, Class IV rules produce patterns that exhibit emergent phenomena such as propagation, division and recombination. Langton405'93 launched a detailed study of cellular automata as models to simulate life processes. He eventually found a tuning parameter for the
150
F. T. Hong
cellular automata, which he called lambda, that exhibits a distinct correlation with Wolfram's four classes of transition rules. Langton recognized that Class IV rules generate patterns which are located between Class II (stability) and Class IV (chaos) and are similar to the second-order phase transition encountered in thermodynamics. This identification allowed Langton to speculate that life emerges at the phase boundary between stability and chaos, which was subsequently dubbed the "edge of chaos." Langton coined the term artificial life and asserted that understanding of "life as it is" will be greatly enhanced by our inquiry into "life as it could be." Simulation studies suggest that emergent behaviors can follow from simple and strictly deterministic rules but are not explicitly programmed with a set of predetermined rules. For example, interactions similar to the flocking behavior of birds can be generated with a limited number of simple rules, as demonstrated vividly by a simulation known as "boids" (e.g., see Ref. 566). Life is viewed as an emergent phenomenon when interactions of the components, in accordance with conventional physical and chemical laws, exceed a threshold of complexity. However, Rosen569'570 doubted whether such a threshold of complexity actually exists. He questioned the value of simulations, in general, and artificial life research, in particular, in the quest of understanding life (see Sec. 6.13). Conrad and coworkers have also used cellular automata to simulate microscopic dynamics in neuromolecular brain models (see Sec. 7.5). Conrad 141,142 questioned the approach of nonlinear dynamics and did not think that it is possible to derive nonlinear life-like properties from the application of basically linear physics. Instead, he surmised that biological life can be viewed as an extension of the underlying physical dynamics, and that life may not be an emergent phenomenon but, instead, a product of the evolution of the universe. He proposed the fluctuon model, and claimed to be able to derive nonlinear life-like properties from the underlying physics of conditions that simultaneously satisfy quantum mechanics and general relativity.138'139'140 He argued that there are inherent self-organizing features of physics, and life appeared as a consequence of evolution of the universe. Presently, experimental verification is lacking. However, simple inanimate objects do not exhibit these life-like features stemming from quantum mechanics and general relativity. Presumably, these properties can become manifest only when the system achieves a certain degree of complexity. It is thus difficult to perceive the merits of Conrad's approach, since both approaches require life to emerge from complex nonlinear interactions of matters (self-organization). Furthermore, the conventional nonlinear dy-
Bicomputing Survey II
151
namic approach has made no claim that the evolution of life is not part of the evolution of the universe. On the other hand, an alternative answer to Conrad's inquiry may be found in the nonlinearity rooted in the dispersion of linear physical laws (see Sees. 5.14 and 5.17). 2.4. Element of nonlinear dynamic analysis Many of the simulation studies described in Sec. 2.3 were inseparable from a branch of mathematics known as nonlinear dynamic analysis. The past three hundred years of science history is notable for the triumphant accomplishments of linear mathematics and its application to physics, chemistry and even biology. Nonlinear differential equations were once considered intractable and were avoided whenever possible. That this avoidance is possible is a consequence of choosing sufficiently "reduced" and, therefore, manageable problems; this is essentially the approach of reductionists. Thus, it is possible to obtain an exact solution of the Schrodinger equation — the control law in quantum mechanics — for the hydrogen atom but not for atoms more complex than that. It is possible to obtain an exact solution in celestial mechanics for two-body problems but not for many-body problems. For more complex problems, scientists and engineers, amidst their desperate effort to preserve linearity, routinely linearize their problems and/or add coefficients of correction (including the use of perturbation methods). Unfortunately, linearization tends to eliminate interesting and often relevant features, and insertion of coefficients of correction tends to conceal the salient features of the process and to obscure the underlying physical picture. The conventional approach offers little hope for understanding complex processes such as life, in which a detailed understanding of the parts offers severely limited insights into the whole. The advent of high speed digital computers radically transformed the options available to the analysis of complex problems. Speed and unfailing accuracy of a digital computer make it possible to launch a "brute-force" attack on nonlinear equations and nonlinearity of natural phenomena. Computer simulations have thus quickly gained the status of becoming the third branch of scientific endeavors, in addition to theoretical and experimental investigations (e.g., Refs. 727 and 729). The advances have received ample coverage in the popular press. Titles that contain such keywords as "fractals,"433 "chaos"240 and "complexity"696 have been in much demand by the lay public. Nonlinear dynamic analysis had its foundation laid much earlier than its
152
F. T. Hong
recent resurgence (e.g., see Refs. 664 and 1). Mathematician Henri Poincare made substantial contributions to the subject. One of the fundamental tools is a topological structure known as the phase space, which is essentially the plot of the trajectory of a moving object with respect to its position and its velocity or momentum (x vs. dx/dt).h The prototype problem is the equation of motion of a particle subject to various kinds of forces. It is well known that the types of solutions to an equation of motion, such as oscillatory solutions or monotonic solutions, depend on the boundary (initial) conditions. The phase diagram allows for convenient visualization of "clustering" of trajectories in the phase space. Thus, a monotonic solution may be represented by a trajectory which eventually converges to a point (point attractor), whereas an oscillatory solution may be represented by an elliptic (or circular) orbit (periodic attractor). The term "attractor" implies that the trajectories in the phase space are attracted to the basin (valley) in the phase space "landscape." When there are more than one attractor in the phase space landscape, the "watershed" that separates one attractor basin from another is called a separatrix. A resurgence of interest in nonlinear dynamic analysis stemmed in part from the advance in computer technology and in part from the accidental discovery of a new kind of attractor, known as a strange attractor or chaotic attractor, by meteorologist Edward Lorenz four decades ago.424'425 Lorenz studied a simplified set of equations for weather simulations. He found that a slight change of the input parameters (digit truncation) led to similar weather patterns initially. However, the patterns subsequently diverged and all similarity was eventually lost. Increasing the accuracy of input data delayed the onset but did not prevent the subsequent divergence. What Lorenz encountered was a set of superficially simple equations that were highly sensitive to the initial conditions. A dramatization of the effect is embroidered in the well-known Butterfly Effect: the flap of a butterfly's wings in Brazil could set off a tornado in Texas.425 Phase space analysis leads to the identification of the Lorenz attractor, which is an example of a new class of attractors: the aforementioned chaotic attractor. A chaotic process exhibits an erratic pattern which may look quasi-periodic but is not really recurrent and periodic. The phase space analysis reveals a richly patterned attractor, and shows a contraction of ''The momentum (p), the velocity (v) and the mass (m) of a moving object are related by the following equation: p = rnv = m dx/dt.
Bicomputing Survey II
153
the trajectory in the phase space but the attractor never settles into either a point or periodic attractor. In other words, the trajectory exhibits a patterned irregularity. Lorenz's computation invokes strictly deterministic rules of classical mechanics but the outcome is totally unpredictable; the quickest way to find the pattern is to run the simulation. Deterministic chaos is sometimes indistinguishable from true randomness (Sec. 5.14). 3. Biocomputing at the Evolutionary Level It is often said that ontogeny recapitulates phylogeny. Protein folding appears to retrace the steps of protein evolution. Protein folding can be regarded as problem solving because only the linear primary amino acid sequence of a protein is explicitly programmed (Sec. 7.6 of Chapter 1). The protein must find its own way to fold properly, and the folding process is subject to physico-chemical constraints, which are inherent in the properties of the constituent amino acids. Likewise, evolution is also problem solving by living organisms, under the constraints of the environment, for long-term survival and other needs, and for continuing improvement. Aside from its intrinsic intellectual value, evolution as problem solving provides a useful metaphor for creative problem solving by humans (Sec. 4), as well as an inspiration for practical algorithms in artificial intelligence208 (Sec. 7). 3.1. Is evolution deterministic? Evolution is crucially related to the issue of determinism because random error is an important element in evolution. Without mutations, there would be no evolution. However, such a statistical view about the evolutionary process has been questioned by science philosophers. Rosenberg571'572 suggested that an omniscient account of evolution would not require the notion of random drift. He claimed that the evolutionary process is fundamentally deterministic. The deterministic view was echoed by Horan.333 On the other hand, Millstein469 argued that there is insufficient reason to favor either determinism or indeterminism. She adopted an agnostic stand. Rosenberg's deterministic view was apparently rooted in his dissatisfaction with the concept of random drift. He pointed out that the statistical view of evolution mirrors the statistical character of thermodynamics. He thought the analogy was not an illuminating one because of the ambiguity in the relation between statistical thermodynamics and Newtonian mechanics: although the second law makes a statistical claim, it is supposed to reflect the fully deterministic behavior among the constituents of the thermody-
154
F. T. Hong
namic systems, in accordance with Newtonian mechanics (p. 58 of Ref. 572). Rosenberg pointed out Darwin's admission that the notion of random drift was necessitated by "our ignorance of the cause of each particular variation" (Chapter 5 of Ref. 159). Darwin's remark resembles Laplace's claim of physical determinism. If one subscribes to absolute physical determinism, then one is obliged to hold a deterministic view of evolution. However, Laplace's claim can neither be proved nor disproved, as will be demonstrated in Sec. 5.16. Rosenberg's misgiving about the statistical nature of thermodynamics was not totally unfounded. As will be demonstrated in Sec. 5.13, the second law is not compatible with Newtonian mechanics, despite the common belief to the contrary. Thus, Rosenberg's objection to the statistical theory of evolution needs to be reconsidered. Rosenberg accepted quantum mechanics as non-deterministic in principle, but he thought that quantum mechanics cannot explain the probabilistic character of evolutionary theory. He stated that, "by the time nature gets to the level [of evolution], it has long since asymptotically approached determinism" (p. 61 of Ref. 572). Rosenberg assumed that the convergence of microscopic indeterminism towards macroscopic determinism is monotonic. However, as will be demonstrated later, the hierarchical organization of biocomputing allows the control laws to swing between the two extremes of randomness and determinism, in either direction. Rosenberg also ignored the implication of chaos and bifurcation theory. A seemingly minute fluctuation or dispersion, if placed at a critical juncture of a chain of events, can, in principle, lead to drastically different outcomes. Rosenberg's inference apparently stemmed from Schrodinger's judgment expressed in his influential book What is Life?596 (see also Sec. 5.4). Rosenberg alluded to the deterministic effect of evolutionary forces that shape the direction of evolution via natural selection (p. 71 of Ref. 572). The analogy between evolutionary selection pressures and forces in Newtonian mechanics is somewhat misleading. A force in the Newtonian sense moves objects in the same direction as the force is directing, but never in the exactly opposite direction. After all, classical mechanics is reasonably deterministic. In contrast, evolutionary selection pressures operate in a statistical sense. The situation is similar to what transpires in diffusion: diffusional "forces" act on gases or solutes in solution in a statistical sense. It is well known in thermodynamics that the negative gradient of the chemical potential [energy] gives rise to a phenomenological force, / (experienced by each molecule), that creates a tendency for solutes to move from regions of high concentrations to regions of low concentrations, i.e., in the direction
Bicomputing Survey II
155
of a decreasing concentration gradient: / = -fcTVlnC, where k is the Boltzmann constant, T is the absolute temperature, C is the concentration of the solute, and V is the gradient operator in vector calculus. Even though the force is expressed in the unit of newtons per molecule (or a smaller unit, such as piconewtons per molecule), this phenomenological force does not act on (isolated) individual solute molecules, and is, therefore, not a bona fide force. The force acts only in the presence of the ensemble of a large number of molecules. In the ensemble, a given solute particle (ion included) or gas molecule is pushed around towards all directions by particles or molecules surrounding it; on a statistical average, a molecule experiences more pressure to move down along the concentration gradient than up against the gradient. That is why some solutes move against the gradient although, statistically, many more solutes move down along the gradient than those move up against it, thus resulting in a net flux of solutes from dense regions to dilute regions. Metaphorically, social pressures in social Darwinism also operate in a statistical sense. This is why nonconformists exist in spite of peer pressure, and antisocial persons persist in spite of severe penalty codes. We thus favor an intermediate point of view: evolution is neither absolutely deterministic nor completely random. A thorough discussion of determinism will be deferred to Sec. 5. In the remaining part of this section, only those topics of evolution that are to be integrated with other issues of biocomputing will be covered. Examples will be drawn from membrane bioelectrochemistry and membrane bioenergetics mainly because of the author's background. Readers are likely to find relevant examples in other areas of biology to serve the same purposes. 3.2. Explanatory power of evolution Darwin's theory of evolution provides a mechanistic explanation of how structurally and functionally more advanced species can arise from simpler and more primitive species, and how some physical traits in more advanced species are related and linked to those of more primitive species. Its explanatory power is further reaffirmed when the "lineage" of variations becomes explainable in terms of amino acid sequence of proteins and nucleotide sequence of genetic codes.255'254'430 However, the evolutionary theory, unlike other scientific theories, has not been proved by conventional scientific methodology with the expected rigor for an obvious reason: it is
156
F. T. Hong
impossible to conduct control experiments and to repeat the relevant experiments, under controlled conditions, for the sake of statistical analysis. The theory was established by virtue of preponderance of evidence rather than by virtue of the more commonly expected proof beyond reasonable doubt. In other words, Darwin constructed the theory primarily by the method of induction instead of deduction, using a plethora of examples to support his arguments.0 There simply is not any better and more comprehensive mechanistic explanatory theory around that is radically different from Darwin's theory. Induction requires the use of a large number of examples. Thus, the discussion to be presented in the subsequent sections appears to be flawed because usually no more than two examples are cited to support each point of view. It is not because that the author believes that proof can be established by a scanty number of examples. Rather, the author believes that the readers will be able to find many additional examples. It is therefore essential for the readers to explore and supply additional examples so as to either corroborate or refute the arguments presented. The author would greet refutations or corroborations with almost equal enthusiasm. 3.3. Evolution as problem solving In the simplest formulation, Darwin's theory of evolution is billed as variation and natural selection. It is of course implicitly assumed that an organism is capable of perpetuating a desirable trait by reproduction and thus allows the cycle of evolutionary mechanisms to continue. Thus, the evolutionary process can be characterized by a triad of variation, selection and reproduction (perpetuation). The triad can be regarded as a pseudoalgorithm rather than an algorithm because the evolutionary process is not sequential but rather both sequential and parallel. Variation refers to genetic mutations as a result of spontaneous or environmentally induced copying errors of genetic codes. Selection refers to natural selection of Darwinism. Perpetuation allows for altered genetic traits to be transmitted from generation to generation. Perpetuation obviously depends on gradualism in protein folding for its operation (see Sees. 7.5 and 7.6 of Chapter 1). In the subsequent discussion about creative problem solving, a similar pseudo-algorithm will be discussed (Sec. 4). In both pseudo-algorithms, the c
Medawar vigorously disputed that Darwin had ever used induction in the formulation
of his theory of evolution (see p. 137 of Ref. 451). This issue will be resolved in Sec. 4.13.
Bicomputing Survey II
157
first step can be regarded as a search for possible solutions for the problem. A pertinent question to ask is: Did evolution proceed by trial and error? 3.4. Random search, exhaustive search and heuristic search The primary source of variations is mutation of the genetic codes. Here, randomness enters the process of evolution. Did evolution proceed by means of purely random permutations/combinations of 20 amino acids? Using the terminology of operations research, the total number of possibilities of mutations constitutes the search space. Exhaustive searching means trying out each and every possibility. Random searching (or blind searching) means trying out various possibilities randomly and blindly. Over a finite period, random searching is not equivalent to exhaustive searching; only through eternity can random searching become exhaustive. Systematic searching means conducting a search according to a predetermined, and often sequential, procedure, which is usually intended to cover the entire search space. An exhaustive search is often a systematic search but a systematic search may not be exhaustive. What is customarily referred to as "trial and error" is either random or systematic searching. Heuristic searching means trying out a subset of the search space specially selected by certain pre-selected explicit criteria, known as /leuraiics631'490'629 (see also Sec. 6.1 of Chapter 1). A heuristic search need not be based on explicit rules but can be based on some intuitive feeling, commonly known as hunch or "gut feeling" (Sec. 4.10) It is generally recognized that evolution could not have been resorting to random searching all the way along, much less exhaustive searching, because there simply was not enough time for evolution to proceed this far, given the Earth's age as the ultimate time limit. In accordance with the lock-key paradigm, molecular recognition starts as a random search via diffusion and collisions (Sec. 6 of Chapter 1). Once the pair of encountering molecules get into each other's close range, the reduction-of-dimensionality effect and other effects due to non-covalent bond interactions kick in to constrain the exploration to a smaller search space. It is no longer random diffusion in an isotropic three-dimensional space because subsequent searches are predominantly two-dimensional. Randomness remains but it is biased randomness (heuristic searching). The bias is like what transpires in the random walk of a drunken sailor on a slope, as compared to that on a leveled ground. Evolution could be similar. In the metaphor of a fitness landscape, 732
158
F. T. Hong
evolution first "wandered" from one fitness peak to another peak at random. However, once it stumbled on a particular valley or basin which was richly "supportive" for self-organization, it locked into that particular valley by virtue of a positive feedback mechanism. Other equally or perhaps more favorable valleys thus became off limits. In other words, the early steps of evolution were probably purely chance phenomena (random searching). Once Nature stumbled on a self-reinforcing scheme, as suggested by Kauffman's autocatalytic model, 362>363>364 evolution locked onto a specific pathway and settled in a region of the search space that offered a large number of viable mutations (heuristic searching). Other possible routes of future evolution were then closed as not being in the "thoroughfare" of the search paths. The search no longer covers the entire available search space with a uniform probability, much less exhaustively. Thus, it appears that evolutionary problem solving adopts a combined top-down and bottom-up approach. Mutations provide the bottom-up exploration of possible evolutionary modifications, whereas the above-mentioned positive feedback scheme exerts a top-down constraint and limits the range of the evolutionary searches. Computer simulation studies suggested that the standard genetic code did not appear by random chance but was a consequence of evolution. The genetic code evolved not only to minimize the impact of mutations or mistranslations on protein folding and protein properties, but also to make possible heuristic searching of viable mutations214 (see Sec. 7.5 of Chapter
1).
Computer simulation studies of protein folding suggested that proteins fold by a nucleation mechanism. The so-called folding nucleus region tends to be more conserved than other regions of a protein (Sec. 7.6 of Chapter 1). These folding nuclei are the richly "supportive" valleys in the metaphor of fitness landscape that offer a large number of viable mutations. 3.5. Enigma of homochirality of biomolecules In evolution, asymmetry (or chirality) is a recurrent theme from the molecular level (the existence of enantiomers and the predominance of one isomer), to organ systems (a symmetric body plan with the exception of the heart and the liver), and to physiological functions (cerebral lateralization, Sec. 4.15). At the molecular level, L-amino acids and D-sugars are the overwhelmingly predominant natural forms. There is no obvious a priori reason why one form of isomers should be superior to the other. Nature could have selected the set of D-amino acid monomers for the construction of proteins.
Bicomputing Survey II
159
It is not clear that the choice of the L set instead of the D set (the problem of homochirality of biomolecules) was of chance origin or due to a definite physical cause. The emergence of the modern genetic codes was equally intriguing, and remains a topic of debates and speculations.375 Hegstrom and Kondepudi291 concisely reviewed existing examples of chiral asymmetry from the level of subatomic particles, through the level of animals and plants, and to the level of the universe. As is commonly known, there are four types of fundamental forces: the gravitational force, the electromagnetic force (responsible for ordinary chemical reactions), the strong nuclear force (which holds atomic nuclei together), and the weak nuclear force (responsible for nuclear beta decay). Of the four, only the weak nuclear force is chiral (nonconservation of parity). In unifying the weak force and the electromagnetic force, Weinberg, Salem and Glashow705 proposed a new "electroweak" force that appears when spinning electrons attain a speed near that of light. Thus, all atoms are inherently chiral.80 However, chirality in life is another matter. The asymmetry of molecules caused by the weak nuclear force or the electroweak force was estimated to be exceedingly small (1 in 109 or 1 in 1017, respectively), unless an amplification mechanism exists. Such an amplification mechanism is theoretically possible. Being dominated by the electromagnetic force, a chemical reaction that produces both enantiomers tend to produce both isomers in equal amounts. In a closed system attaining thermodynamic equilibrium, it is not possible to produce an excess of one isomer at the expense of the other (cf. Le Chatelier principle, negative feedback concept). However, in an open system (i.e., with flows of matter and energy to and from the environment) that is far from equilibrium, it is possible to attain an instability so that a positive feedback mechanism becomes operative, i.e., a bifurcation pointd (see Refs. 378 and 291). If one enantiomer becomes slightly in excess over the other, the majority isomer can grow at the expense of the minority isomer by a process known as spontaneous symmetry breaking. In the case of amino acids, once the L-isomers became increasingly utilized in successful mutations, the probability of a successful use of the D-isomers in mutation decreased accordingly. A minute change due to an external perturbation (e.g., the effect of the weak nuclear force) or due to pure chance fluctuations could tip the delicate balance (see Sec. 5 for an analysis of the subtle difference between noise with a cause and noise without a cause). However, d
The term "bifurcation" does not follow the strict definition of bifurcation theory, but
is used in a loose sense to indicate the existence of alternative time courses.
160
F. T. Hong
other investigators dismissed the asymmetry of weak interactions as the cause of homochirality of biomolecules.365 More recently, Clark131 speculated that selective destruction by circularly polarized starlight may be an alternative explanation. The topic of homochirality of biomolecules has been reviewed by Keszthelyi,366 who favored the chance origin rather than causal origin. Chirality regarding the left-right asymmetry in the development of animal body plans is also a mystery.730 The advent of molecular biology provided an unprecedented and powerful approach to this problem and allowed a number of biochemical components as well as microscopic structures (such as cilia) to be identified as the initiators of asymmetric development in an otherwise symmetric body plan in vertebrates.56 What, in turn, causes an asymmetric deposition of these molecules (left vs. right) and what causes cilia to rotate counterclockwise rather than clockwise remain elusive. 3.6. Damage control and opportunistic invention When the choice is between two symmetrical sets such as two steric isomer sets of amino acids, one set is probably as good as the other. When the choice is among two or more asymmetric options, it may be possible to pick one as the best choice. However, the notion of a "best choice" is inherently problematic. A good choice under a particular set of selection criteria imposed by the present environmental conditions may prove to be an inferior choice in hindsight, when an environmental change leads to a corresponding change of selection criteria. In general, a "short-sighted" evolutionary choice by a given species could be penalized by future extinction. By evolving additional features to cope with the unforeseen shortcomings caused by an environmental change, an organism might be able to "live with" a choice, which was initially good but later turned sour. Conversely, an organism might evolve and settle for an ambivalent deal, if it could evolve mechanisms to cope with the accompanying side effects at the same time. Let us consider the case of photosynthesis in green plants as an example. Superficially, evolution of the water-splitting capability in photosynthesis offered the worst combination of environmental conditions: the simultaneous presence of phospholipids, oxygen and light. Photosynthesis generates an end product, molecular oxygen, that is highly detrimental to the integrity of the thylakoid membrane because unsaturated phospholipids are prone to oxidation in the presence of light and oxygen. In addition, highly
Bicomputing Survey II
161
oxidizing free radicals and toxic oxygen species also damage and degrade the reaction center Di-peptide.46 How did Nature end up with this scheme? Long before photosynthetic organisms evolved, anaerobic bacteria could only partially oxidize nutrients to form, for example, lactic acid, and could extract only a fraction of the free energy from the nutrients. Further oxidation required molecular oxygen which was unavailable in large quantities until oxygen-evolving photosynthetic bacteria evolved. Initially, there existed only a single photosystem in the reaction center of a photosynthetic organism that neither evolved oxygen nor fixed carbon dioxide. The modern oxygen-evolving photosynthetic organisms (cyanobacteria and green plants) utilize two photosystems, as described in Sec. 5.2 of Chapter 1. It was generally thought that the oxygen-evolving Photosystem II was a later add-on to a preexisting Photosystem I. However, when the structures of the reaction centers became elucidated, it was puzzling to find that the (single) reaction center of purple phototrophic bacteria was actually similar to the oxygen-evolving Photosystem II of green plants rather than to Photosystem I, as customarily assumed (see discussion in Refs. 44, 45 and 463). Subsequent high-resolution crystallographic data of Photosystem II reaction center reported by Barber's group562'493 show that the arrangement of the transmembrane a-helices is remarkably similar to that of the helices in the reaction centers of purple bacteria169 and of plant Photosystem I.390.597 Thus, there is a link between Photosystem II and the purple bacterial reaction center on the one hand, and between Photosystem II and Photosystem I on the other hand. Barber and coworkers562 pointed out that the five-helix scaffold of the two reaction centers might have been largely unchanged since photosynthesis first evolved 3.5 billion years ago. The similarity between CP-47 of Photosystem II and the corresponding six helices of the PsaA-PsaB proteins of Photosystem I suggests that the latter arose by genetic fusion of a CP43/CP47-like protein with a five-helix reactioncenter prototype. Thus, the high-resolution data further consolidated the phylogenetic origin of Photosystem I from Photosystem II. Greenbaum and coworkers261 have shed light on this phylogenetic relationship. These investigators demonstrated O2 evolution and CO2 fixation in a genetic mutant of Chlamydomonas which lacks Photosystem I. They suggested that the combined scheme of Photosystems I and II was not an absolute requirement for O2 evolution and CO2 fixation to take place; Photosystem II alone might be sufficient. However, Photosystem II alone is susceptible to photodamage by molecular oxygen. They presented quite a different scenario from that offered by conventional wisdom. Photosystem
162
F. T. Hong
II evolved first to produce O2 and fix CO2. Initially, the oxygen concentration in the atmosphere was too dilute to be harmful to Photosystem II. It took about a billion years, after the appearance of photosynthetic organisms, for the atmospheric oxygen to begin to rise significantly and reach the harmful level that could destabilize the structure of Photosystem II (e.g., Chapter 14 of Ref. 7). If the interpretation of Greenbaum and coworkers is correct, the addition of Photosystem I to the existing Photosystem II was then an adaptation to the changing environment which Photosystem II helped create. If photodamage of Photosystem II could be viewed as a "self-inflicted wound," then the addition of Photosystem I was the subsequent "damage control." This is but an example readily coming to my mind. Other examples abound; the evolution of scavengers of free radicals was another. Resorting to damage control instead of evolving a brand new system is consistent with a strategy of heuristic searching. The best solution was not being sought after. There are examples showing that certain features with survival values arose in evolution long after the initial "decision" had been made, i.e., long after the passage of a critical bifurcation point. Of course, Nature could not have exercised an extremely long-range plan. The subsequent emergence of a useful feature, which was built on ancient capability, merely reflected an "opportunistic invention" by Nature. A case in point is the role of polar head-groups of phospholipids in proton lateral mobility on the membrane surface and in the activation of protein kinase C (PKC) (Sees. 5.4 and 5.5 of Chapter 1, respectively). Evolution could not have had the foresight when phospholipids first evolved to form the plasma membrane. Presumably, proton pumps appeared because anaerobic metabolism produced a sufficient amount of organic acids to lower the ocean's pH. The latter condition created the need for marine organisms to evolve a proton pump to lower the intracellular proton concentration, since excessively low pH may cause most, if not all, proteins to denature. The proton gradient so generated was subsequently recruited for bioenergetic functions and hence the mechanism of lateral proton mobility became a useful add-on. Thus, availability creates new demands, and a latent capability such as proton lateral mobility simply rose to the occasion. Likewise, the evolution of phosphatidyl serine for PKC activation appears to be a consequence of fine-tuning phospholipid properties. Presumably, evolution lacked foresight (see Sec. 6.13 for a discussion of final causes). The apparent virtual intelligence of intelligent biomaterials is not true intelligence but is fully explainable in terms of evolutionary mech-
Bicomputing Survey II
163
anisms (Sec. 7.3 of Chapter 1). Yet, by introspection, our own intelligence enables us to have foresight and to be able to plan ahead. How did this capability of foresight arise (see Sec. 4)? 3.7. Analogues and homologues The existence of analogous and homologous structures among various organisms is evidence suggesting evolution's practice of heuristic searching. Though excellent examples abound, we shall again use one in bioenergetics. The difference between the reaction scheme in the purple phototrophic bacterium Rhodopseudomonas viridis and that in the archibacterium (archea) Halobacterium salinarum (formerly Halobacterium halobium) illustrates this point. Rhodopseudomonas viridis evolved a complex system based on the photochemistry of bacteriochlorophyll (a metalloporphyrin), whereas Halobacterium. salinarum evolved a much simpler system based on the photochemistry of vitamin A aldehyde — a single protein bacteriorhodopsin — for almost the same purpose: generating a transmembrane proton gradient. Furthermore, bacteriorhodopsin closely resembles the visual pigment rhodopsin in chemical structures (both are vitamin-A containing protein) and in molecular functionalities (both exhibits similar photoelectric signals325'328'329). Why did Halobacteria recruit a visual pigment to perform photosynthetic function? We shall entertain two different scenarios of explanation. There are four different retinal proteins in the plasma membrane of Halobacterium salinarum: a proton pump (bacteriorhodopsin), a Cl~~ pump (halorhodopsin), and two photosensors (sensory rhodopsin and phoborhodopsin). Mukohata and coworkers476'339 analyzed the amino acid sequences of two dozen retinal proteins from various strains of extreme halophilic bacteria. They found these proteins form a distinct family designated as archaeal rhodopsin family, which was not related to other known proteins, including the visual pigment rhodopsin. Their analysis indicated that these proteins arose from at least two gene duplication processes during evolution from ancestral rhodopsin. They speculated that the original rhodopsin ancestor was closest to bacteriorhodopsin. Thus, for photosynthesis, the green plants' two photosystems are analogous to purple phototrophic bacteria's reaction center, whereas bacteriorhodopsin of Halobacterium salinarum is homologous instead. Presumably, by having evolved bacteriorhodopsin, Halobacteria might be locked onto the path of making vitamin A-containing (retinal) proteins. The subsequent evolution of
164
F. T. Hong
the Cl~ pump halorhodopsin and two sensor rhodopsins constituted an act of heuristic searching in evolution: different functionalities were served by modifications (mutations) at certain critical amino acid residues. It is of interest to note that bacteriorhodopsin and halorhodopsin were intrinsically equipped to transport either H+ or Cl~ ,497 and are functionally inter-convertible, under appropriate conditions.43'589 A second scenario assumes that the two sensor proteins evolved first for phototaxis. The visual pigment rhodopsin binds protons from the cytoplasmic side of the membrane, but does not transport protons. We have previously speculated that the early receptor potential associated with this rapid proton binding may be responsible for triggering the biochemical reactions of visual transduction317'318'319 (see also Sec. 5.3 of Chapter 1). If bacteriorhodopsin evolved from existing sensor proteins, all it had to do was add a proton-transport channel and a mechanism to release protons to the opposite side of the membrane, thus evading the need to evolve a new photosynthetic apparatus from scratch. This latter scenario would explain why the photoelectric signals from bacteriorhodopsin and rhodopsin have a high degree of similarity.325 However, this latter scenario is not supported by the analysis of Mukohata and coworkers. Nevertheless, either scenario could be used to illustrate evolution's practice of heuristic searching, i.e., retooling of an existing molecular functionality for different physiological functions. (How convenient to have such an ambivalent view!). 3.8. Co-evolution and perpetual
novelty
The evolution of different designs with unequal capabilities for the same purpose, discussed in Sees. 3.6 and 3.7, also indicated that Nature apparently did not aim at seeking the best solution. Nature merely sought improvements. An individual species merely tried to adapt and improve its capability for continuing survival. A diverse variety of organisms evolved to adapt to one another and occupied different ecological niches by coevolution. Thus, the fitness landscape is not fixed but is rather dynamically interactive. Evolution to a new adaptive state by a given organism often upsets the previous steady state for other organisms and alters the fitness landscape, and, thus puts a new selection pressure on other organisms. Evolution must take all living organisms and their interactions into account, and the outcome of a computer simulation of the process can only be found implicitly, often by means of an iterative procedure. The situation is analogous to the arms race in which superpowers continue to "evolve" new
Bicomputing Survey II
165
weapons and defense strategies; it will never reach true equilibrium, but phases (interludes) of stability — punctuated equilibrium (see later) — are possible through mutually assured destruction (MAD). An excellent example of co-evolution is provided by that of oxidative and photosynthetic phosphorylation, which are essentially two mutually supportive processes in ecology. The former utilizes oxygen and produces carbon dioxide, whereas the latter produces oxygen and utilizes carbon dioxide. Photosynthetic bacteria appeared shortly after the appearance of the first living cell (about 3.5 billion years ago). Photosynthetic bacteria, by providing an inexhaustible source of reducing power (from sunlight), overcame a major obstacle in the evolution of cells. However, it took an additional time of more than 1.5 billion years before aerobic respiration became widespread. What caused such a long delay? This might be because oxygen so produced was initially utilized to convert Fe2+ into Fe3+ in the ocean (see Chapter 14 of Ref. 7). Only after Fe2+ in the ocean became exhausted did oxygen in the atmosphere begin to rise (somewhere between 0.5 to 1.5 billion years ago). Aerobic metabolism and photosynthesis had been two such wonderful mutually supportive survival schemes that some eukaryotic cells took advantage of them and captured (or, rather, kidnapped) the bacteria with such capabilities, which thus became mitochondria and chloroplasts, respectively, according to the endosymbiont hypothesis. Such events are hard to prove, but convincing clues are available: both mitochondria and chloroplasts contain genomes that are characteristic of bacteria and different from those of eukaryotic cells (e.g., see Chapter 14 of Ref. 7). Thus, evolution follows changes of environment: certain mechanisms would not have evolved until the required conditions had been met.397'398 However, attempts to take advantage of new opportunities provided by a changing environment were also met with new challenges, which called for new defense mechanisms to evolve. True equilibrium never existed; the concept of the optimal or the best adapted life forms is meaningless because life is engaged in "perpetual novelty" for the sake of continuing survival (p. 147 of Ref. 696). 3.9. Punctuated equilibrium and Cambrian explosion Apparently, evolution did not proceed at a constant rate in terms of biodiversity. Instead, evolution proceeded in a "stop-and-go" — intermittently bifurcating — fashion (concept of punctuated equilibrium). This is exemplified by the Cambrian explosion when new species appeared with a high
166
F. T. Hong
frequency. It is an intriguing question to ask: Did the various "stop-andgo" phases coincide with the alternating strategies of random searching and heuristic searching? It is difficult to ascertain. However, the acquisition and elaboration of Hox complexes (a regulatory gene that is responsible for laying down the structural body plans) may be necessary (though not necessarily sufficient) for this Cambrian explosion.196'368 Thus, it was probably a consequence of heuristic searching (cf. Sees. 7.5 and 7.6 of Chapter
1).
4. Cognitive Aspects of Biocomputing One of the most important attributes that distinguish the human brain from a sequential digital computer is creative cognition. The thought process of geniuses or individuals with superior mental abilities has captured the fascination of philosophers and scientists perhaps since the inception of these professions. Since the invention of digital computers in the mid-20th century, implementation of cognitive abilities in computers or robots has been an important goal in the quest for machine intelligence. The cognitive gap between the human brain and the most advanced computer programs has narrowed considerably after several decades of AI research. Efforts made in these computer simulations of the creative process have also greatly enhanced our understanding of the creative process. Moreover, advances made in neuroscience research have transformed creativity research from the phenomenological level to the cellular and molecular levels. Needless to say, insights into the creative process will not only help engineers design better computers but also help educators develop new educational strategies. We shall examine the process of creative problem solving from the perspectives of modern cognitive science and artificial intelligence: the biocomputing approach. In this section, anecdotes, introspections and personal observations are to be cited and used freely. Although it appears to be a blatant departure from conventional scientific practices, it is done with a good reason, which will become apparent at the end of this section (see also Sec. 6.14). A limited version of this section has been published elsewhere.327 4.1. Models of creative problem solving Creative problem solving is inherently complex and any attempt to describe the "algorithm" associated with the cognitive process runs the risk of oversimplifying the subject if not being misleading. In general, the act of creative problem solving cannot be readily broken down into a sequence
Bicomputing Survey II
167
of algorithmic steps consisting of simple logical operations (cf. Sec. 6.13). However, it can be decomposed into a time sequence of three or four phases: a pseudo-algorithm, so to speak. For almost a century, psychologists and cognitive scientists have been building models and have made attempts to uncover the procedure underlying creative problem solving and the secret wrapped under the cloak of a genius' mind. One of the approaches was to rely on introspective reports of accomplished scientists and certified geniuses232'272'687'155'610 or their biographies.632'224 Poincare described in detail his introspective account of the thought process during mathematical invention 521 (see also compilations in Refs. 687 and 232). Inspired by Poincare, Hadamard also analyzed the mind-set of mathematicians in their work.272 However, some investigators downplayed introspective data and focused on behavioral experimentation. A large number of monographs have appeared in the past century. They are too numerous to be listed here. A number of handbooks provide a convenient access to the vast literature.242'577'579'650 4.1.1. Wallas'four-phase model Wallas697 (see also p. 91 of Ref. 687) was among the first to suggest a paradigm for the creative process. Apparently modeling after Poincare's account, Wallas proposed that the creative process consists of four phases: preparation, incubation, illumination and verification. When a human problem solver is confronted with a problem, the first step is task analysis or preparation. Task analysis usually involves identification of the crucial steps and feasible angles of attack. Task analysis may also include means-ends analysis490 and transformations of an original problem into another which looks less formidable, or into another problem for which a "cookbookrecipe" solution already exists. This is done with reference to one's previously acquired knowledge and with the hope of matching the present problem with a previously known answer. The preparation phase is then followed by a period of incubation. The problem is often set aside for a while before it is taken up again. The period in which the idea begins to "crystallize" is the illumination phase. The final phase of verification consolidates the solution of the problem. 4.1.2. Koestler's bisociation model The model proposed by writer Koestler376 is worth mentioning. Koestler made an attempt to analyze the common processes leading to scientific dis-
168
F. T. Hong
coveries, art creation and humor, from a unified point of view. Specifically, Koestler defines two terms: matrix and code. Essentially, a code consists of the specifications which define the target of problem solving. A matrix is a frame of reference or a subset of search space, which is selected by the individual problem solver on the basis of the given code, the result of means-ends analysis, personal preference, and intuition. Apparently, Koestler chose to deal with high creativity — the kind of creativity often associated with geniuses. In Koestler's formulation, a problem of high novelty cannot be solved by staying in the original or commonly chosen matrix. The solution requires a process called bisociation: binary association of two matrices. Graphically, the second matrix is confined to a plane of different dimension, and is therefore accessible only to those who can make the leap of thought into an unconventional "dimension." The moment of illumination is marked by the sudden match of the solution and the problem, when the target on the second matrix is reached. Koestler's theory was met with mixed responses (see a summary by Partridge and Rowe.503) Boden75 discussed it approvingly. However, she put Koestler in the same camp as Poincare and treated him as a useful "witnesses" to the creative processes, but regarded his explanation as vague and suggestive. Medawar451 was scathingly disapproving of Koestler's model. In place of bisociation, Medawar proposed the Hypothetico-Deductive Theory: the process of making a scientific discovery consists of proposing a hypothesis and then performing deductive reasoning. Medawar also dismissed emphatically the role of induction in scientific discoveries, and seriously questioned whether Darwin had actually used inductive reasoning to formulate his theory of evolution, as Darwin had claimed. Medawar dismissed Francis Bacon's induction as merely the "inverse application of deductive reasoning" (p. 134 of Ref. 451).
4.1.3. Simonton's chance-configuration model The dispute between Koestler and Medawar can be readily resolved by considering Simonton's chance-configuration theory.633 At a metaphoric level, Simonton's model of creative problem solving is analogous to evolutionary learning: the evolutionary triad of variation, selection, and perpetuation544 (cf. Sec. 3.3). Like Koestler, Simonton also specifically addressed problem solving by highly creative people. Simonton's model claimed its parentage in Campbell's blind-variation and selective-retention model,104'733'298 which,
Bicomputing Survey II
169
in turn, followed the analogy of evolution and learning.40-544 The cognitive process in Simonton's model is divided into three phases: "blind" variation, selection, and retention, which correspond to searching for a potential solution, recognizing it, and implementing the solution. The analogy to the evolutionary triad is apparent. Superficially, the notion of "blind" variation implies "random" searching. However, according to Wuketits,733 it means, instead, "not guided by anticipation," although Campbell himself had difficulty making it unambiguous. In my opinion, the notion implies that the creative process is neither completely deterministic — i.e., predictable or premeditated— nor completely random; it is relatively deterministic. In other words, searching for a solution can neither be conducted by following a predetermined route nor by means of trail and error alone (with luck). This point will be made clear in Sees. 4.26 and 5. In reference to the phase of "blind" variation, Simonton specifically considered the process of solving a difficult problem, perhaps a problem with no previously known answers. This phase obviously includes task analysis, as mentioned earlier. Most people probably would not launch a truly blind or truly exhaustive search, but would rather engage in the search for an appropriate solution from the known and learned repertoire. Only when one becomes convinced that the existing approaches are exhausted does one begin the search for an unknown and unprecedented solution. Only a fraction of people are actually willing to invest the required time and effort to follow through a difficult problem. In the phase of selection, the problem solver chooses, upon first screening of the search space, a short list of candidate solutions that are deemed more likely to be an appropriate solution. The ability to recognize the appropriate or probable solutions during the search process is just as important as the ability to define an appropriate search space. Failure to find a solution can be due to either the exclusion of the solution from the search space or the inability to recognize the solution when included. In the phase of retention, the solution that has been selected or recognized must be preserved and retained by some thought process. Often, more than one probable solution is found to match the problem, but not all of them are appropriate or right. Verification is thus required to complete the creative process. In the subsequent discussions, no distinction will be made between the terms "retention phase" and "verification phase."
170
F. T. Hong
4.2. Parallel processing versus sequential processing in pattern recognition One of the merits of Simonton's chance-configuration theory is the feasibility of examining the role of parallel processing and sequential (serial) processing in creative problem solving. This approach also allows for considerable demystification of Poincare's introspective account. Most investigators did not approach the problem from this point of view. A notable exception is Bastick, who associated intuition with the parallel processing capability of the right cerebral hemisphere (Sees. 5.2 and 5.4 of Ref. 52). Csikszentmihalyi suggested that something similar to parallel processing may be taking place during the period of incubation (p. 102 of Ref. 155). However, the idea was not further elaborated. A common thread that ties all these ideas together is the notion of pattern recognition. In the 1988 Peano Lecture, Simon suggested that there is no need to postulate special mechanisms to account for intuition, insight or inspiration, since these phenomena can be produced by the mechanism of recognition (p. 117 of Ref. 629; cf. Sec. 4.26). That pattern recognition is a central element of scientific discoveries can be seen from how Babylonian astronomers made predictions. Unlike Greek astronomers who developed geometrical models, Babylonians used numerical models: predictions were made purely by tables of numbers with recurrent patterns, which predicted numbers from numbers (see, for example, p. 135 of Ref. 212). If a given problem is treated as a pattern, then the available candidate solutions in the search space can be regarded as templates, and finding the probable solutions and/or the correct solution within the search space thus becomes a process of pattern recognition (cf. template matching; p. 69 of Ref. 389). Searching for suitable candidate solutions begins with the search phase and ends with the match phase, thus resulting in the acquisition of a small number of solution-templates that reasonably match the problem pattern. However, these candidate solutions must be subjected to further scrutiny in the verification phase, before either some of them are retained as the final solution (s) or none of them actually works and the search continues. In the subsequent discussion, the first two phases of Simonton's model will sometimes be combined into a single phase of search-and-match in the present rendition. The search-and-match and the verification-retention phases formally correspond, respectively, to: a) Poincare's intuitive and logical approaches (pp. 210-212 of Ref. 521; Sec. 2.1 of Ref. 52), b) the inspirational and elaborative phases, as proposed by Kris391 (see also Ref. 494),
Bicomputing Survey II
171
c) the solution-generating and solution-verifying processes, stipulated by Newell et al.,488 d) the visual-ability and verbal-ability modes, as mentioned by Bastick (pp. 192-193 of Ref. 52), and e) parallel-intuitive and sequentialdeliberative thinking, as mentioned by Boden (p. 205 of Ref. 75). Another useful identification is to associate intuitive and logical approaches with dynamic (statistical, distributed, downward) and semiotic (rule-based, local, upward) controls, respectively.505 The dynamic and semiotic controls were thought to involve in evolution and embryonic development as well. What is a pattern? Coward148 proposed a simple definition: a pattern is something that repeats, in time or in space. Furthermore, a pattern may be hierarchical: a pattern is composed of a set of component patterns and repeats if a high enough proportion of those components are present. In this additional elaboration, it was implied that a pattern need not repeat exactly, in time or in space, and that repetitions with infrequent or rare defects are tolerated. Whether a pattern actually repeats depends critically on the criteria of what constitutes a repetition. Heraclitus (ca. 540-480 B.C.) said, "You could not step twice into [precisely] the same river; for other waters are ever flowing on to you" (p. 62 of Ref. 50). In this most severe restriction, nothing ever repeats itself. Yet, Santayana said, "Those who cannot remember the past are condemned to repeat it" (p. 588 of Ref. 50). Historical lessons are valuable because of the recognizable patterns of past events. History sometimes repeats itself accurately, but never precisely. Thus, for a pattern to be valuable and recognizable, the criteria for a match with a template are usually somewhat forgiving or, rather, fault-tolerant, but not always so (see below). The patterns considered so far are tacitly assumed to be analog in nature. Strictly speaking, two types of patterns can be considered: digital and analog patterns. A pattern of numerical digits and/or alphabetic letters, such as a telephone number, a street address, a computer password, etc., is a digital pattern; matching of digital templates with a digital pattern requires a strict agreement, digit by digit and letter by letter. On the other hand, a pattern that matches its templates on the basis of shapes — such as what happens in molecular recognition — is an analog pattern, because there is a continuous distribution of different varieties of tolerable defects that do not impair recognition. In other words, there are infinite ways for these templates to deviate from the ideal shape imposed by a given pattern without losing their recognizability. Molecular recognition, however, is not based on shapes alone. Molecular recognition is usually mediated by a finite number of "points of contact"
172
F. T. Hong
Fig. 1. Affinity metaphor of pattern recognition. The ligand binds to three different proteins with different affinities. Protein 1 and Protein 2 "shape"-fit the ligand equally well and both do better than Protein 3. Protein 1, however, has a higher affinity than Protein 2 because of the additional ionic bond. Thus, the three proteins differ in the strength of binding and/or the number of "contact points." Regarding the ligand as the pattern, these three proteins are candidate templates with different degrees of fit or match. In order to recruit all three candidate templates, one must use loose criteria to judge the match. Strict criteria narrows down the number of matches. Note that the role of proteins and that of ligands may be reversed. In the latter scenario, a protein (pattern) may fit several different ligands (templates) with varying affinities, as exemplified by multiple agonists and antagonists of the receptor of a given neurotransmitter. (Reproduced from Ref. 683 with permission; Copyright by McGraw-Hill)
that are distributed spatially according to the "shape" of the matching cavity, which is usually present on the surface of the bigger of the two encountering partners; these contact points are the recognition sites that bridge the two encountering molecules by means of non-covalent bond interactions (Fig. 1; see also Sec. 6.1 of Chapter 1). Furthermore, the binding process is not a simple dichotomy of "match" or "no-match," because these non-covalent forces decay with distances gradually rather than abruptly. Thus, there is a gray scale of binding strengths, i.e., a gray scale of goodness of match, so to speak. The fact that molecular recognition is by and large an analog process allows for recognition errors to be exploited in the action of toxins, enzyme inhibitors, and ion channel blockers, to name a few. A critical question arises: Is base-pair matching in DNA and RNA an analog process or a digital process? The matching of adenine (A) to thymine (T) as well as cytosine (C) to guanine (G), at the individual (molecular) level, is shape-based molecular recognition with the aid of several pairs of matching hydrogen bonds, and is therefore an analog process (Sec. 6 of Chapter 1). The ambiguity of recognition stems from the possible presence
Bicomputing Survey II
173
of molecules with a similar structure, e.g., synthetic analogs of nucleotides, thus rendering the process analog. In contrast, the base-pair matching process at the ensemble level, during the process of transcription or translation, must be regarded as digital pattern recognition, because ambiguity common to analog matching is drastically reduced to almost none by the limit of only four distinct varieties of naturally occurring nucleotide bases. In other words, the distribution of pattern/template varieties is discrete rather than continuous. The four distinct nucleotide bases are isomorphic to four distinct "digits" (or, rather, letters): A, C, G, T (for DNA) or A, C, G, U (for RNA). Similarly, human's visual recognition of individual alphabetic letters and Arabic numerals is an analog process but matching of street numbers and names is a digital process under normal circumstances. In other words, the numeral "0" that may appear in a street number, but usually not in a street name, is unlikely to be confused with the letter "O" in a street name. Evidently, here the analog or digital process refers to different hierarchical levels with drastically different contextual backgrounds, and one need not be confused. It is useful to generalize the above distinction of digital and analog patterns in the following way: digital pattern recognition refers to processes of which the criteria for matching are absolutely deterministic, whereas analog pattern recognition refers to processes of which the criteria are relatively deterministic. Here, we follow the definition of determinism, stipulated in Sec. 3 of Chapter 1. For example, using rigid rules or numerical scores as the criteria for recognition or discrimination is regarded as a practice of digital pattern recognition. In this regard, Heraclitus' remark invoked criteria for digital pattern recognition, whereas Santayana's remark referred to analog pattern recognition. In the subsequent discussion, pattern recognition means analog pattern recognition unless otherwise explicitly stated. Thus, pattern recognition encountered in the search-and-match phase usually refers to visual patterns or patterns perceived by other sensory modalities. Just like molecular recognition, matching of a problem pattern with its candidate solution-templates need not be a precise or perfect fit; a loose fit is often adequate, and even desirable because it bestows fault tolerance upon the recognition process (see later). Since analog pattern recognition is more forgiving than digital pattern recognition, the match is usually not unique; a pattern may be able to fit more than one template. Thus, a gray scale of fitness or goodness of match can be established. This is the reason why analog pattern recognition performed by a digital computer is at best awkward. Analog pattern recognition by means of digital comput-
174
F. T. Hong
ing in terms of explicit rules and a sequential algorithm often requires an inflated software overhead, because what constitutes a reasonably good fit is not easy to articulate in words or specify in terms of explicit rules. That analog pattern recognition in digital computing is possible after all owes much to the ever-increasing computing power — speed, memory capacity, etc. — of digital computers. The judgment of goodness of match in analog pattern recognition involves an overall evaluation of all relevant criteria and is therefore somewhat subjective. In other words, the judgment must be made as a whole, as advocated by Gestalt psychologists (Gestalt = form, shape, in German).709 As Boden75 indicated in regard to judgment of harmony in music, a stable interpretation is achieved by satisfying multiple constraints simultaneously and involves the recognition of overall harmony generated by all notes heard at the same time. Clearly, evaluation or appreciation of harmony requires parallel processing, as normally occurs in the evaluation or appreciation of a piece of fine (visual) art. A sound overall judgment can seldom be made in terms of a sequential algorithm of simple logical operations based on a set of rigid criteria, especially in terms of numerical scores with predetermined, fixed thresholds. In other words, it is highly problematic to judge analog patterns with rigid numerical criteria as if they were digital patterns. A thorough discussion of the questionable practice of exclusively rule-based reasoning in value judgments is deferred to Sec. 6.6. The somewhat subjective judgment with regard to goodness of match strongly influences the outcomes of problem solving. Loose matching of the pattern and templates allows for more templates (solutions) to be selected and considered as candidates of solutions. Tight matching of the pattern and templates often leaves useful templates (probable solutions) unrecognized. Imagination plays a crucial role in this step. An imaginative person has a tendency to recognize subtle matches that are not obvious to average people. A subtle fit of available templates to a given pattern often requires "stretching" and "distorting" a template until it snaps into the pattern, much like how molecular recognition takes place in accordance with the induced fit model (Sec. 6.1 of Chapter 1). Distorting or stretching a pattern (or template) often involves the process of abstraction. Irrelevant details in a picture are left out in the process so as to highlight the relational content of the remaining skeletons, e.g., block diagrams. On the other hand, concrete diagrammatic representations can be attached to abstract rule-based thought, as was often done in physics, e.g., phase diagrams (for nonlinear dynamic analysis), Feynman diagrams, etc. (see Sec. 4.10 for a detailed dis-
Bicomputing Survey II
175
cussion). In this way, picture-based reasoning can be applied to abstract, rule-based mathematical thought. The search-and-match phase serves to screen all potential solutions in the search space and reduces them to a short list of candidates. These candidate solutions are plausible but not infallible. The ultimate verdict must rest on the verification phase. The process of verification must be expressed in terms of a sequential presentation of logic statements and must be expressible in terms of verbalized or written words or symbols. Furthermore, the verification must be carried out in a rigorous fashion. No gray scales are allowed for the degree of validity; it is either valid or not valid. The verification process in scientific problem solving can be strictly objective, unless a technical mistake is made. Needless to say, digital computing is quite suitable for the verification process. 4.3. Random search versus heuristic search The discussion in Sec. 4.2 implies that the search-and-match phase of Simonton's model involves only parallel processing. However, it is not necessarily so: there are options of using either parallel or sequential processing in the search phase. If the search space of a problem is finite and reasonably small in size, each and every possible solution can be evaluated. This approach constitutes exhaustive searching, and can be performed systematically in a sequential manner. However, the search space is sometimes infinite or ill-defined, as in the case of solving a novel problem with a high degree of difficulty. Even if the search space is finite, it may be so huge that it is not practical to perform a systematic search in real time. In these cases, the problem solver can launch a blind (random) search. Truly random searches are probably neither effective nor necessary most of the time. For practical reasons, only a small subset of the search space is chosen. The choice of a subset of the search space, which is judged to be more probable to contain the solution than the rest of the search space, is the essence of heuristic searching.629'631 The choice is often governed by experience and/or an elusive ability called intuition. For example, medical students throughout the English-speaking world are taught something called Sutton's Law (p. 119 of Ref. 657). According to William Dock, M.D., when Willie Sutton — ostensibly the most notorious U.S. bank robber — was questioned why he robbed banks, Sutton answered, "Because that's where the money is." (Sutton subsequently denied having said that.) Accordingly, the act of targeting and robbing banks
176
F. T. Hong
is tantamount to heuristic searching for money. In making medical diagnosis, physicians used to be guided by their knowledge, experience and intuition. They often went for the most obvious conclusions (diagnosis) after a few critical tests (including observations and physical examinations), as suggested by Sutton's law, rather than conducted every possible routine laboratory test and ran through every possible branch of the search tree (systematic search). By doing so, they were able to quickly initiate appropriate treatments. This "old-fashioned" practice saved time, money and often even made a difference between life and death when the patient's critical condition called for promptness of treatments. But some physicians who advocate evidence-based medicine have begun to de-emphasize intuition and experience in medical practice and trumpet the merits of computer-based systematic searching.197 Health-care insurance companies further demand that the Treatment Pathways — essentially an algorithmized search tree — be followed strictly, otherwise the physician's fee and expenses would not be reimbursed for the lack of justification in terms of objective evidence (Sec. 6.13). Let us consider chess playing as an example. Chase and Simon113'114 studied the performance of master players (not grand masters), Class A experienced players, and Class B beginner players. When they were presented with a chess board pattern that was taken from the midst of an actual chess game, the master players fared better than the Class A players, who, in turn, fared better than the Class B players. In contrast, when they were presented with a random and meaningless chess board pattern, the master players lost their advantage and, in fact, tended to do worse than the Class A and the Class B players, statistically speaking. Why? A chess game is complicated by the necessity to guess the opponent's move; it is a complication that does not exist in the prediction of the trajectory of a flying arrow. The possibility of moves in a systematic search increases rapidly as the number of steps of the opponent's moves that must be taken into account increases. This situation is commonly known as combinatorial explosion. According to Newborn, about a thousand positions arise after exploring, or searching, one move by each side; a search of two moves by each side leads to about a million positions. The search tree of possible moves and positions thus grows at an exponential rate (Chapter 2 of Ref. 487). The search space is simply too vast to be systematically searched in real time, even by a supercomputer (Sec. 5.18). Apparently, the master players did not indulge in systematic searching for all possible moves. What the master players did was identify the particular pattern with a previously
Bicomputing Survey II
177
studied and learned pattern and then implement the known and recommended strategic moves. In contrast, non-expert players must rely more on intuition than the master players did. Why then did the non-expert players score better than the master players in the case of meaningless patterns? The master players, with an abundance of training and indoctrination, were probably more dogmatic than the non-experts (Sec. 4.4). In other words, the master players might have relied excessively on their learned repertoire of patterns and strategies, thus inadvertently restricting the search space to a region that contained mostly unfruitful search paths. 4.4. Dogmatism and self-imposed
constraint
Guilford was credited with distinguishing between divergent and convergent thinking.267'268 Divergent thinking does not guarantee creativity but is an important character trait of creative individuals.576'578 It is also well known that dogmatism is a hindrance to creativity. Simonton equated dogmatism with ideal inflexibility and Margolis referred to it as mind barriers.439 Simonton632'633 cited a study showing that creative thinking correlates with educational levels: statistically, creativity peaks at the baccalaureate degree but falls off at both ends (either no education or completion of advanced degrees), thus exhibiting an inverted U-shaped relation. In contrast, dogmatism shows the opposite trend, with a minimum at the medium level of education. Considering divergent thinking and dogmatism as two opposite character traits, we can interpret dogmatism as the possession of a selfimposed constraint and divergent thinking as the lack of such a constraint. Self-imposed constraints often account for missed opportunities in scientific discoveries. Metcalfe459'460'461 considered the effect of intuition in "insight" and "routine (non-insight)" problem solving. Self-imposed constraints affect "insight" problem solving more than "routine" problem solving. The following problem, mentioned by Holyoak,313 illustrates this point. This problem can be solved by three different approaches: a) insight, b) mixed insight and routine, and c) plainly routine. The problem is posed as a request to plant four trees in such a way that the distances between any two trees are equal. To most people, the answer for planting three (instead of four) trees is obvious: the apices of an equilateral triangle. A substantial number of people probably get stuck in this step. However, some manage to conclude that the fourth tree should be planted at a hilltop: the apex of a pyramid. When shown the correct answer,
178
F. T. Hong
people who failed to get the right answer usually protested that they did not know that they were allowed to go to the third dimension. The failure in this case was apparently caused by the unnecessary self-imposed constraint: the original problem poses no restriction to a two-dimensional space. Actually, a simple but helpful rule can be readily deduced from elementary geometry: within a two-dimensional space, although it is always possible to find a fourth point that is of equi-distance to the first and second points, this fourth point can never be of (the same) equal distance to the third point. In other words, such a fourth point, if exists, must belong to another dimension. Strictly speaking, this approach requires insight but also utilizes an easily deduced rule in elementary plane geometry — it is impossible to plant the fourth tree in the same plane — to eliminate unproductive searches in the two-dimensional space (heuristic searching). However, it is the self-imposed constraint that makes the difference. Another approach utilizes intuition to solve the problem in a single step. For example, an organic chemist who is familiar with bonding angles of a carbon atom will quickly identify the site of tree-planting to be the four corners of a tetrahedron by means of pattern recognition. It is a feat which we may wish to attribute to the individual's knowledge base and the ability to make the pattern match. There is a third and routine way to solve the above problem. Someone who masters the subject of solid (three-dimensional) geometry may approach this problem by a systematic "cookbook-recipe" approach: the trajectory of all points with a fixed distance to a particular fixed point is the surface of a sphere centered at that particular point with a radius of the said fixed distance. Starting with three trees already planted at the three apices of an equilateral triangle, the fourth point will be the intersection(s) of three trajectories. Each trajectory is the surface of a sphere with the center located at one of the three apices, and with the radius equal to one side of the equilateral triangle. The three spheres intersect at two separate points and, therefore, two solutions can be found: one above the plane at the peak of the pyramid, as before, and another below the plane at the "peak" of an inverted pyramid! The above example demonstrates that sound training can compensate for the lack of divergent thinking. The above example also shows that the ability to recognize unfruitful search paths is a prerequisite of performing heuristic searching: the fourth tree cannot be planted in the original plane and a search in the third dimension is necessary. On the other hand, the habitual practice of prematurely excluding presumed and misconstrued un-
179
Bicomputing Survey II
fruitful search paths constitutes dogmatism rather than heuristic searching. The difference between the two situations can be quite subtle. The need to limit the search space for the sake of heuristic searching and the desire to avoid excessive restrictions of the search space thus impose conflicting requirements on problem-solving strategies. How one manages to strike a balance is probably more art than science. That is, it is probably difficult to delineate unambiguous rules, which once discovered, can be learned for the purpose of reaching such an optimum. Of course, a restriction can be initially imposed but lifted when the problem persistently refuses to yield to the initially selected approach. However, knowing when to persist and when to yield remains more art than science. By generalizing the above example, it can be inferred that the willingness to explore and to run the risk of making mistakes and the ability to make subtle, sensible deviations from the norm of thinking, sanctioned by peers, is an important factor for expanding the search space and, therefore, for creativity. This character trait is often referred to as imaginativeness; it is divergent thinking. A dogmatic person seldom deviates from the norm. Dogmatism also affects performances in the match phase of creative problem solving. By definition, a dogmatic person tends to interpret a rule literally in letter — as a dogma — rather than figuratively in spirit. Dogmatic persons often exhibit an extraordinary reluctance to stretch or distort a candidate template so as to match it to a pattern. For example, a physician insisted upon applying the concept of positive feedback only in a physiological setting, but disapproved the application of the same concept in sociology and its re-interpretation as the concept of vicious cycle (personal observation). He thus failed to see the same pattern in sociology as in physiology. Unbeknownst to this physician, the concept was actually borrowed from the field of engineering. As a matter of fact, this mechanism has been successfully applied to economics.19'20 In conclusion, dogmatism hurts creativity on two counts: excessive diminution of the search space and excessively strict criteria in judging a match. 4.5. Retention phase: the need of sequential
verification
In the retention phase, a candidate solution that has been selected must be preserved and reproduced by some mechanism, otherwise it cannot represent a permanent contribution and addition to the knowledge base. The configuration of a matched pair of solution and problem must be stabilized and evaluated for validity of the match. Only if a candidate solution
180
F. T. Hong
passes the test of vigorous logical verification can it be preserved, otherwise it should be rejected and discarded. Thus, this phase roughly corresponds to Wallas' verification phase. Verification is necessary after a successful acquisition of candidate solutions in order to detect errors inadvertently committed during the hasty search-and-match phase. There are at least two sources of errors. In order to search efficiently for candidate solutions within the constraint of allotted time, a problem solver must examine as many candidate solutions as possible. Those who can perform searching at a high speed have a distinct advantage since a higher search speed is translated into a larger search space. Speed is a requirement for another reason. Often a potential solution arises as a vague intuition or "gut feeling." Such a vague feeling or hunch usually goes as easily as it comes. Part of the reason behind this phenomenon lies in the utilization of working memory in processing and evaluating various solution options. Working memory is used here as a "scratch pad" but it is volatile249 (see Sec. 4.19 for a detailed discussion on working memory). The processing speed is necessarily high, lest a fleeting idea fades away before it gets a chance to be evaluated for goodness of match. Rapid evaluation thus renders the search-and-match phase error-prone. The second source of errors is rooted in analog pattern recognition. The match between templates and a pattern is often imprecise, and is, therefore, also error-prone. For these two reasons, subsequent logical verification is necessary to consolidate a preliminary solution or eliminate a false lead. One may fail to solve a problem either by passing up a correct solution or by coming up with a wrong solution. It is often indicated that highly creative people make more discoveries than average people but they also make more mistakes. Why? Highly creative people are more willing to explore and searching at a higher speed than average people, thus covering a larger search space. Highly creative people are also more willing to stretch and distort a match beyond recognition and to accommodate subtle but loose matches than average people, thus accommodating more candidate solutions. Both tendencies expose highly creative people to the hazard of making more mistakes than average people. Thus, highly creative people are more willing to take risks than average people (Sec. 4.21). The saving grace must then be the subsequent rigorous verification. Here, the subsequent verification of a scientific theory mean experimental tests. In this regard, highly creative people are more sensitive to logical inconsistencies and tend to exercise a stricter and more careful verification process than average people. Nevertheless, there is no guarantee
Bicomputing Survey II
181
that highly creative people are always correct because some inconsistencies are quite subtle and difficult to detect (cf. Sec. 4.7). 4.6. Picture-based reasoning versus rule-based reasoning in pattern recognition Prom the above discussion, the search-and-match phase involves primarily analog pattern recognition, and is dominated by judgment of perceived patterns as a whole. An analog pattern can be either visual, auditory or of other sensory modalities. Therefore, we shall refer to this mode of reasoning as pattern-based reasoning, or simply picture-based reasoning since the
perception involves visual patterns most of the time, especially in scientific activities.464'465-252'253-504 We prefer the term "picture-based reasoning" to the term "pattern-based reasoning" for the following reasons. Unlike the latter which may pertain either to digital or analog patterns, the former does not have any such inherent semantic ambiguity; it always pertains to analog patterns. Furthermore, picture-based reasoning is synonymous with visual thinking, another commonly used term in the creativity literature. Compared to visual thinking, however, picture-based or pattern-based reasoning is more readily linked to the AI concept of parallel processing. The advantage of viewing picture-based reasoning as a parallel process will become apparent later (Sec. 4.8). However, picture-based reasoning, pattern-based reasoning and visual thinking will be used interchangeably whenever it is appropriate to do so. Picture-based reasoning is not the only "legitimate" mode of reasoning during the search-and-match phase. In fact, analog pattern recognition can also be performed in an abstract sense without evoking a concrete sensory pattern representation, i.e., in terms of a verbal or symbolic representation of the pattern in question. It is one of the search paradigms used in AI knowledge representation (e.g., see Chapter 3 of Ref. 303). A human problem solver or a computer simply searches for an appropriate rule (logic), enunciated in words or symbols, from the known repertoire or database, and matches it to a given problem. This mode of reasoning will be referred to as rule-based reasoning. Thus, rule-based reasoning is synonymous with verbal thinking. Again, the advantage of the usage of rule-based reasoning instead of verbal thinking will be made clear later. Rule-based reasoning is perfectly appropriate in search of digital patterns, such as practiced by Babylonians in the construction of numerical models. It is also appropriate for information processing that requires strict
182
F. T. Hong
adherence to a predetermined and previously approved procedure, such as applications in accounting or banking. However, the practice of exclusively rule-based reasoning in science as well as in music and fine art is plagued with serious drawbacks (see Sees. 4.7 and 4.20, respectively). Needless to say, rule-based reasoning involves predominantly sequential processing, and is readily implemented by means of digital computing (see Sec. 7.1 for technological applications). Visual thinking has long been recognized as a highly valuable process in creative problem solving376'17'713 (see also p. 191 of Ref. 52). Increased verbal thinking is known to be associated with a decrease in creativity (p. 301 of Ref. 52). Accomplished physicists, such as Albert Einstein (pp. 142-143 of Ref. 272; pp. 32-33 of Ref. 232), Richard Feynman (p. 131 of Ref. 241), Roger Penrose (p. 424 of Ref. 512), and Stephen Hawking (p. 35 of Ref. 286), have given introspective testimonials to visual thinking as their predominant mode of thinking in making scientific discoveries. Visual thinking featured especially prominently in Nikola Tesla's thought processes (which he called mental operations). As documented in his autobiography My inventions,662 his mental operations were so intense that what he saw tormented him to a pathological extent. By analyzing Michael Faraday's detailed diaries, Gooding252'253 found visualization to be a key process in Faraday's discovery in electricity and magnetism and his invention of the first electromagnetic motor, as well as a vehicle to communicate his novel findings to his contemporaries (see also Ref. 340). His visual conceptualization by means of imaginary lines of force remains a powerful way to comprehend the phenomena of electricity and magnetism, as well as other conservative forces, i.e., those forces that obey an inverse square law such as gravitation. Miller464 investigated the role of visual imagery in the origin of 20th century physical concepts, by analyzing the work of Ludwig Boltzmann, Henri Poincare, Albert Einstein, Niels Bohr, and Werner Heisenberg (see below for a definition of visual imagery). He further extended the analysis to the insights of geniuses in both science and art and their attempts to seek a visual representation of worlds both visible and invisible.465 In particular, for modern physics of the invisible world, mathematical abstraction guides the formulation of visual (diagrammatic) representations. Many more examples of visual thinking have been documented by Koestler376 and by West.713 What tormented Tesla was neither actual visual perception nor hallucination but something known as visual imagery (mental imagery).613'614 Visual perception is what one actually sees, whereas visual imagery is what
Bicomputing Survey II
183
one visualizes by activating part of stored visual memory or by means of sheer imagination of nonexisting objects (known as mind's eyes).386'713 The act of pattern-based reasoning in scientific activities involves largely visual imagery, e.g., the visualization of concrete scientific phenomena or abstract scientific concepts and the manipulation of complex three-dimensional information. Poincare had a useful faculty of visualizing what he heard, thus alleviating the inconvenience caused by his poor eyesight (p. 532 of Ref. 55). This faculty greatly facilitates the practice of picture-based reasoning. In the above discussion, visual pattern recognition is emphasized. However, the idea can be extended to include auditory patterns such as musical melodies and harmony. Tonal perception pertains to what one actually hears, whereas tonal "imagery" pertains to the tonal representation that one elicits from stored tonal patterns in long-term memory. Generalizations can also be made to include olfactory and gustatory patterns, as in perfume detection/discrimination and wine tasting, respectively. Bastick noted that intuitive types with the serial verbal-ability mode — i.e., rule-based reasoning — have pronounced body-feeling models of their information to compensate for their lack of parallel image representation (p. 193 of Ref. 52). Thus, picture-based reasoning can be generalized to include analog patterns perceived or imagined in all these sensory modalities. Recognition of these patterns requires parallel processing: an analog pattern must be perceived as a whole. In summary, there are two options of reasoning during the search-andmatch phase: rule-based or picture-based reasoning. An average person presumably switches between these two modes of reasoning as the specific demands for either mode of reasoning arises. For example, while using picture-based reasoning in the search-and-match phase most of the time, a biologist can switch to rule-based reasoning when the need to apply the knowledge learned in a "foreign" field, in which personal expertise is deficient, arises. We shall now consider the advantages and disadvantages of either mode of reasoning. 4.7. Advantages and disadvantages of rule-based reasoning The main advantage of using rule-based reasoning in the search-and-match phase is speed and objectivity. Rules, in general, and mathematical theorems, in particular, are highly compressed and condensed information. Thus, one can quickly apply the rule or theorem to match the problem without spending time on re-discovering the rule or re-deriving the theo-
184
F. T. Hong
rem, and on recreating the "picture" from which the rule was originally derived. Since strict rules must be adhered to, the outcome of rule-based reasoning is independent of the individuals that perform it, i.e., it is highly objective. I suspect that striving for objectivity might have been instrumental in the development of logic and laws in the Western culture. In addition to speed, the conciseness of rules or concepts in representing knowledge offers yet another advantage. As will be discussed in Sec. 4.8, loading a rule instead of the detailed knowledge that it represents allows more information to be loaded concurrently in working memory. However, this advantage is dubious, in view of the comprehensiveness of pictures in representing knowledge. Perhaps an optimal balance by means of a judicious combination of loading both pictures and rules in working memory may resolve this dilemma (cf. Sec. 4.19). Being synonymous with verbal reasoning, rule-based reasoning stands ready to be communicated to others. In contrast, it is difficult to articulate the rationale of picture-based reasoning in unambiguous terms. Therefore, it is difficult to convince others by means of picture-based reasoning alone. The lack of a "verbal backup" for an argument renders picture-based reasoning highly subjective, thus stigmatizing its practice. As discussed in Sec. 4.5, picture-based reasoning is error-prone because of analog pattern recognition. Rule-based reasoning is more accurate and reliable than picture-based reasoning if the rule is correct and if the rule is followed properly. For example, a street address, if it is correct, is more reliable in locating a destination than a mere hunch or impression about landmarks of the neighborhood surrounding the destination. However, one can get forever lost if the street address is incorrect, whereas an imprecise hunch may still lead to the correct destination. This difference between analog and digital information processing is reflected in a remark aptly made by someone: to err is human, but to completely foul things up requires a digital computer. However, as we shall see, exclusively rule-based reasoning is no less error-prone than exclusively picture-based reasoning. Thus, it takes a combination of rule-based and picture-based reasoning to reduce errors. The identification of a proper rule to be used in problem solving relies heavily on the matching of keywords or key phrases, which are often name tags of rules or terse descriptions of their key features. A practitioner of exclusively rule-based reasoning thus runs the risk of becoming a "prisoner of words": one can recognize a rule only when the name or the (written or verbal) description matches the features being sought. The criteria of
Bicomputing Survey II
185
matching may be excessively strict. Francis Bacon apparently recognized this disadvantage. He wrote, in Novum Organum (Book 1, Sec. 43, Idols of the Market Place, p. 49 of Ref. 14): For it is by discourse that men associate, and words are imposed according to the apprehension of the vulgar. And therefore the ill and unfit choice of words wonderfully obstructs the understanding. Nor do the definitions or explanations wherewith in some things learned men are wont to guard and defend themselves, by any means set the matter right. But words plainly force and overrule the understanding, and throw all into confusion, and lead men away into numberless empty controversies and idle fancies, (emphasis added) In contrast, loose matches are made considerably more readily in pictures than in words. Picture-based reasoning allows for pattern recognition on the basis of analogies — analogical reasoning311'306'478'228 — because an analogy is often based on perceived visual patterns or visual images. Thus, practitioners of exclusively rule-based reasoning are less likely to recognize an appropriate solution than practitioners of combined rule- and picture-based reasoning, when they stumble on one. On the other hand, they sometimes make false matches due to misleading keywords. Although such a false match can, in principle, be detected during the verification phase, the task is not always feasible for practitioners of exclusively rulebased reasoning to perform, for reasons to be presented next. Since the verification phase requires the use of strict rules and rigorous logical arguments, practitioners of exclusively rule-based reasoning are expected to excel in verification. In fact, Poincare specifically mentioned several mathematicians who excelled as logicians rather than intuitionalists (Chapter 1, Part I in The Value of Science.521 However, in reality, practitioners of exclusively rule-based reasoning are often poor logicians for the following reason (personal observation; see also Ref. 324). Being compressed information, a rule often contains scanty descriptions or reminders of its domain (range) of validity or applicability. By retaining the pictures that are associated with the generation of a rule (or the derivation of a mathematical theorem), practitioners of combined rule- and picture-based reasoning are more likely to be reminded of the conditions under which the particular rule was created or derived. In contrast, practitioners of exclusively rule-based reasoning are more likely to misuse an irrelevant rule, or abuse a relevant rule beyond its domain of validity, during the verification phase. This shortcoming was hinted at by Poincare in answering a question
186
F. T. Hong
he raised: "[H]ow is error possible in mathematics?" (p. 383 of Ref. 521). Here, it is important to point out that the above discussion does not imply that mathematical mistakes are made only by practitioners of exclusively rule-based reasoning. Mathematical mistakes are sometimes so subtle that even highly creative mathematicians occasionally applied an established theorem beyond the domain of its validity (e.g., see Chapter 7 of Ref. 634). Exclusively rule-based reasoning has often been exploited in devising jokes, as exemplified by one originally quoted in Freud's essay on comic and cited by Koestler (p. 33 of Ref. 376): Chamfort tells a story of a Marquis at the court of Louis XIV who, on entering his wife's boudoir and finding her in the arms of a Bishop, walked calmly to the window and went through the motions of blessing the people in the street. 'What are you doing?' cried the anguished wife. 'Monseigneur is performing my functions,' replied the Marquis, 'so I am performing his.' Thus, in general, a globally absurd story can be constructed by a mere juxtaposition of a number of locally logical story lines, thus "debunking the expectation" of sensible logic and eliciting a laughter. The element of incongruity in a joke often stems from misuses or abuses of rules. Curiously, a gorilla named Koko knew how to exploit this kind of incongruity so as to create a joke, conveyed to human caretakers through a sign language she had learned (pp. 142-145 of Ref. 507; see also Sec. 4.18). Exclusively rule-based reasoning has increasingly been used in public political debates (personal observation). It is not clear whether some politicians recognized an increasing prevalence of the practice of exclusively rule-based reasoning, caused by a failing educational system, and cleverly attempted to exploit the situation, or these politicians themselves were victims of a failing educational system. Practitioners of exclusively rule-based reasoning have an additional handicap. They may impose excessive restriction on the search space when a heuristic search is called for. The use of strict criteria imposed by words may inadvertently "prune off" useful search paths in the search tree. Bastick stated that the visual-ability mode (picture-based reasoning) is superior to the verbal-ability mode (rule-based reasoning) in intuitive processing because it is oriented more to the whole problem than to particular parts (p. 192 of Ref. 52). The visual mode of internal representation of global knowledge allows for simultaneous processing of the whole physiognomy, free from
Bicomputing Survey II
187
the physical restrictions that reality places on the objects they represent. How physical restrictions affect picture-based and rule-based reasoning differently is illustrated by the following brain-teaser cited in Koestler's book (pp. 183-184 of Ref. 376): One morning, exactly at sunrise, a Buddhist monk began to climb a tall mountain. The narrow path, no more than a foot or two wide, spiralled around the mountain to a glittering temple at the summit. The monk ascended the path at varying rates of speed, stopping many times along the way to rest and to eat the dried fruit he carried with him. He reached the temple shortly before sunset. After several days of fasting and meditation he began his journey back along the same path, starting at sunrise and again walking at variable speeds with many pauses along the way. His average speed descending was, of course, greater than his average climbing speed. Prove that there is a spot along the path that the monk will occupy on both trips at precisely the same time of day. Lacking minimal specifications required by a mathematical approach, the above-mentioned problem obviously calls for picture-based reasoning rather than a rule-based, mathematical calculation. Koestler mentioned, in his book, that a young woman without any scientific training had proposed the following approach, thus arriving at a simple and obvious proof: superimposing two mental images, with one monk ascending and his "transparent" duplicate (or, rather, clone) descending, starting at the same time on the same day but at different locations — the foot and the top of the mountain, respectively. However, this approach bothered a practitioner of exclusively rule-based reasoning and he remained unconvinced of the young woman's argument, because it was physically impossible to have the same monk performing two separate acts, starting at two distinct physical locations, on the same day and at the same time. Finally, modern "high-tech" gadgetry came to my rescue. I suggested the following: recording the two separate events, ascending and descending, on two separate video tapes, and then replaying the two tapes simultaneously on two separate monitors which he could view concurrently, with due care to synchronize the starts of the ascend and the descend. He was eventually convinced, presumably by the virtual reality so engendered. Certain types of problems can only be solved by means of picture-based reasoning. An excellent example was depicted in a World War II movie
188
F. T. Hong
called The Dambusters: afictionalizedchronicle of the military preparation leading up to the raid on the Ruhr dams on the night of May 16-17, 1943 (e.g., see p. 245 of Ref. 259). The bombing mission was carried out by an elite bomber force, 617 Squadron of the (British) Royal Air Force, with Lancaster bombers specially adapted to carry bombs designed by Barnes Wallis. Wallis determined that a well-placed 6,000-lb bomb could breach the massive Mohne and Eder dams. He devised a "bouncing bomb" that would skip across water like a pebble, hit the side of a dam, and detonate after sinking. To do so, the bomb must be dropped from an altitude of precisely 60 ft (18.3 m) and at a certain fixed distance from the dam. The problem was that no available altimeter was accurate enough for that purpose at that time. The fictionalized movie showed how the problem was solved while members of the squadron were relaxing at a London theater shortly prior to the planned mission. While watching a scene with the actors being spotlighted with two crossing beams, one of the squadron members was mentally switching his viewingfield,back and forth, from the actual scene to a mental image of a Lancaster bomber equipped with a pair of searchlights at the two wing tips. He thus discovered a novel solution serendipitously: equipping the Lancasters with a pair of searchlights angled to intersect and meet the surface of the water at the same spot when the aircraft was at the right altitude. It is not certain whether the solution was actually found that way, but the peculiar way of dropping the bombs was real. One thing could be certain though: it was almost impossible to dream up this solution by means of exclusively rule-based reasoning, much lessfindingit by consulting military rule books. Even when solutions are readily obtainable in either way, rule-based reasoning may lead to an erroneous solution, whereas the same error can be avoided by means of picture-based reasoning. Furthermore, if the same rule is used to verify the solution, the error may not be detected during the verification phase. The following problem illustrates this point (provided by Arif Selguk Ogrenci and Ping-Wen Hou from an Internet source): If you catch up with and move past the number 2 racer in a Marathon race, what is your new rank? By virtue of rule-based reasoning, you must be number 1, since you now rank higher than the number 2 racer. However, in a mental image that depicts two front runners ahead of you initially, you see clearly that you, as the number 3 racer, overtake the number 2, but not the number 1, racer; your new rank is number 2. In this example, the dynamic nature of a Marathon race may be neglected in rule-based reasoning but is automatically detected in picture-based reasoning. Again, rule-based
Bicomputing Survey II
189
reasoning may lead to a locally logical but globally absurd conclusion (or, rather, statically logical but dynamically absurd conclusion). Interestingly, a different rule in the above example may be formulated by means of picture-based reasoning: you become the new number 2 racer since you replace the old number 2 racer in the Marathon race. If, instead, this new rule is invoked in rule-based reasoning, the previous error can be avoided. Thus, conclusions reached by means of rule-based reasoning may be path-dependent. Furthermore, if the new rule is used to verify a solution, the error previously incurred can be rendered apparent. The pathdependence of rule-based reasoning has long been recognized in accounting and is reflected in a time-honored practice: an accounting error is unlikely to be detected by following exactly the same procedure — a collection of rules — all over again, and, therefore, a different procedure must be called for in auditing. Now, let us get back to the problem of the Marathon race, but rephrase it a bit differently: If you overtake the last person, what is your new rank? If you were to rehash the previously successful rule, you would reach a conclusion — If you replace the last person, you become the last — which would be counter-intuitive and absurd. You cannot become the last if someone whom you have overtaken is now behind you. Apparently, you encountered the "edge" effect of the rule (see Sees. 5.19 and 6.13 for more extensive discussion of the "edge" effect of logical reasoning). So you might want to go back to the first rule that had once failed, thus reaching a different but less counter-intuitive conclusion: you are second to last since now you are ahead of the last. Wrong again. Picture-based reasoning shows clearly that if you are to overtake a person, you must do so, in general, from a position behind this person, but, in this particular case, it was you that was the last rather than the person whom you tried to overtake. You obviously could not overtake yourself, and, therefore, the problem itself is absurd. The internal inconsistency of the problem is not immediately obvious in verbal reasoning but is patently obvious in picture-based reasoning. The pros and cons of rule-based versus picture-based reasoning were vividly described in Anatoli Boukreev and G. Weston DeWalt's book The Climb, which chronicled the May 10, 1996, tragic accident in the "Death Zone" of Mt. Everest (pp. 134-135 of Ref. 82): The differences between [Rob] Hall's and [Scott] Fischer's philosophies of guiding [high-altitude mountaineering tours] were emblematic of an ongoing debate between practitioners in the adventure
190
F. T. Hong
travel industry. The camps of belief can be roughly divided between the "situationalists" and the "legalists." The situationalists argue that in leading a risky adventure no system of rules can adequately cover every situation that might arise, and they argue that rules on some occasions should be subordinated to unique demands that present themselves. The legalist, believing that rules can substantially reduce the possibility of bad decisions being made, ask that personal freedom take a backseat. Critics of the legalist philosophy argue that an omniscient, rulebased position that minimizes independent action is being promulgated largely out of fear of bad publicity or lawsuits that might result from a lack of demonstrable "responsibility." These critics find it confoundingly odd that an industry that promotes the values of personal freedom and initiative would expound a philosophy that minimizes the pursuit of these very values. Here, we recognize the value of rules in an environment characterized by low oxygen tension and in a mental state that is rendered sluggish by hypoxia. Ironically, a bad decision was apparently made by both team leaders, Hall and Fischer, to "bend" the cardinal rule of the mandatory turn-around time of descent: both let the turn-around time slip by several hours, thus failing to escape the wrath of an unexpected blizzard. Rumor had it that Hall's expedition team Adventure Consultants suffered from greater casualties than Fischer's team Mountain Madness, because of Hall's advocacy of a rule-based system for his paid clients. However, the complexity of many other factors involved in the circumstance prohibits such a simplistic interpretation. In summary, practitioners of exclusively rule-based reasoning suffer in all three phases of the creative process stipulated in Simonton's model: a) the search space is too restricted, b) the criteria for matching candidate solutions to a given problem is too strict, and c) the use of logic in the verification phase lacks rigor. Thus, lost opportunities may be caused by an excessively small search space, by the inability to recognize a subtle match, or by failure to detect an erroneous solution. Exclusive use of rule-based reasoning in the search-and-match phase may be tightly associated with dogmatism or the lack of divergent thinking; this is a speculative assertion that can be experimentally tested. On the other hand, the lack of objectivity in picture-based reasoning is an obvious drawback. The handicap can be alleviated or eliminated by careful verification. It is the verification phase
Bicomputing Survey II
191
that bestows objectivity upon the overall creative process. 4.8. Contemporary interpretation of Freud's concept of the unconscious and Poincare's introspective account Hard work is often recommended in problem solving. However, a dogged and relentless attack may not be the best way to solve a difficult problem. Often, problem solvers take advantage of the so-called incubation period by temporarily setting aside the problem before resuming the attack. Apparently, a concentrated effort in problem solving may lead to an excessively restricted search space. The incubation period provides the opportunity for launching a new direction and expanding the search space, thus preventing one from becoming too deeply entrenched in an approach that leads to "cul-de-sacs" (a cognitive equivalent of being trapped in a local minimum in a free energy landscape or being stranded on a local fitness peak). Through introspection, Poincare vividly described his personal experience with the incubation period (pp. 389-390 of Ref. 521). He referred to the period of incubation prior to illumination as "long, unconscious prior work." He also referred to the unconscious mind as the "subliminal self." He recognized that the subliminal self was capable of [pattern] recognition or, in his own word, "discernment." Poincare was obviously under the influence of the Freudian School,215 thus accepting the dichotomy between "the conscious" and "the unconscious." He suspected that "[the subliminal self] knows better how to divine than the conscious self, since it succeeds where that has failed." He raised the question: "[I]s not the subliminal self superior to the conscious self?" It was this kind of description of an introspective nature that gave investigators the impression of vagueness and mysticism, as often indicated in the creativity literature (see, for example, Refs. 75 and 706). He was even ridiculed in the popular science literature; E. T. Bell was critical of him: "After Poincare's brilliant lapse into psychology skeptics may well despair of ever disbelieving anything" (p. 552 of Ref. 55). The role played by the unconscious in creative thinking was first explored by Sigmund Freud as an extension of his psychoanalytic theory (see p. 38 and Chapter 4 of Ref. 539). Sexual and aggressive needs are sublimed (transformed) and expressed indirectly through a mechanism which Freud referred to as primary-process thinking, in contrast to the more rational secondary-process thinking. Primary-process thinking takes place in dreaming or reverie of a normal person as well as in thinking of a psychotic
192
F. T. Hong
patient during waking hours. It tends to associate with concrete images rather than with abstract thoughts. Logic is not a primary ingredient of primary-process thinking. Therefore, primary-process thinking is regarded as irrational or non-rational. In contrast, secondary-process thinking tends to logical and in touch with reality. Neoanalytic theorists, such as Ernst Kris, 391 subsequently toned down the emphasis on sex and aggression, and put regression, from secondary to primary processes, in the service of the ego (see also pp. 183-184 of Ref. 52). Inspiration is a regress to the primaryprocess state of consciousness, which allows an individual to free-associate ideas, sometimes non-logically. However, Weisberg706 dismissed, in his book Creativity: Beyond the Myth of Genius, both primary-process thinking and the role of the unconscious in creativity. Instead, Weisberg argued that creative acts are the consequence of ordinary thought processes, augmented by particular abilities, training and acquisition of domain-specific skills and expertise, and high levels of motivation and commitment (Chapter 8 of Ref. 706). Hayes289 essentially shared the same view (see also Sec. 4.21). Hayes also denied the importance of incubation in the creative process. He found little empirical evidence to support Wallas' claim of the existence of an incubation period because he found many instances in which creative acts proceeded from beginning to end without any pause that would allow for incubation (p. 142 of Ref. 289). Hayes apparently overlooked the possibility of overnight incubation. In addition, just because incubation is not evident in each and every important discovery is no good reason to dismiss incubation as a powerful way of searching for good solutions. It is astonishing that Hayes dismissed incubation so casually; examples abound in daily life, demonstrating that even non-geniuses occasionally benefit from incubation. The view of Weisberg and Hayes was superficially supported by the astonishing progress made in machine intelligence. Newell et al. concluded that "creative activity appears simply to be a special class of problemsolving activity characterized by novelty, unconventionality, persistence, and difficulty in problem formulation" (p. 66 of Ref. 488). A designer of computer-based creative problem solvers naturally believes that computer simulations in terms of logical reasoning (secondary-process cognition) can capture the essence of human creativity. However, that does not mean that the simulations closely follow the creative process of human beings since a digital computer may compensate its shortcomings with its enormous memory capacity and high speed and use a radically different approach to accomplish a similar result (see Sec. 4.26 for a detailed discussion).
Bicomputing Survey II
193
Bailin36 was also opposed to the "elitist" view of creativity and had set out to demystify the creative process. She refuted the previously perceived discontinuity between high creativity and ordinary thought processes, and concluded that high creativity is just "excellent thinking." However, Bailin did not elaborate exactly what constitutes excellent thinking. Some investigators did not share the view of Weisberg and Hayes. For example, Martindale442 thought that Kris' theory was misunderstood and misinterpreted. But that did not necessarily make Martindale an advocate of the non-elitist view. As we shall see, the thought process involved in creativity is indeed "ordinary" in the sense that the process is shared by geniuses and ordinary folks. However, such an ordinary process requires primary-process thinking — the very ingredient that Weisberg elected to discard — which pertains to several loosely connected terms: insight, intuition, and inspiration. Comments on the assertion of Newell et al. will be deferred to Sec. 4.26. Regarding Weisberg's and Hayes' emphasis on motivation and domain-specific knowledge, comments will be given in Sees. 4.21 and 4.22, respectively. The fall from grace of the Freudian theory has led to rejection and isolation of neoanalytic theorists from the mainstream cognitive science (see a personal and highly inflammatory recount by Crews;149 a contrasting view has been presented by Pribram and Gill;539 see also other articles collected in Ref. 68). On the other hand, some mainstream cognitive scientists have begun to revive Freud's concept of the unconscious.157 One of the most notable features of primary-process thinking, as exhibited by some psychotic patients, is erratic associations of remotely related ideas. This symptom is known as flight of ideas, in the psychiatry literature. However, the ability of associating remotely related ideas is a valuable trait for creativity. A creative process often involves combinations of ideas, as both Einstein and Poincare testified (see quotations later in this section and in Sec. 4.19, respectively). Mednick suggested that "[t]he greater the number of associations that an individual has to the requisite elements of a problem, the greater the probability of his reaching a creative solution" (p. 224 of Ref. 452). Mednick further theorized that the difference between individuals of high creativity and those of low creativity parallels the difference in their associative hierarchies. An individual of low creativity tends to have strong stereo-typed responses to a given stimulus (associative hierarchy with a steep slope), whereas an individual of high creativity tends to have a broad range of responses (associative hierarchy with a flat slope). Extending Mednick's theory, Mendelsohn further theorized that "the greater the
194
F. T. Hong
internal attention capacity, the more likely is the combinatorial leap which is generally described as the hallmark of creativity" (p. 366 of Ref. 453). Martindale441'442'443 pointed out that the theories of Kris,391 Mednick,452 and Mendelsohn453 are really identical theories expressed in different vocabularies: defocused attention is the common element. To appreciate this interpretation, a re-examination of Freud's theory of the unconscious in light of the new insight provided by cognitive science research is in order. First, let us examine the role of selective attention: the brain's executive control that allocates limited processing resources to certain selected information at the expense of others.172'672'359 In this regard, Baars' Global Workspace (GW) Theory29 is highly relevant. Recall that parallel processing is involved both in the search for possible templates and in the judgment of goodness of match. There is little doubt that the human sensory systems are capable of parallel processing in view of the anatomically and functionally parallel nature of sensory inputs to the brain. For example, sensory signals from different parts of the visual field are transmitted, in parallel, via topographically separate pathways. In addition, separate visual pathways exist for transmission of information from the retina with regard to luminance and colors.156 Furthermore, separate pathways projecting to the cortex handle the assessment of spatial relationships and motion of objects, on the one hand, and identification of colors, patterns, or objects, on the other hand.454 However, all these parallel sensory inputs compete for processing and may not receive equal and uniform processing (see below). A particular modality of sensory inputs may receive preferential attention over other modalities. When one is concentrating on reading, for example, one is paying diminished attention to extraneous auditory stimuli from the surroundings. Even within a given sensory modality, various fractions of the receptive field may not receive equal attention (cf. attention window; p. 70 of Ref. 389). The central part of the visual field (central vision) usually receives the highest degree of visual attention. However, it is possible to shift attention deliberately to the peripheral part of the visual field (instead of turning the eyes, head or body), as demonstrated by Wurtz et al.734 There exists an attention shifting subsystem that allows attention to be shifted towards different regions (see p. 74 of Ref. 389). The frontal eye fields, posterior parietal lobe, pulvinar, and superior colliculus play critical roles in this process.534'533 Attention shifting involves two processes: a) actual turning of the body, head, eyes, and/or attention window to focus on a specific spatial region, and b) "priming" the representation of expected features (p. 235 of Ref. 389). In other words, shifting attention also makes
Bicomputing Survey II
195
readily available certain types of expected templates for pattern recognition — a top-down process — for the purpose of heuristic searching. This may be the reason why it is difficult to recognize one's own native language while traveling in a foreign country, especially if one happens to know that foreign language. In this regard, wrong templates are made ready due to anticipation of exposures to the foreign language. The working visual field is limited in size. Without moving the head or forcing the eyes into the corners, the visual field is a flat oval "window" of about 45° in height and 120° in width (see, for example, p. 73 of Ref. 30). About the same size of working field exists for the mind's eye; reconstructed mental images are also limited in size.386'387>388>389 In other words, there is a limit to an individual's mental capacity to perform parallel processing. This limit is referred to as central limited capacity: essentially a central "bottleneck" of conscious parallel processing. Two simultaneous tasks will interfere with each other if both require conscious mental effort, even if the two tasks are of very different nature. Attention is thus the executive control of access to consciousness by reference to long-term or recent goals, according the GW formulation. Although the sensory system is constantly bombarded with all kinds of external stimuli, only a small subset receives conscious attention (selective attention). Selective attention refers to the top-down bias implemented by the individual's volition. However, attention can be redirected by bottom-up, sensory-driven mechanisms, such as distraction by a stimulus of high contrast, e.g., a loud noise or a blinding flash. Baars defined a gray scale of continuum in consciousness, with clearly conscious and clearly unconscious events located at the two extremes. Thus, fuzzy and difncult-to-determine events occupy the middle region on the scale. Attended events receive various degrees of conscious control. Focal attention is at the top on the gray scale, whereas peripheral or background perceptual events (peripheral attention) is placed low and close to the "edge" of attention or, rather, the "fringe-consciousness" or "antechamber" of consciousness (coined by William James and Francis Galton, respectively) (e.g., see pp. 24-25 of Ref. 272). In other words, there is a gray scale of attention. It should be pointed out that Baars' notion of unconsciousness is different from Freud's notion of the unconscious: unretrieved materials in long-term memory is treated as unconscious whether they are repressed or neutral thoughts. Freud's concept of the unconscious gives no provision of a gray scale of its transition from the conscious (for example, see Ref. 371). However, Hadamard referred to "successive layers in the
196
F. T. Hong
unconscious," thus implying the existence of such a gray scale (p. 26 of Ref. 272). During a thought process, all solution candidates in the search space do not receive equal attention. During a prolonged period of concentrated efforts in problem solving, the search space may become segmented into roughly two regions: a region at the center of attention (a region where repeated attempts are made) and a region at the periphery of attention (a region of relative neglect). This formulation is supported by arousal research.672 The range of cue utilization is reduced as arousal increases, whereas remote and incidental cues on the periphery of attention become available as arousal decreases. Of course, the demarcation between the two regions is not well defined, and there may be a diffuse intermediate region of gray areas. Furthermore, the demarcation is not fixed but dynamic, since shifting of attention to an unexpected stimulus of high contrast is possible by means of the above-mentioned bottom-up mechanisms (interrupting function of attention).626 In contrast, Freud's dichotomy between the unconscious and the conscious does not allow for dynamic shifting between the two classes of thoughts. Understandably, Freud's theory is a special case that deals with difficult situations that arise in neurosis; it takes an extraordinary effort (psychoanalysis) to raise repressed thoughts to the level of consciousness. Is this segmentation of the search space into two regions, based on the degree of attention, fundamentally different from Freud's separation of the unconscious from the conscious? Or, is this simply a different way of describing the same phenomenon? Freud's notion conjured up an image of a layered vertical network architecture, and the transition between the two layers seems abrupt. In contrast, the formulation of focal versus peripheral attention implies a horizontal network architecture, with a gradual transition between the two regions. As will be explained in greater detail in Sec. 7.5, the two formulations have no fundamental difference from the point of view of network architectures. In light of this modern re-interpretation of Freud's concept of the unconscious, Poincare's introspective account could be readily understood and substantially demystified. It is not too farfetched to assume that it is human nature to initially choose the well-trodden paths over the unusual paths, for finding the appropriate solution to a problem (heuristic searching). It is also reasonable to assume that the solution to a novel but difficult problem tends to lie outside of the commonly chosen region of the search space preferred by average problem solvers, otherwise it would not have been considered dif-
Bicomputing Survey II
197
ficult (cf. Koestler's notion of bisociation, Sec. 4.1, and Shekerjian's notion of "connected irrelevance," p. 41 of Ref. 610). Given these two assumptions, Poincare's unconscious work during the incubation period was nothing but the gradual defocusing of his attention or spreading of his attention to the peripheral part of the search space. Such a state of defocused attention occurs during a period of relatively low cortical arousal.494 Experimental evidence supports the hypothesis that broad and diffuse attention deployment is associated with higher levels of creativity.672 In the arousal manipulation, there exists a difference between subjects high in originality and subjects low in originality. Subjects high in originality experienced a benefit from a moderate, as compared to low or high, arousal induction, whereas subjects low in originality showed marginal improvement as arousal increased to high levels. This difference was partly associated with the complexity of tasks: the (ideation fluency) test was simpler for less original subjects. The more complex a task, the lower arousal must be to attend to a sufficient range of cues. Thus, under high arousal and a constricted range of cue utilization, performance decreased for highly original subjects, but performance of subjects low in originality showed a marginal increase. As pointed out by Toplyn,672 the above results are also consistent with the results from affect research.581 Experimental results suggest that comedy or music can increase divergent thinking and creative problem solving, but stress can decrease divergent thinking and originality. This suggests that moderate intensity in affect, as with arousal, may enhance originality and divergent thinking. Arousal increases focus and discrimination in attention. In contrast, affect defocuses attention, and, as a consequence, attention can be deployed to access a wide range of cues, while, at the same time, retaining sufficient focus to discriminate among the quality of the available cues and discern those which mediate remote and original ideas. This may explain the difficult contradiction described by Csikszentmihalyi: "not to miss the message whispered by the unconscious and at the same time force it into a suitable form. The first requires openness, the second critical judgment" (pp. 263-264 of Ref. 155). Low arousal favors openness, but high arousal is needed for critical judgment. Thus, a moderate arousal and the capability of flexibly shifting arousal levels favor divergent thinking and creativity. A unified interpretation can thus be attained if we assume that the capability of parallel processing (or pseudo-parallel processing, see Sec. 4.11) is the common denominator in these related conditions; all these conditions enhance the effectiveness of the search-and-match phase of ere-
198
F. T. Hong
ative problem solving. Mozart's music was claimed to be particularly effective in enhancing cognitive performance (Mozart effect).555'556'103 By exposing thirty-six college students to 10 minutes of listening to Mozart's Sonata for Two Pianos in D major (K488), their performance on IQ (intelligence quotient) tests for spatial-temporal tasks was enhanced. Needless to say, the topic was controversial. It elicited such critical letters to the editor from readers of Nature109'648 that its editor grouped those letters under a derogatory heading: "Prelude or requiem for the 'Mozart effect'?" Although the claimed effect was subsequently plagued with inconsistent replications,554 Don Campbell103 actually made it an educational practice with some success. It is not difficult to comprehend why the exposure to Mozart's sonata enhances temporal reasoning since music is a temporal art. But why spatial reasoning? Music, especially classical music, has a mild arousal effect. It is also not difficult to understand why music has a general effect of improving cognitive performance. But then why Mozart? Here, the claimed enhancement on spatial task performance gave us a hint: the effect may be attributed to the mind's enhanced capability of parallel processing. I suspect that this might be the specific effect of exposure to Mozart's music. Mozart had a penchant for presenting several attractive melodies concurrently in his symphonic compositions, especially in his more renowned late symphonies. His music demands divided attention of the listeners, thus fostering a "multi-track" mind or, rather, a multi-tasking capability. Although polyphonic compositions (e.g., fugues) of Johann Sebastian Bach or Georg Philipp Telemann also make the same, or perhaps more, demand (especially on ensemble performers), Mozart's melodies seem more distinct, discernible and attention-catching than those appearing in most polyphonic compositions of Baroque composers, at least for amateurs. However, this is just my speculation. Nevertheless, the speculation suggests what can be done in future attempts to replicate the original observation of Rauscher and coworkers. It is well known that behavioral experiments are notoriously model-driven. A model (or theory) suggests appropriate experimental designs but also leads to inadvertent sample heterogeneity that is subsequently attributed to factors neglected by the model (Sec. 4.21). Rauscher and coworkers556 were aware of the possibility of sample heterogeneity, and suggested the use of master chess players in future experiments. They recognized the difference between two types of exercises: creative and analytic exercises. The for-
Bicomputing Survey II
199
mer can be identified with the search-and-match phase and the latter with the verification phase of creative problem solving. They also realized that "all music excites both types of evolutions in different proportions, and all reasoning involves both the analytic and creative evolutions." However, as suggested by the experiment reported by Chase and Simon113'114 (Sec. 4.3), chess players to be used in future experiments should be further subdivided into novices and experienced players. As will be shown in Sec. 4.22, perhaps few subjects can match the striking difference of reasoning styles between biomedical and humanities students. Using them as separate groups of experimental subjects may clarify the issue of sample heterogeneity by demonstrating which groups are more susceptible to the Mozart Effect and which groups are less or not. Perhaps the choice of types of Mozart's compositions also have some bearing on the level of significance of the elusive effect. Because of the content of relative propensity of multiple concurrent melodies, the Overture of Mozart's opera Die Zauberflote (The Magic Flute) and the first movement of his Symphony No. 38 (Prague Symphony; K. 504), for example, may be more effective in enhancing spatial reasoning than his Sonata for Two Piano in D Major used in previous experiments. In light of the above interpretation, primary-processing thinking is neither illogical nor non-logical, but rather a non-sequential, non-verbalizable kind of reasoning (Sec. 4.10). What sets geniuses apart from psychotic patients may just be the integrity of mental faculty that enables the former to carry out logical verification of their primary-process thought. However, conscious mental work is not only important in the verification phase but also crucial in the search-and-match phase. Poincare insisted that the initial fruitless hard work was necessary to bring out the divine subliminal self. This period of hard work accomplished at least one task: firmly encoding all intricate specifications of the problem in his long-term memory and linking appropriate items from the knowledge base with the problem. In this way, he could readily elevate these specifications and knowledge items to an intermediate level of consciousness (items at the edge of attention or unattended inputs) for ready retrieval while he was subsequently on a geological excursion without a pencil and a sketch book ready in his hands (pp. 387-388 of Ref. 521). The above interpretation also applies equally well to the mind's journey of mathematician Andrew Wiles who eventually closed the last logic gap in his attempt to prove Fermat's Last Theorem. It was a difficult proof that had eluded mathematicians over the past 350 years, including such geniuses
200
F. T. Hong
as Euler, Cauchy and Lame. According to Singh's account,634 the last logic gap remained persistently refractory to repeated attacks by Wiles until he was ready to accept defeat, after more than eight years of continuing and concentrated efforts. Then suddenly, it became obvious to Wiles that the gap could be readily bridged by combining two separate approaches that Wiles had used previously, one at a time, at two separate periods of his prolonged and protracted endeavor — an approach so patently simple in hindsight and so exquisitely beautiful that Wiles was moved to tears. The fact that Wiles was trapped at search paths so close to the correct one implied that the correct approach had remained at his peripheral attention for a long time, presumably under the spell of his single-minded determination, which led to compartmentalization of thoughts. Only at a later moment of relative acquiescence could he find the obvious solution. Wiles did not credit his ultimate success to unconscious work. I believe that defocusing of attention may be a better explanation. For bench scientists whose discovery involves primarily experimental observations, Jean-Marie Lehn, 1987 Nobel Laureate in chemistry, offered the following advice: "Often an experiment wants to tell you more than what you first expected, and often what it tells you is more interesting than what you first expected" ,346 The cognitive interpretation is: defocus your attention so as to expand the search space; do not just "prime" your mind only to what you have expected. Furthermore, the experimental result offers a picture that contains more information than what you want to know. So, use picture-based reasoning to capture the additional information. It is futile to use rule-based reasoning. This is not only because no words can fully capture what is contained in the picture, but also because it is impossible to "prime" your mind with keywords that you have not known or have not expected. It is of interest to note that, although some investigators continue to associate intuition with unconscious acts, others prefer to use the adjective "preconscious" to refer to materials that are unconscious for the moment, but are available, and ready to become conscious (Sec. 4.1 of Ref. 52). How does an unattended input maintain the potential of retrievability? The clue to an answer came from the following phenomenon. Broadbent92 previously hypothesized a filter theory of attention to explain selective attention. The hypothesis states that the role of attention is to select some important aspects of the stimulus for processing and to exclude others, thus conserving processing capacity for the most important things. Selective attention experiments suggest that unattended sensory inputs involve as much com-
Bicomputing Survey II
201
putational processing as attended inputs, thus vitiating the claim that attention saves processing capacity. This is known as the filter paradox. Baars explained the paradox by suggesting that all inputs are highly analyzed but only the conscious input is broadcast systemwide and widely distributed to a multitude of what Baars referred to as specialized unconscious processors. A specialized unconscious processor can be regarded as a relatively autonomous system that is limited to serving one particular function, such as detection of a straight line, one's own name breaking through from an unattended stream of speech, or fundamental concepts in a given scientific topic that have attained the status of "second nature." Thus, unattended inputs are probably at the edge of working memory or standing by ready for retrieval in long-term memory, since they may be brought to central attention as suggested by the filter paradox. It is possible to reconstruct a reasonably accurate retrospective report of unattended events by subsequent recalling information stored in long-term memory. Details which were previously overlooked emerge in the new context. That this is possible may be due to two factors. First, picture-based reasoning is involved. Second, previously neglected items in recalled mental images now become centers of attention because a new circumstance has "primed" the picture in a radically different way. This is how the police usually interrogates an eyewitness. However, the practice is a double-edged sword that sometimes leads to erroneous recalls, as commonly known. There is an experimental basis of this interpretation. Watanabe et al.701 demonstrated that unattended input may be instrumental in perceptual learning. Experimental subjects, who were called to perform a central task of identifying alphabetical letters, were also presented with a background motion signal so weak that its direction was not perceptible (as confirmed by a separate test). While the central task engaged the subject's central attention, the background signal was apparently captured by the edge of attention. Despite being below the threshold of detection and being irrelevant to the central task, the repetitive exposure improved performance specifically for detecting the direction of the exposed motion when subsequently tested with a suprathreshold (perceptible) motion signal alone. Now, let us consider for a moment how domain-specific knowledge is retrieved for problem solving. Merely storing the relevant domain-specific information in long-term memory is probably insufficient, as evident from a common example in the class room setting: students, who failed to solve a problem, often insisted that they had known the stuff (but failed to make
202
F. T. Hong
the connection), when they were subsequently shown the correct answer. According to Singh (p. 207 of Ref. 634), Wiles consulted all related literature in many different branches of mathematics and practiced the latest mathematical techniques repeatedly "until they became second nature to him." Only then could these techniques become readily retrievable, during his intense conscious work as well as during the interludes of relative defocusing of attention. Thus, the initial fruitless hard work, which Poincare found necessary, was apparently also needed by Wiles to enhance the retrievability of all required mathematical techniques.521 Does this re-interpretation of Poincare's notion of unconscious work create any new difficulty for Freud's original theory of the unconscious? It is too early to judge and that is for the readers to assess. In the meantime, repressing sexually-oriented thoughts into the unconscious can be replaced by pushing the unpleasant (repressed) thoughts to the periphery and away from the center of attention (avoidance). This interpretation is again in line with Baars' Global Workspace Theory, in which repression or suppression was interpreted as directing attention away from emotional conflict (Sec. 8.4 of Ref. 29). The psychoanalytic technique of inducing free associations for the purpose of uncovering unconscious thoughts is actually a disarming tactic of diffusing the highly focused attention brought about by the tendency of avoiding repressed thoughts whether they are sexually-oriented or not. In addition, neutral thoughts are neglected by chance or excluded by personal prejudice, as exercised by a dogmatic mind. Csikszentmihalyi155 also emphasized the importance of unconscious thoughts in creativity. He distinguished between the psychoanalytic concept of the unconscious, which is related to inner tensions, and the unconscious thoughts in creativity, which assumes no predetermined direction, i.e., neutral thoughts with no emotional content. The presence or absence of a sharp demarcation between the center and the periphery of attention is a matter of resolution, as is also involved in the judgment of sharpness of a corner. Graphically, a transition curve that looks like a step-function at a coarse view may appear to be a sigmoidshaped curve at a sufficient level of magnification (cf. the threshold curve shown in Fig. 17B of Chapter 1). There is however a subtle difference. In the interpretation of Poincare's introspection, only working memory was considered in the re-interpretation, whereas the original Freudian theory obviously involves long-term memory. Nevertheless, working memory research is in a state of flux. The current consensus no longer regards short-term memory or working memory as a
Bicomputing Survey II
203
separate anatomic or functional entity but rather as the activated part of long-term memory.472 Thus, the difference may be more imagined than real. On the other hand, an alternative re-interpretation of the psychoanalytical concept of the unconscious based on neuroscience is possible371 (see also Sec. 5.20). However, no demystification of Poincare's introspective account was offered. The above interpretation of primary-process thinking can be modified to explain the history of the emergence of ideas. In his book The Science of Conjecture, Franklin raised the question as to why the mathematical theory of probability was not developed until Pascal and Fermat did it in the 17th century, even though the concept of probability (logical probability) had been used since antiquity in decision making, in the presence of uncertain or inaccurate evidence (Chapter 12 of Ref. 212). Franklin even suggested that nonhuman animals make probabilistic inferences in risk evaluations. Franklin concluded that the vast majority of probabilistic inferences are "unconscious." It took centuries of time and geniuses, such as Pascal and Fermat, to "verbalize" probabilistic inferences in terms of mathematical rules. What they did was essentially a process of formalization, in Rosen's terminology (Sec. 6.13). Franklin summarized what he learned from the cognitive science literature (p. 323 of Ref. 212): Insight into what has been learned implicitly is not suddenly revealed in full but appears as the mind iteratively redescribes the information it already has. One can distinguish, for example, a first stage, in which one can recognize a zebra; a second, in which one can recognize the analogy between a zebra and a zebra crossing [crosswalk] without being able to explain the analogy in words; and a third, in which the respects in which the two are analogous can be stated in words. Franklin's analysis becomes transparent if we substitute picture-based reasoning for unconscious inferences, and rule-based reasoning for verbalization (or formalization) of probabilistic inferences. That non-verbal picturebased thoughts often precede verbalized concept formation was vividly described by Einstein in his letter to Hadamard, in response to the latter's questionnaire (pp. 142-143 of Ref. 272): (A) The words or the language, as they are written or spoken, do not seem to play any role in my mechanism of thought. The psychical entities which seem to serve as elements in thought are certain
204
F. T. Hong
signs and more or less clear images which can be "voluntarily" reproduced and combined. There is, of course, a certain connection between those elements and relevant logical concepts. It is also clear that the desire to arrive finally at logically connected concepts is the emotional basis of this rather vague play with the above mentioned elements. But taken from a psychological viewpoint, this combinatory play seems to be the essential feature in productive thought — before there is any connection with logical construction in words or other kinds of signs which can be communicated to others. (B) The above mentioned elements are, in my case, of visual and some of muscular type. Conventional words or other signs have to be sought for laboriously only in a secondary stage, when the mentioned associative play is sufficiently established and can be reproduced at will. The second paragraph, quoted above, indicates that Einstein not only searched for available "pictures" or "sign" modules, but he also combined them together logically in terms of sequential rules. The situation was similar to what transpires during the process of pattern recognition, as described by Kosslyn: picture modules are logically connected so as to maintain the shape-invariance, while accommodating the flexibility of relationship between various body parts (Sec. 4.15). Einstein also pointed out that the motivation of verbalization was to convert picture-based reasoning into logically connected concepts in words, so that it can be communicated to others. Here, I shall add: verbalization allows for the conversion of the art of unconscious picture-based probabilistic reasoning into a craft of reasoning based on rules of probability theory, which can then be learned and practiced by the less gifted. Standardization by means of publicly accepted rule-based procedures also prevents excessive claims, made by individuals on a no-holds-barred basis, especially in a legal court. Franklin's speculation regarding nonhuman animals' risk evaluations also makes sense, if we assume that nonhuman animals use picture-based reasoning for risk evaluations. It is not an unreasonable assumption since rule-based reasoning is intimately related to the language capability; picture-based reasoning seems to be the only option left for nonhuman animals, with the possible exception of great apes (Sec. 4.18). Presumably, perception of risks by a nonhuman animal results in a cumulative emotional index. When this index exceeds a certain threshold, a decision to escape is then made. This
Bicomputing Survey II
205
is not unlike the situation in which human's complex perception of risks based on an overall assessment results in a decline of the comfort level and a rise of fear, and eventually leads to a concern or an action, even though the exact causes that have triggered the discomfort or fear and the exact basis of the concern or action remain unreportable (see Sec. 4.10 regarding verbalization of intuitive feeling). 4.9. Interpretation
of hypnagogia and serendipity
We shall now examine several well-known cognitive phenomena in light of the re-interpretation of Freud's concept of the unconscious. Anecdotes abound in science history about clues for an important discovery that often arose during a dream and, in particular, a lucid dream.403'260 The most famous one is Friedrich August Kekule's dream445'563 about a snake biting its tail that inspired him to discover the molecular structure of benzene.e Kekule's dream was not really a dream but a state of consciousness between wakefulness and sleep, called hypnagogia or hypnogogia. 445 Kekule's dream can be interpreted as a result of disinhibition or release from a highly focused effort according to the formulation of focal versus peripheral attention presented in Sec. 4.8. But it can also be interpreted as the consequence of an escape from the tyranny of the left cerebral hemisphere which dictates rule-based reasoning (Sec. 4.15). The latter interpretation is supported by cerebral lateralization research. Dream research and split-brain research (using patients with severed corpus callosum) suggested that the right hemisphere is primarily involved in dreaming, but the dream content is accessible to the left hemisphere for subsequent verbal reporting (see, for example, pp. 294-295 of Ref. 646). Self-imposed constraints apparently become less effective when one is not fully conscious than when one is fully alert. Bastick suspected that the reduction in ego control facilitates the occurrence of hypnogogic reverie, which he defined as the seemingly chaotic associations of images and ideas that occur during very relaxed, near sleeplike states (p. 341 of Ref. 52). Perhaps the state of hypnagogia is not very different from the mental state of a psychiatric patient during a session of "Beider34 indicated that it was Josef Loschmidt who discovered the benzene structure first. The "snake biting its tail" story has to be viewed in a different light. However, there exist other less publicized examples of hypnogogia. Bastick cited an example of Gauss (p. 342 of Ref. 52). Modest or minor discoveries through hypnogogia may be more common than one is led to believe (personal observation).
206
F. T. Hong
psychoanalysis by a licensed psychiatrist: the psychiatrist's task is to facilitate the patient's free associations. The patient thus becomes more likely to reveal inner repressed thoughts. In this regard, it is of interest to consider what a genius and a psychotic patient may have in common. The mental status of a genius has often been compared to insanity.536'574'348'182 The subject is controversial. However, the present analysis offers the following speculation. A search for matching answers in creative problem solving often depends on our mind's ability to make free associations, which can be construed as another manifestation of divergent thinking. As alluded to in Sec. 4.8, patients suffering from schizophrenia or manic-depressive psychosis (manic or hypomanic phase) are characterized by an extreme form of free associations: flight of ideas. In mania, the brain is moving at an accelerated pace, and, under certain conditions, the patient may demonstrate lightning quick free associations.182 The patients of psychosis tend to invoke primary-process thinking, which is a valuable trait for creative problem solving. For example, if productivity is evaluated in numbers, then the two peaks of productivity of German composer Robert Schumann coincided with his two hypomanic episodes in 1840 and 1849 (e.g., see Ref. 348, 349). A history of mental illness is not a prerequisite of high creativity, but the high prevalence of mental illness in certain types of creative people points to a connection between mental illness and creativity; the correlation is strong but the mechanism is not clear.182 A psychotic patient sometimes uses an aberrant form of primitive logic, called paleologic, in reasoning.598 Although paleologic is not typically adopted by scientific geniuses, paleologic thinking sometimes appears in a dream or hypnogogic state of a normal person (personal observation). A brilliant idea acquired during a dream or hypnagogia sometimes disintegrates under rigorous scrutiny — during the verification phase — in a fully awake state (see also Poincare's account of ideas appearing in a semi-hypnogogic state.521) Scientists sometimes make important discoveries by accidents or mistakes — a situation known as serendipity.565'175 Was it because these scientists had sheer luck to aid them? Perhaps not. Louis Pasteur claimed that "in the fields of observation chance favors only the prepared mind" (p. 502 of Ref. 50). As Boden pointed out, parallel processing of the mind is a key factor for serendipity; it is not mere random chance alone but rather "chance with judgment" (p. 220 of Ref. 75). Boden also presented an extensive discussion about the unpredictability of serendipity. Her interpretation can be made clear, if the word "judgment" is replaced by "recognition":
Bicomputing Survey II
207
serendipity is pattern recognition at an unguarded moment. Here, at work is the ability to make a subtle match between a pattern and templates under an unplanned, unexpected circumstance. Roberts565 compiled numerous examples of serendipity. He pointed out that accidents became discoveries because of the sagacity — word of Horace Walpole who coined the term "serendipity" — of the person who encountered the accident.458 What constitutes sagacity? Can it be developed or fostered? He thought that it is an inborn ability or talent but he also thought that it can be encouraged and developed. He listed a number of factors that seem to be common to many of his compiled examples: curiosity (curious about the accident), perception (keen observation, perceiving the expected as well as the unexpected), flexibility (turning an annoying failure into a discovery) and command of domain-specific knowledge (having the knowledge to interpret the accidental result). All these factors help but the explanation lacked specificity since all of them are also needed for creativity in general. Roberts further classified serendipity into two subclasses: true serendipity, for unexpected discoveries not being sought for, and pseudoserendipity, for unexpected solutions being sought for. The common element is unusual sensitivity in recognition, which Roberts attributed to sagacity. The accidental discovery of pulsars by Jocelyn Bell and Anthony Hewish was true serendipity, since pulsars were heretofore unknown and they could not have sought for them (p. 121 of Ref. 565). Curiosity about an unexpected noise (bursts of radiation) was certainly a key factor. However, broad and diffuse curiosity about each and every occurrences of laboratory noise — and there are plenty of them — is a mixed blessing or curse. In the extreme case, task involvement becomes an impossibility because of the tendency for one to spread oneself too thin. What turns out to be an unexpected opportunity or a time-wasting distraction is a conclusion to be made only in hindsight; one may never know it in one's life time. What Bell, then a graduate student, observed was a recurrent event at each midnight, and she was able to recognize this occurrence as a heretofore unidentified pattern. The bursts came earlier each night, just as stars do (due to the Earth's orbiting motion), thus pointing to a celestial source. It was curiosity, keen observations and "chance with judgment" that made a difference. Roberts also listed the accidental discovery of penicillin by Alexander Fleming under the category of true serendipity (pp. 159-161 of Ref. 565). The accident was a contamination that ruined colonies of Staphylococcus bacteria on a culture plate. It was an apparent experimental failure, from
208
F. T. Hong
most others' viewpoint, but an opportunity for Fleming. It is often difficult to distinguish the two types of serendipity because both require framing of a problem, whether it is before or after the accident. In my opinion, Fleming's discovery of penicillin was pseudoserendipity, instead. First, judging from Fleming's career and his experience during World War I, it was possible that Fleming might have been looking for some better substances to treat battle-wound infections than phenol and might have put this objective on the back burner without actively pursuing it. Second, by Fleming's own account, the reason why he did not discard the contaminated cultures was his interest — an interest not shared by his contemporaries — in naturally occurring antibacterial substances. Furthermore, he indicated that his previous discovery of lysozyme, in a similar situation, kept him keenly alert for similar accidents (lysis of bacterial colonies on a culture plate), thus preparing his mind for the discovery. Presumably, he vividly remembered the lysed bacterial colonies caused by a drop of his tear which he subsequently found to have contained lysozyme. On the other hand, some scholars believed that many of these anecdotes of serendipity might have been pre-planned events — experimental designs — rather than accidents. Root-Bernstein567 thought that it is not sufficient simply to be in the right place at the right time: a scientist must be expecting something for serendipity to occur. That might be true for the case of Fleming's discovery of penicillin — he was expecting another ruined culture plate after the lysozyme accident or incident — but what about his first encounter with a ruined culture plate when he discovered lysozyme? He certainly could not expect something, in concrete verbal descriptions, that he had not known ahead of time. All he could have was a vague, unspeakable general feeling of expectation — some magic biological substances that cause massive death of bacteria (see later). So, Fleming could not have expected a ruined culture plate in his discovery of lysozyme.f Furthermore, f
The authenticity of the lysozyme story was disputed (see, for example, Ref. 567). In place of the lysozyme or the penicillin example, Thomas Edison's discovery of photographs might be a reasonable substitute for the sake of this on-going discussion. However, a continuing debate is expected. Those who never had any such experience might continue to treat these anecdotes as exaggerated or fabricated stories whereas those who had the experience of modest or minor serendipitous discoveries in the past would embrace the stories without much suspicion or, at least, would believe some of these stories to be true (personal observation). The point is: it is not necessary to prove all of these claims. If only one or some of them turned out to be true, serendipity would be a significant phenomenon in creativity (see Sec. 6.14 for the reason why it is not necessary to prove it in all of the known samples rather than just some selected samples).
Bicomputing Survey II
209
he could expect that unexpected events might happen anytime, anywhere. The only sure way for him to catch them was to be constantly alert to any possible, unexpected match between his problem in mind and any potential solution-template, anytime, anywhere. In other words, one must make it a pastime to continually attempt to match any event (that happens to happen) with any problem (that happens to come to mind), whether the particular event and/or problem is important or trivial. Any less prepared mind would be caught off guard because the candidate solution-template could not be designated as such until the match happened. Any attempt to separate what is deemed important and what is not is a prescription to lose an unexpected opportunity. Presumably, Fleming's reasoning at the moment of serendipity was largely picture-based, i.e., he had to cast and judge the event of a potential match, between his problem and the unexpected solution, in pictures. He could not have found the match had he interpreted the event of a potential match in sterile words since the verbal description of an unexpected solution could not have been formulated ahead of time but mental images depicting death of bacteria could have been stored in his mind ahead of time, presumably in a number of different versions (e.g., absence of motility of bacteria viewed under a microscope, lysis of bacterial colonies seen with naked eyes, etc.). In addition, those preconceived pictures could not have been clear and exact, and a stretch of imagination must be exercised in order to score a match between a problem and a solution template. In other words, vague and fleeting event-patterns in terms of mental imagery must be kept at the edge of his attention so that the unexpected appearance of a candidate "template" will automatically elicit an attempt to recognize it. When he saw a clear circular area in a culture plate (due to lysis of bacterial cells and massive destruction of bacterial colonies), he recognized his opportunity first in pictures, which were subsequently converted to words. Furthermore, Fleming's mind must concurrently keep track of several objectives, thus practicing a kind of parallel or pseudo-parallel processing within a time span longer than a single problem-solving session. While he was working on a particular objective, other related or unrelated objectives were not completely forgotten — i.e., assuming a standby status — and were ready to be recentered upon encountering appropriate cues. The above description also applies to the case of Thomas Edison's discovery of phonographs (cf. Sec. 4.21). If our present interpretation of what Pasteur meant by the prepared mind is correct, the word "dedication" acquires a new meaning. A stretch
210
F. T. Hong
of attention beyond the formal session of problem solving seems to be a prerequisite for making serendipitous discoveries. Likewise, hypnagogia is essentially the consequence of extending one's attention (to a particular problem) into the twilight zone of consciousness. In other words, dedication is not merely hard work; it is total task involvement throughout day and night, during waking hours as well as in dreams (or, rather, hypnagogia). When Isaac Newton was questioned how he discovered the law of gravitation, he indicated that he had done it "[b]y thinking about it all the time" (p. 211 of Ref. 211). In Belgrade's Nikola Tesla Museum, three of Tesla's portraits exhibit a characteristic pose, presumably his favorite: pointing his fingers or hand at his temple. I suspect that Tesla attempted to tell us that he thought all the time. Andrew Wiles recalled how he was enchanted with the problem of proving Fermat's Last Theorem, "I was so obsessed by this problem that for eight years I was thinking about it all of the time — when I woke up in the morning to when I went to sleep at night" (cited in p. 60 of Ref. 481). Here, the keyword in Wiles' remark is "obsession." So far, we have emphasized the role of the ability to recognize an opportunity in serendipitous discoveries. Can serendipity be sought after actively? The answer is affirmative. In our discussion of dogmatism, the willingness to deviate from the norm and explore and to run the risk of making mistakes was identified as an important factor for creativity (Sec. 4.4). The same willingness is also conductive to generating unexpected results or even something terribly wrong, both in experimentation and in thinking, thus creating new opportunities to be recognized by a prepared mind. It is no wonder that a dogmatic person seldom experiences serendipity: reluctance to deviate from the norm diminishes the opportunity, and refusal to recognize an unexpected opportunity, if any, virtually abolishes the possibility of serendipitous discoveries. Hayes's289 interpretation of serendipity differed from ours. He thought what Pasteur means by "the prepared mind" was someone who is sufficiently knowledgeable to recognize the chance of a discovery (see also pp. 59-60 of Ref. 481). However, examples abound that discoveries of the serendipity type often eluded many others that were just as knowledgeable as the "lucky" ones. Roberts thought that it was Fleming's knowledge and training in bacteriology that allowed him to interpret the meaning of a clear area on a culture plate but that is just common knowledge: any competent microbiologist or technician knows its meaning (p. 245 of Ref. 565). There is little doubt that domain-specific knowledge played a minor role in the
Bicomputing Survey II
211
discovery of penicillin. Of course, Fleming's knowledge came in handy in the subsequent isolation of the mold and its identification as penidllium, during the verification phase. Domain-specific knowledge is unquestionably important and can, under certain circumstances, render an unrecognized template or pattern instantly recognizable. However, knowledge alone is insufficient for making a serendipitous discovery (see also Sec. 4.22). A discussion of serendipity would be significantly impoverished without inclusion of what appeared in a recent book entitled The Travels and Adventures of Serendipity.458 For a mysterious reason, the authors, Merton and Barber, had suspended and shelved their work for about half a century but suddenly decided to publish it in its original form without a major revision. The book thus presents a "time capsule" of what was in the mind of sociologists some fifty years ago but was inadvertently rendered inaccessible to cognitive scientists. It is important to keep in mind that sociologists's task is to give us the sociological perspective rather than to present a cognitive mechanism. Their analysis relied heavily on introspective accounts of past scholars. Presumably, sociologists were unaware of psychologists' taboo. Ironically, these accounts captured in their book still shed considerable light on the cognitive mechanism. For example, the role of recognition was mentioned several times. It leaves me wondering whether the course of creativity research might have been favorably altered had they published their work half a century sooner. Merton and Barber's analysis covered a wide range of human endeavors. Here, we shall focus on serendipity in science (pp. 158-198 of Ref. 458). They devoted considerable space to the discussion of whether chance ("accidents" ) is more important than a particular discover's sagacity. Historically, the opinions were divided among scholars. Quite a number of scientists thought that Hans Christian 0rsted's discovery of electromagnetism was a consequence of a pure accident and pure luck, in the sense that anyone in the right place at the right time would have made the same discovery. On the other hand, imaginativeness, knowledge and persistent alertness of a discoverer had been stressed by others, as epitomized by Pasteur's remark about "the prepared mind." Merton and Barber pointed out that beneath the disputes there were often some motivational factors ("ulterior motives") at work. Those who thought 0rsted [also spelled as Oersted in the physics literature] was just lucky might have done so because of envy or jealousy, or because of their reluctance to encourage future generations to aspire for easy success without hard work. By the same token, scientists who defended their own serendipitous discovery took the opposite position
212
F. T. Hong
for fear of being belittled of their work or for fear of being deprived of proper credit. The detailed stories make interesting reading as far as sociology is concerned, but opinions tainted with ulterior motives hardly help clarify scientific issues (This might be the reason why psychologists traditionally distrusted introspective reports). Still, some scholars adhered to their peculiar view because they sincerely believed so. The latter is evident from numerous examples cited in the book. But these opinions were almost all based on selected examples scattered over a wide range of diversity. There were examples of accidents that were too obvious to miss or ignore, but there were also subtle accidents that only a prepared mind could recognize. Newton's legendary apple belonged to the latter category. Countless people witnessed a falling apple in the past, but only Newton took that as an important hint. Why others had missed before Newton could be understood in terms of the following reconstructed but somewhat speculative scenario. One of the corner stones of Newton's theory is the principle of inertia, on which the concept of force is based. Most people watched the orbiting motion of planets — by then Copernicus' idea had been accepted — without a second thought, but someone who had a preoccupation with the notion of inertia might note that orbiting planets did not fly off the tangent as rain drops on a spinning umbrella did. In order to reconcile the two disparate observations with the concept of inertia, Newton must postulate a force that constantly pulls the planets towards the Earth (the so-called centripetal force). Presumably, Newton recognized the crucial hint of the falling apple. The apple was not orbiting the Earth but the force pulling it towards the ground was pointing in the same centripetal direction as the then-speculated gravitational force on the orbiting planets did. Perhaps two examples would not be sufficient for Newton to make a good induction in support of the universality of the gravitational force, but the incident — not even an accident — was a good starting point for Newton to hunt for more examples. People who dismissed the "apple" legend perhaps did not quite appreciate the subtle implication that could well cross Newton's mind. Taking together numerous examples of serendipity cited in Merton and Barber's book, one reaches an obvious conclusion: the diversity of opinions might well be correlated with sample heterogeneity (cf. Sec. 4.21). In order to do justice to these diverse examples, it is imperative to establish a gray scale of serendipity. If we are wiling to accept this gray scale and not to attach serendipity solely to monumental discoveries, then there is nothing "elitist" about the notion of serendipity. For example, everyday in-
Bicomputing Survey II
213
genuity involved successful trouble-shooting often requires serendipity, and serendipity can grace ordinary folks as well as geniuses. My conversations with people of less-than-genius talent suggested that serendipity might be far more common than we were led to believe so far. In fact, any competent experimental scientist must rely on unexpected experimental outcomes to gradually revise or radically revamp the working hypothesis, which is essentially a minor theory. In doing so, an experimental scientist constantly bumps into minor serendipity unless absolutely no progress is made. Breakthroughs are often associated with significant, though not major, serendipity. That there were intermediate cases of serendipity can be demonstrated by a personal example, in the "self-exemplifying" style of Merton who described his encounter with the word "serendipity" as a case of serendipity (pp. 233-242 of Ref. 458). In pursuing Loschmidt's reversal paradox discussed in Sec. 5.13, I demonstrated, by means of reductio ad absurdum, that the final state of two spontaneously unmixed gas containers is just as likely to exist as the initial state of two artificially separated, premixing gas containers. I asked an additional question, which appeared to be one too many initially but turned out to be strategic in hindsight: Is the final momentum-reversed state rare because the initial state is also rare? To my initial dismay, the answer was readily found to be affirmative, thus demolishing my attempt to show that a spontaneously occurring momentum-reversal would be a fairly probable event if microscopic reversibility were strictly true. In relative acquiescence two weeks later, I viewed my defeat in a different light, and began to explore the other side of the same proverbial coin. All of a sudden, I recognized the implication of my failure: the failed conclusion, if viewed in pictures instead of words, also revealed the elusive breaking of time-symmetry that Prigogine had spoken of. In other words, the unexpected failure itself was actually a better solution than my initially chosen candidate that had failed to live up to my expectation. This unexpected corollary to an otherwise disappointing conclusion offers a far simpler, and far more direct, demonstration that microscopic reversibility is fundamentally irreconcilable with macroscopic irreversibility, than all the remaining arguments compiled in Sec. 5 taken together. This personal example offers a specific merit:8 the initial "miss" served as the "experimental control" g This personal example is useful only if the entailed conclusion is indeed correct or, at least, on the right track, but that is not for me alone to decide. Otherwise, its value can still be preserved if serendipity is now extended to include accidents that offered a false
214
F. T. Hong
to the subsequent "hit." Furthermore, there was no sample heterogeneity. Since the same person could miss the opportunity initially but could still reclaim it at a later time, the example was an intermediate case of serendipity. The hint from the initially disappointing result was neither as elusive and as easy to miss as the protagonists had claimed, nor was it as patently obvious as the antagonists had maintained. The inherent "time" heterogeneity — the "hit" and the "miss" were separated by two weeks of "incubation" — served a useful purpose. It helped identify possible crucial factor(s) responsible for the two opposite outcomes. In this particular case, it was neither knowledge nor hard work that made the difference. Rather, it was "priming" — or, in Bastick's words, "recentring" — of the mind that allowed a previously peripheral message to be brought to central attention. Merton and Barber recognized a "serendipity pattern" in research by which they meant "the fairly common experience of observing an unanticipated, anomalous and strategic datum which becomes the occasion for developing a new theory or for extending an existing theory" (p. 196 of Ref. 458). Here, they identified key features of serendipity. I have only one point to add: there is an inherent element of subjectivity. A falling apple is a perfectly normal occurrence to most people but Newton was stricken with its anomaly at a crucial moment (and perhaps at that particular moment only). Likewise, Sternberg and Davidson's remark that "what we need most in the study of insight are some new insights!" can be greeted as a humorous one, but it can become strategic when this sentence is analyzed in terms of linguistics and in light of Rosen's generalization of linguistic processes (Sec. 8). In these examples, the common element of subjectivity helped identify one of the most crucial factors in serendipity (in addition to luck): priming of the mind, which was probably what Pasteur meant by a "prepared mind." Was luck an important contributing factor? Generally speaking, it is not reliable to judge the element of luck by introspection, as Merton and Barber's book amply demonstrated. However, anyone who got a second chance had to be luckier than those who got only one chance to do it right (see Sec. 5.15 for a more objective analysis of the role of luck).
hope but locked one onto a fateful path that eventually threw one into utter confusion, namely, negative serendipity, for lack of a better term. I have Karl Marx's brilliant failure in mind, but am too ignorant about the topic to cite it formally.
215
Bicomputing Survey II
4.10. Gray scale of understanding and interpretation intuition and "aha" experience
of
Now, we shall consider what the terms "understanding" and "intuition" mean. We all understand what the term "understanding" means, but it is a term that is more elusive than commonly thought. As will be demonstrated in Sec. 4.22, its meaning has undergone considerable transformations in the arena of education. Understanding a concept may merely mean that the concept and the procedure of manipulating it have been memorized. I, for one, no longer trust a student's introspective report that he or she understands a given concept. Instead, I usually manage to find my own answer by probing with tactical questions. Let us examine what constitutes understanding in light of Simonton's model. In a letter to Hadamard,272 cited in Sec. 4.8, Einstein expressed his interest in the distinction between "mere associating or combining of reproducible elements" and "understanding (organisches Begreifen)," which Max Wertheimer709 had tried to investigate. If we regard "mere associating or combining of reproducible elements" as what transpires in the search phase of problem solving, then "understanding" marks the sudden illumination of the match phase. Since the search-and-match phase can be conducted either by means of rule-based or picture-based reasoning, or their combination, understanding is not an all-or-nothing revelation: a gray scale of understanding exists. For example, before Maxwell developed the equations that bear his name, a fair degree of understanding of electricity and magnetism had been achieved in the 19th century, solely on the basis of Coulomb's law and its magnetic equivalent, as well as Ampere's and Lenz' laws. The advent of Maxwell equations unified electricity and magnetism. Furthermore, the speed of light was then linked to basic constants in electricity and magnetism. Maxwell's theory thus significantly enhanced mankind's understanding of electricity and magnetism. Thus, as far as mankind is concerned, being able to verbalize (or formalize) an intuitive picture-based comprehension constitutes an important step of enhanced understanding — a major scientific discovery (or mathematical invention) so to speak — as demonstrated by the formalization of probabilistic reasoning, in particular (Sec. 4.8), and emergence of science, in general. On the other hand, as far as learning is concerned, understanding merely at the level of rule-based reasoning is deemed inadequate. Understanding at both the picture and rule levels is more satisfactory. Why the difference? This is because effective learning is essentially a process of
216
F. T. Hong
"reverse" discovery. In making a major scientific discovery, the "gut" feeling — comprehension at the picture level — comes first, and articulated rules (or mathematical formulas) come second. In learning something already known, the sequence can be reversed. Rules or formulas, as well as the detailed step-by-step instructions of how to use them, can be readily learned first by just rote memorization {rule-based learning). This step enables "know-what" and "know-how." Understanding and comprehension of the learned (or memorized) rules can be further enhanced if the picture leading to the rules is also learned (picture-based learning). This second step enables "know-why." In this way, the students can reconstitute the rules into a coherent parallel picture, and manipulate the newly acquired knowledge with picture-based reasoning in the future. Ideally, the students should experience the process of re-discoveries rather than just "reverse" discoveries. With adequate hints to be given by a textbook and/or an instructor, students can retrace the step of a discovery by going from the picture to the rules, without a heroic effort but with great fun. What the students are supposed to have learned is not just "know-what" and "know-how" but also "know-why." Regrettably, such a lofty ideal has been gradually marginalized and forgotten (Sec. 4.22). Intuition is a term which scientists found difficult to define but easy to use; we all know it when we encounter it.52-522'494 To date, investigators still have not been able to reach a consensus. Policastro522 defined intuition as a tacit (implicit) form of knowledge that broadly constrains the creative search by setting its preliminary scope, so as to avoid combinatorial explosion, i.e., intuition enables heuristic searching. Damasio thought that intuition is mediated by a covert mechanism, outside consciousness, which does not require/involve logical reasoning or analytical ability, i.e., intuition is linked to the unconscious and primary-process thinking (pp. 187-189 of Ref. 157). As Shermer615 put it, intuition is the key to knowing without knowing how you know, i.e., intuition is unreportable. In contrast, as Policastro also pointed out, some other cognitive scientists tend to associate intuition with heuristics, which are explicit rules and are reportable (Sec. 3.4 and Sec. 4.26). When used in the context of creative problem solving, intuition is associated with inspiration that is required to solve an "insight" problem instead of a routine One.459'460>461 Insight,651 a related term, is also difficult to define and describe. Sternberg and Davidson652 suggested that "what we need most in the study of insight are some new insights!" This humorous remark has an obvious self-referential overtone. As will be discussed again in Sec.
Bicomputing Survey II
217
8, beneath the superficial humor, there is a serious message hidden in the remark. Here, it suffices to point out that the additional insight, sought after by Sternberg and Davidson, was actually hidden in Poincare's writing, waiting to be recognized for almost a century. Poincare complained about the notion of intuition: "[H]ow many different ideas are hidden under this same word [intuition]?" (p. 215 of Ref. 521). However, a useful interpretation of intuition was actually implied by Poincare, who said: "Logic, which alone can give certainty, is the instrument of demonstration; intuition is the instrument of invention" (p. 219 of Ref. 521). He also said: "It is by logic that we prove, but by intuition that we discover" (p. 274 of Ref. 17, p. 2 of Ref. 52 and p. 233 of Ref. 464). We can thus roughly identify logic with the verification phase and intuition with the search-and-match phase of Simonton's chance-configuration model. If so, why does creative people have more or better intuition than average people? Does not almost every competent problem solver solve a problem by first searching for a plausible solution, matching it to the problem and then later verifying the solution? We shall demonstrate that the difference lies in the personal preference of using either picture-based or rule-based reasoning in the search-and-match phase. In principle, it is possible to make discoveries by means of rule-based reasoning (creative work of the first kind). For example, many theorems of Euclidean geometry can be derived from existing theorems by means of exclusively rule-based reasoning. That is, it is possible to generate new rules by means of a recombination of existing rules, without resorting to picturebased reasoning. In practice, such opportunities of discoveries are usually soon exhausted after a field of research becomes mature; only opportunities of minor discoveries remain for practitioners of exclusively rule-based reasoning. In view of the above-cited remarks, Poincare clearly did not associate intuition with exclusively rule-based reasoning but rather with picture-based reasoning (creative work of the second kind). Finally, since most people agree that it is difficult to articulate, in unambiguous terms, the nature of intuition, it is naturally to associate intuition with a parallel process: picture-based reasoning. After all, it is well known that it is awkward to simulate a parallel process with a sequential process. Schooler and Engstler-Schooler592 observed that verbalization of the appearance of previously seen visual stimuli impaired subsequent visual recognition performance. Furthermore, Schooler and coworkers594'593 found that verbalization can interfere with insight problem solving but not with non-insight problem solving. These observations suggested that insight as well as intuition
218
F. T. Hong
is not associated with rule-based reasoning but rather with its alternative: picture-based reasoning. On the one hand, picture-based and rule-based reasoning can be regarded as two competing and mutually distracting processes. On the other hand, it is beneficial to use them alternatingly because they cover each other's shortcomings. Bastick extensively discussed nonlinear parallel processing of global multicategorized information in intuition (Sec. 5.4 of Ref. 52). He also pointed out that intuition is experience-dependent, whereas logic is independent of experience. Miller thought that intuition plays a central role in scientific creativity and that science intuition is an extension of common sense intuition (pp. 441-442 of Ref. 465). He further asserted that the power of unconscious parallel processing of information emerges as a central part of creative thought. These general descriptions can be transformed into explicit and specific terms, if we interpret the search-and-match phase of problem solving in terms of analog pattern recognition, via picture-based reasoning, as described in Sec. 4.2. Thus, nonlinear parallel processing is invoked both during the search for suitable candidate solutions (templates) and during the process of recognizing a suitable match between a candidate solution and the problem. The global knowledge, alluded to by Bastick, refers to knowledge that belongs to a "foreign" category, and is not immediately related to the problem under consideration. The global knowledge is, however, potentially relevant to problem solving (e.g., by analogy). Intuition is experiencedependent, presumably because knowledge of "foreign" nature grows with experience and also with increasing contacts with foreign fields of endeavors, resulting in an expansion of the relevant search space through crossfertilization ("domestication of foreign knowledge"). However, experience is a double-edged sword. Experience also breeds dogmatism, a trait highly detrimental to creativity. A mere expansion of the search space leads primarily to a combinatorial explosion. Nevertheless, access to such expanded knowledge may be facilitated by what Bastick referred to as multicategorization. That is, superficially unrelated knowledge modules can be redundantly connected to one another by virtue of linkages to multiple categories as well as to categories at multiple hierarchical levels. Such redundant connections effectively provide extradimensional bypasses, alluded to by Conrad (Sec. 7.5 of Chapter 1). Access to such intricately connected knowledge modules is a highly nonlinear parallel process: the first connection that results in preliminary match suddenly opens up new potential, redundant connections (like
Bicomputing Survey II
219
dangling hooks, waiting to be recognized and connected), and hence possibilities of new matches. In other words, it is a cooperative phenomenon that is conducive to accelerating free associations, much like the cooperativity of hemoglobin action: the binding of O2 to its first monomer makes subsequent bindings to the second, the third, and the fourth monomer easier and easier, by gradually loosening up the non-covalent bonds between the four monomers (Sec. 7.3 of Chapter 1). Metaphorically, experience aided with multicategorization is also like the folding nucleus, which facilitates protein folding (Sec. 7.6 of Chapter 1). The creative process itself creates additional multicategorized redundant connections, thus precipitating a state, akin to the postcritical conformation in protein folding, that is destined to the ultimate emergence of a correct solution (folding). In this formulation, how emotional states influence perception and how ego threat inhibits intuition can be readily understood. Moderate arousal fosters a condition of enhanced explorations: metaphorically, fluctuations of the transient state ensemble (TSE) aids protein folding. Ego threat inhibits explorations: metaphorically, reduced temperatures suppress TSE fluctuations and hinder protein folding. It is also clear why picture-based learning is superior to rule-based learning: picture-based learning facilitates the establishment of multicategorization, and picture-based reasoning facilitates the recognition of potential new connections. This formulation also provides an explanation of the so-called "aha" phenomenon (Bastick called it the eureka experience.52) Simon627 described this phenomenon as an experience: "the sudden solution is here preceded by a shorter or longer period during which the subject was unable to solve the problem, or even to seem to make progress towards its solution." It was Archimedes' experience as he jumped out of bath water, ran nakedly and yelled "Eureka!" It was the "breakthrough" mind state that gave Poincare the subjective feeling of certainty regarding the correctness of his inspiration-derived solution, long before he had a chance to verify it (p. 388 of Ref. 521). Both Gauss and Tesla used the metaphor of lightning to describe the suddenness of their mental breakthrough (quoted in Sec. 4.26). Koruga383 suggested that the metaphor of lightning also illustrates the process of illumination. In view of these examples, it is apparent that suddenness or discreteness is the most prominent feature of "aha" experiences (sudden illumination). Bastick analyzed a number of examples and concluded that the eureka ("aha") experience is associated with the preconscious aspects of the intuitive process and cannot be considered in isolation (pp. 147-148 of
220
F. T. Hong
Ref. 52). He indicated that intuitive processes that follow long incubation periods usually involve drastic recentring (recentering), a rearrangement of emotional states subservient to associations of ideas, so to speak. The incubation periods serve the purpose of "shelving" or "forgetting" the original approach. Recentring often leads to a sudden detour from the original search path. All these implicit descriptions point to random access rather than sequential access, and to processes of pattern recognition that are often vague and fleeting rather than cut-and-dried. Essentially, it is an introspective feeling that accompanies the "snapping" action when a candidate "template" suddenly snaps into the "pattern" or when a critical piece of a jigsaw puzzle falls into place so that a long-standing impasse suddenly dissolves, akin to the attainment of the postcritical state in protein folding. Conceivably, such a "snapping" action can take place in at least two scenarios. First, in cases of picture-based reasoning, a template-pattern match is often unstable, because of the necessity to stretch either the pattern and/or candidate templates until they "snap" and a match is consummated and stabilized. This is what Einstein meant when he said that "the mentioned associative play is sufficiently established and can be reproduced at will" (Sec. 4.8). This kind of snapping action usually does not occur in rule-based reasoning. However, there may be a possible exception: rule-based pattern recognition under multiple constraints (selection criteria), i.e., the rule pattern is composed of multiple and readily separable parts. In this second scenario, a template-pattern match needs to be stabilized first before it can be evaluated for goodness of match. A sound judgment cannot be made unless all relevant constraints are held concurrently in working memory. To do so, the constraints must be loaded rapidly into working memory, otherwise some of them begin to fade while others are still being loaded. Sequential consideration of the constraints — the alternative approach — often leads to confusion or bad judgment: while some parts begin to line up in a match, other already-aligned but strained parts begin to disintegrate and pop out of alignment. In both above scenarios, once the elusive match forms precariously, the mind should be able to "lock onto" the idea — snap it into a stable configuration — and not get confused again. Matching a correct answer with a novel and subtle problem may be as difficult as landing a modern fighter jet in the middle of the night on the heaving deck of a cruising aircraft carrier in a stormy sea. Sometimes the feat cannot be accomplished with a single attempt; multiple passes are often necessary. Here, a successful catch of the
Bicomputing Survey II
221
restraining cable on the carrier's deck by the jet's tailhook constitutes the metaphorical snapping action entailed in an "aha" experience. Note that the matching process in rule-based reasoning with multiple constraints is actually parallel instead of sequential in nature. It is quite conceivable that the problem solver may actually use abstract diagrams — what Miller called sensual imagery (p. 233 of Ref. 464) — in thinking, thus making parallel processing feasible. If so, an ostensibly rule-based reasoning may be more appropriately regarded as picture-based, instead. This speculation is supported by Poincare's introspective account (p. 385 of Ref. 521): he stressed the importance of the feeling, the intuition of the [spatial] order of syllogisms, not just a simple juxtaposition of them (see also Sec. 4.19). Thus, in addition to suddenness, a common feature of these "aha" phenomena is parallel processing. Although one certainly can speak of insight and sudden illumination — following systematic searching — in a routine case of straightforward rule-based reasoning, the accompanying subjective experience does not even come close to what entails in an "aha" experience — certainly not in intensity. An "aha" experience is primarily a private introspective feeling. A consensus is therefore difficult to emerge, and undoubtedly the debates will persist. This is always the problem when subjectivity must be summoned to elucidate a private experience: the verification process requires reader participation. Can objective criteria be established to detect an "aha" experience in others? Simon thought that it is possible. A discussion will resume in Sec. 4.26. Miller made an interesting point in his historical analysis of creativity: the word for intuition in German is Anschauung, which can be translated equally well as "visualization" (p. 45 of Ref. 465; see also Chapter 4 of Ref. 464). In the philosophy of Immanuel Kant, intuitions or visualizations can be abstractions of phenomena that we have actually witnessed, which Kant carefully separated from sensations (the raw information received through our sensory organs).356 Kant referred to concrete visual imagery that we can actually see as "visualizability" (German: Anschaulichkeii). In Kantian philosophy, the visual images of visualizability {Anschaulichkeii) is inferior to the images of visualization {Anschauung). Loosely speaking, the distinction between the two kinds of imagery is equivalent to that between undistorted visual imagery, faithful to our perception, and processed ("stretched") visual imagery, evoked to achieve a match during the searchand-match phase. We thus came full circle and reached where Immanuel Kant was.
222
F. T. Hong
For classical physics and most other scientific endeavors, visualization and visualizability can be considered synonymous. However, as Miller464 pointed out, this mold was broken by the development of quantum mechanics: in the mechanics of the atom the intuitive pictures of a Copernican miniature solar system had to be abandoned. Loss of visualization turned out to be an essential prerequisite for Heisenberg's formulation of quantum mechanics in 1925. However, there was an urgent need to get back some sort of visualization. Thus, Heisenberg embarked in a reverse process of the Kantian usage of Auschuung and Anschaulichkeit: in Miller's words, the Anschauungen assumed the role of quantities associated with classical physics and thus with close links to the world of perceptions, whereas Anschaulichkeiten were promoted to ever higher realms of abstraction (Chapter 6 of Ref. 464). Essentially, Heisenberg redefined Anschaulichkeiten: this was mathematical formalism guiding the process of pictorialization rather than the other way around as before, and visualizability is no longer synonymous with visualization. Still, picture-based reasoning reigns in visualizability but the cognitive element is no longer pictures or visual images of concrete objects. Rather, it is pictorialization of abstract thoughts, and the cognitively enduring feature of picture-based reasoning is parallel processing which underlies both the old-fashioned visualization and the newly defined visualizability. Visualizability of abstract mathematical thoughts makes parallel processing of the underlying thoughts possible. A simple example is phase diagrams commonly used in nonlinear dynamic analysis (Sec. 2.4). The mathematical entities can be viewed all at a glance, and picture-based reasoning can be invoked. Another example of visual representation of abstract thoughts is flowcharting in computer programming. Flowcharting allows a programmer to visualize rule-based processes all at a glance, thus facilitating planning of programming strategies. As we shall see, visualizability of abstract mathematical thoughts also enhances the apparent capacity of working memory (Sec. 4.19). Having extolled the merits of intuition, we must also examine the other side of the coin. Since we have demonstrated that intuition is closely associated with picture-based reasoning, it is obvious that intuition also shares the same strengths and shortcomings: both are sources of creativity but both are error-prone. Shermer615 pitted intellect against intuition: one is cool and rational but the other is impulsive and irrational. He referred to the perennial psychological battle between intellect and intuition that was played out by the ultrarational Mr. Spock and the hyperemotional Dr. Me-
Bicomputing Survey II
223
Coy in almost every episode of a science fiction Star Trek, whereas Captain James T. Kirk played the near-perfect synthesis of both. Shermer cast this ideal balance as the Captain Kirk Principle: intellect is driven by intuition, and intuition is directed by intellect. Here, intuition and intellect can be identified with primary-process (picture-based) and secondary-process (rule-based) thinking, respectively. The Captain Kirk Principle can thus be paraphrased: picture-based reasoning generates new speculations but the speculations cannot be construed as valid conclusions until they are verified by means of rule-based reasoning. It is apparent that this principle is actually the reincarnation of Freud's theory of primary-process and secondary-process thinking. Thus, Shermer's interpretation of intuition is consistent with our present one. Myers, in his book Intuition,481 presented numerous examples to illustrate both the powers and perils of intuition. However, his interpretation of intuition appeared to be too loose and too broad. In Chapter 6 of his book, Myers presented nine examples to illustrate some common errors regarding physical reality, and he attributed the cause of these errors to intuition. In my opinion, some of these errors have nothing to do with picture-based reasoning, or with intuition according to our stricter interpretation. Consider his example 4, which asks for the average speed in a trip that took 60 mph going but 30 mph returning. Either rule-based reasoning — if done correctly — or picture-based reasoning would yield the same answer of 40 mph, whereas an erroneous answer of 45 mph can be obtained by blindly following a rule: the average of A and B is (A + B) /2. This latter error should be attributed to sloppy rule-based reasoning rather than faulty intuition (see Sec. 4.7 for misuses and abuses of rule-based reasoning). Practitioners of exclusively rule-based reasoning seldom pay attention to weighted averages or nonlinearity in analysis, whereas picture-based reasoning gives a better chance to circumvent these pitfalls (personal observation). In my opinion, errors in several physics examples in Myers' book should be attributed to poor understanding of basic physics principles as a consequence of rule-based learning (Sec. 4.22). Gardner mentioned that honor students of college-level physics at several well-regarded universities did not recognize situations where inertia is at work though they apparently knew Newton's law of inertia (p. 3 of Ref. 223). Myers' discussion about perceived versus actual risks deserves a comment (p. 199 of Ref. 481). He mentioned that, mile for mile, in the latter half of the 1990s Americans were thirty-seven times more likely to die in a motor vehicle crash than on a commercial flight. He called our fears of
224
F. T. Hong
plane crashes "skewed" [by intuition]. Superficially, he did have a point to make because the fear might have been initiated by a lack of experience (fear of unfamiliar events): many people survived minor car accidents, but few had the experience of surviving a plane crash. The fear might also have stemmed from having to relinquish the control to the pilot of a commercial plane. Regardless of its source, the fear of plane crashes is undoubtedly intuition-inspired, and is mostly rooted in irrational fear. However, is intuition really as misleading in this case as suggested by Myers and his cohorts? I am afraid not. While most people could not verbally justify the fear of plane crashes when they were confronted with the seemingly objective statistical data, a little soul-searching — i.e., scanning the field of the Gestalt picture of such a fear so as to find a way to verbalize it — quickly reveals a fundamental pitfall of statistical methodology, sample heterogeneity (Sec. 4.21). I heard similar statistical data mentioned by several different individuals a number of times before, but none ever revealed how the data had been collected and analyzed. It was thus not "intuitively" unreasonable to assume that all data had been pooled together indiscriminately. Thus, the flaw or fraud of sample heterogeneity might be suspected. Had the statistician who was involved in cooking those data classified motor vehicle accidents according to locations (express ways vs. local streets, or vs. rural highways) and according to safety records of both drivers (both careful vs. both reckless, or vs. one careful and the other reckless), the statistics might or might not have shown the same trend. Likewise, had the same statistician classified plane crash accidents according to airlines involved (differences in maintenance records) and/or according to countries involved (respecting vs. disregarding human rights), the statistical data might or might not have painted the same picture. Thus, the intuition-inspired fear might simply reflect a deep-seated distrust of statistics manipulators (a gut feeling not usually verbalized in the dissidents' objection), whereas those who took the statistical data at face value might be mere victims of superstition of statistical methodology. Of course, the foregone conclusion of statistical analysis could well be valid even after additional painstaking corrections, as suggested above or by others, sloppy statistical analyses in the present form are hardly convincing to skeptics. In summary, intuition and the accompanied "aha" phenomenon are intimately associated with the process of picture-based reasoning. Furthermore, intuition can be interpreted as the manifestation of: a) the ability to correctly prune unproductive search paths from the search tree, while correctly preserving productive search paths that are not obvious to average
Bicomputing Survey II
225
people, so as to arrive at an optimal search space (heuristic searching), and b) the uncanny ability to recognize a subtle match where average people fail. In brief, having good intuition means being skillful at both searching and matching, by means of picture-based reasoning. Ironically, intuition is difficult to define or describe presumably because the involved mental process is parallel in nature. If primary-process thinking is also identified with the parallel process underlying intuition, it becomes easy to understand the reason why primary-process thinking has been regarded as irrational or non-rational; it is difficult to defend the impression of primary-process thinking with verbal arguments. Verbalization of our impression or perception of a parallel process requires a parallel-to-serial conversion. To serialize a parallel process is to discretize the continuum of an analog process, just like using a graphic scanner to convert a gray-scale picture (not line drawing) into a finite string of dots. Obviously, there are infinite ways of serializing a parallel process, at different levels of resolution. Some of them are more satisfactory than others but no finite serial process can fully capture an entire parallel process. What one can do is find a particular serialization process that is not excessively long but reasonably captures the essence of the process. Likewise, what we attempt here is to find a verbal description that captures the essence of intuition, at a sufficient level of comprehensiveness and relevance (not just detail), so as to be able to demystify the creative process and to enhance our understanding of it. Note that it is not the process of serialization (verbalization) per se that is time-consuming. Rather, it is the step of laborious searching for an adequate way of verbalization that is rate-limiting, as attested by Einstein's introspection, cited in Sec. 4.8. A case in point was a first-hand account of a survivor of the September 11, 2001, terrorist attacks of the Twin Towers of New York City's World Trade Center. This survivor described his ordeal of an hour-long journey on the way down via a staircase of one of the attacked buildings (C. Sheih, personal communication): "There was no smoke at all in the stairwell, but there was a strange peculiar smell, which I later remembered it smelling like how it does when one boards an aircraft. I later found out that this was jetfuel." His immediate awareness of the peculiar smell apparently stemmed from pattern-based reasoning (olfactory pattern recognition). The verbal awareness of the presence of jet fuel was not immediate, since his mind was not primed towards a heretofore unprecedented attack by means of an airliner turned a manned missile. The peculiar smell pattern was remembered nevertheless, despite a temporary lack of verbal meaning. Verbalization came
226
F. T. Hong
in two stages. First, it became associated with a location where previous experience with the same smell pattern took place. Then, more specifically, the smell pattern became associated with a particular substance, once his mind was prompted by detailed news of the terrorist attacks. Perhaps the awareness of unreportable pattern-based reasoning is what cognitive scientists meant by "preconscious" (Sec. 4.8). I further suspect that the so-called "sixth sense" is not really a novel special sensory perception as yet to be discovered by future physiologists. Rather, it is nothing but the Gestalt picture painted in terms of all five known special senses — as well as some somatic senses, such as touch or pain — in different proportions of contribution. It is pattern recognition based on the perceived holistic similarity to what one has previously experienced. 4.11. Pseudo-parallel
processing
In Sec. 4.2, parallel processing and sequential processing are treated as two dichotomous aspects of information processing. It is further suggested that it is advantageous to practice parallel processing during the search-andmatch phase of problem solving. Yet, in Sec. 4.8, the search space is subdivided into a region of central (focal) attention and peripheral attention. Switching between the peripheral and the central regions requires shifting of attention. Shifting of attention implies that true parallel processing does not cover the entire search space all at once. Apparently, when the search space is sufficiently large, it cannot be loaded into working memory in its entirety. Likewise, the visual field in visual perception or visual imagery is limited to a mere 45° by 120° window. It is necessary to segment the search field: loading only a fraction of it into the region of central attention, while leaving the remainder in the peripheral part or even in long-term memory but retaining its readiness for activation (retrieval). This is analogous to using the hard disk as virtual memory, the content of which can be swapped with the content in the main random access memory (RAM). This mode of parallel processing will be referred to as pseudo-parallel processing, for lack of a better term. It is partial serialization of an otherwise parallel process. In principle, a vastly greater search field can be readily accessed and the information so obtained can be readily integrated by constant shifting, back and forth, between central and peripheral attention or by scanning various fractions of a search field that is too big to be loaded into the region of central attention in its entirety. Thus pseudo-parallel processing is similar to a common technique in microelectronics: multiplexing. If pseudo-parallel
Bicomputing Survey II
227
processing can be implemented by means of multiplexing, is there really true parallel processing? This is a subtle problem because rapid multiplexing gives rise to the illusion of true parallel processing, which is the essence of time-sharing or multi-tasking in computer technology. The following consideration may shed some light on the problem. Let us consider the "dot" pattern of a digitized picture. It is difficult for a tiny insect that walks on a coarsely digitized picture and that scans each and every pixel, at a close range, to have an adequate appreciation of the entire picture. By the same token, it is difficult, if not impossible, for a human being to reconstruct the image by merely looking at a long stream of "zeros" and "ones" of serialized digital data, as in the black and white dots (or stripes) on a black-and-white television. Rather the dot pattern must be viewed as a whole in order to "get the picture." Apparently, this is a case of true parallel processing. This is precisely what Gestalt psychologists have been preaching about all along. However, when the actual viewing field is too large to fit into an individual's visual field, such as the case of viewing an IMAX (registered mark) movie, it is necessary for the viewer to turn the head and/or eyes to scan the entire viewing field. One must then mentally integrate the various strips of viewing sub-fields in order to conceive and appreciate the spectacularly wide view provided by an IMAX movie. That the mental integration is possible is exemplified by the act of seeing the entire viewing field across a picket fence by walking quickly along it, even though a high percentage of the viewing field is actually blocked by the pickets at a given moment (usually a picket is wider than the gap between two adjacent pickets). This example demonstrates the subtle difference between true parallel processing and pseudo-parallel processing. Pseudo-parallel processing can be achieved by rapid sequential scanning of various fractions of either a viewing field or a search space. In fact, the image on a single-gun television screen can be created by delivering an electron beam to scan the screen horizontally, while sequentially shifting the beam from the top to the bottom of the screen, one row at a time, and then suddenly flipping it back to the top of the screen, so as to start the entire process all over again: this process is known as a raster scan. In the latter case, however, the situation is reversed: the perception involves true parallel processing, but the presentation of the image is made in a pseudo-parallel fashion. A similar situation is also present in the auditory counterpart. Appreciation of harmony is most likely made by means of parallel processing, at least for amateurs. For relatively untrained ears, different chords can be
228
F. T. Hong
recognized and distinguished without being able to name the individual component notes that make up the chord, because the harmony is heard as a whole. Yet, appreciation of two concurrently running and equally attractive melodies is likely to be conducted by means of pseudo-parallel processing, with rapid shifting of attention between the two melodies. There are two mutually conflicting requirements: a) each individual melody must be appreciated as a whole in an uninterrupted sequence, and b) the two melodies must be appreciated concurrently for maximal enjoyment. Obviously, concurrent appreciation of two different melodies cannot be achieved by listening to the melodies, one at a time, in two consecutive listening sessions. Rapid and alternating sampling of the two concurrent melodies allows for the effective capture of the two melodies while preserving their individual integrity. This process is akin to the chopped mode of displaying two oscillographic traces in an old-fashioned single-gun oscilloscope. The electron beam switches rapidly between two traces, thus giving rise to the illusion of a simultaneous display of two (dotted) traces. Interestingly, separate monitoring of two concurrent melodies in two consecutive listening sessions is analogous to the other mode of display, known as the alternate mode: continuous sweeps of two separate traces, one at a time, in an alternating fashion. Prom the foregoing discussion, it is apparent that what rule-based reasoning accomplishes during the search phase is actually pseudo-parallel processing. The search process is serialized by scanning rules in the knowledge base, one at a time. This is the strategy used in the construction of expert systems. However, the coverage of the search space by means of pseudoparallel searching is often limited to rules that are explicitly known and rules that can be deduced from a combination of two or more existing rules. It is difficult, if not impossible, to deduce, by means of rule-based reasoning alone, new rules that cannot be derived by means of a recombination of existing rules. In contrast, such new rules can be generated from existing knowledge base when picture-based reasoning is invoked. In addition, practitioners of picture-based reasoning can exercise random access during the search-and-match phase, thus enabling the problem solvers to go directly to the "heart" of the problem. The above-mentioned difficulties are not insurmountable. Attempts to overcome these difficulties often require simulations of picture-based reasoning in terms of pseudo-parallel processing. A detailed description and discussion is deferred to Sec. 4.26. Pseudo-parallel processing is invoked in creative problem solving in yet another way. Recall that prolonged focusing during a problem-solving ses-
229
Bicomputing Survey II
sion may inadvertently exclude relevant fractions of the search space. Superficially, this shortcoming can be alleviated by shifting attention to other fractions of the search space when an initial approach reaches an impasse. As Bastick pointed out, focusing need not necessarily reduce the total information which may be accumulated by successively focusing on various aspects of the stimulus field (p. 251 of Ref. 52). That is, attention can be focused on separate sub-spaces, one at a time, without loss of any [itemized] information. However, as Bastick further indicated, such information would tend to be more compartmentalized, and, as a consequence, harder to integrate by means of free associations because of diminished redundancy. For example, in his prolonged effort to prove Fermat's Last Theorem, Wiles actually came across two key approaches on two separate occasions. However, only when his attention to the problem became sufficiently defocused could he effectively integrate the two approaches so as to accomplish the proof.634 In this regard, the notion of pseudo-parallel processing suggests an approach to creative problem solving. Although rapid switching of attention to various sub-spaces may be disruptive, slower scanning among various potentially relevant fractions of the search space or slow oscillations between focusing and defocusing may be beneficial, thus satisfying the conflicting requirements posited by Csikszentmihalyi: critical judgment and openness (see Sec. 4.8). Combining this insight together with the role of affect in creativity, the following convenient way of inducing periodic cycles of focusing and defocusing is suggested: one may listen to music with a moderate arousal content when one attempts to solve a problem of a moderate to high degree of difficulty. Here, it is not just a practice of divided attention, as stipulated in regard to the Mozart effect. Rather, the main purpose is to induce periodic zooming-in and zooming-out of attention which may thwart the deleterious effect of compartmentalization, caused by prolonged focusing of attention. In this regard, music other than Mozart's compositions may also be qualified, with the possible exception of Haydn's Surprise Symphony (No. 94 in G major) because of the unusually high arousal content of the latter!s andante (second) movement. 4.12. Need of conceptualization
and structured
knowledge
Since the search speed for probable solutions is important, how knowledge ("database") is acquired is also important. Since the dawn of civilization, human beings have been engaged in the quest of conceptualized knowledge. Conceptualization is a way of data organization via data compression, and
230
F. T. Hong
an efficient way of data storage and retrieval.535 Conceptualized knowledge appears as a collection of interlocking modules, which are easier to store, faster to retrieve, faster to process, and significantly more feasible to score matches between solutions and the problem than unstructured knowledge. Conceptualization often requires a certain degree of abstraction. Concrete situations in which a particular concept is applicable are often linked by similarity or analogy either at a concrete level or at an abstract level. Analogies at a concrete level are often suggested by similarity of visual images. Analogies at an abstract level are made evident by similarity of mathematical equations governing physically different phenomena, such as heat conduction and diffusion, or by similarity of conceptual schematics — i.e., visual representations of abstract thought — representing two diverse phenomena (cf. formalization; see Sec. 6.13). Modularized knowledge in the form of rules or concepts need not occupy the center of attention during a problem-solving session, because it is not critical to rehearse the detailed knowledge underlying the rule or concept (cf. abstract but accessible concepts, active but unrehearsed knowledge in immediate memory, shown in Fig. 1.1 of Ref. 29). Modularized knowledge thus relieves or alleviates the "bottleneck," imposed by the central limited capacity, by eliminating the unnecessary competition (concept of chunks, Sec. 4.19). Aside from the ease of retrieval as a factor, modularized knowledge is retained longer perhaps because, for each time a specific fact is retrieved, related facts in the same module are also refreshed momentarily and thus become reinforced, much like refreshing a dynamic RAM. Modularized knowledge is easily recalled by free associations, thus enabling its practitioners to vastly expand their search space of candidate solutions. Conceptualization often generates a number of rules and theorems that retain their validity beyond the particular events or items from which these rules are originally generated. The appeal of these rules or theorems lies in their generality. Once established beyond reasonable doubt, these conclusions can be used in the verification phase of future thought processes, including generation of new rules and new theorems. In addition, these rules or theorems, once discovered and established, can be learned by others without having to re-discover them. However, effective learning of a concept must be accompanied by a reconstructed picture. As the motivation of students shifts towards getting good grades and as curiosity diminishes accordingly, concepts are no longer treated as knowledge that needs to be understood but rather as some useful rules only to be memorized.
Bicomputing Survey II
231
This change of student cultures is the reason why a previously workable conventional lecture became problematic. If conceptualization and utilization of rules facilitates problem solving, why does exclusively rule-based reasoning turn out to be as harmful as indicated above? First, practitioners of exclusively rule-based reasoning tend to store multiple copies of the same rule in their long-term memory since the relationship between different versions of the same rule is not apparent in words ("prisoner of words" phenomenon). Second, they tend to store superficially different but fundamentally related information in separate modules instead of a single integrated module. The resulting failure to integrate related rules governing related phenomena into a coherent and modularized relational knowledge structure defeats the purpose of conceptualization: retrieval of related rules is hampered by the resulting fragmentation of knowledge (see also Sec. 4.10 for a discussion on multicategorization and redundancy of global knowledge). 4.13. Koestler's bisodation versus Medawar's hypothetico-deduction scheme Now, let us consider Koestler's bisociation model and Medawar's objection, mentioned in Sec. 4.1. Obviously, Koestler's second matrix refers to the search space outside of the conventional range. The bisociation step — the fusion of two matrices — refers to the critical moment of illumination: the matching of an extraordinary solution-template to the problem pattern. Koestler's model has the additional advantage of showing the "snapping" action that coincides with the particular moment of recognition, following collisions of the first matrix with various candidate solution matrices ("aha" experience, Sec. 4.10). It is the equivalent of the explosive moment of "bursting into laughter" in humor (see Part One of Book One of Ref. 376). Now, consider Medawar's objection: "[J]ust how does an explanation which later proves false (as most do: and none, he admits, is proved true) give rise to just the same feelings of joy and exaltation as one which later stands up to challenge? What went wrong: didn't the matrices fuse, or were they the wrong kind of matrix, or what?" (p. 88 of Ref. 451). The objection raised by Medawar serves as a reminder of the controversy in the psychology literature regarding the accuracy of intuition. Some investigators thought that intuition was necessarily correct and thus custom-tailored the definition of intuition to suit their preconceived conclusion (see Sec. 8.1 of Ref. 52). On the other hand, Poincare pointed out
232
F. T. Hong
that "[ijntuition can not give us rigor, nor even certainty." But he also said, "Pure logic could never lead us to anything but tautologies; it could create nothing new; not from it alone can any science issue" (pp. 213-215 of Ref. 521). Poincare emphasized both discovery (invention) and verification. As explained in Sec. 4.2, the match between a template and a given pattern is usually not unique, nor is bisociation. The apparent sin of Koestler's omission was his failure to mention explicitly that more than one "solution" (second) matrix can fuse with the "problem" (first) matrix; the possibility of a wrong match is not uncommon and is an inherent consequence of the analog process of pattern recognition, thus necessitating a gray scale of goodness of match. Incidentally, Koestler did allude to "false intuitions" and "inspired blunders" (p. 212 of Ref. 376). Koestler's sin was compounded by his failure to emphasize the importance of verification. However, he was apparently aware of it, since he subsequently quoted Poincare's line about the necessity of verification. As a replacement for bisociation, Medawar proposed a HypotheticoDeductive Theory. At face value, the theory appeared innocuous and seemed perfectly in line with empirical science. However, upon closer scrutiny, Medawar was found to dismiss the existence of induction, as if everything could be derived by deduction alone. He even treated induction as "the inverse application of deductive reasoning," in an apparent effort to curb the excess of the practice of the induction method in England, for which he blamed John Stuart Mill's advocacy (see p. 135 of Ref. 451). It is of interest to note that, prior to Mill's advocacy, Isaac Newton had laid an unprecedented emphasis on deduction, through his famous disclaimer "Hypotheses non jingo (I feign no hypotheses)" (p. 181 of Ref. 431). His deduction was carried out in a spectacular way: he (and Leibniz) invented calculus for the specific purpose of calculating his prediction of the law of universal gravitation. Medawar agreed that Newton did "propound hypotheses in the modern sense of that word" (p. 131 of Ref. 451). Here, it must be pointed out that Newton also invoked induction. One of Newton's key ideas was his ingenious interpretation of the Moon's orbiting around the Earth as a consequence of the Moon's falling down onto the Earth, instead of flying off the tangent of its orbit, much like the proverbial apple falling onto his head. These were but two of the many examples that Newton invoked to reach the hypothesis of universal gravitation. A hypothesis intended for universal validity, based on a,finitenumber of observations, implied that it was reached by means of induction. Thus, Newton's induction, which he tried to de-emphasize, was just as spectacular as his deduction.
Bicomputing Survey II
233
It is true that the induction involved in most modern experimental investigations can be rather trivial, and the thought process is dominated by deduction. In particular, the formulation of a working hypothesis could be laden with lengthy lines of deduction but fairly straightforward induction, thus justifying the notion of "inverse application of deductive reasoning." Apparently, Medawar's argument works well with cases in which the "search" part of the search-and-match phase is relatively easy but the "match" part requires elaborate deduction (see below for the notion of second-level deduction). On the other hand, there are cases in which it is difficult to come up with a good hypothesis (difficulty in searching) which, once proposed, is relatively easy to recognize and verify, as is often encountered in high creativity (e.g., Columbus's egg type of discovery). It is therefore not justified to ignore the induction method all together when an extensive search for an appropriate hypothesis or theory must be conducted. Medawar's misgiving can be reconciled by recognizing the nested hierarchical nature of both induction and deduction in the process of scientific investigation, as was first suggested by Donald T. Campbell (according to Wuketits733). If the formulation of a novel hypothesis or theory can be considered the first-level induction, then the embedded deductive reasoning required to bring about a match between a template (candidate theory) and the pattern (a given problem) must be regarded as the second-level deduction. Likewise, if an experimental verification is regarded as the firstlevel deduction, then the search for the best approach or technique to carry out the experiment must be treated as the second-level induction. In fact, the nested hierarchical nature was vaguely hinted at by Koestler's rebuttal to Medawar's second objection. The second objection raised by Medawar was: "The source of most joy in science lies not so much in devising an explanation as in getting the results of an experiment which upholds it" (p. 88 of Ref. 451). To this, Koestler replied: "The 'joy' in 'devising an explanation' and the satisfaction derived from its empirical confirmation enter at different stages and must not be confused" (p. 93 of Ref. 451). Thus, Koestler appeared to have understood the nested hierarchical nature of scientific investigations. Though not difficult to do, the rebuttal to Medawar's remaining objections will not be pursued here. With the benefit of hindsight, Medawar's objection is understandable. Apparently, Medawar emphasized the type of experimental research that is often required to build a database. Ironically, Medawar reached his conclusion by conducting inductive reasoning on an incomplete set of examples. In contrast, Koestler also did something similar: he considered only the type of
234
F. T. Hong
research that requires tremendous insight to discover but, once pointed out, requires only modest deductive reasoning to comprehend or verify, possibly because of his profession as a creative writer. He thus failed to emphasize verification. No wonder the two views clashed so violently; they paid attention to the opposite ends on the gray scale of a continuous spectrum of types of research problems: from deduction-intensive to induction-intensive. In this regard, Boltzmann's theory of statistical physics occupied the middle region on the gray scale since Boltzmann's induction and deduction were both extraordinarily elaborate. That is not to say Boltzmann's induction and deduction were easy. In retrospect, apparently few computer scientists and few cognitive scientists took Medawar's view of creativity seriously. This can be seen from the attention paid to the role of induction in problem solving311 and in artificial intelligence (see discussion about inductive programs, p. 185 of Ref. 75). 4.14. Behaviorism versus cognitivism We shall consider the following question in regard to scientific creativity: How much of it is due to innate ability and how much can be enhanced by education and training? In order to gain some insight into this question in light of our present knowledge of biocomputing, it is necessary to address the controversy between behaviorism and cognitivism. The issue of natureversus-nurture will be addressed in Sec. 4.24. Behaviorism dominated the thinking in the psychology of learning from the early 20th century until around i960.702'637'699 According to behaviorism, learning is achieved by associating the correct response to a recurring stimulus by reward and/or the incorrect response to a recurring stimulus by punishment. The crucial factor is repetition or, rather, recurrence. The correct response becomes more likely to recur if the association is reinforced with continuing rewards. The learning does not require the intervention of consciousness. The process resembles Darwinism in the sense that the administering of rewards and punishments constitutes a selection pressure that serves to consolidate the learning process as habit forming. The behaviorist view of learning was not universally accepted because there are exceptions in which a single exposure to a new stimulus can lead to a new behavior. A new subject, if sufficiently impressive, can be learned in a single exposure, and can be retained almost forever because of automated periodical reviews elicited by admiration and intense pleasure of refreshing
Bicomputing Survey II
235
the memory (personal observation). In addition, it is possible to come up with a solution to a novel problem without being taught. In fact, that is what a novel scientific discovery is all about. Newton's discovery of universal gravitation was not the consequence of repeated drills imposed upon him by his mentors or his contemporaries. There is compelling evidence indicating that great apes can discover novel ways of solving a specific problem without being taught first (novel problem solving).377'99 For example, an orang-utan discovered, without being taught, a way to retrieve an object which was laid outside of his or her cage and could not be reached with any of several available short sticks and hollow tubes: it was done by joining three short sticks together, by means of two hollow tubes (see, for example, Fig. 4.2 of Ref. 99). In other words, this ape knew how to "put two and two together." This mode of learning is called cognitivism (e.g., Refs. 222 and 28). The procedure of creative problem solving described earlier (Sees. 4.1 through 4.13) pertains to cognitivism. The apparent conflict between behaviorism and cognitivism has gradually subsided. Petri and Mishkin515 have proposed a two-system model, in which separate anatomical structures and pathways exist for storing (cognitive) memory and for developing (noncognitive) habits. Thus, behaviorism and cognitivism appear to complement each other, and represent two sides of the same "coin." The pathway of the cognitive memory system includes the neural circuits connecting the rhinal cortex underlying the hippocampus and amygdala to the medial thalamus and mammary bodies and to the orbital prefrontal cortex. These structures are also connected by feedback loops. The circuits perform processing of sensory information beyond the primary reception of crude sensory information and are capable of undergoing adaptive filtering (for similarity),466 sustained activation (for working memory), 217>250 and association (pairing and matching of different stimuli).473 The pathway for the noncognitive habit system is less well understood. Presently, the extrapyramidal system including the basal ganglia and cerebellum is implicated in the habit forming system. It appears that the controversy between behaviorists and cognitivists has been put to rest, and the research can now be concentrated on elucidating the neural circuitry and the interaction between the two systems.
236
4.15. Cerebral
F. T. Hong
lateralization
Cerebral lateralization is an important feature for cognitive problem solving in humans. In its earlier versions, the concept of cerebral lateralization stipulated that the left hemisphere specializes in linguistic function and the right hemisphere specializes in visuo-spatial and nonverbal cognition.230'646'83'295 This concept has undergone drastic revisions over the past few decades. Generalizations to include other cognitive functions lead to the following interpretation. The left hemisphere specializes in analytic cognition: its function is therefore algorithmic in nature (sequential processing). In contrast, the right hemisphere specializes in the perception of holistic and synthetic relations: its function stresses Gestalt synthesis and pattern recognition (parallel processing). In other words, the left hemisphere specializes in rule-based reasoning, whereas the right hemisphere specializes in picture-based or, rather, pattern-based reasoning. A new interpretation of hemispheric specialization advanced by Goldberg and coworkers,247'244'246'245 called the novelty-routinization approach, encompasses the above categorizations. According to these investigators, the right hemisphere is critical for the exploratory processing of novel cognitive situations. The left hemisphere is critical for information processing based on preexisting representations and routinized cognitive strategies. The traditional verbal/nonverbal dichotomy of lateralization thus becomes a special case. The novelty-routinization hypothesis postulates the dynamic nature of hemispherical lateralization. Thus, at early stages of learning a specific task, performance relies on the right hemisphere. The left hemisphere subsequently takes over. Evidence suggests that the right hemisphere plays a role at early stages of language acquisition but the role diminishes with age. The common observation of increasing difficulty in acquiring a new language with advancing age is consistent with the novelty-routinization hypothesis. An interesting example is music perception, which has long been considered to be a right hemispheric function. Behavioral experiments indicate that there is a strategic difference between trained musicians and novices. Novices tend to perceive a melody as a whole and show a left-ear preference (using the right hemisphere). Trained musicians tend to perceive a melody as a combination of previously known "subunits" (patterns or modules); it is a practice of routinization and left hemisphere function.64 Thus, experience can alter hemispheric asymmetry. Mixing of two drasti-
Bicomputing Survey II
237
cally different populations in a behavioral experiment is likely to obscure an existing asymmetry due to sample heterogeneity (cf. Sec. 4.21). Similar processes of routinization have also been observed in the generation of visual mental imagery.387 Kosslyn found that the act of generating a visual mental image involves at least two distinct classes of processes: one that activates stored shapes and one that uses stored spatial relations to arrange shapes into an image. The left hemisphere is better at arranging shapes when categorical information (specified by language-like information) is appropriate, whereas the right hemisphere is better when information regarding coordinates (specified by pattern-like information) is necessary. Thus, modularization (stored visual shapes or tonal patterns, or perceptual units in general) is presumably necessitated by the need to relieve the central limited capacity of consciousness (see Sec. 4.8). However, as Kosslyn pointed out, modularization may be a necessity to enhance the feasibility of recognition when the entire object to be recognized, such as a horse, can assume many different shapes but its parts, such as a horse's characteristic long face, preserve a higher degree of shape-invariance. The specialization of the left and the right hemispheres in handling shapes under two contrasting situations is again consistent with the specialization of the two hemispheres in rule-based and pattern-based processes, respectively. The concurrent use of both hemispheres for different aspects of the same task partly explains the apparent lack of hemispheric asymmetry during a creative process (see below). In addition to cerebral lateralization, there is also specialization of the pre-frontal cortex and the posterior association cortex. The pre-frontal cortex is important in meeting the challenge of a novel task, and in developing novel strategies that do not preexist in the cognitive repertoire. On the other hand, the posterior association cortices are thought to be the repositories of cognitive routines and preexisting cognitive representations. Together with the scheme of cerebral lateralization, it appears that the right pre-frontal cortex is the locus of novel cognitive function and the left posterior cortex is the cognitive routinization center. Goldberg and Costa244 have also concluded that the left hemisphere is singularly important for the rule-based internal organization of cognitive routines. The novelty-routinization hypothesis is particularly relevant to the foregoing discussion about creative problem solving. The four phases of Wallas' model of creative problem solving thus reflect the alternating activation of the left and right hemispheres for creative problem solving: the preparation and the verification phases rely more on left hemisphere
238
F. T. Hong
function, whereas the incubation and the illumination phases depend more on the right hemisphere. This identification is consistent with Dwyer's identification of the right hemisphere for parallel processing and the left hemisphere for serial (sequential) processing (1975 Ph.D. Thesis cited in Sec. 5.2 of Ref. 52). Earlier, based on experiments designed to detect differences of hemispheric processing on "nameable" and "unnameable" objects, Cohen132 also suggested that the verbal left hemisphere operates as a serial processor and the nonverbal right hemisphere as a parallel processor. The idea was not taken seriously by other cognitive scientists, presumably because of the subsequent failure of other investigators' attempt to replicate Cohen's observation (pp. 60-61 of Ref. 83). Samples' identifications are also useful: the left hemisphere is logical, rational and digital, whereas the right hemisphere is intuitive, metaphorical and analogical.586 The novelty-routinization hypothesis stipulates that concept forming (conceptualization) starts with right hemispheric perception, and the result of Gestalt perception is then passed on to the left hemisphere for consolidation as concepts or rules. Perhaps this is the reason why eminent physicists tend to emphasize picture-based reasoning since a conceptual breakthrough often demands initial keen perception of faint clues and subsequent generation of new concepts (Sec. 4.6). In contrast, such demands are apparently exempted from science students, who are primarily involved deductive verification of the acquired knowledge (essentially a process of "reverse" discovery). In other words, having inspiration to re-discover a concept may be beneficial to a student but is not required in learning, much less in attaining high grades. The demand to master domain-specific knowledge outweighs the necessity to comprehend newly acquired (memorized) knowledge, especially in biomedical disciplines. This explains why, under the assault of information explosion, biomedical students lose the ability to think long before students of other disciplines (anecdotal observations; see also Sec. 4.23). The hypothesis also suggests why creativity declines with advancing age especially in areas where conceptual breakthrough is required. In this regard, an additional psychosocial factor may also be involved; an authority might have invested too much time and effort in an increasingly irrelevant "pet theory" to give it up (see, for example, pp. 42-43 of Ref. 286). It has long been suggested that creativity is located in the right hemisphere.190'296 Samples586 thought that the intuitive holistic functions reside in the right hemisphere (see also Ref. 500 and p. 188 of Ref. 52). Case studies have also associated special talents with pathological conditions of hemispheric asymmetry, such as dyslexia256'257 and idiot
Bicomputing Survey II
239
savants.673'336*468'258 Again, these studies suggested an important role of the right hemisphere in creativity. The actual situation is more complex than that. Although there appears to be some privileged role of the right hemisphere in creativity, behavioral experiments have yielded mostly ambiguous results.86'361 Noppe494 thought that there is little evidence to conclude that the unconscious — or primary-process thinking — is attributed to right hemisphere functioning, whereas the conscious — or secondaryprocess thinking — is relegated to the left hemisphere. The prevailing view of cognitive scientists is: cooperation between the right and left hemispheres is characteristic of the brains of creative thinkers (see Chapter 11 of Ref. 646). However, Martindale443 held a different view: "creative people rely more on the right hemisphere than on the left only during the creative process and not in general." On the basis of split-brain research, Bogen and Bogen77 also held the dissident view. The split-brain experiments demonstrated that the two hemispheres can function independently and simultaneously in parallel. Bogen and Bogen surmised: if learning can proceed simultaneously, independently and differently in each hemisphere, so may problem solving. They pointed out that this prospect makes behavior less predictable, less stimulus-bound and, therefore, more flexible, but also contributes to a concomitant decrease in stability. Apparently, the gain in flexibility more than made up for the loss in stability. Here, the readers are reminded that this balance between stability and flexibility seems to prevail in various hierarchical levels of biocomputing, from the molecular level all the way up to social systems (cf. Chapter 1). Bogen and Bogen believed that there is a certain degree of hemispheric independence, such that the inter hemispheric exchange of information via corpus callosum is much of the time incomplete. They thought that the hemispheric independence plays an important role in creativity (p. 574 of Ref. 77): "A partial (and transiently reversible) hemispheric independence during which lateralized cognition can occur and is responsible for the dissociation of preparation from incubation. A momentary suspension of this partial independence could account for the illumination that precedes subsequent deliberate verification." Here, Bogen and Bogen made references to Wallas' four phases of creative problem solving. Thus, the role of two hemispheres in auditory and visual cognition may be generalized to scientific problem solving. The task specialization inherent in the process of cerebral lateralization has an important bearing on machine intelligence. Grossberg265 analyzed neural network computation in terms of its stability and plasticity. He sug-
240
F. T. Hong
gested that the stability and plasticity subsystems must be distinct and separate in order to enhance the computational efficiency of a neural net (cf. Conrad's Trade-off principle;136 see also Sec. 9). Taking Grossberg's analysis seriously and generalizing Kosslyn's analysis of visual pattern recognition, we can now better appreciate why the language skill acquired at a young age by way of the right hemisphere is transferred to the left hemisphere for routinization; the grammatical skill and a rich vocabulary must be consolidated in the left hemisphere for stability by attaining the second-nature status. Furthermore, the following speculations can be made. Although maintenance of the language skill appears to be a left hemispheric function, composing a written article and formulating a speech are actually the task of the right hemisphere because the latter activities involve recombinations of elements mastered by the left hemisphere and, therefore, require flexibility and creativity. The same probably can be said about training of composers regarding basic musical skills such as music notations and music theories (harmony and counterpoint). The right hemisphere can then concentrate on the flexible task of creating novel musical compositions. For the same reason, perhaps it would be a terrible mistake not to master the multiplication table and to learn long-hand calculations at a young age so as to elevate these skills to the second-nature status and so as to allow the right hemisphere to carry out creative mathematical or scientific acts. The advent of hand-held calculators have succeeded in fostering a generation of students of whom many simply have no intuitive feeling about numbers, not just being mathematics-illiterate (personal observation). If the above speculations turn out to be valid, then it is no wonder why experts failed to find a correlation between high creativity and right hemispheric preference. Creativity involves rearrangements of knowledge modules: tasks performed primarily by the right hemisphere. However, access to these knowledge modules involves the left hemisphere, and fetching of these modules necessitates exchanges of information between the two hemispheres via corpus callosum. Merely examining the frequency of activities in the two hemispheres is unlikely to elucidate the role of two hemispheres in creativity. Furthermore, as indicated in Sec. 4.13, complex problem-solving processes may consist of several search-and-match and verification sub-phases nested in a hierarchical fashion, as characterized by the existence of second-level induction within the first-level deduction and vice versa. The processes cannot be neatly segregated into a time sequence of searching, matching and verifying in an orderly fashion. The activities in the two hemispheres may alternate at such a fast rate that any hemispheric
Bicomputing Survey II
241
asymmetry becomes obscured in a static behavioral experiment, by virtue of the inherent time-averaging effect. Future high-speed real-time imaging of the brain may shed light on this issue. In addition, sample heterogeneity, due to inadvertent mixing of individuals with a preference to rule-based reasoning and individuals without such a preference, may further contribute to the ambiguity of a behavioral experiment (see Sec. 4.21 for further discussion). The participation of the right hemisphere in ostensibly left hemispheric functions such as language acquisition also explains why learning a foreign language by exclusively rule-based learning and memorization often fails miserably. The shades of semantic meaning of a foreign word can only be learned with a sentence, a paragraph, or even an entire article as a whole by way of the right hemisphere. In other words, a foreign language must be learned by means of both intellect and intuition. Perhaps learning a foreign word must also include capturing the essence of what transpires in the ambience and the background at the moment when a word is uttered, not to mention the intonation and the accompanying body language. 4.16. Innovation versus imitation: gray scale of creativity Implicitly, in the above discussion of cerebral lateralization, novelty is associated with cognitivism, whereas routine is associated with habituation. In everyday usage, however, one tends to associate novelty with creativity while regarding a routine act as non-creative. One also tends to judge novelty in terms of the degree of difficulty of the task, thus associating novelty with acquisition and utilization of exotic domain-specific knowledge. In creativity research, the terms "novelty" and "routine" are to be replaced with less ambiguous terms "innovation" and "imitation," respectively. The discovery of a novel (heretofore unknown) way of solving a problem is an innovation. However, a given act is innovative only upon its first occurrence. Subsequent imitation of the same act quickly dwindles to the level of routine. Ethologists have reported both innovation and imitation in nonhuman primates. An inventive female monkey apparently discovered a new technique of washing sand off food (cited in Refs. 101 and 99). Fellow monkeys immediately copied the technique. On the other hand, novel problem solving by great apes, based on insight instead of learned "cookbook recipes," has been extensively described by Kohler 37T (see also Ref. 99). The distinction between innovation and imitation is not a clear-cut dichotomy. Most innovations contain elements of imitation. Even Isaac New-
242
F. T. Hong
ton's monumental discovery of universal gravitation was built on top of the accomplishments of his predecessors, Copernicus, Kepler, Galileo, etc. This was why Newton said to Robert Hooke, "If I have seen further (than you and Descartes) it is by standing upon the shoulders of Giants" (p. 281 of Ref. 50 and pp. 1-9 of Ref. 457). Most imitations also contain minor enhancement. Even a "copycat" act may not be strict imitation of an existing innovation. The subtlety was reflected in a case of court ruling — Festo vs. SMC (November 29, 2000) — on a lawsuit regarding a patent infringement:655 Should a patent be issued on the basis of a slight modification of a previously existing patent? Obviously, there exists a gray scale of innovation or creativity. Solving a problem with a greater degree of difficulty requires a greater departure from conventional wisdom or, in Koestler's terminology, a greater leap into a matrix located in an extraordinary dimension. In science history, Kuhn denned two types of innovations: "normal science" and "paradigm shift".399 Innovations in normal science were built on well-established disciplines, whereas a major "paradigm shift" usually involved a conceptual breakthrough made possible by an excursion into a forbidden zone of the search space that had been excluded by the authority or establishment, or by conventional wisdom. For example, Newtonian mechanics was a major paradigm shift, whereas fluid mechanics must be regarded as normal science because the latter involved refinements on Newtonian mechanics as applied to continuous media. Relativity theory and quantum mechanics were well-known major paradigm shifts because the very foundation of Newtonian mechanics had to be uprooted. Statistical mechanics represented an interesting case of innovation. Ludwig Boltzmann insisted that statistical mechanics was a continuation of Newtonian mechanics, even though contradictions between the two disciplines had been pointed out. In hindsight, it can be shown that statistical mechanics and Newtonian mechanics have irreconcilable conflicts (see analysis in Sec. 5.13). Thus, statistical mechanics was a far greater paradigm shift than Boltzmann himself had envisioned. In contrast, evidence-based medicine may not be a paradigm shift as its advocates have claimed.197 It can be shown that it was a product of blind faith in current AI achievements and misunderstanding of how the human brain processes information (Sec. 6.13). It is obvious that the making of a paradigm shift requires far greater creativity than an innovation in normal science. Part of the confusion in creativity research stemmed from a neglect of the gray scale of creativity. Some investigators did not distinguish creativity in normal science from
Bicomputing Survey II
243
creativity of paradigm-shift dimensions. Consequently, the underlying factors of high creativity were obscured by sample heterogeneity (Sec. 4.21). The difference between innovation and imitation (or between novelty and routine), which occupy two opposite ends on the gray scale, must be evaluated in terms of the innovation/imitation ratio and weighed on the scale of the degree of difficulty. Thus, the gray scale of creativity extends from pure imitation to nearly pure innovation. In extreme cases, the most subtle form of imitation may become indistinguishable from high creativity: an interesting example is the innovation described in the movie The Dambusters, cited in Sec. 4.7. Social values also enter the deliberation. Thus, creativity of a criminal is seldom glorified. That innovations in normal science and innovations of a "paradigmshift" magnitude belong to two different hierarchical levels can be readily visualized by invoking the analogy of protein folding. According to a contemporary theory of protein folding, a small subset of the amino acid residues of a protein forms a folding nucleus, which critically guides the rest of residues to fold properly (Sec. 7.6 of Chapter 1). Whereas normal science is built on top of a well-established discipline, numerous viable mutations are based on a successful folding nucleus. A successful folding nucleus tends to be evolutionarily more conserved than the remaining amino acid residues of the same protein; a folding nucleus tends to evolve more slowly than the remaining parts of the same protein. This analogy suggests that conducting researches in normal science is tantamount to heuristic searching for fruitful results, whereas paradigm shifts are mostly the consequence of explorations for alternative and/or better approaches. This analogy also suggests why it is more difficult and takes longer to generate a paradigm shift than to produce creative works in the framework of a well-established discipline. At least in the early stage, a new paradigm does not have the competitive edge to attract followers because the latter are simply outnumbered by the followers of normal science; here, the competition is for both funding and attention. On the other hand, overpopulation of practitioners of normal science soon leads to the side effect of the law of "diminishing return," thus ushering in the emergence of a new discipline. A newly emerging but successful discipline of paradigm-shift nature soon leads to a feeding frenzy once its merit is recognized, thus starting the cycle of normal science/paradigm shift all over again. The above discussion does not imply that imitating is not mentally taxing at all. For more sophisticated acts that require a great deal of knowhow and skills (domain-specific knowledge), the capability to learn "canned
244
F. T. Hong
recipes" itself requires some mental agility and novelty in thinking even though it is not as demanding as the requirement to solve a fresh problem de novo. On the other hand, the acquisition of domain-specific knowledge expands the search space and thus enhances the likelihood of solving a novel problem. For example, mastering Riemannian geometry was a prerequisite for Einstein to formulate general relativity. Likewise, solving some difficult arithmetic word problems may require a great deal of insight but the knowledge of algebra reduces the task to a simple routine of rule-based reasoning; the required insight had been incorporated into the creation of algebraic rules. However, the possession of domain-specific knowledge alone does not guarantee creativity (see Sec. 4.22 for a detailed discussion). This statement should not be construed as an attempt to trivialize domain-specific knowledge in general. Few would dispute the importance of the contribution of the Human Genome Project to human's knowledge and health. Nevertheless, it should be noted that the acquisition of such important domain-specific knowledge was made possible by innovative research in molecular biology that laid the foundation of crucial principles as well as necessary techniques. However, no claim of creativity should be laid on merely practicing the application of this kind of exotic domain-specific knowledge unless significant innovations have been added on.
4.17. Elements of anticipation and notion of planning ahead In the discussion of intelligent materials in Sec. 7.3 of Chapter 1, it was pointed out that the function and capability of intelligent materials are suggestive of an element of anticipation (cf. final causes, Sec. 6.13). The notion of anticipation implies the act of consciously planning ahead. Of course, intelligent molecules do not actually plan ahead and do not have consciousness, at least not what is known as reflective consciousness. The virtual intelligence is the manifestation of molecular versatility that was selected by evolution, and that happens to exceed what we expect from a conventional material. In creative problem solving, the element of anticipation manifests in the act of planning ahead. The means-end analysis enunciated by Newell and Simon 490 is part of the act of planning ahead. Calvin101 speculated that the evolutionary origin of intelligence was the development and specialization of a core facility that was common to language, music, and dance: the planning of a meaningful sequence of muscle contractions. Predicting the
Bicomputing Survey II
245
trajectory of weapons by prehistoric hunters also required this core facility. Planning ahead is not a simple stimulus-response type behavior that is attributable to behaviorism; it is a cognitive act. Furthermore, planning ahead requires calculations of future outcomes. It also requires thinking about what other individuals (of the same species and with comparable intelligence) would think and do. A professional chess player often calculates the anticipated moves of the opponent that are nested many layers deep, e.g., a counter-move to a potential counter-move to a previous move (see IBM Deep Blue vs. Kasparov in Sec. 5.18). Studies of planning ahead by inclusion of other intelligent beings' responses culminated in the development of game theory.691'446'27'207 The following fictionalized story from a historical epic, called Three Kingdoms, about three rival countries which were located in parts of where now China is (ca. A.D. 200), illustrates the basic principle of game theory (Chapter 50 of Ref. 427). General Tsao Tsao (also spelled as, Cao Cao), the commander of Han forces and the founder of the Kingdom of Wei, was evading an enemy pursuit. He had two available retreat routes to choose from: a big flat thoroughfare, and a short-cut that included a treacherous mountain pass, called Huarong Trail. The advance team on the lookout reported back that they detected smoke from several bonfires along the Trail, but nothing unusual on the big road. General Tsao picked the trail, but his staff questioned the wisdom of his decision, citing the well-known rule that a bonfire is a telltale sign of the enemy's presence. Following an alternative military dictum, "fake implies truth, and vice versa," General Tsao reasoned differently: "Kongming [the military strategist of the enemy Kingdom of Shu] is a very clever man with a full bag of tricks. He must have sent someone to build the bonfires deliberately, so as to trick me and lure me into the big road. I would rather not fall into this trap." Negotiating through the treacherous pass resulted in a great many casualties; the surviving soldiers and horses were thoroughly exhausted. Having just made it through the pass, Tsao was somewhat relieved and began to ridicule Kongming for not having the foresight of staging an ambush right there, so as to bottle up the entire retreating troops. No sooner did he regain his composure from the euphoria than he heard a thunderbolt from a fired cannon. Five hundreds elite swordsmen, headed by General Guan Yunchang of Kingdom Shu, flanked the trail. They had been waiting there for the arrival of General Tsao's battered and exhausted troops. Apparently, the effectiveness of Kongming's deceptive tactic critically depended on his accurate assessment of General Tsao's intelligence: an overestimate would have been just as bad
246
F. T. Hong
a miscalculation as an underestimate. How does the human brain actively model the external world for the purpose of planning ahead? Korner and Matsumoto381'382 analyzed anatomical and physiological evidence of brain function and proposed a scheme of how self-referential control of knowledge acquisition in the human brain works. They pointed out a major difference between the human brain and a conventional computer: each perceptual categorization and each sensorymotor coordination links "hardware" components of the brain in a new way, whereas a conventional computer adapts within fixed and pre-designed hardware constraints. A significant insight can be gained by looking into the order of development of the hierarchical control phylogenetically as well as ontogenetically. The archicortex (limbic system), which is the oldest phylogenetically, provides the genetically "hard-wired" intrinsic value system (a priori knowledge as expressed in the emotional system). The value system provides the constraints to be exerted on the subsequent development of the paleocortex (temporal lobe) and neocortex (cf. top-down constraint in evolution, Sec. 3.4). Functionally, the archicortex provides a rapid but coarse evaluation of input sensory information on the basis of the existing knowledge base (element of anticipation). Rapid deployment of this crude "initial hypothesis" as a matching template allows for the evaluation of prediction errors — i.e., degree of mismatch — by comparison of the internal world model and the sensory reality. The prediction is based on the similarity between the presently encountered situation and a previous one. The ensuing prediction errors reflect the subtle, or not so subtle, differences between the new sensory input and the previous one that was used to generate the "template." The prediction errors are taken into consideration when new strategies are subsequently plotted: modification of the initial prediction to better match the new situation. If these prediction errors cannot be eliminated by enlisting an alternative prediction from the existing repertoire, the brain must find a novel prediction by further exploration. This novel prediction is not made by re-learning the entire new situation but instead by learning or devising the "difference" only and by adding "corrections" to the initial prediction. Thus, new knowledge is integrated into the relational framework of existing knowledge. This scheme is consistent with the observation that our learning often involves assimilation of the newly acquired knowledge into the cognitive structures that already exist in the mind.450 This is also the reason why teaching in terms of analogy is effective in promoting under-
Bicomputing Survey II
247
standing of new phenomena. The procedure of error correction can be repeated, if necessary, in an iterative loop. Thus, by "iterative tuning" (successive approximations), the brain manages to refine the output so as to minimize or even eliminate the prediction error. Achieving a rapid but coarse fit, by means of an "initial hypothesis," constitutes an act of heuristic searching. Using visual pattern processing as an example, Korner and Matsumoto illustrated how value-based initial behavioral patterns are fed to the inferotemporal cortex to set the coarse semantic metric. This coarse metric, while interacting with the detailed input from the primary visual cortex via refined and filtered interactions, allows for the decomposition of the input pattern in terms of the coarse semantic metric set forward. Korner and Matsumoto further proposed that the neocortical columnar structure is the site of the elementary processing nodes for evaluating the prediction error. Rao and Ballard552 explored the idea that coding of prediction errors may contribute importantly to the functions of primary and high-order visual cortex. They also considered the role of the hypercomplex neurons of the visual cortex in coding of prediction errors. The general topic of neuronal coding of prediction errors has been reviewed by Schultz and Dickinson.600 The above scheme indicates that cognition is a combined top-down and bottom-up process. Ample evidence supports the notion of the top-down control exerted by the brain on the bottom-up process of forward sensory transmission. The executive control of working memory and the notion of selective attention are top-down processes (see Sec. 4.8). Pribram537 pointed out that, although he and coworkers had spent almost half a century amassing data in support of a combined top-down and bottom-up process in perception, many texts in neurophysiology, psychology, and perception were still obsessed with the view of pure bottom-up, reflex-arc, stimulusresponse (behaviorism). Kuhl and coworkers392'393 have studied voice recognition by subjects of different ethnicities, in particular, the variations of vowel sounds. Using computer-generated sound spectra that varied continuously, these investigators found that discrimination of sounds was based on available vowel templates acquired in infancy and adolescence. This observation explains the increasing difficulty of acquiring an accent-free foreign language with advancing age. This is of course the manifestation of the above-mentioned top-down control that is equipped with a limited repertoire of sound templates. Metaphorically, dogmatism is also a manifestation of top-down control
248
F. T. Hong
that provides a paucity of available "idea"-templates for comprehension (self-imposed restriction on exploration; see Sec. 4.4). In particular, practitioners of dichotomous "black-or-white" thinking are particularly prone to misunderstanding and difficulty in comprehension. This is because they have only two diametrically opposite templates for each type of problem. When a gray scale is called for, they habitually force-fit what they have perceived to either template, thus resulting in misunderstanding. In response to an intermediate perceived pattern that fits neither of their only pair of templates, they simply have to give up fitting, thus resulting in inability of comprehension. The scheme of Korner and Matsumoto is reminiscent of the two-step strategy of molecular recognition involved in the activation of PKC: the initial step is a coarse "homing" to the target, and the subsequent step refines the recognition (Sec. 5.5 of Chapter 1). Both mechanisms evade the tedium of random searching and the accompanying combinatorial explosion (by virtue of a top-down control) but allow for sufficient exploration to avoid excessive restrictions on the search space (by virtue of a bottom-up exploratory process). It is also reminiscent of the affinity model of molecular recognition shown in Fig. 1 (see also examples of molecular recognition given in Sec. 6 of Chapter 1). The brain continues to come up with templates of higher and higher "affinity," as new information continues to be analyzed and updated. It is also of interest to note that top-down constraints are instrumental in speeding up protein folding processes (Sec. 7.6 of Chapter !)• 4.18. Intelligence of nonhuman animals: planning ahead, versatility and language capability If brain function follows the steps of phylogeny, then it is natural to ask: Do nonhuman animals plan ahead? Do they have anticipation? For humans, planning ahead is a conscious act about which we all have subjective feelings via introspection. We do not doubt that other individuals also have the capability of planning ahead on the basis of others' verbal reports; it would be far too egocentric to deny others' capabilities of planning ahead. However, in dealing with nonhuman animals, it is somewhat problematic because we are forced to adopt a completely different set of criteria and to resort to the observer's own subjective judgment of the animal's objective behaviors. Descartes' view that nonhuman animals lack the ability to think was
Bicomputing Survey II
249
gradually replaced by the view that animals do think but think differently than us. Ethologists reported that some nonhuman animals including rats exhibit surprises when they encounter an unexpected scenario, appear to be able to plan ahead, and tend to think in "if, then" terms and simple abstract rules (p. 131 and p. 134 of Ref. 264). Some animals also have rudimentary numerical abilities.284'285 Experimental evidence also indicates that some nonhuman animals rely heavily on retrospective and prospective processing in working memory for planning appropriate responses to stimuli from a complex and ever-changing environment.331 Franklin even suggested that nonhuman animals use probabilistic inferences to assess risks (Sec. 4.8). With no obvious language-like capability, these animals may well use picture-based reasoning in risk assessments. Ethological research of great apes has produced convincing evidence demonstrating that great apes can make inference with regard to how fellow apes will respond to what they intend to do.377'99-162 A chimpanzee named Sarah could perform analogical reasoning,495 so did a gorilla named Koko (p. 143 of Ref. 507). Although many of us are willing to grant higher nonhuman animals some degree of cognitivism, we are less inclined to do so to lower animals. The distinction between behaviorism and cognitivism is not always clearcut, primarily because it requires an investigator's subjective judgment of a collection of attributes based on our expectation. Among all nonhuman animal behavioral patterns, elaborate deception schemes are often cited as objective evidence that some nonhuman animals have consciousness and do plan ahead (e.g., see Ref. 99 and Chapter 11 of Ref. 264). However, what constitutes a deception is of course a matter of subjective judgment; misunderstanding regarding deception between even two human beings is not uncommon. Consider the behavior of a species of spiders called Portia fimbriata. A Portia spider stalks and hunts its preys apparently by deception (aggressive mimicry and cryptic stalking). It does not look like a typical spider but rather like a piece of detritus. It knows how to pluck at threads of the web of other species of spiders so as to fool the web's owner into believing that the vibration is caused by the desperate struggle of an unfortunate prey (mosquito).343'344'345'717 A Portia spider can also mimic the vibration generated by the mating ritual of a male Euryattus spider so as to lure an unsuspected female out of hiding into a convenient striking distance. The low-lying status of arthropods on the evolutionary scale makes it tempting to dismiss this elaborate behavior as genetically preprogrammed and acquired through evolution to fit a convenient ecological niche for sur-
250
F. T. Hong
vival (animal instinct). However, it turns out that Portia fimbriata can actually try out different modes of plucking the web, with its versatile eight legs and two palps, and then lock onto the mode that actually works to its advantage. A Portia spider has a sharp eyesight, which allows it to discriminate different types of preys and to decide to deploy whether cryptic stalking or just ordinary stalking.280 A Portia spider also knows to only pick on preys its own size. Although it routinely sends the luring signal to a male of Nephila maculata, it remains "radio silent" in the presence of a female that is huge compared to its own size. Furthermore, a Portia spider seems to know how to plan ahead. For example, after an initial attack on an intended victim fails, a Portia spider retreats and finds an alternative route such as climbing up a vine that extended above the target web, dropping a silk of its own and then swinging towards the target. A Portia spider has a brain the size of a pinhead, and has eyes with at most 100 photoreceptors.345 The mind-boggling question is: How does Portia manage to achieve so much with so little? Still, behavioral versatility is more intimately related to how the neurons are wired together than to size or numbers alone. Obviously, more work is needed to demystify the complex behavior of Portia. Such elaborate behaviors of lower animals suggests that there is a conscious cognitive element and an act of planning ahead. However, in view of the low-lying status of arthropods, the evidence may not be sufficiently convincing. As one proceeds downward on the evolutionary scale, it becomes harder and harder to differentiate a scheme that is implemented with foresight and consciousness from one that is a manifestation of genetically wired animal instinct. In reality, animal instinct and conscious cognitive behaviors are intricately interwoven, as is evident in numerous studies of avian orientation and navigation (for reviews, see Refs. 591, 723, 53, 321, 724 and 69). Migratory birds of the northern hemisphere routinely fly to the southern hemisphere to avoid the harsh winter. Long-distance migrations of birds pose two problems: the compass problem (the means to determine direction) and the map problem (the detailed information regarding the destination). Ethological studies have yielded convincing evidence that migratory birds use multiple cues for determining the direction (orientation): the position of the Sun, stars, the Earth's magnetic field, local landmarks, etc. The magnetic sense is innate,721 although the sensory receptors have not been clearly identified.
Bicomputing Survey II
251
Using hand-raised young pied flycatchers that had never seen the sky, Beck and Wiltschko54 demonstrated that the magnetic sense matures independently of any experience with the sky. The magnetic compass thus appears to be the primary reference for the pied flycatchers' orientation, against which the other cues are calibrated. On the other hand, the establishment of the sun compass requires a conscious learning process: 725 the bird must correlate the apparent position of the Sun with its internal clock.590'591 Wiltschko and coworkers720 have shown that when a group of pigeons was raised but was allowed to observe the Sun only in the afternoon, they were unable to use the sun compass in the morning but instead used the magnetic compass. Apparently, young birds need to observe much larger portions of the arc of the Sun's apparent trajectory to be able to use the sun compass during the entire day. But why? In principle, for the purpose of calibrating the sun compass against the birds' internal clock, a single position on the Sun's trajectory should be sufficient because of the one-to-one correspondence between the Sun's position and the time of the day. A couple of reasons may be speculated. First, reliance on a single point of observation is error-prone. The birds might have been invoking a time-honored practice by humans since the time of Greek astronomers: a more accurate measurement of a doubtful quantity can be gained by averaging several inaccurate measurements (p. 131 of Ref. 212). Defining a segment (arc) of the Sun's trajectory is tantamount to fitting the scattered data points with a smooth curve. The trajectory defines a vertical plane, whereas the inclination of the magnetic field, together with the direction of gravitational pull, defines another plane; the magnetic compass is an "inclination" compass (see Ref. 722 for an explanation). The birds can then calibrate the angle between the two planes against their internal clock so as to obtain the time course of the variation of this angle. Second, there is a seasonal variation of the Sun's trajectory: the trajectory tilts towards and then away from the zenith on an annual basis, being closest to the zenith at the time of the summer solstice and farthest away from it at the time of the winter solstice (the turning around times are the vernal and autumnal equinoxes). This seasonal variation disrupts the above-mentioned one-to-one correspondence, which may be preserved by a correction with the knowledge of a segment of the trajectory. Arguably, the above speculations may beflawedin being anthropomorphic; the birds may have an entirely different "thought." However, Wiltschko and Wiltschko's experiment clearly demonstrated that the birds' use of the sun compass is not based solely on animal instinct; a conscious cognitive element is in-
252
F. T. Hong
volved. In addition to orientation, migratory birds need a map or a flight plan to reach their winter destination. Wiltschko and coworkers have also investigated the flight plans of both garden warblers and pied flycatchers. It turned out that their flight plans are largely innate, and both use the magnetic compass to implement their flight plans. On their normal migration, the garden warblers avoid the Alps and the Mediterranean Sea. Klein et al.373 found that, during August and September, they fly from central Europe (Frankfurt am Main) southwest (SW) to the Iberian Peninsula. They change their migratory direction around the Strait of Gibraltar to south-southeast (SSE) in order to reach their African wintering grounds south of the Sahara. Gwinner and Wiltschko271 kept hand-raised garden warblers in Frankfurt, and determined their seasonal variations of orientation preference by observing their orientation restlessness (fluttering) in a special circular orientation cage (Emlen funnel). Under this captive condition, the garden warblers first oriented SW for about eight weeks at the beginning of the migratory season and then changed to SSE for an additional six weeks. The durations and directions of the two periods match the actual flight route of the garden warblers, assuming that they migrated with a uniform speed — a reasonable assumption, given the long journey, even though most migratory birds migrate predominantly at night. This observation suggests that the changes of migratory orientation are controlled, at least in part, by spontaneous internal changes (possibly hormonal in origin) in the preferred direction relative to external orientating cues, such as the Earth's magnetic field. The flight plan is, however, species-specific. When pied flycatchers were kept under the constant magnetic conditions of central Europe, they oriented SW during the first part of the flight plan, but became disoriented at about the time that they were supposed to have reached the Strait of Gibraltar. Wiltschko and coworkers suspected that variation of the magnetic field intensity may be crucial. Experiments were subsequently repeated with a simulated magnetic field imposed on the orientation cage: the expected location-dependent variations of the magnetic field intensity and inclination along the birds' migration route were coarsely simulated by the corresponding tame-dependent variation at the cage site, again assuming a uniform speed of migration, so that the locally imposed magnetic field was adjusted to lower intensity and inclination during the second leg of their imaginary journey to reflect the appropriate conditions at northern Africa. This was similar to the operation of a computer-based flight sim-
Bicomputing Survey II
253
ulator except that variation of the expected visual sceneries were replaced with variation of the expected magnetic field conditions. Under the newly imposed magnetic condition, the pied flycatchers were found to continue to be properly oriented, showing the expected southeasterly tendencies of the second leg of their migration.723 If an investigator is repeatedly exposed to elaborate behaviors described above, a gradual shift of attitude could lead to a more and more liberal interpretation of consciousness and intelligence. On the other hand, elucidation of molecular and developmental biology tends to cast a spell on investigators in the opposite direction. The courtship behavior of fruit flies Drosophila megalogaster illustrates this point.548'262 Fruit flies exhibit fairly elaborate courtship rituals which are controlled by several genes that regulate the development of cells in certain parts of the brain. For example, "triggering" of the courting behavior requires the presence of male-type cells, both in the rear part of the brain (which integrate sensory signals) and in parts of the thoracic ganglion. On the other hand, odor-based sex discrimination depends of the presence of femaletype cells in the antennal lobe of the mushroom body of the brain. When the latter part was altered, an affected male would court a male just as vigorously as it would court a female while maintaining essentially normal courting rituals. Thus, a detailed mechanistic understanding of elaborate behaviors tends to persuade some investigators to accept the view of behaviorism to the extent that may appear to be "irrational" to other investigators who subscribe to the opposite (cognitive) view (see p. 123 of Ref. 263 for Griffin's comment about "semantic behaviorism"). However, liberal interpretation by ethologists can be viewed by their detractors as being careless in sliding down the "slippery slope" on the gray scale of behaviorism versus cognitivism, and crossing the fine line into accepting any elaborate schemes as evidence of consciousness. Perhaps they would also accept the behavior of the next generation's robots as evidence of consciousness (cf. Sec. 5.18). In a desperate effort to draw a fine line between cognitivism and behaviorism, some investigators use versatility as a criterion. Although it is true that nonhuman animals tend to be specialists in mastering their elaborate behavioral schemes and lack the ability to generalize them, I suspect that this view may be a skewed anthropomorphic one that is fostered by our accidental occupation of the pinnacle position on the evolutionary scale. I wonder whether future sophistication of machine intelligence may force us to revise our criteria (cf. Sec. 4.26). Ironically, information explosion
254
F. T. Hong
in biomedical fields and fierce competition in school have joined forces to foster a new educational climate for students to practice a highly focused learning plan (Sec. 4.23). They learn only knowledge to be covered in testing but are ignorant of knowledge outside of the curriculum. Their practice of exclusively rule-based learning further limits their ability to generalize (transfer) knowledge or skills learned in one area to another related area. They specialize only in performing tasks that have been explicitly taught, much like a worker on a factory assembly-line, while exhibiting an astonishing lack of versatility. It makes us wonder whether versatility is a good criterion for cognitivism after all. Thus, both proponents and opponents of (nonhuman) animal intelligence run the risk of crossing the elusive fine line of demarcation between cognitivism and behaviorism, and into the region of extremes and absurdity. This situation is similar to an attempt to find the demarcation between the colors red and orange. Our inability to find a clear-cut demarcation between red and orange on the continuous optical spectrum does not imply that red and orange are two indistinguishable colors. So at the two extremes on the gray scale of intelligence, the difference between behaviorism and cognitivism is clear-cut, but the transition from one to the other in either direction is gradual and murky. The same can be said about innovation and imitation. The debate will undoubtedly go on, but I do not take the position that such endless debates are futile and senseless, because it allows us to detect the gray-scale nature of a particular attribute that once was considered to have a "yes-or-no" dichotomy (e.g., creativity). Consider the linguistic ability that was once regarded as the unique attribute of human intelligence. This once deep-seated conventional wisdom was shattered by the report of a chimpanzee named Washoe that had been taught to master American Sign Language (ASL).219 Following the replication in several nonhuman primates, including a chimpanzee named Nim Chimpsky and a gorilla named Koko, a heated controversy erupted with regard to whether what Washoe, Nim and Koko had learned to master can be considered truly a simple form of language (Chapter 11 of Ref. 518). The sign "language" used by Nim661'659 and by Koko507'506 tended to be ungrammatical and repetitious. For example, Nim signed: "Give, orange, me, give, eat, orange, give, me, eat, orange, give, me, you," and Koko signed: "Please, milk, please, me, like, drink, apple, bottle" (summarized in pp. 236-251 of Ref. 264). Needless to say, the heated debates led to no consensus, but continuing debates have contributed significantly to advances in linguistics research, as a spin-off.
Bicomputing Survey II
255
Perhaps a gray scale of language capabilities should be considered for nonhuman primates. After all, cerebral lateralization did not appear suddenly in humans, as an evolutionary discontinuity. The brain and the skull of nonhuman primates often exhibit morphological asymmetry, though it was less frequently seen than in humans.414 In fact, cerebral asymmetry is reflected in vocal control of songbirds.18 Thus, invoking the grammatical structure as the sole criterion in judging the language capability of nonhuman primates seems out of place and out of proportion. Even for humans, there are significant variations in the grammatical structure of different languages. Here is my point. To Western linguistic experts, the expressions signed by Nim and Koko fell short of the standards of a bona fide language. However, being a nonnative English speaker, I could be fooled if I were a Turing judge, who stayed behind a curtain and communicated to the apes via a tele-typewriter operated by a human interpreter (see Sec. 5.18 for an explanation of the Turing Test). Because of my own background, I could have concluded that the expressions were "broken" English spoken by non-native speakers and that the speakers' native language did not have an elaborate grammatical structure, e.g., certain Asian languages.11 Prom my perspective, the repetitions simply reflected the possibility that the speakers/signers did not know enough adverbs to signal the urgency of their request and/or were perhaps raised in a culture in which repeatedly pestering the caretakers, such as parents or government officials, was considered an effective way of extracting a demand. The following dialogue between Koko and Barbara (her caretaker), who showed Koko a picture of a bird feeding its young, demonstrates that Koko had a sense of self-identity (p. 285 of Ref. 506): K: THAT ME. (Pointing to the adult bird.) B: Is that really you? K: KOKO GOOD BIRD. B: I thought you were a gorilla. K: KOKO BIRD. B: Can you fly? K: GOOD. ("Good" can mean "yes.") h
I do not imply that languages with a less deterministic grammatical structure are inferior. The lack of grammatical precision can at least be partly compensated for with the aid of adverbs and other modifiers, including the body language.
256
F. T. Hong
B: Show me. K: FAKE BIRD, CLOWN. (Koko laughs.) B: You're teasing me. (Koko laughs.) What are you really? (Koko laughs again and after a minute signs:) K: GORILLA KOKO. If I had no prior knowledge of great ape research, I would never suspect that the above conversation actually involved an ape, and Koko's selfdeprecating humor could have convinced me that the speaker/signer had been westernized. Realistically, grammar should not be the sole criterion for judging Koko's (or Nim's) language capability. As quoted by Patterson and Linden (pp. 193-194 of Ref. 507), Mary Midgely summed it up eloquently: "To claim that an elephant has a trunk is not to say that no other animal has a nose." Koko's ability to use homonyms, rhymes, analogy and context-sensitive elements, such as the difference between the verb "like" and the adjective "like," should also be taken into account. Koko's ability to diverge from norms and expectations in a recognizably incongruous manner, in order to generate a joke, suggests that her signing was not a simple stereo-typed, behaviorists' "stimulus-response" kind of expression (Chapter 16 of Ref. 507). As also pointed out by Patterson and Linden, detractors' objections were often based on what the apes had not yet done. The situation is reminiscent of AI detractors' case against machine intelligence (Sec. 4.26). As Griffin pointed out, the expressions of Nim and Koko leave no doubt what they wanted, and the ape language experiments have clearly demonstrated evolutionary continuity between human and nonhuman capabilities of communication and thinking. 4.19. Multiple intelligences: role of working memory Gardner221'225 proposed a theory of multiple intelligences. It became perhaps the most influential theory about intelligence in the late 20th century. Gardner made clear that there are many different kinds of "intelligences" that excel at different kinds of tasks. Thus, it no longer makes sense to speak about a single kind of intelligence: the intelligence. Here, we shall evaluate the theory of multiple intelligences in terms of Simonton's chanceconfiguration theory. Simonton's model allows us to identify each and every crucial step of reasoning in the process of problem solving. The demand at various steps are different for different kinds of tasks. Different kinds of intelligences
Bicomputing Survey II
257
draw a variable strength at different steps. For example, let us consider the importance of memory. Most people place a premium on long-term memory, as reflected in the proliferation of advertisements of diets or herbs for memory enhancement. Even some advocates who preached and practiced the right-brain movement used visual thinking — reading a "bit map" instead of perceiving a meaningful picture — to enhance long-term memory and prized the newly gained ability of remembering meaningless phrases (or spelling words) in both forward and reverse sequences, thus missing the opportunity of identifying a key element in creativity. Tsien675 genetically engineered a smart mouse named Doogie; he aimed at enhancing the mouse's long-term memory and used it as an major indicator of intelligence. Unbeknownst to the investigator, it was probably not his own long-term memory of molecular biology — the relevant domain-specific knowledge — that made him smart enough to engineer Doogie. Rather, it was his ability to pull or pool together several commonly known facts into his working memory, thus making sense of a design principle that guided him to target a partial modification of the mouse's NMDA receptor (Sec. 9 of Chapter 1). To do justice to Tsien, I must point out that he understood that it takes more than just a good memory to solve novel problems. Most standard physiology textbooks do not emphasize the importance of working memory in problem solving. The textbook by Vander et al.683 is a notable exception. Vander et al. indicated that "the longer the span of attention in working memory, the better the chess player, the greater the ability to reason, and the better a student is at understanding complicated sentences and drawing inferences from texts." In the cognitive science literature, it is generally recognized that working memory plays an important role in complex problem solving. In fact, it was owing to this consideration that the old term "short-term memory" was replaced by the term "working memory." These two terms may be considered synonymous for our present purpose. Here, it should be stressed that the greater the amount of information held concurrently in working memory, the better problem-solving skill the individual possesses. How can the capacity of working memory be measured? The earliest estimate of the capacity of working memory was reported by Miller in 1956467 (for a review, see Ref. 147). He indicated that there is a "magic number seven, plus or minus two" that designates the number of chunks of information that can be held concurrently in working memory (chunk-based model). According to Simon, a chunk is a stimulus or any complex stimulus pattern that has been learned as a unit and has become highly
258
F. T. Hong
familiar, so that when encountered again it can be recognized instantly as a whole (p. 60 of Ref. 628). The chunk-based model was supported by the studies of Chase and Simon.113'114'622 A rival theory, known as time-based model or rehearsed loop hypothesis, was proposed by Baddeley.31'32 Baddeley and coworkers33 previously studied the effect of word lengths on the capacity of working memory. They found that: a) memory span is inversely related to the word length across a wide range of materials, b) when the number of syllables and number of phonemes are held constant, words of short temporal duration are better recalled than words of long duration, c) span could be predicted on the basis of the number of words which the subject can read in approximately 2 seconds, d) when articulation is suppressed by requiring the subject to articulate an irrelevant sound, the work length effect disappears with visual presentation, but remains when the presentation is auditory. However, a chunk may contain more than one word. For example, Simon tried the following list of words: Lincoln, milky, criminal, differential, address, way, lawyer, calculus, Gettysburg. He found that he was not successful in recalling them all.622 But the recall became easy after these words were rearranged, as follows: Lincoln's Gettysburg address, Milky Way, criminal lawyer, differential calculus. Therefore, the chunking effect is apparent. The chunking effect also explains why picture-based reasoning is more advantageous than rule-based reasoning in regard to memory retention: the former allows for superficially (verbally) unrelated but pictorially similar or analogous items to be chunked together. However, the time-based rehearsed loop model is not without merit. Simon found that the number of chunks held in working memory is not constant, and an increase in reading duration does reduce the number of retained chunks.741 How information is organized and encoded is clearly a critical factor in increasing the size of chunks and, thus, in increasing the amount of information that can be brought to the conscious level concurrently. The experiment of Baddeley and coworkers33 showed the difference between information encoded acoustically and information encoded visually (using working memory as visuo-spatial scratch pad). Simon subsequently found that the short-term memory span for chunks that can be discriminated and encoded acoustically is in the neighborhood of six or seven, whereas the span for homophones that can be discriminated only visually or semantically but not acoustically is only two or three.738 Thus, there is a price to pay for visual encoding of words. For scientific thoughts, visual encoding is still powerful because of the comprehensiveness of information contained in a
Bicomputing Survey II
259
picture or diagram.407 The increase of information content more than makes up for the decreased number of retained chunks. In the demonstration of chunking, Simon indicated that the word rearrangement took advantage of something he was extremely familiar from his upbringing. Presumably, he meant the following. For professional or amateur astronomers, the term "Milky Way" means a single entity in the night sky. For someone who has never heard of the term, linking of the two words has no chunking effect. By the same token, practicing and rehearsing domain-specific knowledge (such as organic chemistry in biochemical research) and problem-solving tools (such as mathematics in physics research) probably enhance the apparent capacity of working memory. Presumably, these practices enhance familiarity and coalesce knowledge modules into chunks of larger "size." In other words, these practices elevate domainspecific knowledge and skills to the status of "second nature" and keep them at the "edge" of working memory, thus making it easier to retrieve. However, for those who memorize domain-specific knowledge by rote, the chunking effect would be minimal, since mentioning the phrase differential calculus, for example, does not refresh the memory of an entire battery of mathematical techniques, but, instead, evokes a blank in the mind. It is also readily understood why conceptualization is so useful: a large number of descriptions in terms of both words and diagrams can be replaced with a single phrase. Mentioning the phrase representing a concept triggers the recollection of a whole range of detailed information. Conceptualization accomplishes what Bastick referred to as multicategorization. The true power of multicategorization cannot be unleashed if one merely memorizes the phrase that represents a concept; the chunk simply has an empty content. Poincare mentioned the special aptitude of a chess player who "can visualize a great number of combinations and hold them in his memory" (p. 384 of Ref. 521). Here, Poincare clearly referred to working memory. Of course, a good chess player must also have a superior long-term memory in order to be able to remember fellow players' playing strategies. Poincare confessed that his memory was not bad, but was insufficient to make him a good chess player. Apparently, a genius-grade chess player must command a significantly better memory than a genius-grade mathematician. Poincare's introspective account thus supported the theory of multiple intelligences. Poincare explained why a good memory is less crucial for him to have than for a master chess player (p. 385 of Ref. 521): "Evidently because it is guided by the general march of the [mathematical] reasoning. A math-
260
F. T. Hong
ematical demonstration is not a simple juxtaposition of syllogisms, it is syllogisms placed in a certain order, and the order in which these elements are placed is much more important than the elements themselves." Apparently, it was not how much mathematics he knew, but how cleverly he could pull his knowledge all together to make logical, sensible conclusions after conclusions, thus solving a difficult problem — a stunning act of fitting numerous pieces of the puzzle together. Interestingly, Poincare linked intuition, parallel processing, and perhaps also the chunking effect all together with regard to mathematical problem solving. He went on to say: "If I have the feeling, the intuition, so to speak, of this order, so as to perceive at a glance the reasoning as a whole, I need no longer fear lest I forget one of the elements, for each of them will take its allotted place in the array, and that without any effort of memory on my part." Apparently, picture-based reasoning as well as a judiciously integrated relational knowledge structure (multicategorization) increases the size of chunks, thus diminishing the importance of a superior long-term memory and partially compensating for a mediocre one. In addition, clever short-hand notations also increase the chunk size of knowledge modules, thus helping scientists and mathematicians avoid prematurely overloading their working memory. An example is the summation convention in tensor calculus: every dummy letter index appearing twice in one term is regarded as being summed from 1 to 3. Thus, the expression gtkAk replaces the more cumbersome expression gllAi + gl2 Ai + gl3A3 or E 9ikAk.
fc=i
In summary, the judicious combination of rule- and picture-based reasoning may also increase the apparent capacity of working memory: both mental images and concepts provide compressed information, and both provide linkages to superficially unrelated knowledge, thus enhancing the retention of unrelated knowledge modules, as a single linked chunk, in working memory. I believe that the practice of combined rule- and picture-based reasoning is more important than increasing attention span in determining the ability to reason. A prolonged attention span may be detrimental, whereas the occasional defocusing of attention may help expand the search space, as explained in Sec. 4.8. In conclusion, Simonton's chance-configuration theory suggests that both chess playing and mathematical invention share the same basic elements of mental faculty but the relative importance of each element varies. In the next section, we shall consider whether a similar conclusion can be
Bicomputing Survey II
261
extended to creativity in humanities. 4.20. Creativity in music, art and literary works It is common belief that aptitude in sciences and aptitude in humanities are diametrically different and may even be mutually exclusive. Gardner's popular theory of multiple intelligences tends to reaffirm and reinforce this view. Yet, Calvin101 suggested that the evolutionary origin of intelligence was the development and specialization of a core facility that is common to various kinds of higher cognitive activities. Having had a similar conviction, Koestler376 elaborated a unified theory of bisociation to explain creativity in science, art and humor (Sec. 4.1). Here, we shall demonstrate that Simonton's chance-configuration theory, a closely related theory, is also applicable both to sciences and humanities, thus supporting Koestler's unified view. The search-and-match phase stipulated in Simonton's theory is certainly important in composing music and poems. Let us start by considering the likelihood of the proverbial monkey accidentally composing a Shakespearean sonnet by hitting typewriter keys randomly. Probability theory indicates that this is extremely unlikely to happen — almost certainly not in the lifetime of the monkey. Yet, probability theory also indicates that it is not an absolute impossibility for the same monkey to accidentally write a digital computer program that will run properly and not crash. For example, the following two-line assembly-language program was written in the Macro 11 assembly language, implemented in the PDP-11 series of minicomputers of Digital Equipment Corporation, Maynard, MA. It involves keystrokes as few as seventeen, including spaces and carriage returns: S: BR .-1 END S where the two "S" letters can be replaced with any pair of identical letters. This short program carries out a procedure that jumps (BR = branch) to the previous program step and loops around indefinitely. It may be trivial but is a correct and meaningful program that can be realistically created by random trials by an exceptionally lucky monkey, using an old tele-typewriter that has no lower case characters. Digital computer programming is relatively immune to a combinatorial explosion, owing to its rather small program-instruction set and its total lack of semantic variations. On the other hand, poetry is not a word game that involves mere juxtapositions of words so as to satisfy the minimal syn-
262
F. T. Hong
tactic and stylish requirements. What distinguishes a good poem from an interesting limerick is the nuances of its semantic content. In brief, it is unlikely to compose a meaningful, if not outstanding, poem by means of random searching for words, whereas exhaustive searching inevitably meets the wrath of a combinatorial explosion. An uncanny ability to perform heuristic searching is still a necessary condition of creativity in humanities (though not a sufficient condition). Any doubt will be swept aside by considering the number of combinations of musical notes in the audible range, to be "parsed" in terms of distinct melodies, a sizable repertoire of rhythms, harmony (or dissonance), and expressions (e.g., crescendo). The number of combinatorial possibilities is further multiplied by many possible ways of orchestration in terms of a combination of different musical instruments, including human voices, each with a distinct timbre (tone quality). In practice, exhaustive or random searching is not a normal mode of creating literary works and music. In the analysis of scientific creativity, we have associated intuition with pattern-based reasoning and an extraordinary ability to perform heuristic searching (Sec. 4.10). Inspiration for creativity in humanities — a manifestation of primary-process thinking — can also be similarly interpreted. A fine composer or a fine poet need not search aimlessly for the right notes or the right words and the right ways of combining them into a piece of fine work. The real challenge for them is to bypass sterile combinations of notes or words and to go for the appropriate combinations of elements more or less directly. Of course, they are also gifted with an exceptional ability to distinguish good elements and good ways of combining them from mediocre ones and recognize them when these elements and combinations come up in the search. However, the ability to differentiate good works from mediocre ones is certainly not the monopoly of poets or composers (or other creators), otherwise the creators will not be able to find audiences that are capable of comprehending and appreciating their works. In scientific problem solving, two types of heuristic searches are possible: picture-based and rule-based reasoning. Are these two modes of thinking also relevant to creativity in humanities? Creation in humanities can be classified in accordance with how the artistic elements are constructed and displayed. Visual art involves a spatial display of its elements, whereas music and poetry involves primarily a temporal display. Dancing and movie making can be regarded as hybrids involving both spatial and temporal elements. We claim that both rule-based reasoning and picture-based (or, rather, pattern-based) reasoning are relevant to all types of creative activ-
Bicomputing Survey II
263
ities in humanities. The relevance of rule-based reasoning is obvious because certain established rules are to be followed. Composing a piece of music requires adherence to certain acceptable forms and styles as well as rules in harmony and counterpoint. Composing a poem requires conformity to rhymes and certain forms and styles. Picture-based reasoning is obviously required in visual art. Picture-based reasoning is also valuable in composing polyphonic music, because of the presence of both parallel and sequential elements. However, for monophonic music and poetry that display no parallel elements, its value is not so obvious. The advantage of pattern-based reasoning can be made apparent by considering Mozart's claim regarding his composing approach in a letter to a certain Baron von P who had given him a present of wine and had inquired about his way of composing (quoted in Part the Fourth: 1789, pp. 266-270 of Ref. 312; also excerpted in pp. 55-56 of Ref. 687 and pp. 34-35 of Ref. 232). Mozart claimed to be able to hold an entire music score in his working memory so that "[he could] survey it, like a fine picture or a beautiful statue, at a glance." Thus, Mozart's working memory is at least as impressive as his long-term memory, which, according to legend, had the phenomenal capacity of storing other composers' work upon a single exposure. Here, it was not about Mozart's attention span. It was about his extraordinary parallel processing capability: Mozart had a multi-track mind. This is evident in the same letter: "Nor do I hear in my imagination the parts successively [i.e., sequentially], but I hear them, as it were, all at once (gleich alles zusammen)." Presumably, there is a similar advantage to hold an entire poem (or a piece of monophonic music) in one's head and examine it as a whole for the sake of viewing the overall structural balance before committing it to paper. Poincare also held a similar view about mathematical invention. He thought that the order in which the elements of syllogisms are placed is more important than the elements themselves, and arrangements of such elements are more readily done by viewing them as a whole in his mind (Sec. 4.19). However, the authenticity of the above-cited letter has been called into question by modern Mozart scholars. In a new introduction written for the Folio Society edition of Holmes' book The Life of Mozart, Maestro Christopher Hogwood indicated that modern "forensic" techniques for examining paper-types and watermarks had destroyed the myth of Mozart's composing approach, citing several examples of false starts and revisions (p. xii of Ref. 312). In my opinion, such examples alone do not fully invalidate
264
F. T. Hong
the point made in Mozart's letter. Just because Mozart had false starts or revisions occasionally is no proof that he could not compose music in the manner, so indicated in the allegedly forged letter, at all. Besides, who else other than Mozart could have understood so well how the mind works without the benefit of insight made available by modern cognitive science? As often cited in the music literature, Mozart composed the overture for his opera Don Giovanni in the odd hours well past midnight prior to its premier performance, and the ink on the score sheets had barely dried when the curtain rose. I do not see how this could be done unless Mozart had invoked his legendary prowess of parallel processing. Miller found that the creative style described in the above-mentioned letter to Baron von P was consistent with the style described in Mozart's letter to his father, dated December 30, 1780, regarding the progress of composing his opera Idomeneo: "... everything is composed, just not copied out yet ..." (p. 269 of Ref. 464). Hildesheimer's book Mozart, which cited the above letter, also quoted, on the same page, another letter dated April 20, 1782: "But I composed the fugue first and wrote it down while I was thinking out the prelude" (p. 238 of Ref. 299). Mozart's creative style was also consistent with his overt behaviors, not just composing. As Hildesheimer pointed out, the act of "copying out" was automatic for Mozart; he often did it at the same time while he held conversations or listened to someone talking. This behavioral pattern again pointed to Mozart's penchant for parallel processing: doing several different things at the same time. The latter can be regarded more as a habit than a talent. It is often found in people who habitually perform parallel processing (personal observation). The fact that Mozart did sometimes make corrections was not sufficient reason to deny the authenticity of his letter to Baron von P. In recognizing Mozart's behavioral pattern, an overall consistency must be allowed to override minor, local irregularities or deviations, if necessary, or else one misses the big picture (cf. Duhem thesis, Sec. 6.5). Alternatively, Hildesheimer viewed it in a different light: "The manuscript of Don Giovanni contains deletions and corrections, but never betrays haste, only the swift hand of a man who thinks more quickly in notes than his pen can write" (p. 238 of Ref. 299). Besides, last-minute altering of a music score was not an uncommon practice among composers of operas, for the sake of custom-tailoring the degree of difficulty to match a singer's ability or even for the sake of punishing an unruly and defiant singer by depriving the singer of an opportunity to show off his or her pyrotechnics. Parallel processing is not an all-or-none capability; it covers a gray scale
Bicomputing Survey II
265
from almost none to the exaggerated perfection, which Mozart scholars attempted to dismiss. I am almost certain that many of the readers had the experience of planning out the sequence of exposition in their head before starting to write a scientific paper, so as to see the "big picture" and avoid unnecessary future rearrangements (revisions); it is not an extraordinary feat unless the first draft is expected to be perfect. What set Mozart apart from many other composers was his command of an exceptionally high "degree" of parallel processing that enabled him to hold more complex music elements in his working memory concurrently than an average composer does; this ability partly explains his extraordinary composing speed and his relatively infrequent revisions. It is also quite obvious that Beethoven used the same approach to compose his symphonies. Even in his well-publicized acts of frequent and often extensive revisions, Beethoven must have rehearsed his symphonies in terms of tonal imagery before he wrote them down, at least for the sake of self-evaluation and at least after he turned totally deaf. Beethoven's frequent revisions, as compared to Mozart, might simply reflect his style rather than his ability to perform parallel processing. In point of fact, Mozart and Beethoven were not the only composers that could have invoked parallel processing in composing an orchestral work. A common practice in composing orchestral works is to compose a preliminary draft as a piano score, which is an abbreviated form that can be played on a piano but contains most of the essential features. A wellknown exception was French composer Hector Berlioz. Berlioz did not play piano, only flute and guitar. He had no choice but to bypass the step of piano scoring and to compose an orchestral work directly; he did not "compose music and then orchestrate it," according to his biographer (p. 73 of Ref. 580). It would be utterly impossible to do so without the ability to rehearse the entire piece in his head. On the other hand, he conveniently evaded the constraint set by piano scoring. Perhaps this was why his mastery of orchestration was universally acclaimed during his lifetime, and he was regarded as the father of modern orchestration (p. 321 of Ref. 85). In the chance-configuration theory, the third phase of verification, as applied to music and poetry, is not as straightforward as in science. In scientific problem solving, a theory is subordinate to the relevant observed experimental facts, and verification consists primarily of rule-based reasoning to ensure that a theory does not violate facts. What does verification means in an act of music composing or literary writing? In my opinion, there are at least two different aspects of verification: a) conformity to the relevant rules of the "game," and b) prior assessment of the possible accep-
266
F. T. Hong
tance of critics and audiences. Rule-based reasoning is relevant in verifying conformity to rules. This step thus partially duplicates the match phase, and the two phases may be merged and become indistinguishable. On the other hand, an assessment of the attractiveness of a piece of work often requires perception of the piece as a whole; verification must also involve picture-based reasoning and parallel processing. Verification in music, art and literary works is much less certain and much less objective than in science. Rules are not inviolable and the contemporary critics and audience are not always "right." Rules in poetry and singing and the taste of judges were parodied in Richard Wagner's opera Die Meistersinger von Niirnberg (The Mastersinger of Nuremberg). Walther von Stolzing deviated drastically from the existing rules in a singing contest, and the overwhelming majority of the judges could not understand the originality of his song and singing. Hans Sachs cast the lone vote of support, saying (p. 224 of Ref. 695): Wollt Ihr nach Regeln messen, was nicht nach Eurer Regeln Lauf, der eignen Spur vergessen, sucht davon erst die Regeln auf! Liberally translated, Hans Sachs meant: "Just forget about using your rules to judge his performance since he did not even adhere to the rules. Go find the new rules first!" Rules for music composition are not natural laws but conventions crafted by fine musicians of earlier generations, based partly on the "physics" of sound and partly on the social consensus of human's aesthetic standards,63 which have been field-tested with the audience of earlier generations and become part of the culture. These rules constitute the domain-specific knowledge that must be acquired in a musician's formal education and becomes part of a composer's asset commonly known as "techniques." As compared to scientific discovery, a greater emphasis has been laid on talent in music creation. In the above discussion, we specifically interpret talent as the ability that is associated with primary-process thinking, including inspiration, intuition and capability of parallel processing. However, this does not imply that domain-specific knowledge, which is largely rule-based and often requires extensive practicing, is not important. Quite the contrary, domain-specific knowledge in Western music has been advanced to such a highly sophisticated level that advanced training is an imperative prerequi-
Bicomputing Survey II
267
site for becoming a competent musician (composer or performer). However, to become an outstanding musician, domain-specific knowledge alone is insufficient. The fact that a local (though not universal) consensus regarding subjective aesthetic judgments is possible at all implies that humans' intrinsic aesthetic value judgment is, at least partly, genetically endowed, presumably in the limbic system. The psychology of aesthetics is enigmatic. The biological foundation of music perception has begun to be explored.173'739 Here, I would like to explore and speculate in light of our understanding of biocomputing. Humans' music activities seem to reflect the interplay of two opposing trends: striving for recognizable patterns and — in Waldrop's words (p. 147 of Ref. 696) — for perpetual novelty at the same time. Thus, Western music started with well-defined scales, rhythms, harmony and counterpoint, all of which represent well-recognized and well-recognizable patterns. Subsequently, dissonance, twelve-tone music, atonal music, multiple concurrent rhythms and the likes stepped in for the sake of perpetual novelty and threatened to obscure the painstakingly laid-down patterns. Perhaps that "ontogeny recapitulates phylogeny" is also applicable to music. Even in a given piece of music, well-recognizable melodies are often repeated a certain number of times (just to make sure the audience recognizes them). However, often the repetitions are not exact but the melody is shifted by a fixed interval, or inverted, or even laced with ornamentations, in a recognizable way. Even the shift or inversion may not be exact, presumably because of the strife for perpetual novelty and/or the need to avoid awkward or mediocre passages that may otherwise have appeared if rules are adhered to strictly. These deviations from norms are often so ingenious that it first comes as a surprise to the audience, but it quickly becomes an elegant surprise of which the embedded ingenuity, once pointed out, becomes obvious and adored by the audience! It is not surprising that the format of "theme and variations" has been offering inexhaustible opportunities for composers of all ages to show off their inspiration and skill. In this way, the two extremes are avoided: extreme boredom and utter confusion (and total loss). Although I have no way of furnishing proof, I suspect that a judicious choice from the middle part on the gray scale of boredom versus confusion, and a dynamic shift between the two extremes, is the main reason why a masterpiece is forever attractive to an enchanted and captivated audience. The aesthetic standards laid down by our genetic predispositions are,
268
F. T. Hong
however, not absolute and tend to exhibit cultural differences. Even cultural preferences are subject to modifications by environments (e.g., familiarity and habituation through repeated exposures and learning). It is of interest to note that, on a per capita basis, Japan appears to be a bigger market for Western classical music and operas than the United States (personal observation). On the other hand, the stability of a given culture tends to enforce conformity to the established rules and accepted aesthetic standards. However, strict conformity is anathema to creativity. Music history tells us that creative musicians often explore new ways of making music at the risk of alienating their contemporary critics and audience. The assessment of audience expectations is not usually done by trial and error. Even the most daring composers would do some preliminary verifications (self-assessment) before meeting the audience. A dilemma sometimes arises: acceptance by the contemporary critics and audience and acceptance by audiences of future generations are two entirely different matters. German-French composer Jacques Offenbach was one of the few to succeed on both counts. Offenbach was keen to audience responses. When he wrote a new operetta, he regarded the version performed on the first night as a basis for negotiation with the public; his assessment of audience reaction was a factor in his method of composition (p. 203 of Ref. 199). He usually revised and finalized his works only after this assessment. However, he knew that his comic operettas that pleased the Parisian audience and brought him fame and fortune could not possibly survive the posterity because of their lack of seriousness and artistic profundity. He set out to compose his only work for posterity: Les Contes d'Hoffmann (The Tales of Hoffmann), but died before completion; it was not certain whether he had ever completed the vocal (voices and piano) score, according to Faris.199 Today, the latter is Offenbach's only work that still graces the operatic stages worldwide on a routine basis. Ironically, he never had the opportunity to incorporate audience reactions into a subsequent revision. In general, a visionary composer knows when and how to bend or break the rules for the sake of creativity, and knows how to educate the audience so as to break new grounds, e.g., the use of dissonance instead of harmony and the creation of atonal music. Again, it is two opposing "forces" at work. Conformity demands adherence to existing rules, but the strife for perpetual novelty aspires to innovation. This was often reflected in the evolution of a composer's style — a trend which is often referred to as maturity. However, there existed cases of an abrupt transition that surprised the audience and the critics alike. An example of revolutionary compositions
Bicomputing Survey II
269
that were first rejected by the audience upon its premier performance but later became a standard-bearer easily comes to mind: Igor Stravinsky's Le Sacre du printemps (The Rite of Spring) (Chapter 4 of Ref. 81). Others were not so lucky. Austrian composer Gustav Mahler won his fame as a fine composer long after his death. It may be difficult for a modern listener to understand why Mahler's contemporary audience found his symphonies so grotesque and objectionable. The factors were multiple and complex. Among many monographs written about Mahler's music, we single out here an insightful analysis, by Lea,409 mentioning several factors that are relevant from the point of view of creativity research. Mahler was a loner, i.e., a nonconformist. Being simultaneously an insider and an outsider of the then musical establishment, he was a "marginal man," as Lea put it. In a number of ways, he often strayed away from the music tradition. Beethoven and Wagner were Mahler's heroes. Yet, in his nine completed symphonies, he did not stick to the forms established by these masters. For example, he seldom returned to the original key at the end of the piece, as was usually done by his predecessors. When he saw it fit, he was not even shy in altering Beethoven's well-established symphonies. As his critics put it, Mahler's music exhibits what the Germans called Stilbruch — a sudden shift of style or a combination of styles that appeared incongruous. Mahler's genius was reflected in his highly exploratory character trait. During his successful career as the music director and conductor of Vienna Court Opera, he was constantly experimenting with various ways of performing a number of operas in his chosen repertoire. For this practice, a young critic named Max Graf had the following dramatic description: "Of course one believed that all this experimenting and exploring would lead to a definite goal; but now we see to our astonishment that for this artist [Mahler] constant fiddling with performances is an end in itself" (pp. 142-143 of Ref. 74). Like other composers, he also utilized folk music and regional military marches. However, he often "denationalized" [de-regionalized] these elements, and gave them such an ironic, irreverent and unorthodox treatment that often left the audience groping for meaning. He also utilized music as a medium to carry his emotional turbulence — Mahler was one of Freud's most famous psychoanalysis patients (pp. 7-13 of Ref. 74). Lea eloquently summed it up: "It is psychoanalytic music in that it x-rays both folk music and the symphonic tradition; this is what makes conservative and nationalist critics most uncomfortable" (p. 59 of Ref. 409).
270
F. T. Hong
It is quite obvious that Mahler was motivated to compose music primarily by his own need — perhaps his own emotional need — instead of fame and fortune. His claim to fame and financial stability were achieved through his career as a conductor in prestigious opera houses, such as the Imperial and Royal Vienna Court Opera and the Metropolitan Opera of New York. He composed mostly during summer (a self-proclaimed summer composer), but his compositions brought him more jeers than acclaims during his life time. Music and art history provides numerous other examples to uphold the claim of sociologist Deci and coworkers:165'168 extrinsic rewards undermine intrinsic motivation — here, intrinsic motivation is commonly known as self-expression in music and art (see Sec. 4.21 for a detailed discussion). In the case of Jacques Offenbach, it was neither fame nor fortune, but rather a higher aspiration for a legacy in posterity, that brought out the best of his talent. Prom the above discussion, it is evident that picture-based reasoning plays a critical role in music, art and literary works. Although there is still room for important contributions to be made by scientists trained to practice exclusively rule-based reasoning, a musician or an artist so trained stands virtually no chance of success. It is no wonder that artist Edwards190 was among the first to recognize the important role played by the right cerebral hemisphere in art creation, even though she was ridiculed by some scientists (see also Sec. 6.14). Since aesthetic standards are formulated by consensus in a given culture and enforced by peer pressure, it is subject to changes over time. A possible change of taste of the audience, inadvertently brought about by our failing education system, poses a serious problem for the future of music and art. Consider classical music as an example. Although it is not necessary for a music audience to be creative, an audience must be somewhat in tune with a piece of music in order to appreciate it. This is akin to the relationship between a transmitter and a receiver in communication technology. Although perfect tuning is not required, a receiver must stay within the range of the tuning curve of the transmitter in order to achieve a "resonance." Resonance is precisely what gives a music listener the uttermost pleasure. Imagine that a listener, on the way home after attending a concert, tried to recreate a music passage heard during the concert — either by humming the tunes or by regenerating the associated tonal images — but could not quite get it right. Subsequent re-exposure to the same piece of music would reveal the ingenuity of the composer and evoke a sincere admiration and an ecstatic pleasure from the listener. After all, a composer is
Bicomputing Survey II
271
better equipped to conduct heuristic searching than an untrained or trained but untalented listener. Classical music contains more than just elaborate melodies but also the richness empowered by polyphony and harmony as well as the emotional content. It is the richness of its content that keeps bringing back the same audience who never gets tired of the same piece of music. Maintaining the interest and enthusiasm of a steady stream of aficionados constitutes the lifeline of music professionals and performers. However, even just to get close to the "resonance" curve requires a certain music proficiency. The proliferation of the fast-food culture and the craving for instant gratification imply that the future of music and art (particularly, classical music) is in serious jeopardy. In other words, the general decline in the ideal and quality of education that diminishes the educational objective to mere acquisition of job-related skills also threatens the survival of music and performing arts (with perhaps the exception of movies and popular music). With a general decline of curiosity and a concomitant rise of utilitarian pragmatism, fewer and fewer people are willing to invest the time and effort to acquire the minimal proficiency for the sake of music and art appreciation. The educational establishment is not helping at all (Sec. 4.22). Even science education is subservient to technology, because of the latter's obvious tie to economy. The rest of human endeavors which look superficially frivolous takes a backseat. The establishment strives to beef up science research and education at the expense of the English department in colleges and universities. It is no wonder that a significant fraction of our youngsters can hardly read and write (and cannot think), let alone acquire the proficiency to appreciate classical music and fine art. The prevailing peer pressure that discourages access to anything that can be labeled "uncool," such as mathematics and classical music, helps neither science nor music and art education. The above discussion was focused on creativity of designers of a piece of art work, but it did not cover the full dimension of creativity. It may be true that a painter alone is responsible for the entire process of creation, but the full expression of performing art also requires participation of performers. Good composers, playwrights and choreographers — designers of performing art — are often labeled as creative. However, good conductors (or singers, pianists, etc.), actors (or directors, coaches, etc.), and dancers — executors or actuators of performing art — are more likely to be regarded as talented, instead. Although a creative artist must be talented, a talented artist is not always regarded as creative. Creativity is more often attached
272
F. T. Hong
to a piece of art work than to the performance of the (same) work. This is perhaps because a talented performer is expected, by the lay public, to be technically impeccable and flawless so as to be able to faithfully execute the "program" dictated by a composer, a playwright or a choreographer. Thus, a dancer's pyrotechnics and a tenor's "high C" always draw a big applause even from the uninitiated, whereas only a sophisticated audience recognizes the nuance of a genuinely superb performance. In an exaggerated and somewhat derogatory analogy, the relationship between a performer and a composer is thought to be equivalent to that between a digital computer and a computer programmer. This view, even in its milder and less insulting version, is however inaccurate and unjustified because a piece of music or performing art contains both syntactic and semantic elements, whereas a digital computer program has pure syntax but no semantics. In this regard, sharp-eyed readers may have noticed my unorthodox and perhaps inappropriate use of the verb "parse" in reference to the task of "assembling" musical notes into a piece of music. It was a deliberate misstep so as to bring this important aspect to the readers' attention. According to Rosen's terminology, information that can be recorded in terms of symbols or words is to be regarded as a syntactic element. In contrast, information that cannot be fully expressed as syntax or algorithm is to be regarded as a semantic element. In other words, a syntactic system is a rule-based system, whereas a semantic system is primarily a patternbased system. These generalized definitions of syntax and semantics are consistent with their usage in languages, and were established by Rosen in reference to science and mathematics (Sec. 6.13). As pointed out by Rosen, in spite of some mathematicians' hope and efforts, there are always some sematic residues in a mathematical system that cannot be fully replaced by increasingly elaborate syntactic enrichments, i.e., it is not possible to fully formalize a mathematical system, as "painfully" demonstrated by the celebrated Incompleteness Theorem of Godel. Translated into plain English and applied to the setting of music composing, this means that no composer can fully dictate the semantic elements in terms of written instructions with ever-increasing details. Thus, a performer's technique — the equivalent of a natural scientist's domain-specific knowledge and experimental techniques — can only ensure accurate execution of the syntactic elements of a piece of performing art. As for the execution of the semantic elements, which a composer can only vaguely indicate as musical expressions, a performer must summon his or her artistic prowess; technique alone is insufficient. Although the syntactic
Bicomputing Survey II
273
constraint dictated by the composer is somewhat rigid, there is a considerable latitude for performers to execute the semantic part. It is this freedom of expressing the semantic elements contained in a piece of music that allows a performer's creativity to become manifest, and a creative performer simply seizes upon the opportunity and rises to the occasion. Apparently, the execution of the syntactic elements is primarily a left hemispheric function, whereas the execution of the semantic elements is mainly a right hemispheric function. Thus, just like scientific activities, artistic activities also require cooperation of both cerebral hemispheres. Since creativity is harder to come by than technique, it is readily understood why artists were among the first to point out the association of the right hemisphere with artistic creativity190 (Sees. 4.15 and 6.14). Finally, a brief comment about musical notations is in order. As pointed out in Sec. 4.19, clever notations could contribute to the advance of mathematics. Likewise, I speculate that Western musical notation might have shaped the development of Western classical music. The standard musical notation is both a digital and an analog way of representing musical notes in pitches and in rhythms. The relative position of a musical note on a five-line staff dictates the precise absolute pitch of the note, whereas the visual pattern formed by a group of notes also gives an analog representation, which can be grasped at a glance. Thus, one can visualize an ascending scale as a "climbing" pattern formed by a group of sequential notes without paying attention to the precise position of individual notes. Similarly, a chord (such as a triad) is readily recognized from the visual pattern formed by a group of concurrent notes. Similar conclusions apply to rhythms: syncopation is readily discernible through the characteristic "off-beat" distribution of notes on paper. Thus, a good sight-reader of music sheets reads a cluster of notes all at once rather than a single note at a time (cf. chunk-based model of working memory, Sec. 4.19). A composer or a conductor can view or visualize the parallel analog pattern of a symphonic score without prematurely overloading working memory. In brief, Western musical notation is conducive to parallel processing. Whether this was a contributing factor in enriching Western classical music or not is hard to prove. In conclusion, with appropriate modifications, the chance-configuration theory is also applicable to creativity in humanities. Thus, multiple kinds of intelligences reflect various "phenotypical" expressions of the same core mental facilities in different combinations, much like the splendor of 16.7 million visible colors, generated by means of appropriate combinations of three primary colors — red, blue and green — on a gray scale of 256 lev-
274
F. T. Hong
els. While the above analysis reaffirms the validity of Gardner's theory of multiple intelligences, it also shows that the chance-configuration theory is more fundamental than the theory of multiple intelligences. 4.21. Complex and interacting factors in the creative process: role of motivation, hard work and intelligence It is evident that a phenomenon as complex as creativity involves multiple factors. The overall expression of creativity thus depends on the particular, weighted combination of these factors; combinations in different proportions give rise to different types of talents (concept of multiple intelligences). However, the combinations are not simple linear combinations because these factors are not independent variables but are interacting with one another in a nonlinear fashion. Some of these factors are connected in series, but some others are connected in parallel. The absence or impairment of some of these factors may render a crucial step rate-limiting and creates a "bottle-neck" effect. Once a bottle neck is created, factors affecting other "downstream" steps (connected in series) become inoperative, whereas steps that are connected in parallel now assume a new critical role of providing an alternative relief while new factors come into play accordingly.1 Thus, a given factor may be necessary but not sufficient. Worse still, a factor may be contributory but not necessary. Boden pointed out that, during the creative process, a large number of constraints must be met en masse — i.e., in parallel — but none is individually necessary: each constraint "inclines without necessitating" (p. I l l of Ref. 75). The situation is similar to what transpires in molecular recognition, in general, and in vesicular transport, in particular (Sees. 6.1 and 5.7 of Chapter 1, respectively). Reductionists, who are accustomed to solving problems in relatively isolated biological systems and to thinking in non-interacting linear terms, are likely to be baffled by the complexity (cf. nature vs. nurture, Sec. 4.24). The common practice of dishing out one factor at a time in testing a hypothesis seems more suitable for the benchwork of a reductionist than the field work of an integrationist. By varying a single factor while keeping others constant, the factor under investigations may be inadvertently 'A convenient way of visualizing the complexity is provided by the analogy of a network of parallel and series resistors. However, the analogy is far from perfect. For instance, the representation of distributed elements in a neural network by lumped circuit elements in an electric network is particularly problematic.
Bicomputing Survey II
275
rendered inoperative. For example, a lack of appreciation of the complex interactions has led to the obviously absurd conclusion that small group teaching makes no difference than conventional teaching235 (see also Sec. 4.25). There were plenty of factors that might cause the approach to fail. For example, if the instructor lacks the skill to guide the discussion or reveals the "standard" answer of a problem prematurely, the students may simply revert to traditional rote learning. The gray-scale nature of complex phenomena frequently rears its ugly head in the investigations of higher brain functions. This leads to another problem in psychometric investigations. It is fashionable to divide a gray scale into neat pigeonholes for purposes of description, but there is always the danger of "the hardening of the categories," as someone has aptly said. This crude, simplistic categorization sometimes leads to sample heterogeneity. It is sample heterogeneity that was in part responsible for the confusion regarding the role of motivation in creative problem solving. Eisenberger et al.191 investigated how the explicitness of promised reward affects creativity in a group of preadolescent schoolchildren. These investigators concluded that promised reward increases creativity, if there is currently, or was previously, an explicit positive relationship between creativity and reward. I have no specific reason to doubt the validity of such an investigation under the stated conditions. Such a conclusion is, however, extremely misleading if one tries to extrapolate it to other age groups, because preadolescent school children are considerably more innocent than other age groups. Motivation in adults is a complex issue: it usually involves a combination of multiple factors. Although this complexity is widely known among the general public, some administrators in institutions of higher education seem unaware of it and continue to draft policies under the assumption that people are motivated by a single factor. Thus, premedical students know how to impress the admissions committee by claiming that they choose the medical profession solely because they wish to answer a calling to help patients. Likewise, many universities use monetary rewards (euphemism: merit raises) to promote productivity and excellence in research and/or teaching; these administrators must have assumed that most academic professionals have a major ulterior motive — monetary gain — in their pursuit of excellence. As Deci and Ryan pointed out, the practice of using extrinsic rewards to ensure desirable behavioral outcomes gratifies caretakers' needs that may or may not be related to the designated missions (see p. 129 of Ref. 168).
276
F. T. Hong
Thus, at least in certain cultures, parents use extrinsic rewards to suppress children's noisy exploratory activities and bothersome, unending curious questions simply because the parents need a quiet moment. School healthcare officers prescribed Ritalin (methyl phenidate) primarily to silence disruptive students, despite their claim that the purpose was to enhance the students' capacity to learn. The school officials' often astonishing lack of concern about the dubious effectiveness and harmful medical side effects of the Ritalin treatment betrayed their true and ulterior motive (p. 117 of Ref. 486). Likewise, university administrators use extrinsic rewards to encourage the investigators to seek government funding simply because they need to enhance the institutional revenue. Investigators are often discouraged to seek funding from private agencies that do not pay indirect cost return (institutional overhead), or are even penalized for doing so — part of the direct cost (funds for research) is to be deducted in order to recover the lost institutional revenue. Regarding the dubious uses of extrinsic rewards, Deci and coworkers166 and others have shown that extrinsic rewards can undermine intrinsic motivation. On the other hand, Eisenberger and Cameron's group102'192'193 claimed otherwise. A controversy erupted between the two camps in 1999 (see, for example, Ref. 415, 194 and 167). The practice of using extrinsic rewards to enhance creativity and educational excellence is not compatible with our understanding of the creative process. Reward- or approvaloriented motivation has been associated with diminished cognitive flexibility, reduced depth in processing of new information, impaired integration of new information with prior knowledge, and diminished creativity in general.611 This observation was corroborated by experiments performed by Viesti:689 monetary rewards interfere with insightful problem solving (see also p. 185 of Ref. 52). Similar conclusions were reached in experiments regarding artistic creativity and writing creativity.11'12 Like other human endeavors, pursuits of excellence are motivated by multiple factors: pleasure, curiosity, desire of independent mastery, self-fulfillment/self-actualization (including proof of self-worthiness), peer recognition, fame, vanity, fortune and power, which form a continuous spectrum, with intrinsic motivation at one end, and vanity and extrinsic rewards at the other end.168'165'144 Roald Hoffmann, a 1981 Nobel Laureate in chemistry, cited the following as his own:718 "to be secure, to have control, to be praised, a creative urge, a sublimation of procreation, curiosity, wanting to understand, wanting to improve the world." An average normal person is probably motivated by a combination of all of these factors in different pro-
Bicomputing Survey II
277
portions; there is an individual difference of where the peak of the spectrum is located (personality factor). There is little doubt that an element of pleasure is associated with the creative process itself. In general, pleasure tends to associate with an "aha" experience (Sec. 4.10). As Koestler376 pointed out, the sudden and intense pleasure ("Eureka!") that erupts during a creative process is akin to bursting into laughter in humor; both are tension-releasing (see also p. 170 of Ref. 52). The pleasure derived from the creative process provides the addictive (self-reinforcing) effect. However, a creative act often involves a lengthy and protracted effort, and, for most people, it is unlikely to be sustained by pleasure alone. On the other hand, high creativity is unlikely to be sustained by vanity and extrinsic rewards alone, because a creator often has to endure ridicule and dejection by peers and other hardships (including poverty and persecution by the establishment) during a lengthy creative process. The cases of Copernicus and Galileo are well known. Aside from persecutions inflicted by the church or state establishment, unrelenting rejection by peers within the scientific community is yet another hardship to endure. For example, Ludwig Boltzmann's work on statistical mechanics was misunderstood and severely attacked, during his life time, by many of his contemporaries. Ill and depressed, he took his own life in 1906 (see p. 161 of Ref. 530). Motivation stemming from either end of the spectrum often exerts opposite effects on creativity.611 Conformity is anathema to creativity. 611>154>653 Conformists tend to respond positively to extrinsic rewards whereas creative performance tends to require strong task involvement, which may lead to paths that run afoul with the establishment. Crutchfield154 described conformist motivation as ego-involved. Conformists care immensely about how they are perceived by peers and the establishment. Thus, their primary goal is to protect or enhance their self-image and/or to avoid being rejected or ostracized by peers and the establishment. They often stop exploration when they encounter-an "ego-threatening" novel problem; a strong desire to succeed instantly (so as to protect or enhance their self-image) inadvertently restricts the search space (personal observation). The mental habit eloquently summed up by the phrase "fear of failure" may be one of the reasons why highly educated people often exhibit an apparent lack of common sense •.> They may not actually lack common sense but rather they elect not j
In contemporary usage, common sense is sometimes equivalent to common knowledge or knowledge of the ordinary sort, as opposed to highly specialized domain-specific knowl-
278
F. T. Hong
to exercise it for fear of letting their common sense run afoul with expert knowledge, thus losing their hard-earned image as an intellectual or failing a critical examination. This may just be my speculation, but is supported by an anecdotal observation regarding the use of either a street mode, in accordance with common sense, or a school mode of thinking, which often defies common sense (see Sec. 4.24 for the detail). Bastick pointed out that, although mild emotional involvement has an integrating effect on perception, extreme emotional involvement — e.g., due to an ego threat — can produce the disorganization of emotional blocking, giving inaccurate perceptions. This trend is described by the Yerkes-Dodson law, which shows the peak performance in the medium range of emotional involvement and declining performances towards both extremes737 (see also Sec. 4.8 regarding the influence of affect on divergent thinking). Another way in which conformist pressures may inhibit creativity is by reducing the willingness to follow through on a new idea or a new course of action, for fear of disapproval by peers or being ostracized by the establishment.611 Thus, vanity can be an inhibitory factor for solving a difficult problem that threatens the problem solver's ego: it inhibits exploration and discourages risk-taking by invoking a fear of failure. In contrast, the lure of a challenging problem and a strong desire to achieve self-fulfillment enhance creativity and help overcome the roadblocks to the goal. Presumably, the same considerations also apply to fine art and music. Mozart said, in the letter partially cited in Sec. 4.20, "For I really do not study or aim at any originality." I believe that Mozart was telling the truth since modesty was not one of his strengths. So far we have restricted our discussion, regarding the influence of personality and motivation, to the search-and-match phase of creative problem solving. Personality also affects the outcome of problem solving at the verification phase. It is well known that conformists, by definition or by default, are usually insensitive to errors committed by the authority for fear of antagonizing the authority. It is also well known that individuals with a strong ego are usually insensitive to their own errors because admitting errors is ego-threatening. In both cases, the insensitivity to inconsistencies may not be the consequence of a conscious and voluntary decision; edge. Here, we adhere to the definition given in Webster's Third International New Dictionary: good [sound] judgment or prudence ... as free from emotional bias or intellectual subtlety, or as not dependent on special or technical [domain-specific] knowledge. Therefore, a lack of common sense is stupidity, by the lay public's standard, rather than ignorance.
Bicomputing Survey II
279
self-deception (denial) is a convenient psychological defense mechanism. In contrast, high sensitivity to inconsistencies is the hallmark of highly creative people. Often, creative people are equally critical to themselves and to others, presumably because of their own keen awareness of the possibility of self-deception and perhaps also because of their unwillingness to fool others even if the error is unlikely to be exposed — a manifestation of strong task involvement. This character trait is evident in the philosophical writings of Rene Descartes146 The peril of self-deception is always lurking in "the unconscious," thus waiting to rear its ugly head when suspended judgment should have be called for instead. When an individual's tolerance of ambiguity is exceeded, it is always expedient to summon self-deception to the rescue so as to persuade oneself to accept those conclusions that ought to have been rejected in hindsight. I suspect that no one, including publicly certified geniuses, can confidently claim to be immune to self-deception. How then does one reconcile the above conclusions with the cherished capitalist tenet that (financial) competition breeds innovations? Presumably, extrinsic rewards may enhance the types of innovations that are nonthreatening to the social norm by encouraging exploration; technological innovations that lead to the generation of wealth are blessed by the society. In contrast, extrinsic rewards tend to suppress high creativity and inhibit unconventional ideas that lead to novel solutions of fundamental problems. Being the ideas of a top-down approach, extrinsic rewards seldom encourage nonconformity. In view of the multiplicity and complexity of motivation, it is inappropriate to mix different kinds of motivations under the same broad category; doing so often introduces sample heterogeneity. Take motivation by vanity as an example. An individual's aspiration for a legacy in posterity (posthumous vanity) is certainly very different from an individual's desire to please the peers (conformist vanity) (cf. Jacques Offenbach, Sec. 4.20). The former strengthens task involvement, but the latter demands instant gratification. Mixing of the two subgroups thus leads to mutual cancellation of the opposing effects on creativity. It is not an uncommon practice for investigators of creativity research to mix those who are better characterized as social luminaries or leaders, in the same sample, together with genuinely scientific or artistic creators (e.g., Ref. 155). Admittedly, it takes some mental agility, unusual foresight, or even a certain degree of "creativity" to achieve social prominence and leadership, but it takes a different kind of "creativity" than that pertains to art and science. For example, although a strong ego is anathema to creativ-
280
F. T. Hong
ity in science and art, it is a major driving force towards social prominence, a valuable asset of leadership aspirants, a hallmark and almost a prerequisite of leaders. Furthermore, Simonton demonstrated that creativity peaks at the baccalaureate level (an inverted U-shaped relation), whereas leadership bears an inverse relationship with the level of formal education (pp. 64-73 of Ref. 632; pp. 120-123 of Ref. 633). Thus, behavioral experiments, with a mixed sample containing opposing types, can yield data with striking disparity: some experiments show no effect if the opposing types are evenly distributed, and others show either positive or negative effects if one type outnumbers the opposite type. In case studies of creativity, hard work has consistently been found to be on the list of characteristics of creative individuals. On the other hand, attempts to correlate intelligence and creativity have yielded conflicting results.289'273'654 These and other reasons led Hayes to emphasize the role of motivation, hard work and domain-specific knowledge in creativity. The primary thrust of his position is that "differences in creativity have their origin in differences in motivation" rather than in innate cognitive abilities (pp. 143-144 of Ref. 289). Hayes claimed that motivation to be creative and to be independent is the primary cause of high creativity; the rest simply follows. Here is his reasoning. Motivation to be independent leads an individual to reject goals that are "trite" or "boring," and set goals that are worthy of being creative. Motivation to be creative leads to willingness to work hard, which, in turn, allows the individual to define harder and better problems, set higher standards, become more self-critical and acquire a larger body of information. He believed that this extra information might be used directly to make an essential inference or might provide an analogy that would suggest a solution path to solve a creative problem. Furthermore, Hayes attributed flexibility of a creative individual's mind, as well as willingness to sacrifice minor objectives (for the sake of major objectives), to motivation to be creative. Hayes' view is valid only for the kind of creative problem solving that can be handled by means of rule-based reasoning (creative work of the first kind) but not the kind of problem that requires picture-based reasoning (creative work of the second kind). Although motivation to be creative may allow an individual to avoid wasting time and effort in trivial projects, setting a creative goal alone does not significantly enhance the probability of accomplishing creative work of the second kind that includes most of accomplishments worthy of a "paradigm-shift" label. Furthermore, Hayes' view regarding motivation in creativity appears to contradict the work of
Bicomputing Survey II
281
Deci and coworkers. Besides, a creative goal pertaining to creativity of the second kind is often not well defined and such a creative act sometimes cannot preprogrammed (cf. Sec. 4.26). Motivation to be creative can hardly set one on the "right" direction and the "right" path to creativity mainly because the path is not known ahead of time, or, sometimes, not even known after the fact. I suspect that Hayes might have confused motivation to be creative with task involvement driven by intrinsic factors, such as curiosity and other nonconformist character traits. Hayes' assertion regarding flexibility of a creative mind seems to be at odds with our understanding of personality and character traits. Although it is possible to change one's mind habit over a long period, it is unlikely that a flexible mind can be summoned at will by mere motivation to be creative. I suspect that the opposite may be true, in view of the Yerkes-Dodson law. In summary, I suspect that Weisberg's706 and Hayes' excessive emphasis on the role of motivation in creative problem solving might also be due to a failure to distinguish various types of motivation from one another. Likewise, the schoolchildren in the investigation of Eisenberger et al.191 might not be sufficiently mature to acquire a fear of failure and/or to lose their curiosity; extrinsic rewards simply got them busy to explore. As we shall see, it is an entirely different matter for biomedical students under pressure to score good grades (Sec. 4.22). Next, we shall discuss in detail the issue of hard work, while postponing a discussion of the role of domain-specific knowledge to Sec. 4.22. I suspect that the emphasis of hard work, prevailing in American culture, might have been influenced by Thomas Edison's remark: "Genius is one percent of inspiration, ninety-nine percent perspiration" (pp. 305-306 of Ref. 211). It might be valid to apply his remark to his invention of incandescent lamps: he did launch a blind search — so-called Edisonian experimentation — for suitable materials, and the search was mostly hard work. Regarding his serendipitous invention of phonographs, it was the other way around: ninety-nine percent inspiration and one percent perspiration. In other words, he had extended his attention so that, while he was working on one project, he recognized his opportunity for another project which had been put on the back burner (Sec. 4.9). Thus, the role of hard work in creativity varies from case to case. Hayes did make a distinction between two kinds of hard work: a) hard work directed to satisfying the demands of a boss, some set of standards, or the interests of the public, and b) hard work motivated by a desire to be in charge of one's own actions. In other words, there are two very different
282
F. T. Hong
kinds of hard work: one stemming from intrinsic motivation and the other stemming from the need to meet external demands or in response to extrinsic rewards or punishments (externally driven). If creative individuals' hard work stems from intrinsic motivation, then they essentially work for themselves rather than for the "bosses." Here, the bosses can be broadly construed as institutional heads, benefactors, funding agencies (in the case of academic researchers), and/or audiences (in the case of musicians and artists). In this regard, the distinction between hard work and indulgence or, rather, between hard work and obsession (in the word of Policastro and Gardner523) is blurred. Andrew Wiles also characterized his eight-year long hard work devoted to the task of proving Fermat's Last Theorem as "obsession" (see cited remark in Sec. 4.9 and a detailed description of his mind's journey in Sec. 4.8). My personal, though limited, observations suggest that hard work driven by extrinsic motivation tends to be more focused and knowledge so acquired tends to be directly related to the success of the intended project. There is simply no time to waste on the luxury of acquiring knowledge outside of one's immediate expertise. In contrast, hard work stemming from intrinsic motivation may lead to acquisition of broad-based, though not necessarily useful, knowledge because the act is driven by curiosity rather than a strong desire to succeed. Technically speaking, the latter activity cannot be regarded as hard work, because pursuing "personal indulgence" is an undesirable distraction, from the boss's point of view. Inadvertently, the latter activity expands the search space and establishes what Bastick referred to as multicategorizations, thus enhancing creativity. Just like the issue of motivation, failure to recognize this subtle distinction led to sample heterogeneity in investigations of the role which hard work plays in creativity. As for the ambiguous correlation between creativity and intelligence, my lack of psychometric expertise prevents an in-depth analysis. At a superficial level, the outcome was not surprising, in view of the multiplicity of factors. For example, two "intelligent" individuals — whatever that means — may be equally adept at the search-and-match phase, but only one of them has the discipline to follow through on the problem, has sufficient domain-specific knowledge and is sufficiently meticulous and careful to complete the verification phase without committing a single error. As a consequence, one leads to a creative act but the other does not. If intelligence is related to intuition, then the ambiguity is perfectly understandable as a consequence of sample heterogeneity. On the other hand, intelligence
Bicomputing Survey II
283
tests may not be consistently reliable. According to Bell, Poincare once submitted to the Binet tests after he had already been acknowledged as the foremost mathematician and leading popularizer of science of his time (p. 532 of Ref. 55). He made a dismal showing worthy of an imbecile. Something was quite wrong here. Labeling an unequivocal genius as an imbecile simply defied common sense. In my opinion, Poincare's dismal performance disgraced the Binet tests rather than Poincare himself. Here, I am not trying to condemn the validity of IQ tests in a wholesale manner, based on a single case of counter-example: IQ tests do address some, if not all, of the mental skills that are considered indispensable for scientific creativity. However, the attempt to represent something as elusive as intelligence in terms of a numerical score is inherently problematic: it is tantamount to treating an analog pattern as if it were a digital pattern (cf. Sees. 4.6, 6.6 and 6.13). On the other hand, the lack of even a modest correlation between intelligence and scientific creativity also defies common sense and demands an explanation. Perhaps using a different set of criteria to assess intelligence may lead to an entirely different conclusion. In summary, the lack of appreciation of the gray-scale nature of factors involved in the creative process has contributed to sample heterogeneity in the creativity literature, thus rendering the reported data difficult to interpret, or even misleading. Sample heterogeneity is also a serious problem in meta-analyses, customarily done in combining a large group of similar data. However, the judgment regarding "similarity" is highly subjective and inappropriate grouping is not uncommon. As Lepper et al.415 properly warned, a meta-analysis of the diverse literature of motivation must be performed with great caution. Thus, it appears that flawed theories of human behaviors often led to experimental designs that produce data in support of the locally logical but globally absurd hypothesis (see comments of Lepper et al.415 about theorydriven hypotheses). The outcome is: the hypothesis becomes a self-fulfilling prophecy, except that the prophecy may not be about gospels but rather about fallacies sometimes. On the other hand, the success of bird navigation and orientation research, cited in Sec. 4.18, owed much to the investigators' imaginative hypotheses that avoided sample heterogeneity. Paradoxically, having a decent theory in advance is a prerequisite of designing a meaningful behavioral experiment or meta-analysis. No wonder conclusions drawn from this type of investigation is notoriously model- or theory-dependent.
284
F. T. Hong
4.22. Education and training: present educational problem Let us go back to the question of how and where education and training can help develop creativity. Modern educational theories address five "orientations" to learning:297'67'456 behaviorist, cognitivist, humanist, social learning, and constructivist.k The breadth and complexity of the topic prohibits a comprehensive review. Although most of these educational theories emphasize the delivery end of teaching, it is instructive to examine what transpires in the receiving end: how students respond to teaching and examinations. While the thought process is primarily cognitive in nature, the act of thinking is most likely a habit. Training to think by repetition may lead to good habits by reducing cognitive strain which may initially be generated by the act of thinking. It is natural to ask: Are there any systematic efforts in training students to master visual thinking? I know with confidence that it can be done because of my own experience in helping students. A quick glimpse into the educational literature revealed that the mainstream thought is to accommodate student's diverse learning styles by customtailoring teaching methods. Here is a quotation of what National Research Styles Institute (NRSI) advocates, cited in an article by Stahl who disapproved it (p. 27-28 of Ref. 647): We all have personal styles that influence the way we work, play, and make decisions. Some people are very analytical, and they think in a logical, sequential way. Some students are visual or auditory learners; they learn best by seeing or hearing. These students are likely to conform well to traditional methods of study. Some people (we call them "global learners") need an idea of the whole picture before they can understand it, while "analytic learners" proceed more easily from the parts to the whole. Global learners also tend to learn best when they can touch what they are learning or move around while they learn. We call these styles of learning "tactile" and "kinesthetic." In a strictly traditional classroom, these students are often a problem for the teacher. She has trouble keeping them still or quiet. They seem unable to learn to read, (http://www.nrsi.com/about.html). k
Constructivism in the context of education and constructivism in the context of the sociology of science must be evaluated separately. I endorse the former but disapprove the extremist version of the latter (see Sec. 6.7).
Bicomputing Survey II
285
This quotation demonstrates some of the pitfalls of know-how without know-why. First, it reveals an unnecessary proliferation in terminology and categorization. Learning styles basically boil down to two major categories: picture-based and rule-based learning, as previously explained. The so-called global learners are obviously picture-based; whether they are visual, or tactile and kinesthetic is a secondary-level distinction. Tactile and kinesthetic learners are exaggerated types of visual learners before the skill of manipulating mental imagery is acquired. In contrast, auditory, analytical, logical, or sequential learning are variants of rule-based learning. NRSI researchers did not make the fine distinction of understanding at the rule level and at the picture level (Did NRSI researchers took students' introspective report of having achieved understanding at face value without validating it with tests or observations?). NRSI researchers also failed to realize that global learners need to master logical reasoning in order to become effective thinkers/learners. From the foregoing discussion, it is obvious that reinforcing or perpetuating rule-based learning is tantamount to permanently handicapping its practitioners without a timely intervention. However, without a valid field study, my view can easily be dismissed as subjective and lack of scientific evidence, even though it is a natural conclusion following our present understanding of creative problem-solving and learning. Nevertheless, as Stahl pointed out according to a number of reviews, all of the learning styles research failed to find that matching children to reading methods by preferred modalities did any good.647 He also pointed out that the NRSI learning styles approach diverts a significant part of a teacher's time to its detailed implementation, at the expense of other worthy activities. Stahl further noted that many of the investigations, on which the approach was based, were student theses, which never made their way to regular journals. So, why then was the approach so popular? I can only speculate here: Because it is politically correct to do so. The learning styles approach accommodates, perpetuates and reinforces a student's original learning style without active interventions. It was like making separate rooms available to smokers and non-smokers without making any attempt to help smokers quit smoking. It is therefore not so harmful as actively enforcing rule-based learning. In contrast, the prevalent practice of using standardized testing to evaluate student performances does more harm because it discourages students to think (cf. p. 22-29 of Ref. 653). But why? Rote memorization is highly effective in taking the type of examination that requires simple recall of facts or regurgitations of arguments. Most, if not all, competent teachers know how to avoid writing questions
286
F. T. Hong
that require only simple recall, unless they really want their students to commit some selected crucial facts to memory. Among the types of problems that require "understanding," two subtypes can be identified. A type of problem can be solved by a recapitulation of a well-established and previously learned procedure of reasoning based on manipulation of learned rules ("cookbook recipes"). This type of problem is adequately solved by rule-based reasoning. Rule-based reasoning/learning is highly effective for the type of examination that requires only the mastery of known rules but appears to be a poor strategy for situations that demand new rules to be generated de novo from already-acquired knowledge, by means of recombinations of existing knowledge modules (cf. Einstein's letter, Sec. 4.8). The alternative picture-based reasoning must be summoned to solve novel problems or problems that require generation of new rules or new recombinations of old rules (see, for example, Ref. 324). In principle, it is possible to write a standardized test question to evaluate creative thinking. However, such a question quickly loses its "luster" (freshness) after repeated exposures. Students who have prior knowledge about a particular test question soon master the detailed procedure of "cookbook-recipe" reasoning, and learn to solve the problem by faithfully repeating the prescribed procedures. Thus, a test for novelty quickly degenerates into one for routine.239 Some teachers claimed that the educational purpose is still served, if students have learned how to manipulate established rules to get the right answers. Some educators believed that this type of standardized question is still better than the type of simple-recall question, because at least it discourages rote memorization. However, the practice does not foster truly independent and critical thinking. Perhaps the most detrimental effect of standardized testing is the restriction imposed on the search space: the students need not search beyond the scope prescribed by the multiple statements in a test question. It is more or less like learning to ride a bicycle with training wheels permanently attached. The assurance of a single correct answer in a test question and the necessity to choose one and only one answer diminishes the students' tolerance of ambiguity, with which intuition as well as creativity was found to be correlated (see, for example, pp. 315-316 of Ref. 52 and pp. 223-225 of Ref. 653). Thus, standardized testing is a powerful way of enforcing convergent thinking because the "violator" (divergent thinker) will be penalized for lost time. Various commercial examination preparatory courses train students to develop the skill of quickly eliminating "unwanted" statements. Most of the time the student's thought process ends when only a single
Bicomputing Survey II
287
statement survives the elimination process. Standardized testing thus fosters a poor habit of negligence in verification (personal observation). Needless to say, a prolonged exposure to standardized tests diminishes the ability for creative problem solving.680 The acquired habit of answering the easiest standardized question first is often transported to the work place: whenever a choice is allowed, an employee tends to pick the least challenging project (G. Matsumoto, personal communication). In view of the difficulty for teachers to generate novel standardized test questions that demand novel problem solving, students tend to settle on rule-based learning, because picture-based learning offers few advantages. Traditional teaching approaches also reinforce this trend. As pointed out by McDermott,448 instructors often teach from the top to down, from the general to the particular, and students are not actively engaged in the process of abstraction and generalization. In brief, very little inductive thinking is involved; the reasoning is almost entirely deductive. Perhaps a student's learning style is also habit forming. How new knowledge is encoded and how adaptive filtering is applied during encoding may also be acquired by training. A good teacher can also help students finetune their cognitive abilities in detecting subtle inconsistencies and thus enhancing their ability to detect the occasional fallacies of dogmas, championed by the establishment (see Ref. 439; cf. Ref. 399). However, it takes more than a cultivated sensitivity to foster creativity. It is well recognized that an oppressive atmosphere that encourages conformity and discourages challenges to authority is not conducive to divergent thinking.154'653'152'611 The unleashing of divergent thinking is partly habit forming and partly cognitive cultivating. Last but not least, since education and training is a Darwinian process enforced with rewards and punishments, the grading and evaluating systems crucially affect the style and the outcome of learning. After all, most students merely try to adapt to the imposed demands and to survive in the existing educational system (see Sec. 4.24 for concrete examples of how students altered their learning style in order to survive). Thus, the responsibility of enforcing the correct learning habit rests upon the teachers. However, social Darwinism also works on the teachers' side. A teacher's effectiveness is usually evaluated by students' written and often anonymous evaluation and by students' performances on standardized tests at the national level. As long as a majority of students' primary goal is getting good grades rather than actually learning the knowledge offered by the course work, writing thought-provoking standardized test questions is a practice that tends to make a teacher unpopular among students, thus
288
F. T. Hong
inviting poor student evaluation.674 Enormous social pressure, especially from the administration and parents, to shore up student performances also tends to encourage grade inflation (by either lowering the threshold of passing grades and/or writing easier test questions) and to encourage outright frauds,357 thus masking the underlying serious problem. Fortunately, there are no lack of conscientious teachers. However, even they must face a dilemma. A colleague of mine told me that she always wrote new questions, because she knew that students memorize their huge collections of old standardized test questions. However, in pursuit of novelty, she soon ran out of good questions, and was forced to write questions of dubious quality. As a consequence, the examinations look more and more like "trivial pursuits," which, in turn, demand more and more intensified rote memorization, thus exacerbating the vicious cycle. The situation between the testees and the testers is akin to the arms race; it will continue until one side can no longer afford to continue. Are teachers merely the victims of circumstances and the reluctant perpetrators of the current educational problems? I think not. As a veteran teacher in a medical school myself, I continue to be amazed by our own collective lack of attention to students' cognitive development. The accelerating pace of information explosion has led many teachers to give an increasingly exaggerated emphasis on the transfer of domain-specific knowledge at the expense of cognitive development. In my opinion, only those fractions of domain-specific knowledge necessary for building a skeleton of concepts and framework of fundamental understanding need to be emphasized; detailed knowledge that is useful, but hard to remember, should be omitted or deferred to an advanced course in the future. It would be a waste of time to require medical students to memorize detailed knowledge however useful it may be. Prior to this technology-generated information explosion, West712 warned that most knowledge transferred to medical students in school will either become obsolete or forgotten, by the time they practice medicine. It is even more so today. Few people can remember, for an extended period, factual information acquired by purely rote memorization, if they use it only once at the time of examination. However, experience indicates that clinicians can remember this type of knowledge, such as normal blood electrolyte concentrations, without much effort if they use it routinely on a daily basis. Some pundits insisted that students must commit important facts to memory, because "you never know when it will become useful." They did have a point to make, but their position was inconsistent. What
Bicomputing Survey II
289
they deemed important is only a limited fraction of important domainspecific knowledge, in view of the ever-accelerating information explosion. What they had chosen to neglect might be just as useful. For example, the traditional neglect of physics in medical education often put physicians specialized in nuclear medicine in a difficult position, by preventing them from gaining sufficient knowledge of their own expertise. Criteria for selecting topics to be included in the medical curriculum are therefore somewhat arbitrary. The pundits overlooked the big picture: unless one is idling, time wasted in memorizing soon-to-become-obsolete or soon-tobe-forgotten knowledge must come out of time that could have otherwise been diverted to building a conceptual framework. Without the guidance of a sound conceptual framework, fragmented knowledge can sometimes become a dangerous weapon, which may inadvertently be used to kill patients through misuses or abuses of memorized rules (Sec. 4.7). As Alexander Pope once said, "A little learning is a dangerous thing" (p. 298 of Ref. 50). A combination of rule-based learning and compartmentalization of knowledge culminated in the following true story: a patient with pulmonary edema and kidney failure was prescribed with a diuretic to relieve the edema, but the ailing kidney would not cooperate (N. Rossi, personal communication). Regarding the excessive emphasis on domain-specific knowledge, John West, a well-known physiologist and textbook author, pointed out a chilling prospect: "In fact, in our medical school there is continual pressure from some courses to increase the amount of material and expand their courses because this can bolster their case for more resources" (p. 389 of Ref. 711). On this disturbing expose, I have no comment other than repeating what Deci and coworkers once claimed: extrinsic rewards can undermine intrinsic motivation. If we, the learners, must choose and pick what we ought to learn and retain, a nagging question remains: In view of the common understanding that some, if not all, discoveries cannot be planned, how do we know, ahead of time, that a particular piece of knowledge that we presently judge as unlikely to be used later will actually not be needed for solving an important problem in the future? The answer is: we do not know for sure, but we do our best to judge. (Textbook authors do just that.) It is therefore important to retain the ability to learn new knowledge without the formal help of a teacher after we leave school for good. In other words, students should make sure that they have learned how to learn while they are still in school. This ability will enable us to learn, in the future, what we have never learned, or to relearn what we have but did not learn well so as to make
290
F. T. Hong
up the deficiencies, incurred by our own poor foresight or our own lack of previous efforts. However, it may not always be possible to make up a deficiency. For example, one sudden recognizes, in mid-life, the need to learn advanced physics but finds it extremely difficult to do so without an adequate mathematics background and finds it even more difficult to relearn mathematics without going back to the most fundamental aspects of mathematical knowledge and starting from scratches. In this regard, the planning of a sound education must give priority to those topics of fundamental knowledge that would be difficult to acquire at an older age (e.g., foreign languages) or without a systematic and extended effort (e.g., mathematics and other basic and general topics). Ironically, the peril of overemphasizing domain-specific knowledge is not a new revelation. It is refreshing to revisit a prophetic position, taken by Robert Hutchins as early as 1931. The following lines were originally quoted in Rosen's book Anticipatory Systems (p. 3 of Ref. 568). The matter is of sufficient importance to warrant a full reproduction of the original quotations here. Science is not the collection of facts or the accumulation of data. A discipline does not become scientific merely because its professors have acquired a great deal of information. Facts do not arrange themselves. Facts do not solve problems. I do not wish to be misunderstood. We must get the facts. We must get them all .... But at the same time we must raise the question whether facts alone will settle our difficulties for us. And we must raise the question whether ... the accumulation and distribution of facts is likely to lead us through the mazes of a world whose complications have been produced by the facts we have discovered. Elsewhere, Hutchins said, The gadgeteers and data collectors, masquerading as scientists, have threatened to become the supreme chieftains of the scholarly world. As the Renaissance could accuse the Middle Ages of being rich in principles and poor in facts, we are now entitled to enquire whether we are not rich in facts and poor in principles. Rational thought is the only basis of education and research. Whether we know it or not, it has been responsible for our scientific success; its absence has been responsible for our bewilderment ...
Bicomputing Survey II
291
Facts are the core of an anti-intellectual curriculum. The scholars in a university which is trying to grapple with fundamentals will, I suggest, devote themselves first of all to a rational analysis of the principles of each subject matter. They will seek to establish general propositions under which the facts they gather may be subsumed. I repeat, they would not cease to gather facts, but they would know what facts to look for, what they wanted them for, and what to do with them after they got them. It is apparent that Hutchins, a university administrator by profession and a lawyer by training, understood science more profoundly than most biomedical educators that I have ever known. Why did medical educators pay so little attention to the cognitive aspect of medical education and medical practice? I suspect that this may have something to do with a gross misunderstanding of the cognitive process underlying intellectual activities, in general, and medical practice, in particular. In the collective perception of medical educators, the problem-solving ability and the potential of achievements as a physician or as a researcher are often equated to the acquisition and mastery of a huge amount of domain-specific knowledge alone. Little attention has been paid to training and teaching the students how to learn after they get their medical degree. Rather, the so-called continuing medical education is largely cosmetic and is often laden with all kinds of gimmicks for the physicians to gain credits without much work. As will be demonstrated next, it is not how much one knows, but how one makes use of what one knows. Shekerjian's interview of Stephen Jay Gould, a MacArthur Fellow and a well-respected essayist, may clarify this point (pp. 2-6 of Ref. 610). In spite of many readers' impression to the contrary, Gould confessed that he was not particularly well read but he could use everything he had ever read. Although most others access only a small fraction of what they have read, Gould claimed that he was using a hundred percent of what he had, thus giving an impression of knowing fifty times more than he actually did. Gould tried earnestly to tell his interviewer that what he really was good at was "making connections"; he was always trying to see a pattern in his zoological field work (cf. Bastick's notion of multicategorizations, Sec. 4.10). It was a low-key way of saying that his real strength was cognitive ability. The interviewer did not elaborate on why Gould could retain a wealth of knowledge and maintain its accessibility at his finger tips. I personally do not think that it was because he had a good
292
F. T. Hong
memory. A significant fraction of modern medical students usually have a good memory to retain factual information until examination time, but they often cannot relate memorized and relevant information to a particular novel problem. Based on our foregoing discussion, it is likely that Gould built the connections while acquiring domain-specific knowledge and integrating it into a relational knowledge structure for ease of subsequent retrieval. That may be why he was good at "making connections." Furthermore, the connections were reinforced each time he used the knowledge. In other words, his essay-writing activity actually strengthened his memory of the previously acquired domain-specific knowledge. I even suspect that he was capable of using more knowledge than he had acquired because making connections allowed him to formulate novel thoughts. He must be highly exploratory not only during his field trips but also during his reading sessions. There is little doubt that Gould was very good at parallel processing of knowledge; he claimed that "[he had] no trouble reading eight hundred articles and bringing them together into a single thread." Regrettably, such important messages are likely to elude some cognitive scientists, who continue to treat geniuses as an exotic commodity and watch their spectacular performances in awe, or treat geniuses as none other than highly motivated hard workers who turn out to be extremely knowledgeable. It may not be fair to put the entire blame on those teachers who advocate excessive emphasis on domain-specific knowledge, since some experts in creative research also held the same view. Weisberg regarded domain-specific knowledge as singularly important in creativity. He claimed: "The reason that one person produced some innovation, while another person did not, may be due to nothing more than the fact that the former knew something that the latter did not. Furthermore, this knowledge may not have been of an extraordinary sort" (pp. 248-249 of Ref. 707). Although I tend to agree with the second half of his claim, the first half deserves a serious rebuttal. He also challenged Simonton's report on the inverted U-shaped relation between education and creativity (Sec. 4.21). He complained that Simonton did not investigate the direct relation between knowledge and creativity. Weisberg obviously confused Simonton's statistical statement with its applicability to individuals; an individual included in a sample may or may not be representative of the sample. For example, Einstein did eventually obtain his doctorate degree though he had procrastinated his submission of habilitationsschrift.501 However, this counter-example is insufficient to refute Simonton's statistical conclusion.
Bicomputing Survey II
293
No one in the right mind would deny the importance of domain-specific knowledge in creative problem solving. However, Weisberg missed some important points regarding how domain-specific knowledge is mobilized to solve a novel problem. Just because one has an advanced degree does not mean that one has retained most of the previously acquired knowledge or is able to match the still-remembered and relevant knowledge with a novel problem. As Gould's introspection testified, knowledge retained by a creative individual must be of quite a different nature than that retained by a relatively non-creative individual. Sample heterogeneity should not be overlooked in dealing with the issue of knowledge. Consider the following big picture and exercise one's common sense. In most areas of scientific endeavors, investigators with rich domain-specific knowledge far outnumber those who are genuinely creative. This discrepancy cannot be explained by motivation alone, on which Weisberg also stressed, since cut-throat competition is rather common in modern scientific investigations. Weisberg also overlooked the fact that domain-specific knowledge can be acquired "on the job" when the need arises, if one has previously learned how to learn. That domain-specific knowledge, together with a strong desire to succeed, is insufficient for creative problem solving was demonstrated by the following personal encounter. A couple of years ago, a medical student consulted me regarding a test problem that had also befuddled many of our highly motivated highachievers over the past years. This problem addresses two pieces of domainspecific knowledge: a principle of diffusion known as Fick's law, and a related second principle regarding osmosis. Solving this problem demands the use of Fick's law first, which triggers the process of osmosis. As a consequence, the subsequent change of solute concentration demands the use of Fick's law again for the second time. The student's predicament was forgetting to apply Fick's law for the second time: he had not held both principles in working memory until the problem was solved. There was no doubt that the student knew both principles well, otherwise he would not have used them correctly once each. I then suggested that he hold both principles in his mind concurrently or, alternatively, draw a diagram, with both principles clearly marked, on a piece of paper as a visual aid. In response to my suggestion, the student sighed and confessed that his learning had indeed been auditory rather than visual. Here, the student tried to tell me that he learned sequentially and, therefore, when he invoked the second principle in his thought, the first principle was pushed out of his working memory. In this case, the relevant knowledge was of the ordinary sort, but it took
294
F. T. Hong
visual thinking to mobilize it. Of course, once known, the rote, sequential procedure can be memorized as a "cookbook-recipe": Fick's law, osmosis, and Fick's law, again. That was how most other students did and that is also the reason why it pays for students to collect and memorize old test questions, if high grades are their only concern. Following the glamorous trend of the computer revolution, many teachers are actively engaged in the search for novel teaching methods by means of computer-aided education. Emphasis is often laid on how students can efficiently search and retrieve information, but little attention has been paid to how these methods can help students filter and organize the information before committing it to long-term memory.98 Without a doubt, information explosion will be further exacerbated by the relative ease of generating and disseminating information, and the resulting overload will further discourage thinking (see also Sec. 4.23). As compared to conventional textbooks, quality control of the Internet content is even more difficult for obvious reason. Low quality control will be certainly reflected in an exaggerated expansion of knowledge: now fallacies and valid knowledge may become equally represented, or even worse. The advantages of computer-aided learning are generally recognized: the ease of making multi-media presentations (with animation, in particular), the ease to update the content, and the widespread availability that transcends the limitation of distances, to name a few. However, some other advantages are rather dubious. For example, the widely acclaimed "links" actually present more problems than solutions. First, the links are predetermined and limited by the imagination of the designers; there is no provision for creating novel links without first becoming a network "hacker." Second, the limit of screen size of a computer monitor often requires segmentation (serialization) of information with a predetermined sequence of links. Thus, the current technology of computer-aided education lacks the feasibility of parallel processing and random access made possible by browsing a book. Although these limitations are not impossible to overcome in the future, the present difficulty actually hampers the practice of picture-based reasoning in spite of the plethora of graphic information that floods the computer screen. By the same token, a computer-aided literature search still cannot completely replace a search by browsing the shelves of a library. A search based on keyword matching alone is likely to miss a significant fraction of the search space made available by a conventional search that combines both user-selected words and pictures. Keywords, no matter how comprehensive
Bicomputing Survey II
295
and elaborate, can never fully capture the essence of an article. For example, a heretofore unknown application of a new scientific finding will never appear in the keyword list of the article reporting that finding; only a prepared mind can recognize it and come up with the missing keyword as an after-thought. The tendency of going directly to the "targets" of a successful search also inhibits exploration of "collateral" information and prevents accidental finding of unexpected information, again because of the lack of feasibility of parallel processing and random access. In both cases, the imagination of the designers imposes a "top-down" restriction, thus making the technology a double-edge sword: the links speed up the search but also eliminate the alternatives that the designers neglect. In brief, the designers threat to "robotize" the users, and the users unwittingly relinquish their autonomy. A rush to jump on the bandwagon is thus ill-advised. At the present stage, it is better to use the technology as a supplement rather than a replacement of the conventional approach. Compared to earlier generations, biomedical students of the current generation know more but understand less, in most areas other than molecular biology. This was in part caused by the decline in mathematics aptitude and a deficiency in physics knowledge that made certain fundamental knowledge off limits to them.711 However, I suspect that the wide prevalence of the practice of exclusively rule-based reasoning may be an even more crucial factor: the practitioners often forsake their intuition and opt for the safe haven of following learned rules. As a consequence, even mathematics was learned as rote procedures and meaningless formulas.235 For example, a graduate of electrical engineering knew the formula for computing the impedance of a capacitative element or an inductive element, but had neither an idea of what impedance means nor the reason why the formula works. However, this engineer had the potential of understanding an intuitive explanation, which his undergraduate teacher never bothered to provide (personal observation). My suspicion was also supported by an anecdotal observation that natural science students are less inclined to think than humanities and social science students. Furthermore, the "epidemic" tended to spread from the population of biomedical students to the rest of student populations. The difference of learning styles is even present between the sub-populations of biomedical students: premedical versus nursing students (see Sec. 4.24). Gardner pointed out, in his book The Unschooled Mind,223 that there is a big gap between understanding specified by the curriculum standard — i.e., performing satisfactorily in certain tests — and genuine understanding, which the school authority does not dare/care to ask. Gardner cited a case
296
F. T. Hong
which he considered most stunning: students who receive honor grades in college-level physics courses are frequently unable to solve basic problems and questions encountered in a form slightly different from that on which they have been formally instructed and tested (p. 3 of Ref. 223). In Gardner's words, the school authority expects the students to answer questions on a multiple-choice test in a certain way, or carry out a problem set in a specified manner, thus enforcing convergent thinking. As a consequence, understanding of a topic or concept has acquired a shade of meaning (in the order of increasing levels of understanding): the student a) knows the terminology, b) can regurgitate the details on demand, c) knows how to manipulate the learned rules to obtain correct answers in standardized testing, or d) knows how a piece of new knowledge was derived, has verified its validity, and knows how to combine it with previously acquired knowledge so as to arrive at a novel conclusion (cf. gray scale of understanding, Sec. 4.10). However, with few exceptions, they have never bothered to check the validity of newly acquired knowledge against previously acquired knowledge and have the foggiest idea about the reason why the knowledge is valid, or even worse: a medical student candidly confessed to me that she could not care less what she had learned in school are truths or not, because "there is too much to learn" (personal observation). These students are like the agent in Searle's Chinese Room argument:602*604'605 the agent knows how to convert an input into the appropriate output by faithfully following the rules of conversion but has no idea about the reason why these rules work. More recently, the situation has further deteriorated: some students now answer a multiple-choice test question by matching keywords or by recognizing phrasing styles, without regard to the associated thought content; they acquire the proficiency to achieve their goal by collecting and memorizing old test questions. Although it appeared to be a novel approach, it has been a widespread practice — by foreign students aspiring to study in the U.S. — which Princeton's Educational Testing Service had to learn the hard way.239 Some of these students sometimes did not recognize a recycled test question, if it had been rephrased with a different set of keywords (personal observation). This type of "disability" is often masked by the use of standardized tests in National (Medical) Board Examination, which is the ultimate yardstick for measuring the effectiveness of medical education. Paralleling the alarming decline in student performances is an equally alarming increase of the incidence of the so-called learning disability: it tripled between 1976 and 1982 (Chapter 6 of Ref. 486). Since the mid-19th
Bicomputing Survey II
297
century educational reformers have debated the relative influence of heredity and environment on the success of students in school. In the 1960s, explanations of academic failure were centered around environmental aspects of behavioral and educational problems, such as socioeconomic factors and family deprivation. In the 1980s, such social explanations have been virtually replaced by biological ones (cf. nature vs. nurture, Sec. 4.24). As pointed out by Nelkin and Tancredit,486 biological approaches, reinforced by the availability of diagnostic instruments, meet institutional needs at a time when greater accountability is demanded of the schools. As a consequence, countless pills of Ritalin have been shoved down the reluctant throat of school children, and the school health-care officers have become the biggest legalized "drug pushers" in the United States. No wonder these children have trouble "kicking the habit" after they turn adult. In light of the foregoing analysis, all these biological approaches are seriously flawed for, at least, one reason: the approaches were formulated under the highly questionable assumption that standardized testing is a valid measure of learning and "academic success" (see Sec. 4.23). In my opinion, those high-achievers who "excel" at exclusively rule-based reasoning are truly learning-disabled, whereas some, if not all, of those who are labeled learning-disabled by school health-care officers may turn out to be merely temperamentally unprepared to be "robotized," and, therefore, must be considered normal by default. I suspect that many, if not all, cases of the diagnosed "attention deficit disorder"700 exhibited little more than normal reactions to boring lectures that were heavily laden with facts but few insights. 4.23. Substituted targets and goals in social engineering Sample heterogeneity discussed in Sec. 4.21 may also appear in the time domain. Combining or comparing samples from different eras with vastly different conditions can be misleading. A case in point is the Scholastic Aptitude Test (SAT) that has increasingly drawn criticism from some educators in recent years.341-205 Yet, a former leader of a major public university369 claimed that "[the] SAT has proved for 75 years that it is a valuable part of the admissions processes, and its role should be preserved." The problem is that time has changed. What was true a decade or a century ago may no longer be true now. The factors leading to the present status of the U.S. educational system are of course complex and, perhaps difficult, if not impossible, to thoroughly
298
F. T. Hong
itemize. Here, I speculate that the main factors are information explosion and fierce competition, which seemed to have driven some students to abandon the time-honored study habit of learning thorough understanding, thus invalidating the equally time-honored SAT. How can information explosion and fierce competition possibly invalidate standardized testing that seemed to work well in the past? Monitoring the effectiveness of education requires measuring the progress of students' learning. Since learning is difficult to measure directly in a system that must cope with mass production on an assembly line, standardized tests such as the SAT were designed to ensure a uniform and objective educational standard. By increasing college access, the SAT system has benefited, in the past, countless students who could not afford to attend an "elite" high school.369 However, the system has turned sour over time. The reason may be speculated as follows. Although students' effective learning is the real target of the educational policy, enhancing students' test scores becomes the substituted target, i.e., surrogate of the real target. Since a student's academic performance is linked to the opportunity for a job or further career advancement, the substituted target of the educational policy becomes the substituted goal of students' learning effort, from the point of view of students, parents, and even local school administrators. The externally imposed goal then determines how most students learn in school (Sec. 4.21). In other words, the set goal strongly influences the selection and implementation of a "control law" that governs the students' learning behavior. The problem is that the control law (learning strategy) that leads to good grades in standardized tests is not uniquely defined. Some strategies lead to effective learning and problem-solving skills but some others do not. Although picture-based learning ties standardized-test scores to actual learning, rule-based learning does not. Although rule-based learning seems to grant a slight edge of advantages in standardized tests, both learning styles can cope with meeting the test standard as long as the competition is not too intense and the students are not overwhelmed by the flood of information. That seemed to be the case in the past. Subsequently, information explosion and fierce competition began to force inventive students to seek means of maximizing their test scores at the expense of actual learning, and rule-based learning simply rose to the occasion for good reason (Sec. 4.24). Again, extrinsic rewards that demand fierce competition thus undermine the intrinsic motivation to learn. The original tight coupling between the real and the substituted targets was thus
Bicomputing Survey II
299
shattered. The SAT has thus become an instrument of enhancing college access to those who are good at memorization and rote learning, at the expense of those who are willing to learn to think. There are no lack of public awareness, in the United States, of the need of an educational reform, since the appearance of a document A Nation at Risk which was known for a punch line in its opening paragraph: "a rising tide of mediocrity."220 However, the policy makers lacked either a sufficient understanding of the problem or political courage and will power to confront the problem squarely. Regrettably, most, if not all, public policies implemented since then have been "top-down" approaches, engineered to alter student behaviors without bothering to find out how the students would react to the "top-down" requirements imposed by the educational establishment. It is puzzling why policy makers have never bothered to apply game theory to investigate and predict student behaviors under the imposed educational system. Actually, it was not infrequent to hear direct student complaints about the current educational system: they are inundated and overwhelmed with facts, facts, and more facts. Some students complain that they did not have time to think because they were too busy learning (or, rather, memorizing) new information. Presumably, they adopt rule-based learning for the same reason. Collectively, we educators simply ignored their plight. Two things that can be done and may show immediate effects are: toning down the emphasis on facts and changing the format of testing. It may not be necessary to abolish the SAT entirely. Adding a small component of essay questions to the existing test may be sufficient to alter the student behavior. After all, grade-chasers — they are commonly known for their penchant for using hair-splitting arguments to squeeze a fraction of a grade point from a reluctant teacher — are not going to abandon the essay parts. They may thus be forced to make an effort to understand and assimilate the learned materials instead of just swallowing the facts without chewing and regurgitating them in nearly the same intact form as the pre-consumed condition. However, resistance from special interest groups (including educators themselves) to newly instigated reforms may be formidable and is well anticipated. For example, opponents of essay tests will most likely base their argument on the subjectivity of the tests. Although subjectivity can be minimized by using multiple examiners, it cannot be reduced to nil. On the other hand, learning a healthy dose of subjectivity from the teachers is not a bad idea; that is how one learns to make value judgment. Of course, there are other problems, such as imposed dogmatism, availability of qualified teachers,
300
F. T. Hong
etc. However, in view of the great harm inflicted by standardized testing, essay testing is the lesser of the two evils. Ultimately, a change of the society's value system is imperative. A society that views higher education as a ticket to good life is not going to support a reform that demands thinking. Likewise, educators who are proud of how much information they know rather than how much insight they have are naturally going to expect students to do the same and force them to consume knowledge of the trivial-pursuit type. The wisdom is how to disentangle all these problems before they quickly become intractable. But that is what human intelligence is for. Must we wait till an unprecedented crisis erupts? Arguably, manipulating the testing systems is tantamount to social engineering. Social engineering is essentially a top-down approach, of which the effect is usually limited. However, those who are opposed to social engineering are to be reminded that preserving the status quo of our current educational system is perpetuating a faltering brand of social engineering. Concurrently, we must seek additional bottom-up measures. For example, explaining the rationale behind the policy to parents and students may convince them and enlist their cooperation. 4.24. Cognitive development: nature versus nurture The above discussion regarding social engineering implies that an individual's cognitive performance can be altered by training or education. If so, to what extent does nature or nurture contribute to an individual's cognitive development? Because of the topic's complexity, there has been very little discussion of the influence of nature vs. nurture on creativity (see Ref. 688 for a review). However, there is little doubt that intelligence has a significant genetic determinant. The question is: How significant? Psychometric tests of intelligence quotient (IQ) were designed to provide a stable measurement of innate cognitive ability over an individual's life time. This practice implies that an individual's intelligence does not change over his or her life time. However, since an IQ test differs from a conventional multiple-choice test primarily in the absence and the presence, respectively, of domain-specific knowledge, we are hard-pressed to believe that nurture has no contribution to an individual's performance on an IQ test. If factors such as affect and motivation can influence an individual's performance on regular multiplechoice tests on domain-specific knowledge, so can they influence the same individual's performance on an IQ test (Sec. 4.8).
Bicomputing Survey II
301
Skepticism towards the validity of IQ tests is nothing new. Most detractors focused on the influence of macroscopic environmental parameters, such as socioeconomic status, intact family, and the likes. Many investigations regarding IQ were based on twin studies (e.g., see Refs. 617, 352 and 198). Superficially, twin studies strengthened the "genocentric" view, and placed a premium on heredity as a primary determinant of intelligence. By reanalyzing past data of twin studies, Farber198 found that the hypothesis that IQ is determined primarily by heredity appears untenable. Rampant practices of pooling together data from different age groups with varied data quality might have led to misleading conclusions (sample heterogeneity). Farber was probably the first to suggest the importance of microscopic environmental factors, intrapsychic events, motivation, and others that cannot be readily measured and expressed in numbers. Traditionally, these factors were dismissed by investigators. However, Farber pointed out the inconsistency: while they dismissed microscopic environmental factors as not visible and measurable, they embraced the idea of intelligence which is not a trait that can be seen and measured directly and quantitatively. She suggested that the problem should be analyzed from the perspective of cognitive development. Factors such as intrapsychic events caused by mutual contacts in twin studies exert its effect primarily during a "window" of vulnerability (pp. 208-210 of Ref. 198). Factors that cannot be measured directly thus become hereditary in disguise, as a consequence of sample heterogeneity (Sec. 4.21). Farber suggested that minute interactions early in life may be more important than the socioeconomic factor per se. Furthermore, the socioeconomic factor may find its expression through psychic effects such as "the nuances of distaste, anger, and fear, and internalizing these into one's own intrapsychic realities that leads to a failure to realize one's own cognitive potential." In view of possible nonlinear effects demonstrated by Lorenz's weather forecasting studies, Farber's warning should not be casually dismissed (cf. Butterfly Effect, Sec. 2.4). As we shall see, vulnerability of an individual's cognitive development is not just limited to early childhood. It can be extended to school years when adverse environmental influences — due to a failing educational system — exert a compounding effect on the acquisition of domain-specific knowledge as well as on further cognitive development. I agree with Farber's suggestion that it is more productive to consider the issue of cognitive development than that of intelligence per se. It is particularly noteworthy that Farber emphasized patterns over numbers in twin studies. Farber apparently recognized the important role of picture-based reasoning in twin
302
F. T. Hong
studies, as opposed to the more traditional rule-based reasoning. Here, we shall consider a related question from a developmental perspective: Can external influences suppress an individual's creativity? Our failing educational system suggests an affirmative answer. Most, if not all, healthy children are born curious, acting just like little scientists, or even little geniuses: they learn while they play and explore, as documented in the monumental work of Jean Piaget.516'517 As time passes through childhood and adolescence, they mature, but most of them also lose their characteristic innocence, and much of their inborn creativity. This suggests that nurture may play a significant role in the maintenance of inborn creativity. Perhaps a conformist is more readily formed in a repressive society than in an open society. A child who was constantly scolded at by parents for asking inquisitive questions was likely to have learned the lesson by not asking unwanted questions after they grew up. These subtle changes critically affect the subsequent cognitive performance. After all, creativity, or the lack of it, is the manifestation of "mind" habit.439 A habit can be changed by training or by will power, so can a mind habit. Twin studies have demonstrated that personality is more affected by environment than any other area of human functioning (pp. 269-271 of Ref. 198). The abovementioned influence of family or cultural pressures are visibly identifiable. Farber further pointed out the power of minute interpersonal and intrapsychic events in shaping personality patterns at a level so fundamental (and at such a critical period) as to be almost indistinguishable from effects of heredity. Farber called special attention to complex factors that interact in a highly nonlinear fashion. Now that our analysis suggested that the same microscopic environmental factors may be at work in shaping the mind habit and the cognitive development, we wonder how past educational reforms that consistently paid attention to only macroscopic environments — e.g., funding, small class teaching, grade performances, etc. — but neglected microscopic environments — e.g., how students reacted to work loads and test formats — could have succeed. The current educational system thus becomes the prime suspect who ruins students' inborn creativity. A cure is tantamount to "de-programming" them and restoring their child-like innocence and exploratory instinct. It is of interest to note that some students could not remember what they had learned in high school as if they had never taken those courses, whereas some other students could still remember what they had learned in high school but not what they had learned more recently in college.
Bicomputing Survey II
303
The fact that some students forgot more recently acquired knowledge but retained knowledge acquired a long time ago suggested that forgetting is not simply related to the passage of time. The study habit employed at a given period apparently had a profound effect on the retention of knowledge acquired at that period. Presumably, some students changed their study habit before or during high school years, but others changed at a later time in college. Thus, progress made through educational reforms at the K-12 (kindergarten, elementary and high school) level might be undone by higher education. Could this be the reason why creativity peaks at the early baccalaureate level, as found by Simonton? If so, I suspect that there might have been a downward shift of that peak since Simonton reported his finding. The next question is: What caused the change? The following personal observations may shed some light on the question. A student, who used to be a Ph.D. candidate in high-energy physics while he was in Poland, had decided to terminate his thesis work prematurely, and subsequently he switched to radiation medicine, after arriving in the United States, for better future job opportunity and security. While studying anatomy, he switched his study habit upon the advice of fellow biology students ("Unlike physics, biology can only be memorized!"). His failure upon the first (midterm) examination brought him to me. I subsequently de-programmed him just in time to rescue him from being kicked out of the radiation medicine program. Another student failed because she devoted an inordinate amount of time to transcribe tape-recorded lectures simply because many others had done that. Improvement was noticed within a week after I persuaded her to abandon the practice. These incidents suggest that external influences affect an individual's cognitive performance, by forcing this individual to change the study habit (Sec. 4.23). In most cases, it was the work load and/or peer pressure that prompted the change. When a few students scored a high grade by just memorizing old test questions, many others who were overloaded with "trivial-pursuit" type of knowledge would be tempted to abandon their good old study habit. Once one chooses the path of rote learning, other factors kick in. For example, learning after the change is no longer fun but a burden or even mental torture. Psychological research on affect indicates that the effectiveness of learning is compromised under this circumstance, not to mention the loss of curiosity (Sec. 4.8). Even if one changes one's mind at a later time and decides to switch back to the old habit, it is no longer possible to do so because, without the retention of some previously acquired domain-specific
304
F. T. Hong
knowledge, it is impossible to understand new knowledge that is supposed to be built on top of its prerequisites. One thus becomes forever trapped in the vicious cycle of rote learning. Thus, a student superficially diagnosed as learning-disabled may be quite intelligent outside of school learning. In fact, the cognitive performance of students sometimes exhibited a gross disparity even during the same period of time, as indicated by the following example. A student failed to answer a question, regarding the positive and the negative feedback mechanisms, framed in the context of physiology, but had no trouble answering a similar question framed in a social or workaday experience. When I asked this student why he could answer the latter but not the former question, he told me that it was "because there is no book to lean upon [in the latter case]." But I suspect that another reason was that the problem did not threaten his ego (cf. Sec. 4.21). Apparently, some students are conditioned to respond to a study- or examination-related question by means of the "school mode" of thinking (i.e., regurgitation of canned answers), whereas they tend to respond to a non-threatening inquiry by means of the "street mode" of thinking, i.e., reliance on intuition. Likewise, many of our medical students did not have an intuitive feeling about the simple formula: density = weight/volume, or concentration = mass/volume. They either memorized it or looked it up in their lecture notes. However, while they shop in a grocery store, they probably have no trouble with the formula: unit price = total price/total volume, or unit price = total price/total weight. The switching between the two modes of thinking, like a Pavlovian dog, offers a simple explanation for the disparity. Ironically, these examples also demonstrates that behaviorism is not totally irrelevant in education (Sec. 4.14). In contrast, our physical therapy students seemed to have no such trouble while in a class or laboratory session. Why does such a cultural difference exist? Here is my speculation. Unlike the faculty of a nursing school or a physical therapy department, few of the basic medical science faculty members have clinical experiences. Having been trained as reductionists (and molecular biology is reductionism at its best), they tend to lay an exaggerated emphasis upon the acquisition of domain-specific knowledge at the expense of creative thinking. The advent of modern "high-tech" instrumentation is both a bliss and a curse to a reductionist. The bliss is obvious. The curse is a diminished demand on the investigator's intuition but an increased demand on the investigator's command of domain-specific knowledge; it is more important to find out what new techniques or new instruments competitors are using than to discover a novel way of solving
Bicomputing Survey II
305
an intractable problem. (Of course, innovative investigators with immense curiosity are immune to the temptation, and they are the ones that keep developing novel approaches and keep others busy in playing catchup.) As a consequence, the school-mode thinking simply rose to the occasion in this (microscopic) environment. Thanks to the deprogramming effort of the clinical faculty, our medical schools were able to turn out decent physicians in the past. The fact that nursing and physical therapy students are less vulnerable or immune to the ailment of school-mode thinking may have to do with their concurrent exposure to teaching of the clinical faculty. However, I wonder how long the success of this deprogramming remedy for medical students will last. In view of the fact that medical schools tend to attract and admit higher academic achievers than other biomedical professional schools do, I suspect that students' ability to think and to utilize acquired knowledge may now be, statistically speaking, inversely correlated with their grades. This is tantamount to "grade inversion" and is a more serious problem than "grade inflation." I suspect that the true culprit is the excessive emphasis laid on domain-specific knowledge. The destructive power of this emphasis is fully unleashed by information explosion in medical knowledge and further aggravated by fierce competition in the job market. Rumors had it that some manufacturing companies have begun to exclude "A" graduates or graduates of specific elite universities from considerations for employment, thus risking the charge of discrimination against individuals and defamation against institutions (anecdotal observations). But the said companies countered that these high-achievers tended to be poor trouble shooters, and claimed self-interest as a right of privately owned companies. Apparently, these employers have already recognized the trend of grade inversion. Speaking about discrimination, it is curious that few, if any, students or parents have realized that the prevalent testing practice of many U.S. institutions of higher education constitutes a type of reverse discrimination — discrimination against the cognitively capable. These manufacturing companies merely tried to set the record straight. However, it is the public that is ultimately victimized by the discrimination, for its senseless loss of talents. It is apparently that all past U.S. educational reforms focused on factors that affect students in a collective way: social economic factors, the size of classrooms, the competence of teachers, the methods of teaching, the availability of equipment (computers), etc. Few have examined the problem from individual students' perspective. Even today, policy makers are still "beat-
306
F. T. Hong
ing around the bush." Regrettably, any educational reform that overlooks the psychosocial aspect of learning (microscopic environment) is doomed to fail. A rush to raise the testing scores is tantamount to pouring fuel upon a raging fire: it exacerbates grade inflation, or even turns it into grade inversion. A nature-versus-nurture debate has always been emotionally charged because of its serious political consequences (see Pinker's book The Blank Slate.519) An emphasis on environmentalism often resulted in increased expenditures on social programs, whereas a shift of emphasis towards biological determinism resulted in a cut-back on spending. Our biocomputing perspective supports the view that both nature and nurture contribute significantly to human's cognitive development. Likewise, primatologist de Waal163 called for an end to the war of nature-versus-nurture. Here, our analysis does not suggest an increased spending on education, but rather a move to scale back the man-made assault inadvertently waged by the current educational system on human's natural cognitive development. Current debates on an educational reform missed the point badly. For example, many educators thought that the disparity in computer access will create a "great divide" between the haves and the have-nots. Rather, it is the distorting effect of standardized testing and the indiscriminate use of digital computers for teaching that may fail most students, the rich and the poor alike. Of course, there will always be some survivors of the calamity. However, whether these minority survivors alone can compensate for the failure of the rest is questionable. This point will be analyzed in Sec. 4.25.
4.25. Is the crisis in the U.S. science education false? Now, we must examine a dissident opinion. In a Scientific American article, Gibbs and Fox235 claimed that "there is a cynical ritual" of public outbursts of educational crisis in the U.S. since 1940s. The authors doubted the usefulness of such outcries, and blamed them for the decline of public confidence in schools. Gibbs and Fox's article was written at the wake of an outcry following the disclosure of the result of the Third International Mathematics and Science Study (TIMSS) in 1999. First, they pointed out that past crises have led to increased national spending and legislation but have made little difference: efforts made in reduction in class size failed to enhance student scores. Second, they claimed that there was no evidence of a sudden decline in the science and mathematics knowledge of those leaving high school. On the contrary, as they further pointed out, national
Bicomputing Survey II
307
test scores have been improving for more than a decade. They were led to the conclusion that teenagers nowadays know more about science and mathematics than their parents or grandparents. In addition, the number of college degrees awarded in science and engineering soared. The above two reasons were backed up by data from national surveys and/or other publicly accessible data. Some people say that "numbers do not lie." True, but we tend to forget that numbers can also mislead casual observers. The authors went on to dismiss the usefulness of TIMSS as an indicator of science literacy because the test was based on "a battery of mostly multiple-choice questions emphasizing basic facts and procedures." To that I agree. I also agree to their assertion that "the tests don't get at long-term problem-solving skills and concepts about nature of science" (opinion of J. Myron Atkin of Stanford cited in their article). Thus, the authors added a third reason to question the helpfulness of howls of crisis: in an education crisis, the question of what schools ought to teach about science and mathematics gets overlooked in a rush to raise test scores. I could not agree more with the authors about this particular claim. The article "Six steps toward science and math literacy," posted by Scientific American staff, is also agreeable. Yet, in a side-bar with a title "Dumb but not dumber," one of the authors (D.F.) cited a science literacy survey to support their claim that U.S. literacy in basic science has been steadily improving since 1990s. We found this latter argument problematic. Let us take a look at the sample questions listed in the side-bar: a) What is a molecule? (11% answered correctly), b) What is DNA? (22% correct), c) Do lasers work by focusing sound waves (39% correct), and d) How long does it take the earth to circle the sun? (48% correct). The author pointed out that U.S. adults did better on these questions than their counterparts in many countries that outperformed U.S. in TIMSS. However, the ability to answer these questions does not constitute science literacy; these questions are merely "entry-level" trivial-pursuit questions that are similar to what one often encounters in many TV game shows in the United States! These questions are also the kind of basic facts which the authors had earlier dismissed as poor indicators of science literacy. The authors' blatant and pervasive internal inconsistency in their article thus makes an attempted detailed rebuttal seem superfluous. However, I felt compelled to analyze another side-bar "Wanted: strong thinkers" for reason that will become apparent after my rebuttal. The authors disputed a commonly held position: the science and mathematics skills of high school graduates are critical to the health of the U.S.
308
F. T. Hong
economy. First, they cited a microeconomic analysis of a nation's competitiveness in global economy that listed "adequacy of schooling" as a relatively unimportant factor as compared to "port infrastructure quality," and listed "quality of scientists and engineers" at an even lower priority. Second, they pointed out that many innovators responsible for today's national success left university during the time of the science education "crisis" of the early 1980s. Third, they pointed out that technological innovation often led to expansion of jobs in the service sector, and suggested that analytical and verbal reasoning abilities of personnel in the latter is more critical. Lacking expertise in microeconomics, I will give the cited analysis the full benefit of the doubt. Given complexity of factors governing economics, it is possible that factors other than education and quality of scientists and engineers may be more critical for short- or near-term health of a nation's economy, depending on how the term "quality" is defined and how quality is evaluated. The authors' argument was flawed on two counts. First, they apparently evaluated quality of scientists and engineers in dichotomous terms: either competent or incompetent. In reality, there are many gradations of excellence or incompetence. If most scientists and engineers are sufficiently competent and are comparable or superior to those of other countries, then other factors will be more critical in the maintenance of a nation's economical health. Although slight lagging in quality of scientists and engineers may be compensated for by gains made in other factors, a serious decline is yet another matter. That brings us to the second flaw in the authors' argument: a lack of appreciation of nonlinear interactions of complex factors (Sec. 4.21). Although only a relatively small percentage of scientists and engineers are the prime movers of technological innovation, maintenance of a small but critical mass of these "elite" scientists and engineers is necessary. Just because donating 500 cc of blood, once in a while, has no adverse effect on a healthy individual's long-term health does not mean that losing 3,000 cc of blood at once or losing 500 cc per day over a period of several days has no harmful effect. The authors' rigid interpretation of the microeconomic analysis — a blatant example of exclusively rule-based reasoning — thus defies common sense. The authors also attempted to dismiss the crisis of the 1980s as being false by linking today's innovators with the products of yesteryears' educational system. Again, just because an educational system produces a large crop of inferior products does not mean that each and every item is inferior. It is well known that some students can manage to learn in spite of their teachers and in spite of the educational system.
Bicomputing Survey II
309
The authors further used numbers in a misleading way. Just because the economical gain stemming from technological innovations is more directly linked to an expansion of employment in the service industry than to that in the technology sector per se is no good reason why we should or can ignore the quality of scientists and engineers. The latter are supposed to be the innovators of future novel products that are presently lying in the "pipeline." Their quality now thus critically determines whether a subsequent economic boom will ever take place in the future. It was a pity that the authors let numbers blur their perception of the big picture of how a future prosperity for many can be tied to innovations of only a few now. However, I agree with the authors' criticism regarding mathematics and science education: "traditional math and science teaching, with its emphasis on memorizing facts and procedures, does students a disservice." I further agree that the general ability, rather than specific one, of "high-level" reasoning is more important, if I can take the liberty to treat exclusively rule-based reasoning and "low-level" reasoning as synonymous. It is no secret that science students trained under the current system, which is heavily laden with a reductionist vision, often command a dichotomous thinking, and lack a rudimentary appreciation of nonlinear phenomena ("linear" thinking), let alone chaos and complexity theories. Voltaire lamented that "common sense is not so common" (p. 306 of Ref. 50). That is especially true among practitioners of exclusively rule-based reasoning. The authors correctly pointed out that what students learned in high school science is utterly irrelevant for high-level reasoning and for real life: merely memorizing jargon and formulas. That alone would be sufficient reason for the need of a genuine reform in science education that changes the way of testing science. Instead, the authors opted for seeking alternative ways of teaching high-level reasoning. Superficially, their choice is in line with the anecdotal observation, mentioned in Sec. 4.22, that humanities students seem to do better than science students in "high-level" reasoning. Yet, over all, the approach does not make sense and is essentially a defeatist approach, because it elects to leave science and engineering education behind and allow it to rot without any intervention. In the last paragraph of their article, however, the authors presented a persuasive argument that ability to think and ability to learn new knowledge continually after schooling is more important than acquisition of a wealth of domain-specific knowledge during school years. Citing William Aspray, the authors pointed out that, given the incredible pace of technological innovation, it is impossible to find people "who have three years of
310
F. T. Hong
experience with a technology that's only 18 months old." This view echoed that of West,712 cited in Sec. 4.22. Thus, the biggest problem with the article is not that its arguments are all false. Rather, the authors blended truths with half-truths and fallacies. Furthermore, the main conclusion was based on an irrational thinking process: a type of data (performances on tests based primarily on regurgitation) is dismissed whenever it contradicts a preconceived idea but is conveniently resurrected whenever it supports another preconceived idea. Thus, the article contains all the elements of a joke or a comedy that also characterize rule-based reasoning: locally logical but globally absurd (Sec. 4.7). However, there is nothing funny about it. To concerned citizens, it would be a national tragedy if such an article is taken seriously by the public, keeping in mind the prestigious image projected by Science American. This article conjures up a chilling prospect that ought to make those who share my view shudder with horror: when mediocre thinkers become the chieftains of opinion mills, we are forever trapped in the vicious cycle of "rising tide of mediocrity." The authors' complaint about cyclic outcries of educational crisis deserves an additional comment. Doomsday predictions seem to have a curious twist. I recalled seeing an article in New York Times back in the early 1970s, predicting a future shortage of engineers. (As history indicates, a shortage of engineers did happen in the 1980s.) At that time, I thought that a prediction as such can neither be affirmed or dismissed ahead of time. I reasoned then: if everyone takes a doomsday prediction seriously and acts accordingly to remedy the situation either through individual efforts or through public measures, the predicted dire event may be averted and the prediction may turn out to be false. On the other hand, if no one takes it seriously and every one fails to take any necessary step to prevent its happening, then the prediction may actually come true. Thus, a doomsday prophecy is self-denying in nature. It differs from what is known as self-fulfilling prophecy because a negative feedback mechanism is involved in the former, whereas a positive feedback mechanism is involved in the latter. Of course, I was too naive at that time, practicing dichotomous thinking which now I denounce: I thought that a prediction is either totally embraced or totally rejected by everyone. I also had too much faith in what the U.S. government could do. The reality is often somewhere in between. An intermediate impact that results in an inadequate remedy may alleviate or postpone a cataclysmic collapse, thus superficially falsifying the prophecy.
Bicomputing Survey II
311
This time, my greatest fear is: previous outcries of crisis — apparently too many "false" alarms — may have lulled our collective perception into complacency. When the "cataclysm" actually materializes and my present prediction becomes tragically confirmed, I do not think that I can take the consolation by saying "I told you so." 4.26. Simulations of Gestalt phenomena in creativity It is of interest to examine whether creativity can be simulated by digital computers. A related question is: Does scientific discovery have a logic? In a monumental treatise The Logic of Scientific Discovery, Popper essentially renounced the message carried by the title of his book (p. 31 of Ref. 526): The initial stage, the act of conceiving or inventing a theory, seems to me neither to call for logical analysis nor to be susceptible of it. The question how it happens that a new idea occurs to a man — whether it is a musical theme, a dramatic conflict, or a scientific theory — may be of great interest to empirical psychology; but it is irrelevant to the logical analysis of scientific knowledge. Simon contested Popper's claim by citing several successful examples of problem-solving programs.621 Subsequently, Simon and coworkers succeeded in constructing additional problem-solving programs of everincreasing capability. The pioneering contributions to computer-based creative problem solving by Simon and coworkers were compiled in several books.623-624'628 Here, we shall focus on the critique of Michael Wertheimer710 and Simon's critique of Wertheimer's critique. 627 We shall then return to the dispute between Popper and Simon. Additional discussions will be deferred to Sec. 7. Michael Wertheimer's analysis was based on the Gestaltist view of creativity outlined in Max Wertheimer's book Productive Thinking.709 Max Wertheimer distinguished two types of thinking: ("blind" or senseless) reproductive thinking and (truly insightful) productive thinking. Reproductive thinking manipulates mental structures but does not generate new mental structures, whereas productive thinking does both. While Michael Wertheimer acknowledged the accomplishment of AI computer programs such as General Program Solver (GPS), 488>489 he thought that such programs perform only reproductive thinking. Specifically, he thought that crucial Gestalt elements, such as understanding (grasping both what is crucial in any given problem and why it is crucial), insight, and the associated
312
F. T. Hong
"aha" experience, are lacking in these programs. Furthermore, the construction of problem-representations was done by the programmer rather than by the computer program itself. Wertheimer dismissed a computer's learning as learning by rote ("mechanical" learning) rather than learning by understanding. Simon disagreed and claimed that all these had been accomplished by digital computers. Simon found that the definition of intuition was either missing or vague in the Gestalt literature. He could only seek helps from dictionaries: Webster's unabridged dictionary defined intuition as "the act of coming to direct knowledge or certainty without reasoning or inferring." Simon thought that intuition can be interpreted as essentially "recognition." He set as criteria for testing the presence of intuition the following attributes: the suddenness of apprehension or cognition, understanding. However, the detailed process is unreportable. As an illustration, Simon cited a program named EPAM (Elementary Perceiver and Memorizer),200'201 which is a model of the processes that occur during the performance of verbal learning and related tasks. When a stimulus is presented to EPAM, the program applies a sequence of tests to it, using the outcomes of the tests to sort it down a discrimination net until it is distinguished from alternative stimuli. A threshold is set in the discrimination net for recognition. EPAM can learn by experience and improve its discrimination net. Patterns need not be identical in order to be recognized as the same by EPAM; EPAM tests only some portion of the features of a pattern. EPAM can deal with similarity as well as identity of patterns. EPAM can indicate its recognition but no information is stored in short-term (working) memory about the specific tests in its recognition net that consummate the recognition. Therefore, the recognition process is not reportable. Simon thought that "the process named 'intuition' by Gestalt psychologists is none other than our familiar friend 'recognition'." Although this statement is a meaningful one to which we can agree, it merely replaces an illusive term with another. As we shall see, face recognition is a holistic process, which Simon's programs simulated but did not duplicate; it is close but not quite as close as Simon claimed. Simon also made an attempt to explain how computer programs, such as GPS and EPAM, could exhibit the "aha" phenomenon (cf. Sec. 4.10). In search of suitable objective evidence, Simon 627 used the performance of Kohler's chimpanzees to define an objective criterion for an "aha" occurrence. Kohler did extensive studies of chimpanzees' problem solving by contrasting insightful problem solving with problem solving by trial and error.377 The telltale evidence was that the apes, who were previously en-
Bicomputing Survey II
313
gaged in a fruitless attempts to reach their goal, suddenly shifted to a new sequence of apparently purposeful, connected behaviors well designed to attain the goal. Simon cited a well-known example to illustrate the "aha" phenomenon: a chimpanzee who, after an initial impasse, suddenly moved a box under a bunch of bananas hanging from the ceiling, retrieving a stick, climbing on the box, and knocking down the bananas. The chimpanzee was previously familiar with the use of a box (or several of them in a stack) or a stick, but never both, for retrieving bananas. The behavior change indicated that the chimpanzee was suddenly able to "put two and two together." Simon applied the same criterion to his computer programs. Programs such as GPS and EPAM do planning in an abstract planning space. Once the program finds a solution in the abstract planning space, it returns to the original problem to implement the detailed execution (verification). The behavior of the computer changes from relatively aimless searching to a confident and apparently purposeful execution. The "aha" phenomenon marks the moment of recognition, which coincides with the demarcation between the end of planning and the onset of executing. This interpretation of the "aha" phenomenon differs from what Bastick proposed, described in Sec. 4.10; Simon was aware of it. Did the chimpanzee's change of behavior indicate primarily a change of its searching strategy or simply the moment of successful recognition? The two timings may or may not be exactly the same. Furthermore, a recognition is possible without a change of searching strategies, e.g., at the end of a lengthy systematic search. The problem was that Kohler's chimpanzees could neither talk nor use any sign language to communicate. So we never know exactly what transpired. Kohler's interpretation was what Griffin referred to as informed inferences — human's subjective interpretation of an objective behavioral pattern (Sec. 6.6). Michael Wertheimer complained that the Gestaltist notion of intuition and insight had been misunderstood and badly distorted by cognitive scientists. Let us examine how EPAM works. Although EPAM practices digital computing, it simulates analog pattern recognition, and uses a finite number of criteria to make judgments (see Sec. 4.2 for a detailed discussion of analog vs. digital pattern recognition). EPAM searches for a trillion (1012) possibilities on an average run, and is able to quickly reach a step of recognition in about two-tenths of a second through the uses of heuristics. The last test prior to recognition is simply the last straw that broke the proverbial camel's back, and is not the sole or main criterion that makes recognition possible. Naturally, the computer program does not keep track of all the in-
314
F. T. Hong
termediate steps of testing — there are too many of them — and, therefore, EPAM cannot report exactly how it reaches the conclusion. The computer simply does not remember. However, this is not what intuition is all about, in view of our discussion presented in Sec. 4.10. Intuition is inherently difficult to articulate not because the details are forgotten but rather because the details are not even known to begin with: one simply has no clue, from the very outset, about the rationale behind an inspired critical decision. Mathematician Carl Priedrich Gauss, in referring to a long-standing problem which he had just solved, said, "The riddle solved itself as lighting strikes, and I myself could not tell or show the connection between what I knew before, what I last used to experiment with, and what produced the final success" (pp. 308-309 of Ref. 353). Tesla also used the metaphor of lightning to describe his "aha" experience (p. 44 of Ref. 662). He claimed that he drew his inspiration from reciting Goethe's poem, but he then went on to describe the visual image that he had seen in his "mental operations" (thinking). Both introspective descriptions highlighted the role played by visual thinking and parallel processing in the "aha" phenomenon; random access accounts for the suddenness as well as the spectacular lack of conscious awareness, as explained in Sec. 4.10. Simon was perhaps the first to recognize that problem solving is an act of recognizing the solution (p. 117 of Ref. 629). However, Simon made no distinction between rule-based and picture-based reasoning, or between primary-process and secondary-process reasoning. Simon certainly appreciated the difference between sequential processing and parallel processing. But he insisted that a parallel process can be simulated by a sequential process and deliberately attempted to blur the distinction between the two processes. He thus missed the opportunity to link intuition to parallel processing and the absence of intuition to sequential processing. Rather than claiming a "home-run," he would have been better off taking the partial credit: just acknowledging the distinction and conceding that pseudoparallel processing merely meets part of the demand of true parallel processing. His simulation programs did not quite exhibit intuition but still did exceptionally well in solving novel problems. A human problem-solver who uses rule-based reasoning to solve a problem may not remember each and every failed attempt, but usually can remember some specific details regarding a particular successful rule and how he or she stumbled on it, e.g., using a laboratory notebook to keep track of major steps of reasoning. Had a computer tried hard enough — in view of its vast memory and hard disk space — it could have done better
Bicomputing Survey II
315
than a human being. For example, the computer could record the particular heuristic that led to success and set up a software counter (pointer) to identify the particular rule or step of reasoning that led to the successful recognition. In this way, unfruitful heuristics and unsuccessful rules could be discarded and the previously used disk space reclaimed — the memory or hard disk record could be overwritten — so that the computer memory and disk storage space would not be prematurely overloaded. Here, we see that Simon's programs, in a desperate effort to simulate intuition, did not even try hard enough to recall at least some, if not all, details. We thus have to agreed with Michael Wertheimer that Simon's interpretation of intuition constituted a misunderstanding and distortion of the Gestaltist notion of insight. But this misstep did not detract from Simon's ground-breaking contributions to computer-based creative problem solving. Superficially, EPAM is a rule-based program. What sets it apart from those rule-based programs known as expert systems is the comprehensiveness of the heuristics and the relative "freedom" granted by the programmer. By increasing the number of criteria for matching the features and by allowing similarity instead of just identity, EPAM introduces a gray scale of matching, and converts a digital pattern recognition process into a quasianalog pattern recognition process. Essentially, EPAM simulated parallel processing with pseudo-parallel processing (Sec. 4.11). What transpires in EPAM's design principle is analogous to a situation in quantum-mechanical molecular orbital theory: the discrete levels of orbitals axe so numerous and are so densely spaced that they coalesce and become a virtual continuum. In other words, by sheer numbers, a discrete process can simulate a continuous process. It is also the essence of simulations by means of pseudo-parallel processing: discrete features are so numerous and so densely packed that the recognition process approaches, but not quite becomes, true analog pattern recognition with true parallel processing. Still, Simon's programs are impressive. Their success is rooted in the attempt to emulate human's thought process. Whether one agrees with Michael Wertheimer or with Simon is tantamount to the opinion whether the bottle is half-empty or half-full. It is difficult to expect an agreement between the two opposing views, because neither Wertheimer nor Simon allowed for a gray scale of recognition. In real life, however, recognition is not a "yes-or-no" dichotomous process. False recognition of an acquaintance or friend is not uncommon, because discrete, prominent features, being singled out for the purpose of recognition, can become a virtual continuum and where to draw the line
316
F. T. Hong
may be somewhat arbitrary. When a recognition process approaches the threshold of a discriminating net, in real life or in simulations, the "aha" phenomenon is more likely to happen if there is a "break" of the slope (of how the threshold is being approached) than if the slope is gentle and without a "break." Thus, the "aha" phenomenon is more likely to be detected if there is a change of approaches, such as following an incubation period, than if searching via all available heuristics, invoked sequentially, is adopted. It is also more likely to happen when random access is feasible than when sequential access is required. Thus, based on an overall evaluation, I am not convinced that EPAM exhibited an "aha" phenomenon. That is not to say future computers cannot do that. However, in my opinion, the "aha" phenomenon is not a central issue of the debate, because it is an introspective feeling that an outside observer must infer subjectively from the manifestation of a computer program's objective performance. Of course, a behavioral expression of the computer's "feeling" can be programmed into an EPAM-like program but the additional capability does not add significantly to the credibility of the program, since it is not technically difficult to fake a non-existing feeling. In computer simulations, what masquerades as a realistic rendition is not real but rather virtual — an euphemism of being elaborately and deceptively faked, or a passable counterfeit, so to speak. As for learning with understanding, subjective reports are not always reliable, as evident from the foregoing discussion in Sec. 4.22 regarding modern biomedical students' criteria. Simon followed a test suggested by Michael Wertheimer: "one test of whether learning [with understanding] has really happened is to check whether what has been learned will generalize to a related task — if all that has transpired is sheer memorizing or mechanical associating, the learner will be unable to recognize the similarity between a task that has already been mastered and a new one which, while it may be superficially quite different, requires the same insight to solve it that also worked in the earlier task. The transfer of learning is a central issue for the Gestalt theorist" (p. 23 of Ref. 710). Simon pointed out that it is no great difficulty in constructing computer programs which can do just that. In fact, some programs can even learn to solve problems by examining worked-out examples and to construct a set of new instructions (rules) adequate for solving a wide range of algebra equations. It is sometimes said that a problem is understood when it can be formulated or represented appropriately. The program UNDERSTAND290 accepts simple problems stated in plain English and constructs representa-
Bicomputing Survey II
317
tions of the problems that are suitable as inputs to a general problemsolving program like GPS. Several computer programs exist that have simple capabilities to use analogies to form new representations. Again, it is helpful to distinguish between two types of understanding: one based on rule-based reasoning and the other based on picture-based reasoning. In human performances, knowledge acquired by picture-based learning can be "transferred" to more remotely related situations than knowledge acquired by rule-based learning (see previous sections). Can a digital computer make scientific discoveries? Since a digital computer is good at making logical, rule-based reasoning, this question is equivalent to: Does scientific discovery have a logic? Simon's answer was affirmative. Newell et al.488 constructed and analyzed several such programs, including one named "Logic Theorist" that managed to "discover" a shorter and more elegant version of proof of a theorem in Chapter 2 of Whitehead and Russell's Principia Mathematica than that originally published by Whitehead and Russell. It is of interest to examine how these programs made discoveries. For simplicity, let us examine an earlier program named BACON that could examine actual data and re-discover several known physical laws (Part II of Ref. 404; pp. 102-115 of Ref. 628). A sample of heuristics used in earlier versions of BACON, as summarized by Boden, gives us a glimpse into the strategy (p. 195 of Ref. 75): • IF the values of a term are constant, THEN infer that the term always has that value. • IF the values of two numerical terms give a straight line when plotted on a graph, THEN infer that they are always related in a linear way (with the same slope and intercept as on the graph). • IF the values of two numerical terms increase together, THEN consider their ratio. • IF the values of one term increase as those of another decrease, THEN consider their product. If the ratio or product of two variables, x and y, is not constant, additional heuristics instruct the computer to compute more complex ratios or products, such as xm/yn or xmyn (where m and n are integers), and check whether any of them is constant. Furthermore, BACON does not have to try every pairs of integers. Rather, it considers whether a ratio, if not constant, increases or decreases monotonically, thus cutting the number of pairs of integers to be tested in half (i.e., heuristic searching). In
318
F. T. Hong
this way, BACON can discover a modest subset of numerical models which are simple algebraic combinations of two variables. For example, BACON re-discovered Boyle's gas law and Kepler's Third Law of planetary motion. There is a major difference between BACON and an ordinary rule-based expert system. In programming BACON, the programmer only provided some basic heuristics, but did not micromanage BACON's step-by-step chores of problem solving. These heuristics allow the program to go for the most obvious and simplest numerical models (cf. Sutton's law, Sec. 4.3). There exist several later versions of BACON, in which improvements were made to allow it to use existing heuristics to act upon each other. Thus, a heuristic for creating discriminant rules might act upon a generalization heuristic to create a more powerful domain-specific generalization heuristic. Essentially, the program can learn to learn. By adding a radically different strategy or approach to the repertoire of heuristics, BACON's power of problem solving could be vastly enhanced. For example, by adding a symmetry heuristic, BACON re-discovered Snell's law of refraction. However, not all laws are quantitative (Sec. 6.13). Programs, such as GLAUBER, STAHL and DALTON, can discover qualitative laws (Part III of Ref. 404). Is any of the above-mentioned problem-solving programs creative? It depends. If a dichotomy with only two classes of creativity — being creative or being non-creative — is posited, then most, perhaps all, existing programs are not. However, compared to some of our students trained to think by exclusively rule-based reasoning, even the most primitive version of BACON is far more creative (cf. gray scale of creativity, Sec. 4.16). If we succeed in educating our students in the same way as expert programmers programmed these problem-solving computer programs, we can declare that our educational system is a great success. So far the examples demonstrated computer programs that could rediscover what had been discovered in science. This kind of creativity is what Boden 75 referred to as P-creative, or psychologically creative. Although the computer programs had no access to existing scientific literatures, the programmer had. In programming BACON, investigators used insights gained in past discoveries of known physical laws to construct the basic heuristics, thus inadvertently tipping off the computer regarding the secret. A program that could discover something that had never been discovered by any human being, living or dead, is said to be H-creative, or historically creative. In the latter case, no hindsight of the law-to-be-discovered can be incorporated into the heuristics. Such programs indeed exist. Boden cited an algorithm known as ID3 which discovered a chess-playing strategy for
Bicomputing Survey II
319
winning an endgame that was not known to any human experts (p. 189 of Ref. 75). Interested readers are referred to Boden's reviews.75'76 Thus, Boden warned: "It's a mistake to think that sequential computer programs cannot possibly teach us anything about psychology [of creativity]" (p. 93 of Ref. 75). It is perhaps also a mistake to think that creativity is a unique feature of conscious human beings. In view of the astonishing performance of these advanced problem-solving programs, humans' superiority in intellectual performance can no longer be assured. Regarding the dispute between Popper and Simon, the above-mentioned examples appear to favor Simon's view since rule-based reasoning is equivalent to logical reasoning. However, it is still premature to declare that all scientific discoveries have a logic. In making scientific discoveries, humans do not always think logically. The designation "pseudo-parallel processing" implies that digital computers still do not think like humans. However, in order to lodge our dissent against Simon's claim effectively, we must find a way to verbalize our disagreements. To do so, we need to examine why Simon's problem-solving programs succeeded. Obviously, the key to the success was not telling the computer what to think but rather how to think; by providing a set of heuristics the programmer told the computer how to think. These programs performed so well because the embedded heuristics had been constructed by pooling together the tactics of thinking used by many past creative scientists. Regardless of their performance, these computers are not as creative as the scientists from whom the computer (or, rather, the programmer) drew their inspiration. Boyle was creative to discover the law that bears his name. Nowadays every competent scientist knows how to examine the relationship of two experimental variables by first checking whether they bear any relation of direct or inverse proportionality, as well as any logarithmic or exponential relations. In fact, all well-trained scientists learn these neat "tricks" devised by past masters. In other words, the computer masqueraded as a creative thinker by being a very good copycat. It is not surprising that these computers might outperform human beings because of their speed, memory capacity, stamina and patience, and because they had uniformly mastered the pooled tactics of many past masters. However, the underlying heuristics of discovery are not a priori programmable but rather a posteri programmable: programmable with the aid of hindsight. That is, someone — either past creative scientists or the programmer — must have discovered the heuristics ahead of time. Of course, the computer can discover new heuristics by recombinations of old ones. However, creative
320
F. T. Hong
human beings can devise new heuristics that are not obvious recombinations of old ones. Since the programmer has no way of knowing what new and radically "revolutionary" heuristics are to appear in the future, these heuristics are not a priori programmable. This point shall be made clear by means of three examples. The first example is the discovery of recursive rules underlying an infinite sequence of alphabetic symbols cited in Simon's 1973 article.621 Simon demonstrated that the recursive pattern can be discovered efficiently (heuristically) by programming the computer to examine the relations of "same" and "next" between symbols that are not too far separated (p. 476 of Ref. 621). Simon's point is well taken. Now consider a few more examples of which the rules of construction are easy to conceive but considerably harder to discover. The rules are the presence or absence of a certain feature of the symbols. For notational convenience, the sequences are made finite by first listing those with the designated feature and then those without the feature. They certainly can be presented as infinite sequences, by repeating each finite sequence infinite times, without compromising the arguments to follow: • (A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z) vs. (B, C, D, G, J, O, P, Q, R, S, U) • (A, B, D, O, P, Q, R) vs. (C, E, F, G, H, I, J, K, L, M, N, S, T, U, V, W, X, Y, Z) . (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, R, S, T, U, W, Y, Z) vs. (Q, V, X) The selection rule in the first sequence is whether the letters are constructed exclusively with straight line segments or with both line segments and curves. In the second sequence, the letters are grouped according to whether there is one or more (topologically simply connected) enclosed area or not. In the third sequence, the grouping is based on a single criterion: presence or absence in the Polish alphabet (there in no "Q," "V" or "X" in the Polish alphabet,1 with the exception of words of a foreign origin"). Undoubtedly, stranger and obscurer selection rules can be easily conceived to construct additional examples with increasing degrees of difficulty. The above three rules cannot be readily discovered by examining the relations of "same" and "next" between symbols alone, certainly not by considering their relative "alphabetized" positions on the alphabet list. However 'See
.
Bicomputing Survey II
321
obscure a selection rule may be, the rule, once known, can be included in the heuristics and the computer can be instructed to examine additional features of the sequence such as shapes, topology, etc., thus expanding its search capability. However, there are virtually infinite kinds of selection rules that can be conceived. Before a class of obscure rules are suspected and preconceived by the programmer, a specific search tactic designed to discover this particular class cannot be included in the repertoire, i.e., not a priori programmable. Second, let us consider Simon's programs that re-discovered several natural laws. The success of BACON which re-discovered Boyle's law and Kepler's third law depends on the fact that these natural laws describe a simple mathematical relation between two variables: direct or inverse proportionality of the variables or powers of the variables. This approach of fitting data with simple mathematical functions is known as curve fitting. Let us consider the limitations of curving fitting in generating numerical (mathematical) models. There are virtually an infinite number of functions to be used in curve fitting. The tactics listed in the heuristics of BACON — e.g., testing fractional powers — were apparently set up with Boyle's law and Kepler's third law™ in mind (o posteri programmable). But these heuristics are not sufficiently comprehensive to re-discover Snell's law of refraction which involves trigonometric functions. Perhaps all known functions and their recombinations — there are infinite of them — should be included in the repertoire of heuristics. But there is another problem: data noise often allows curve fitting to generate more than one mathematical model, and not all of them are physically meaningful. In fact, any curve can be fitted with a polynomial function to any degree of accuracy (see Sec. 20 of Ref. 325 for a concise discussion). Attempts to improve the accuracy of fit often degrade the quality of a model because the model begins to fit noise more than data, as pointed out by Gauch.227 Alternatively, heuristic searching may be implemented at the physical level instead of the mathematical (numerical) level. Here, an example, of which the author had first-hand experience, shall be used to illustrate the point. Chalazonitis and coworkers111'110 discovered that, by applying a homogeneous magnetic field of about 10 kG, isolated rod photoreceptors of frogs in an aqueous suspension rotated and lined up in the direction of the applied field. By visual inspection, the time course of rotation of these ""Kepler's third law states that the period P of a planet is related to the semimajor axis of its orbit a by the following equation: P = a 2 .
322
F. T. Hong
elongated rod-shaped structures appears to look like a half cycle of a sine function of time, but actually it fits an obscure function, instead (Fig. 1 of Ref. 315): lntantf = lntan0 o - ^ ^
{Xai
~ Xri) • t
where t is time, 90 and 9 are the angular orientation of the photoreceptor rod at time 0 and t, respectively, H is the applied (constant and homogeneous) magnetic field, and the remaining symbols are physical constants that do not directly concern us here (the summation £ is to be performed over the index i). It suffices to say that plotting the (natural) logarithm of the tangent of the angular orientation 6, as a function of time t, on a graph yields a straight line, according to the above equation. So does the plot of the sine of the angular orientation 9 as a function of time t. The differentiation between the two mathematical models — in terms of either In tan 6 or sin 9 — was not based on goodness of curve fitting. Heuristic searching of the correct physical mechanism was performed at the physical level. Based on the description of Chalazonitis and coworkers' observation, several conceivable physical mechanisms could be quickly ruled out much like how a physician uses differential diagnosis to rule out unlikely causes of an illness.330 Once the most likely physical mechanism was identified, the above equation, which was needed for verifying the hypothesized physical mechanism, could then be deduced by almost purely rule-based reasoning — all the rules for converting the physical model into the corresponding mathematical model could be found in classical electromagnetic theory. Without the benefit of hindsight, it would be virtually impossible to provide a heuristic that calls for the computer to plot the logarithm of the tangent of the angular orientation, as a function of time, on a graph (a posteri programmable). But it was relatively easy to arrive at the correct physical model based on combined rule-based and picture-based reasoning at the physical level. Likewise, the a posteri insight or heuristic needed to re-discover Snell's law of refraction is symmetry. Without the hint of symmetry, the computer would have few or no clues to conduct a fruitful heuristic search. Although it is impossible to prove that the computer cannot discover the symmetry heuristic by trial and error, it is less uncertain to suggest that the computer probably cannot find it by trial and error within a reasonable time. Thus, by prompting the computer to examine a few common classes of relationship with appropriate hints, an impressive number of natural laws
Bicomputing Survey II
323
can be re-discovered. However, different kinds of problems need different kinds of hints and there are virtually an infinite number of them. These preprogrammed problem-solving programs may not be able to solve problems that demand radically different insights (hints). To make my point clear, let me cite a brief historical account of the discovery of the method of fabricating artificial black lipid membranes (BLM) by Mueller, Rudin, Tien and Wescott475 — a technique that has revolutionized biophysical investigations of biological membranes. Here is the remark made by Tien:665 The work began with Rudin and his associates in 1959-61. They first studied lipid monolayers and multilayers (the LangmuirBlodgett type), and then they played with soap bubbles and films. I use the word "played" because it is difficult to find a suitable word to describe their initial experiments when they were literally blowing soap bubbles with the equipment purchased from the local toy shop! They realized, however, that a soap film in air in its final stage of thinning has a structure which may be pictured as two monolayers sandwiching an aqueous solution. This picture of the so-called 'black' soap films had been suggested many years ago by Gibbs and more recently by Overbeek, Mysels, Corkill and others. Once they recognized this structure together with its molecular orientation, Rudin and co-workers simply proceeded to make afilmof two monolayers sandwiching an organic phase in aqueous solution. (Emphasis added) Prom the above account, there is little doubt that the investigators got their inspiration by means of picture-based reasoning. They discovered a feasible technique by means of childlike exploration, as advocated by Piaget 516 Here, a few words of clarification are needed to render the above description non-technical for readers outside of biology and chemistry. A biological membrane is similar to a soapfilm;both are made of two layers of elongated molecules (called amphiphilic molecules) with two asymmetric ends — a water-loving (hydrophilic) end and a water-avoiding (hydrophobic) end. It should be pointed out that a biological membrane is not an exact analogy or visual image of a soap film. A soap film consists of two layers of amphiphilic molecules with the water-loving parts touching each other, whereas a biological membrane consists of two layers of amphiphilic molecules with the water-avoiding parts touching each other. In other words, the two layers of soap molecules stack heads to heads, while sandwiching a layer of water in between. In contrast, the two layers of phospholipid molecules in a BLM
324
F. T. Hong
stack tails to tails, while sandwiching a layer of organic solvents (chloroform and methanol) in between. Therefore, a biological membrane is similar to an inverted soap film. Thus, it took an appropriate inversion of the perceived picture for them to recognize the similarity. All domain-specific knowledge that was needed for the discovery had long been known to most investigators in the field of biological membranes. The true novelty entailed in their discovery was therefore the notion of inversion and their luck stemmed from childlike innocence. If a program were to give a hint to a digital computer in this case, the hint should be anti-symmetry rather than symmetry. But unless the programmer had this preconceived idea, the computer would not get this hint. Again, perhaps the programmer ought to provide an additional hint that urges the computer to consider inversion when it considers symmetry, or even provide a long list of various imaginable hints to entice the computer to consider inversion. But then the practice constitutes a posteri programming. In contrast, Rudin and coworkers apparently were able to cut through the thicket of all kinds of random thoughts and arrive at the idea of inversion by random access (heuristic searching). They were also the true originators of their idea — not the idea of their "benefactors" or the funding agency. In any case, creativity of the programmers still decides how creative a digital computer can be. The subtle difference enunciated here is intimately related to the issue of origination of free will (cf. Sees. 5.15 and 5.18). To advocates of absolute determinism, all creative acts were predetermined and had been preprogrammed by Nature, but mortal humans might not know the logic or strategy ahead of time. To believers of the existence of free will, humans are capable of creative acts that invoke no preexisting logic and cannot be preprogrammed. Therefore, the question regarding whether discoveries have a logic is linked to whether humans have true free will (see Sec. 5). Simon's programs demonstrated that not all discoveries defy logic. What made Popper conclude that all discoveries have no logic can only be speculated here. Perhaps Popper, like Koestler and many other highly creative individuals, considered only high creativity to the exclusion of the possibility that certain types of discoveries can be made by means of a more methodical approach suggested by past creators' insights, or simply by means of rule-based reasoning plus hard work. That is not to say Simon was not in the same league with Popper and Koestler. Simon came very close to elucidating the problem by suggesting that creative problem solving is a matter of recognition. Had he not been single-mindedly deter-
Bicomputing Survey II
325
mined to prove that intuition can be simulated by sequential processes, he would have recognized the subtle difference between parallel processing and pseudo-parallel processing. There is no reason to assume that existing programs have exhausted all tactics of human creativity. There is also no good reason to claim that these programs cannot be further improved. Therefore, the above verdict regarding the dispute of Popper and Simon remains tentative. Strong AI detractors' objections were similar to what transpired in the controversy of primate language capability: the arguments were often based on what the apes [or computers] had not yet done (Sec. 4.18). A skillful programmer simply takes hints from strong AI detractors' objections, and then devises clever algorithms to implement what the computers have not yet done. As investigators continue to discover new strategies for the construction of heuristics and programmers continue to delegate more tasks to the discretion of the computer, there may still be room for major improvements in the future. One should not overlook the precedents that gains made in computers' processor speed and memory capacity often made it possible to absorb the large software overhead needed for performing lengthy, sophisticated sequential (or, rather, pseudo-parallel) processing, thus making previously impossible task feasible. If science and technology history ever teaches us anything, it is the advice "never say never" — except perhaps just this once.
5. Consciousness and Free Will If one thinks that some "semantic behaviorists," mentioned in Sec. 4.18, have gone too far, one should examine the issue of physical and biological determinism. The latter issue is linked to the problem of free will. In the extreme view held by some investigators, free will is an illusion, and human behaviors are preprogrammed. That is, we humans are genetically predetermined to exhibit elaborate behaviors and supreme intelligence, to have a subjective feeling of being conscious and an illusion of being able to plan ahead, and even to doubt whether we really were preprogrammed. Free will and consciousness are intimately related to machine intelligence, and are therefore important topics in biocomputing. A limited version of this section has been published elsewhere.326
326
F. T. Hong
5.1. Consciousness Consciousness (specifically, reflective consciousness) is a cognitive phenomenon that has a distinct subjective component. We all know that it exists on the basis of introspection but its exact nature is so elusive that there is no single definition that can be agreed upon by investigators from different disciplines or even from the same discipline. Prom the point of view of biocomputing, it is desirable to know whether consciousness is a phenomenon ultimately explainable in terms of physics and chemistry. Prom the point of view of machine intelligence, it is desirable to know whether it can be simulated in a man-made machine. In view of the lack of an agreeable common definition, we shall attempt to describe certain important attributes of consciousness. In other words, we shall address the necessary conditions of consciousness and be content with leaving the elucidation of the sufficient conditions for the future. Rosen has presented an epistemological analysis on the question "What is Life?" and pointed out the absence of a list of criteria that characterize life.569'570 The same can be said about consciousness. His analysis threatens to invalidate many of the following discussions, especially about simulation and machine metaphor. On the other hand, the difficulty encountered in the following discussions actually corroborates Rosen's misgiving (discussed in detail in Sec. 6.13). When one deals with consciousness of an individual other than oneself, one must subjectively interpret the objective manifestations of that individual. Perhaps this is because the attributes being considered must be evaluated as a whole to reach an educated guess. A single attribute taken alone is problematic and insufficient for that purpose; only the sum-total counts and a procedure of Gestalt synthesis is required. Thus, in principle, one can be fooled by such objective manifestations or, at least, one may be concerned with being fooled. Let us examine the common attributes that constitute the objective manifestations of others' consciousness. An obvious attribute of being conscious is the ability to respond immediately to an external stimulus in a purposeful manner which is usually unexpected from inanimate objects. The ability of responding to a stimulus immediately and purposefully is a necessary attribute of consciousness but is certainly not sufficient. Such a behavior is also shared by a heat-seeking missile. Furthermore, it is common knowledge that many simple neural reflexes, which are definitely purposeful, proceed without direct intervention and detection of consciousness. As the concept of intelligent materials stipulates, the conformational change of proteins appears to be purposeful and
Bicomputing Survey II
327
immediate, and is unexpected in most inanimate objects (Sec. 7.3 of Chapter 1). Yet, one would not describe the conformational change as a conscious effort. The purposefulness of a conscious activity is sometimes described as being intentional. However, intentionality has a subjective connotation of an introspective nature. On the other hand, there are many immediate and purposeful bodily activities that are hardly under our conscious control, although we are consciously aware of them. A case in point is the act of salivation at the sight of fine food in front of a gathering of formal guests — an embarrassing act that is hard to suppress consciously. Another case is a neurotic (non-psychotic) condition called obsessive-compulsive disorder: the patient is fully aware of his or her own bizarre behavior but can rarely suppress the undesirable behavior and cure the illness by exercising a conscious effort (see, for example, Chapter 2 of Ref. 601). In a living organism, many purposeful behaviors are preprogrammed genetically (instincts), or, at least, require some built-in biological infrastructures (substrates) for their implementation. An attribute that used to elude or preclude preprogramming is the ability of creative problem solving. Although the early AI expert systems could not produce a genuinely novel solution to a problem, increasingly sophisticated computer programs are challenging this seemingly prima facie unique attribute of consciousness and human intelligence (Sec. 4.26). Manifestation of consciousness requires additional and more elaborate behavioral patterns such as emotional expression (expressive of a subjective feeling) and alertness (suggestive of attention). Note that these externally observable and objective behavioral attributes are interpreted in accordance with the observers' own subjective experiences, and, in principle, can be "faked" by a man-made machine. The expression of emotion and alertness is greatly enhanced by the language capability. One of the deeply seated reasons why consciousness appears to be so special to humans can probably be summed up in Rene Descartes' well-known remark "Cogito, ergo sum (Latin: I think, therefore I am)" (Part IV, Discourse on the Method in Ref. 146). It is this self-consciousness about thinking that makes consciousness such a recurrent topic that simply will not go away. However, it is the ability to verbally report one's inner feelings that makes consciousness of others so much more believable. It may also be part of the reason why humans are reluctant to grant consciousness to nonhuman animals which lack the verbal versatility and eloquence to proclaim "Cogito, ergo sum" (e.g., see Ref. 660; p. 29 of Ref. 531; p. 2 of Ref. 264). Modern ethologists are inclined to think that some higher animals possess consciousness, based
328
F. T. Hong
on some elaborate behavioral attributes. Behavioral experiments are continually providing new and stronger evidence that some nonhuman animals sometimes think consciously (Sec. 4.18). The possibility of generating unpredictable behaviors is yet another attribute of consciousness. However, unpredictability is not a unique feature of consciousness. One may recall that emergent phenomena such as the flocking behavior of birds are not predictable but can actually be simulated (Sec. 2.3). One may also recall that some deterministic rules can lead to totally unpredictable outcomes (deterministic chaos) (Sees. 2.4 and 5.14). Unpredictability is, however, often associated with the notion of free will. Unlike other attributes of consciousness, free will is an elusive concept: scientists and philosophers have not been able to reach a consensus regarding whether free will actually exists or just our own subjective illusion. Thus, the free will problem is likely to be the last frontier of consciousness research. A man-made machine suggestive of "free will" would be its most impressive and most feared attribute. For this reason, a discussion of the free will problem in light of biocomputing control laws is in order. 5.2. Controversy of the free will problem Free will is related to consciousness, attention, motivation, creativity and autonomy, which are features that we want in a man-made machine and also features that a man-made machine can use to defy human control. For example, the phrase "conscious decision" implies that a decision is made after cautious deliberation under free will. When a teacher asks students to pay attention, it is assumed that the students' consciousness is capable of focusing their sensory perception on classroom activities. The notion of motivation implies that people are capable of pressuring themselves under their own free will instead of being pressured by others (others' free will?). Creativity implies that an individual can solve a problem by thinking under free will and against dogmatism, and that the discovery or performance is not a mere result of executing a preprogrammed learning and problemsolving algorithm. Autonomy implies that an individual has his or her own agenda and initiatives, and may defy control by other individuals. The topic of free will used to belong to the realm of philosophy and ethics. Owing to major advances made in neuroscience research, biologists began to tackle the problem. Walter's book Neurophilosophy of Free Will698 integrated our contemporary understanding of neurophysiology of the brain with the free will problem. He presented a comprehensive review and ana-
Bicomputing Survey II
329
lyzed the free will problem in terms of three major components: a) whether we are able to choose other than we actually do {freedom or alternativism), b) whether our choices are made for understandable reasons after a rational deliberation (intelligibility)," and c) whether we are really the originators of our choices (origination or agency). He considered the three components on a gray scale of three levels: mild, moderate and strong interpretations. The existence of free will is not as self-evident as some free will advocates are adept to proclaim: "I have the freedom to do whatever I want in defiance of your wishes." In reality, our freedom of will is somewhat restricted. Human behaviors are subject to many constraints, both external and internal, both physical and social, and both surmountable and insurmountable. Some behavioral options are not available mainly because humans are inherently incapable of executing them, such as flying in midair without the aid of external equipment; the physical barrier which humans must overcome to do so is insurmountable. Other options are prohibited by social customs and/or are off limits to us because they are against our own cultural values (such as cannibalism). The "barrier" which humans must overcome in the latter case is not absolutely insurmountable but is prohibitively high for most people in a civilized society. The following example demonstrates the complex processes involved in decision making, in which free will is only one of the many factors. When questioned about his decision, in 1949, to make the recommendation to the U.S. government about pursuing work on the hydrogen bomb, physicist Edward Teller replied:658 "[M]y decision was not a momentary exercise of free will, but a combination of many reasons and many choices, some of which were an expression of my 'free will'." Other factors, such as the news about the persecution of Lev Laudau in the Soviet Union, also entered into his deliberation. As Walter pointed out, free will with the strongest form of alternativism does not exist. Free will does not mean complete freedom to achieve any preplanned act. Free will is the element which, when added to other factors in the deliberation and planning before an act is executed, may make a difference in overcoming a surmountable barrier. Our position is close to what Walter referred to as restrictivism (p. 70 of Ref. 698). Human's introspective feelings are inherently unreliable (e.g., see p. 323 "According to the usage of Walter, intelligibility does not mean the quality of being intelligible or being understandable in the linguistic sense. Rather, it means acting for understandable reasons (p. 43 of Ref. 698).
330
F. T. Hong
of Ref. 52). Thus, one's feeling of free will may well be the consequence of the sum-total of all external sensory inputs together with all internal processes in our body, conscious or unconscious, including one's emotion (but excluding free will per se). As Schrodinger pointed out, the contemplated alternative actions are ultimately inexecutable.595 Almost any example cited in support of the existence of free will can be re-interpreted to deny its existence. It is well known that an individual who habitually acted defiantly can be manipulated to act in the opposite way: the individual might believe that he or she was acting on free will, but he or she simply played into the opponent's hands, and the outcome was precisely what the opponent had expected. It is not the intent of this article to settle the free will problem because, as we shall see, it is impossible to prove or disprove the existence of free will by means of conventional scientific approaches (cf. Sees. 5.10 and 6.13). Instead, we set out to resolve the conflict between free will and determinism. We shall emphasize the component of alternativism and shall demonstrate the compatibility between alternativism and intelligibility (Sec. 5.15). The component of origination will be treated as an unsolved problem. Readers interested in a thorough treatment of the three components of free will should consult Walter's treatise.698 5.3. Conflict between free will and classical
determinism
The classical determinism that was customarily credited to Laplace follows the doctrine of Newtonian (classical) mechanics406 (e.g., p. 226 of Ref. 145 and p. 75 of Ref. 543). According to classical mechanics, the future position and momentum of each and every particle can be accurately and uniquely determined (calculated) if its present position and momentum (or velocity) are accurately known. Here, the key word is uniqueness, meaning that there is a one-to-one correspondence between the present conditions and the future ones (isomorphism in temporal mapping, from one time point to another). Likewise, given the present conditions (positions and momenta), the past conditions of a particle can also be uniquely determined. The latter is implied by the symmetry inherent in the time-reversal invariance of Newtonian mechanics: time reversal can be implemented by means of momentum reversal (as exemplified by running a movie backward). The notion of time-reversal invariance is intimately related to the concept of microscopic reversibility. By envisioning the universe as a collection of a huge number of particles, all future and past events can, in principle,
Bicomputing Survey II
331
be determined from the detailed knowledge of all present events by calculating the position and momentum of each and every particle in the future and in the past, respectively. Thus, if one could exercise free will to alter the present conditions of a certain event, the events in the past that had a causal bearing on the present event could also be altered. This latter conclusion is demanded by the one-to-one correspondence between the present and the past conditions of particles that are involved in the time-evolution of a particular event. The absurd inference that one can exercise one's own free will to alter past events including those that took place before one's birth constitutes the well-known conflict between free will and classical determinism. Classical determinism has a devastating implication beyond science (see also the introduction by F. S. C. Northrop, in Ref. 293). If criminals merely acted out a script written long before their birth, why should the criminals be held responsible for their crime? If future events are predestined, what is the point of education, training, and making an effort or making repentance? Last but not least, if free will does not exist and every event is predetermined, what is the point of vehemently denying its existence and zealously attempting to convince others that free will does not exist? Whether others would be convinced or not would have been preprogrammed and predetermined, regardless of the zealous attempt. Eminent scientists and philosophers of the past century were divided with regard to whether free will actually exists. Schrodinger dismissed free will as an illusion.595 Science philosopher Karl Popper argued in favor of indeterminism.524'525'529 Neurophysiologist John Eccles thought that "[free will] must be assumed if we are to act as scientific investigators" (p. 272 of Ref. 184). William James felt uneasy with the element of chance in what he referred to as soft determinism, but embraced it anyway because the alternative — hard determinism — seemed morally unacceptable.347 In contrast, Wegner, in his book The Illusion of Conscious Will,"™ treated free will as an illusion, but nevertheless took responsibility and morality seriously. He would appear more self-consistent (but less politically correct) had he also regarded responsibility and morality as illusions. In the remaining part of this section, we shall focus our discussion on the notion of determinism and its conflict with free will. Obviously, this intention betrays the author's belief in the existence of free will, albeit for ostensibly nonscientific reasons. We shall consider whether the conflict actually exists, but not whether free will actually exists. The author sincerely requests the readers not to let their judgment be clouded by the author's
332
F. T. Hong
admission of a nonscientific belief.
5.4. One-to-one versus one-to-many temporal mapping Tentatively, let us not make a distinction between the classical and the quantum mechanical versions of the law of motion, and consider an unspecified physical law (referred to as the physical control law of mechanics) that prescribes the future position and momentum of a particle from the knowledge of its present position and momentum. Absolute determinism can be invalidated if either a) one can demonstrate that the initial values of position and momentum are not uniquely and sharply defined, and/or b) one can demonstrate that the control law is not strictly deterministic and thus prescribes subsequent values of position and momentum with dispersion. Superficially, the above described problem looks somewhat out of date and, with the advent of quantum mechanics, should have been settled almost a century ago. Heisenberg's uncertainty principle dictates that both the position and the momentum of a particle at a given time cannot be accurately measured or "determined" at the same time, and this implies that absolute determinism can be invalidated by the first approach. However, the problem cannot be considered fully settled for the following reasons. First, there remain eminent scientists like Einstein who could not quite accept probability (and randomness) as part of the fundamental physical laws and sincerely hoped that truly deterministic mechanical laws would be discovered in the future. Second, mainstream philosophers hold the view that quantum indeterminacy at the microscopic level does not appear to "carry over" to the macroscopic level (e.g., p. 226 of Ref. 145). The persistence of this view might have been the consequence of the influence of Schrodinger's book What is Life? (p. 87 of Ref. 596): To the physicist I wish to emphasize that in my opinion, and contrary to the opinion upheld in some quarters, quantum indeterminacy plays no biologically relevant role in them, except perhaps by enhancing their purely accidental character in such events as meiosis, natural and X-ray-induced mutation and so on — and this is in any case obvious and well recognized. It is important to realize that this "obvious and well recognized" inference was made at a time when chaos had not been known. This inference is no longer self-evident and must now be called into question and re-evaluated.
Bicomputing Survey II
333
In a discussion of determinism and free will, the notion of indeterminacy is sometimes confused with the notion of unpredictability or computational irreducibility (cf. p. 750-753 of Ref. 729). As mentioned in Sec. 2.4, deterministic chaos is unpredictable and is superficially indistinguishable from noise. One might recall that the discussion on nonlinear dynamic analysis indicates that a small difference in the present positions and velocities can lead to a large difference of future outcomes in certain situations but not others. Two separate issues are relevant: the certainty of the initial conditions and the determinacy of the control law. Lorenz's investigation of long-term weather forecasting shows that the outcome is unpredictable, not because of the indeterminacy of the control laws (which is classical mechanics), but rather because of the inability to ascertain the initial values to a reasonably degree of accuracy and because of the high sensitivity of the control law to small differences in initial conditions. As a consequence, a small difference in the initial conditions can be greatly amplified, thus leading to drastically different future outcomes. In other words, a small difference in the initial conditions could subsequently land the trajectory in a very different (non-contiguous) part of the phase space. On the other hand, given a hypothetical case in which the initial conditions can be specified to an arbitrary degree of accuracy, the outcome may still not be predictable because the control law is probabilistic and the dispersion (variance) of the output is too large to be useful in making accurate predictions of an individual outcome. This is the situation that we intend to discuss here. The readers are also referred to an insightful analysis by Matsuno.444 In the terminology used by Matsuno, the probabilistic control law mentioned above constitutes a one-to-many temporal mapping, since a sharp initial condition leads to a time-evolution of a later condition with nonzero dispersion. Dynamics [the control law] acts so as to decrease the number of unconstrained degrees of internal freedom in motion, but does not completely eliminate them; a small number of degrees of internal freedom is admitted and tolerated by the physical law (notion of dynamic tolerance; p. 65 of Ref. 444). In contrast, Newtonian mechanics is said to be a one-to-one temporal mapping with complete fixedness (certitude) of boundary conditions, since time-evolution leads to new boundary conditions with no dispersion. In our discussion of absolute determinism, it is not a question about how accurately one can determine the initial conditions. The question is whether the initial (or boundary) values are uniquely and sharply defined and whether the control law specifies a one-to-one correspondence between
334
F. T. Hong
those initial values (or boundary conditions) and future (or past) values. If so, then there is no indeterminacy of future and past events even if we cannot make predictions with absolute certainty because of our inability to specify the initial conditions to the required degree of accuracy (cf. determinism vs. computability, p. 170 of Ref. 512). If one-to-one correspondence does not hold strictly true, then an alteration of the present condition does not necessarily imply that the past event should also be altered, and the conflict of free will and determinism cannot be established by the time-reversal argument. No matter how small the dispersion of the control law is, its presence undermines the validity of one-to-one correspondence, and the correspondence becomes one-to-many instead. Furthermore, even a control law with zero dispersion sometimes gives rise to multiple outputs. As pointed out by Prigogine, classical mechanics does not always give unique determination of the future.543 In the problem of a swinging pendulum moving along the arc of a vertically oriented circle, the highest possible position of the pendulum is a singularity, which leads to two distinct alternative outcomes (one-to-many correspondence): a) an oscillating motion, or b) a circular orbiting motion. In the phase space, the singularity point is the separatrix separating the two attractors (Sec. 2.4). However, this singularity point cannot be regarded as a true bifurcation point if strict determinism is adhered to, because a trajectory that reaches this point will linger forever there in the absence of noise (exogenous or endogenous): it is an impasse or deadlock leading to nowhere. It is the presence of noise that tips the balance one way or another and converts the singularity into a true bifurcation. The other argument Prigogine proposed was the lack of complete freedom to assign arbitrary initial conditions (independent values to various particles' position and velocity) in light of our knowledge of the microscopic physics of atoms and electrons. For quite some time, Prigogine's view constituted the lone voice that questioned the validity of microscopic reversibility.543'541 In his book The End of Certainty,542 Prigogine finally unleashed his conviction of physical indeterminism (see Sees. 5.13 and 5.14). The singularity which Prigogine pointed out is but one of the many examples in which classical mechanics does not yield unique solutions and classical determinism breaks down under these special conditions. Earman183 reviewed the topic systematically and presented several examples in which uniqueness of the initial value problem for some of the most fundamental equations of motion of classical physics does not hold, both for cases
Bicomputing Survey II
335
of discrete particles and for continuous media or fields. He was short of concluding that such non-uniqueness entails the fallacy of determinism. He issued the warning, but left the problem open. He also pointed out that the laws of classical physics place no limitations on the velocity at which causal signals can propagation and offer the possibility of arbitrarily fast causal signals; it is an intimate consequence of the structure of Newtonian space-time. The possibility of arbitrarily fast causal signals is also demanded by microscopic reversibility (see Sec. 5.12 for a detailed discussion). However, he made no direct attempt to challenge the compatibility between microscopic reversibility and macroscopic irreversibility. He also reviewed determinism in the context of special and general relativity, as well as quantum mechanics. Interested readers should consult his book A Primer on Determinism. Walter698 also reviewed the topic of determinism in physics. He examined two theories of indeterminism based on quantum mechanics, the oldest one (Pascual Jordan) and the most recent one (Roger Penrose and Stuart Hameroff), and rejected both as untenable. According to Karl Popper,526 the validity of a physical theory is only provisional, no matter how well established it can be: the possibility that a radically new and more satisfactory theory of mechanics may become available in the future cannot be ruled out (cf. Sec. 6.1). If and when it happens, a discussion of determinism based solely on a particular theory of mechanics may be invalidated and the whole argument may have to be sent back to the drawing board and to be radically revamped. We therefore choose to tackle the problem at the epistemological level as well as the ontological level without specifying a particular deterministic control law. We will present arguments that address general and fundamental physical issues, such as microscopic reversibility and its implication to the notions of macroscopic irreversibility and time reversal (Sees. 5.13 and 5.14). The discussion will be conducted in such a way as if quantum indeterminacy actually played no role in biology, as claimed by the second group above. Nevertheless, evidence in support of the relevance of quantum indeterminacy in biology will also be presented (Sec. 5.11). 5.5. Compatibilists versus incompatibilists In free will research, there continue to be two debating camps regarding free will in general and alternativism in particular. The compatibilist camp maintains that no conflict exists between free will and determinism (e.g., see reviews by Goldman248 and by Walter698). Consider a case of a person who
336
F. T. Hong
faces a choice of two alternative options A and B. The irrevocable (fixed) outcome of choosing A instead of B conforms to determinism, but the compatibilists maintain that the alternative B could have been chosen and is actionally possible. The individual thus retains the freedom to choose even though the choice of alternative B did not actually materialize. In other words, it is still within one's power to perform an act even though one did not actually perform it. The compatibilist's view was apparently unacceptable to Schrodinger.595 As he pointed out, we tend to feel by introspection that there are several options available for our choice but, eventually, only one of the many options actually comes to realization. This was the reason why he thought that free will is an illusion. The second camp known as the incompatibilist proposed a puzzle similar to what has been described in Sec. 5.3 but without reference to any kind of mechanics or physics (e.g., Ref. 681). Consider the past event P that took place before a particular individual's birth (see Pig. 2) and consider the natural law L that acts on P to determine the unique outcome R. If the individual had the free will to render R false, then this individual should have been able either to render P false and/or to alter the natural law L. This is because not-R implies that not-(P and L) is true, which is equivalent to that either not-P and/or not-L is true (Fig. 2A). The reasoning leads to an absurd inference: an individual could alter the natural law L and/or an event P that took place before his or her birth. This argument led incompatibilists to come to the conclusion that free will and determinism are not compatible. Compatibilists managed to escape from the above dilemma with the following argument (e.g., see Ref. 248). A person may be able to bring about a given state of affairs without being able to bring about everything entailed by it. In other words, for a certain event P, rendering R false does not necessarily render P false or alter the natural law L (Fig. 2B). This argument essentially ruins the assumed determinism because, as Fig. 2B indicates, the argument implicitly requires that the prior event P together with the natural law L lead to multiple but mutually exclusive outcomes, R and not-R (one-to-many correspondence). The determinism implicitly held by compatibilists is a weaker form of determinism; it is more appropriately termed relative determinism rather than absolute determinism (Sec. 3 of Chapter 1). Thus, compatibilists have inadvertently altered the meaning of determinism implied in the original concept. Still, there is an alternative but trivial interpretation of Fig. 2B, that is, P and R are not causally related. However, apparently the latter was not what compatibilists had in mind.
Bicomputing Survey II
337
Fig. 2. Conflict of free will and classical determinism. A. The past event P, together with natural law L caused the result R to happen. Suppose that R were rendered false by means of free will. This could happen only if a) the natural law L could be altered, or b) the past event P could be rendered false, or c) the natural law L could be altered and, at the same time, the past event P could be rendered false. These conclusions are all absurd. This is the incompatibilist view. B. The compatibilists claim that even though free will could have rendered R false, not every past event P that had led to R could be rendered false, and therefore the alleged conflict does not exist. However, the diagram shows that the event P could cause two mutually exclusive outcomes: R and not-R. Therefore, either the determinism is not absolute, or there is no cause-effect relationship between P and R to begin with. The compatibilists inadvertently invoked relative determinism. (Reproduced from Ref. 323 with permission; Copyright by Plenum Press)
Mathematician David Ruelle,575 in his book Chance and Chaos, dismissed the conflict between free will and determinism as a "false problem." His main objection was that a departure from determinism would entail making a decision by flipping a coin. Ruelle raised the following question: "[C]ould we say that we engage our responsibility by making a choice at random?" Here, Ruelle implied that making a choice at random is morally incompatible with a responsible choice. He also thought that ourfreedomof choice is an illusion, meaning that being responsible severely restricts one's freedom of choice. Ruelle's point is not trivial, and we shall return to this
338
F. T. Hong
point later. Interested readers are suggested to consult van Inwagen's681 and Walter's698 treatises. An interesting essay regarding the notion of responsibility and its relation to the free will problem and determinism can be found in Chapter 10 of Pinker's book The Blank Slate.519 In the subsequent discussion, three separate attempts will be made to resolve the conflict between free will and determinism. The first attempt will invoke the probabilistic control laws in the standard repertoire of biocomputing processes (Sec. 8 of Chapter 1). I shall argue that strict one-to-one correspondence between the input and the output of biocomputing does not hold because of the intervention of these probabilistic control laws. The second and the third approaches will also be attempted by examining the validity of the claim that quantum indeterminacy plays no role in biology and the validity of the concept of microscopic reversibility, respectively. If quantum indeterminacy is relevant in biological processes, absolute biological determinism can be debunked by invoking the uncertainty principle. If microscopic reversibility is not strictly valid, then the time reversal of Newtonian mechanics cannot be justified. 5.6. Randomness and determinism in microscopic dynamics As the examples cited in Chapter 1 demonstrated, it is evident that a living organism recruits a large number of random processes for information processing. The control laws governing the input-output relationship in biocomputing cover almost the entire gray scale of determinism, ranging from highly random to highly deterministic but never absolutely deterministic. The control laws are also time-dependent and/or environment-dependent in many cases. Both exogenous noise (from the environment) and endogenous noise are involved in biocomputing (Sec. 5.8). In dealing with the nature of "noise," we must address the following questions. Does noise enter biocomputing because it is an inevitable outcome linked to participation of biochemical reactions in the process? Or, does noise enter biocomputing because it is actively recruited (by evolution) to participate in biocomputing? Does noise appear in biocomputing because we, as investigators, have not understood the deeper kinetic determinants that generate the noise and, therefore, the noise would have been predictable in its kinetic detail had we known these determinants, or because we do know or may know but it is ultimately impractical to keep track of all those external contingencies imparted by unrelated (or not directly
Bicomputing Survey II
339
Fig. 3. Bifurcation point in biochemical reaction pathways. X is a metabolite which participates in four different biochemical pathways illustrated schematically as passages across the barriers formed by four vertical walls of the boxes. The barrier height is the activation energy. A. All four pathways are uncatalyzed. With comparable barrier heights, the four pathways are almost equally probable. B. Only pathway 1 is catalyzed. The fate of metabolite X becomes more deterministic (to be converted to Y) but residual errors persist as uncatalyzed side reactions. The reverse reactions are not shown. See text for further explanation. (Reproduced from Ref. 7 with permission; Copyright by Garland Publishing)
related) but concurrent processes? First, let us consider microscopic dynamic processing (biochemical processes in the cytoplasm). Numerous biochemical pathways form intricate networks in various cellular compartments. Many biochemical intermediates (metabolites) are shared by different pathways. Each of these intermediates presents a bifurcation point. As illustrated schematically in Fig. 3, one such intermediate is depicted as a ball X in the box. Each of the four vertical walls is a barrier for each of the four possible biochemical reactions which transform X into various metabolites. If these four reactions are uncatalyzed, the metabolite X (hidden inside the box in Fig. 3A) can follow each of the four possible paths with a comparable probability but with difficulty (low probability): the higher the barrier the slower the rate of reaction (Fig. 3A). If, however, pathway 1 becomes catalyzed via activation of the corresponding enzyme, the corresponding barrier height will be lowered considerably and the corresponding reaction speed will increase considerably, as compared to that of pathways 2, or 3, or 4 (Fig. 3B). Under catalysis, the destiny of metabolite X becomes more deterministic
340
F. T. Hong
but not completely error-free, because small fractions of X still go through the three uncatalyzed pathways, known as side reactions in chemistry. An additional source of uncertainty arises from the presence of reverse reactions. In general, the conversion from X to Y is never complete even if all side reactions (pathways 2, 3, and 4) are negligible and neglected. The reverse reaction that converts Y back into X (not shown in Fig. 3) is often catalyzed by the same enzyme. The speed of the reaction is determined by the barrier height (called activation energy), whereas the net driving "force" (or, rather, potential energy) of the reaction in the forward direction is determined jointly by the difference between the (Gibbs) standard free energy level (Go) of the product Y and that of the reactant X, on the one hand, and by the product/reactant ratio (ratio of the amount of Y to X) at a given time, on the other hand. The net driving force vanishes and the net conversion stalls when the equilibrium is reached, because the forward reaction is eventually balanced by the reverse reaction. Therefore, the product/reactant ratio at equilibrium is quantitatively related to the difference of the two standard free energy levels (AGO) (assuming constant temperature and pressure). There are two ways of enhancing the forward conversion. First, having a negative AGO favors the forward reaction, Second, by removing the product as soon as it forms and/or by replenishing the reactant as soon as it becomes depleted, one maintains a negative value of the free energy difference, AG (not AGO), thus keeping the forward reaction continuing. In summary, the control laws in microscopic dynamics of biocomputing are not strictly deterministic. The control laws range from highly random, if uncatalyzed, to highly deterministic with residual "error," if catalyzed. Enzyme catalysis curtails randomness and pushes the control laws closer towards the deterministic end on the gray scale of determinism. It is of interest to note that translation of structural genes and protein synthesis, together with assorted mechanisms for the disposal of protein folding faults, is one of the most deterministic events in microscopic dynamics. However, even that process carries an exception (a bifurcation point): some "soft-wired" organisms allow for two alternative pathways of RNA processing (Sec. 11 of Chapter 1).
Bicomputing Survey II
341
5.7. Randomness and determinism in mesoscopic and macroscopic dynamics Regarding the instrusion of randomness, the mesoscopic and the macroscopic dynamics do not fare better than the microscopic dynamics (Sees. 5.2 and 5.4 of Chapter 1). Mesoscopic networks are only loosely maintained in the fluid environment of the membrane. The bioenergetic coupling of electron transport to phosphorylation is mediated by a mesoscopic state of transmembrane proton gradients, where randomness enters and a bifurcation point exists for protons to diffuse either to the phosphorylation site or to the bulk region (Pacific Ocean effect). However, the randomness is curtailed by a specific mechanism of enhanced lateral proton mobility. In other words, the process of reaching the phosphorylation site by protons is not a random search in a three-dimensional space but a heuristic search that is somewhat confined to the membrane surface (reduction-of-dimensionality effect). As described in Sec. 8 of Chapter 1, the mesoscopic dynamics of opening and closing of ion channels is not governed by a well-defined control law but rather by a probabilistic control law (cf. self-organizing criticality, Sec. 2). The randomness at the mesoscopic level is under the control of the membrane potential. In contrast, at the macroscopic level of neural network interactions, the control law is well defined, i.e., the dispersion is much diminished though not completely absent. What is remarkable here is the transformation of weak causality at the mesoscopic level into strong causality at the macroscopic level. In addition to a direct action on ion channels, macroscopic signals also affect microscopic events via G protein-coupled processes and second messengers.238 Cyclic AMP, which is a key second messenger and a diffusible intracellular component, participates in numerous microscopic biochemical pathways, especially those of signal transduction. Cyclic AMP is frequently involved in the bifurcation of biochemical reactions. Furthermore, its role is greatly enhanced by the switching action that it exerts on biochemical reaction pathways and on ion channel functions (amplification mechanism): cyclic AMP initiates phosphorylation of many enzymes84 and even ion channels,417'301 indirectly via the activation of cyclic AMP-dependent protein kinase (Sec. 7.2 of Chapter 1). In other words, cyclic AMP is the key second messenger that exerts its effect by converting highly random control laws into highly deterministic control laws. Similar comments apply to Ca2+ (Sec. 7.4 of Chapter 1).
342
F. T. Hong
The control laws at various levels and for various processes can be further affected by bifurcation points encountered in non-neural processes. For example, the availability of molecular oxygen strongly affects the availability of ATP, which is the "fuel" for the majority of molecular machines. Yet, among many different factors, the availability of oxygen depends on its transport from the lungs to tissues via the cardiovascular system, which comprises a large number of bifurcations (pardon the pun). Distribution of molecular oxygen to various body regions via the branching vascular trees is under neural control (autonomic nervous system). The vasculature can be regarded as an extension of the macroscopic (neural) network dynamics. Similarly, the hormonal system regulates and coordinates the functions of various organ systems, in a network-like fashion. The action of hormones is even less deterministic than the neural control, since hormones are distributed by blood circulation. Of course, life processes are not always at the vicinity of a major bifurcation point. There are many stable processes that are governed by a negative feedback scheme (point or periodic attractors) and are insensitive to changes of initial or boundary conditions (concept of homeostasis in physiology). The lack of sensitivity is made possible partly by the creation of a somewhat isolated internal environment for the majority of cells (Claude Bernard's milieu interieur). Thus, bifurcation points are present from time to time and from place to place, but not continuously and not ubiquitously. There are many case reports about genetically identical (monozygotic) twins that were reared separately since birth (see Sec. 4.24). The similarity, not only of characteristic facial features but also of mannerisms and personality, was often unnervingly striking. Here, randomness seems to play a minor role. In contrast, at the moment of a sudden cardiac fibrillation, the difference between life and death of the patient is critically dependent on the subtle body conditions of the patient (nutritional status, bodily defense mechanisms, general conditions, etc.), as well as randomness incurred by external agents, e.g., lack of an available ambulance, delayed arrival of an ambulance caught in a traffic jam, malfunction of a defibrillator, incompetence of the rescue team, etc. Here, coupling of a sensitive bifurcation point to external and uncontrolled factors lead to an irrevocable outcome that was beyond the control of free will. Here, randomness exerts a great deal of influence because it is a major bifurcation point: life or death. An example indicating that life processes are not always metastable and precarious is the restoration to life of frozen bacteria after thawing.
Bicomputing Survey II
343
Apparently, the momentum of individual moving molecules is not crucial. What is important is perhaps the structural integrity and the preservation of chemical properties in each subcellular compartment, e.g., concentration gradients. It is well known that cell death by freezing is caused primarily by the formation of ice crystals in the cytoplasm and the accompanying destruction of the cellular structure and disruption of the internal environment. 5.8. Endogenous noise Up to this point of the discussion, it appears that inclusion of noise is an inevitable side effect of recruiting chemical reactions for biocomputing. It also appears that noise is "tolerated" but regulated, by means of variable control laws, in biocomputing. However, for certain biocomputing processes, noise appears to be an essential component for biocomputing and was actively recruited by evolution to enhance biocomputing. The most intriguing example is the use of background noise to enhance the reception of weak signals (a phenomenon known as stochastic resonance), as documented in the crayfish tail fan mechanoreceptor.715'474 The signals generated by low-grade vibrations of water when a crayfish is being approached by a predator are rather weak. However, the presence of background noise (either generated internally or imposed externally) actually enhances the reception because the detection is based on a nonlinear sigmoidal threshold control law (cf. Fig. 17B of Chapter 1). In the language of nonlinear dynamics, the detection threshold presents a bifurcation point, i.e., the juncture of a positive feedback (self-reinforcing) and a negative feedback (self-curtailing) regime (for the molecular events of nerve excitation, see any textbook of physiology or a brief sketch in Sec. 8 of Chapter 1). The presence of background noise at times brings down a borderline stimulus below the threshold, but at other times pushes it over the threshold. If we were to subscribe to mathematical idealization, the threshold would be a level of unstable equilibrium: a point where the system neither turns on nor turns off. In real life, this never happens because the precarious equilibrium is always ruined by fluctuations of the membrane potential and/or fluctuations of the threshold level itself, and by additional noise incurred by external agents such as factors causing fluctuations of the body temperature (which, in turn, changes the rate of biophysical and biochemical processes). Experimental evidence indicates that stochastic resonance can also op-
344
F. T. Hong
erate at the mesoscopic level; it does not require the intervention of a macroscopic mechanism, neural signal processing. Bezrukov and Vodyanoy65 studied a voltage-dependent ion channel formed by an antibiotic, alamethicin, when it is incorporated into an artificial black lipid membrane. They demonstrated that noise enhances signal transduction mediated by these ion channels. Thus, noise is not just an unavoidable nuisance but at times can be "beneficial" to the performance of ion channels. By far, one of the strongest cases against absolute biological determinism is the presence of a probabilistic control law that governs the opening and closing of Na+ channels (and other ion channels) in nerve excitation. Generation of a nerve impulse (action potential) is the major switching event in neural signal processing. It is essentially a digital process since an action potential is either generated or not; the amplitude of an action potential conveys no gray-scale information. The gray-scale capability of neural signal processing is implemented in terms of variation of the interval of a train of action potentials, i.e. the frequency of nerve impulses.
Fig. 4. Schematic showing the time course of the macroscopic Na+ current in the presence and absence of ion channel fluctuations. Trace A is the depolarization step of the membrane potential that triggers the activation of Na+ channels. Trace B is a typical real macroscopic Na+ current measured by the voltage clamp technique with a conventional intracellular glass-pipette microelectrode. Trace C is a hypothetical singlechannel current measured by means of patch-clamping. The hypothetical channel opens and closes exactly once upon stimulation with an precise period of delay, r. Trace D is the hypothetical macroscopic JVa+ current when the fluctuations were eliminated and some 300 channels turned on and off in unison. Note the difference in the vertical scale of Trace D. (Reproduced from Ref. 323 with permission; Copyright by Plenum Press)
According to the well-established mechanism proposed by Hodgkin and Huxley, 302 the switching mechanism depends critically on the voltage-
Bicomputing Survey II
345
induced transient increase of the Na+ conductance and the voltage-induced but delayed increase of the K+ conductance of the nerve or muscle membrane. These highly deterministic events lead to the generation of an action potential. Well-defined control laws govern the voltage and time dependence of these transient conductance changes. In response to a sudden increase (depolarization) of the membrane potential, the Na+ conductance increases rapidly, within a millisecond, to a peak value (a process called activation) and then declines to its basal value in a matter of a few milliseconds (called inactivation) (Figs. 4A and 4B). The time course of activation and inactivation of the macroscopic Na+ conductance is well defined and highly reproducible (small dispersion). So is its voltage dependence. In other words, the voltage-induced transient variation of the Na+ conductance is highly deterministic at the macroscopic level. The Na+ conductance is a macroscopic quantity, and its constituents are the miniature unitary conductances of numerous Na+ channels residing in a nerve or muscle membrane. Do these tiny ion channels open and close, in unison, with the same time course as that of the macroscopic conductance? If so, the macroscopic Na+ conductance would simply be an integral multiple of the unitary conductance. Patch clamp measurements of the unitary conductance of individual Na+ channels indicate that the opening and closing of an individual Na+ channel does not follow the same time course as that of the macroscopic Na+ conductance at all, and, furthermore, they do not open and close in synchrony.619 In fact, the opening and closing is a digital process and the unitary conductance is quantized: a channel either closes or opens and the transition from zero to a fixed value is quite abrupt and appears intermittently with irregular intervals (see Fig. 18 of Chapter 1). The voltage dependence and well-defined time course of the macroscopic Na+ conductance is therefore a manifestation of the collective behavior of a large number of Na+ channels that switch on and off stochastically. The control law governing the voltage dependence and time course of the macroscopic Na+ conductance does not manifest explicitly at the level of an individual channel. Rather, the control law at the channel level is probabilistic in nature, analogous to the law governing the frequency and severity of earthquakes354 (Sec. 2.1) or the law governing the beta decay in particle physics: individually unpredictable but collectively predictable. For a detailed discussion of the fractal nature of ion channel fluctuations, see Sec. 8 of Chapter 1. Now a skeptical inquirer will probably raise the famous objection pioneered by Laplace (Chapter 2 of Ref. 406; see also pp. 226-228 of Ref. 145).
346
F. T. Hong
Laplace asserted that it is self-evident that "a thing cannot occur without a cause which produces it," which he stated as the principle of sufficient reason. He pointed out that, in ignorance of the true cause, people often link a natural event to final causes, if it occurs with regularity, or to chance (randomness), if it occurs irregularly. He claimed that the principle of sufficient reason extends even to actions which people regard as indifferent, such as free will. He thought that it is absurd to assume free will without a determinative motive: an indifferent choice, "an effect without a cause," or, as Leibniz said, "the blind chance of the Epicureans." He thus dismissed free will as an illusion of the mind. He also claimed that there is no doubt that "the curve [trajectory] described by a simple molecule of air or vapor is regulated in a manner just as certain [deterministic] as the planetary orbits; the only difference between them is that which comes from our ignorance." Laplace indicated that these imaginary causes — i.e., indifference or randomness — had gradually receded with the widening bounds of knowledge. He was quite confident that eventually they would completely vanish with the advent of sound philosophy, and thus the true but hidden causes would be fully uncovered. Henceforth, we shall refer to the above claim as Laplace's "hidden cause" argument.
There is some truth to Laplace's claim. For example, conversations carried on in other phone lines within the same cable are practically noise to a particular line of concern. Moreover, pseudo-random numbers can be generated by a deterministic rule (cf. Ref. 288). Chaotic behaviors can arise from unstable systems governed by deterministic control laws (deterministic chaos). In general, there are two types of measurement errors caused by uncontrolled hidden parameters. An error is generically referred to as noise if the hidden parameters are randomly distributed, or as systematic error (or bias) if the uncontrolled parameters are not randomly distributed. Subsequent elucidation of these hidden parameters (causes) sometimes constitutes an advance in our knowledge. Thus, it is often difficult to identify the noise source prior to its elucidation. Let us see if the same claim is valid for the case of Na+ channels in a nerve or muscle membrane. Presently, it is not possible to alter the channel kinetics in a systematic and predicable way, and, in particular, it is not possible to eliminate the channel fluctuations. However, it is conceivable that, in the future, a sufficient understanding of the detailed molecular dynamics, along with detailed knowledge of the channel structure, may enable investigators to discover the factors that lead to the fluctuations in the opening and closing events of ion channels, and to mathematically derive the con-
Bicomputing Survey II
347
trol law. It may then be possible to control these factors so that all Na+ channels respond to a sudden depolarization with the same well-defined time course, namely, opening abruptly in unison after a fixed time interval r of delay, following the stimulus (where r could be zero), staying open for a fixed but short duration, and then closing abruptly (Fig. 4C). Under such conditions, the macroscopic Na+ conductance would have exactly the same time course as that of an individual channel but have a much greater amplitude: a sharp and prominent lone spike with a short duration (Fig. 4D). It is apparent that the macroscopic Na+ conductance would not rise and fall with the typical time course of activation and inactivation found experimentally. Thus, the elimination of factors responsible for the apparently irregular and unpredictable sequence of opening and closing of ion channels would also render these channels unsuitable for the generation of "normal" action potentials. That the "abnormal" action potential would not work can be made clear by the following consideration. Ventricular muscle cells of the heart have an unusually long plateau phase of depolarization of the membrane potential (about 200 ms), i.e., the potential becomes positive for more than 200 ms, instead of just a few ms as in a typical nerve cell. Such a prolonged depolarization phase serves a vital purpose; it makes possible and ensures the chain of events that eventually leads to contraction of ventricular muscle fibers (excitation-contraction coupling). Here, it is important to realize that the contraction is not directly triggered by a depolarization of the membrane potential, but rather by an elevated intracellular Ca2+ concentration. An initial and modest depolarization of the membrane potential rapidly activates Na+ channels (called fast channels) in the muscle membrane. The ensuing massive Na+ influx across the membrane causes an additional rapid depolarization of the muscle membrane potential (called "upstroke" or phase 0). The rapid depolarization then activates the Ca2+ channels in the muscle membrane. The resulting increase of the Ca2+ conductance leads to an enhanced Ca2+ influx into the muscle cell (cytoplasm or sarcoplasm) where contractile elements are located. However, it takes time for the enhanced Ca2+ influx to raise the intracellular Ca2+ concentration to the threshold level for muscle contraction: the prolonged activation of Ca2+ channels during the plateau phase (phase 2) fulfills this requirement. Although the upstroke, caused by activation of fast Na+ channels, is required to activate Ca2+ channels, the maintenance of the enhanced Ca2+ conductance during the plateau phase does not require sustained activation of fast Na+ channels. These fast Na+ channels, like their counterparts
348
F. T. Hong
in a nerve membrane, become spontaneously inactivated in a couple of milliseconds, and the membrane potential repolarizes slightly during phase 1: there is no Na+ activation during phase 2. In order to maintain the prolonged activation of Ca2+ channels, two conditions must be met. First, Ca2+ channels must not become inactivated rapidly like Na+ channels. Second, the membrane potential must remain depolarized (positive) for a long time. This is because Ca2+ channels are voltage-dependent, and require an elevated potential to remain open. A premature repolarization and/or inactivation could diminish the Ca2+ conductance and abort the excitation-contraction coupling. The first condition is met by the intrinsic property of L-type Ca2+ channels, called slow channels (where "L" stands for /arge conductance and Zong-Zasting duration).676'677 In contrast, T-type Ca2+ channels which inactivates rapidly cannot fill the shoes (where "T" stands for tiny conductance and transient duration). The second condition requires an intricate interplay of Ca2+ and K+ channels: "cooperation" of K+ channels is essential. Since the Ca2+ influx and the K+ efflux exert opposite effects on the membrane potential, an almost exact balance of the two fluxes is required to maintain the plateau phase of depolarization. This delicate balance is made possible by a concerted interplay of delayed activation of several different types of K+ channels48 — each with its own appropriate timing and time course — and the inherent slow inactivation of L-type Ca2+ channels. Interestingly, the K+ conductance actually decreases about five fold during phase 2 as compared to its magnitude during phase 1; the slight and rapid repolarization during phase 1 is accompanied by activation of a transient outward iiT+channels which are no longer active during phase 2. The K+ efflux diminishes during the plateau phase because the outward K+ current is now carried, instead, by a type of slow-activating channel called inward rectifier.236'491 Inward rectifier K+ channels work like a trap door. To appreciate how inward rectifier K+ channels work and how the K+ flux interacts with the Ca2+ flux, several crucial factors and events must be kept in mind. Readers who are familiar with these factors and events may skip this paragraph with no loss of continuity. • There are two kinds of "forces" that drive the ion fluxes: a) the membrane potential (electrostatic force) which drives both K+ and Ca2+ from the side of the membrane with a positive potential to the negative side, and b) the concentration gradient that drives a particular ion from the side with a high concentration of that
Bicomputing Survey II
349
particular ion to the low-concentration side. • The K+ and Ca2+ concentration gradients are poised in opposite directions: K+ is high inside, whereas Ca2+ is high outside, and, therefore, the concentration-driven K+ and Ca2+ fluxes are always pointing in opposite directions. On the other hand, both K+ and Ca2+ are under the control of the same membrane potential — there is only one membrane potential in a given cell — and, therefore, the potential-driven K+ and Ca2+ fluxes are always pointing in the same direction. • The magnitude of the potential-driven K+ and Ca2+ fluxes is regulated by both the magnitude of the common membrane potential and the individual K+ and Ca2+ conductances, respectively, whereas the magnitude of the concentration-driven K+ and Ca2+ fluxes is regulated by the steepness of the individual K+ and Ca2+ concentration gradients and the individual K+ and Ca2+ conductances, respectively. • The control law regulating an ion conductance is specific to each type of ion channel. The ion conductance of K+ or Ca2+ depends not only on the common membrane potential but also on an intrinsic property that determines how fast a given type of ion channel can be turned on or off by the membrane potential and whether it maintains a prolonged activation under a sustained membrane potential. Although changes of individual concentrations are gradual, changes of the common membrane potential can take place rapidly (in the millisecond range). As a consequence, the direction of some, if not all, net ion fluxes — a net flux is defined as the algebraic sum of potential-driven and concentration-driven fluxes — can be reversed abruptly by changing the common membrane potential. On the other hand, the concentration-driven K+ and Ca2+ fluxes can never be reversed under physiological operating conditions; however, their magnitude can be changed rapidly by an abrupt change of conductances, caused by an even more abrupt change of the membrane potential. Also note that the complex "circular" interactions: the change of any ion flux changes the common membrane potential, which, in turn, changes all ion conductances; the latter changes cause changes in individual fluxes, thus completing the feedback cycle (which can be either positive or negative). However, interludes of a steady state in which all three parameters — potential, conductances and fluxes — remain constant (bar-
350
F. T. Hong
ring fluctuations) are possible if the interactions lead to a negative feedback cycle. Thus, when the membrane potential is more negative than — 70mV, the large negative potential overcomes the transmembrane K+ concentration gradient, resulting in a net influx of K+, with considerable ease because of the large K+ conductance; the net K+ influx is large. When the membrane potential is less negative than —70 mV, the concentration gradient overpowers the membrane potential, resulting in a net efflux of K+ with considerable difficulty because of the diminished K+ conductance; the net K+ efflux is rather small. The disparity of the two magnitudes (influx vs. efflux) constitutes the one-way trap door or, rather, rectification. Furthermore, when the membrane potential approaches zero or turns positive, the K+ efflux becomes even smaller. In fact, it becomes sufficiently small to match the not-so-intense Ca2+ influx, thus averting a premature return of the membrane potential to the resting level and making possible the long plateau phase. The gate of these same inward rectifier K+ channels later turns on during phase 3 (because of a large negative potential), thus hastening the rapid repolarization. Had all ion channels been made to open and close abruptly in unison (as in Fig. 4D), the time course of the ventricular action potential would be too drastically altered to make excitation-contraction coupling possible, for the following reasons. Although the upstroke might still be sufficiently rapid and forceful — i.e., having a sufficiently high slew rate, in engineering jargon — to activate a sufficiently large number of Ca2+ channels, the elimination of the subtle interplay of various types of K+ channels and the drastically shortened duration of Ca2+ channel activation would abolish the prolonged plateau phase and prematurely initiate rapid repolarization. The ensuing rapid repolarization would further diminish the Ca 2+ influx since Ca2+ channels depend on a near-zero or positive membrane potential to sustain its activation. The rapid repolarization also augments the K+ efflux since inward rectifier K+ channels also depend on a near zero or positive potential to maintain a diminished K+ efflux. Once the balance is tipped in favor of repolarization, as normally occurring in the late phase 3 of the ventricular action potential, a positive feedback mechanism returns the membrane potential precipitously to the resting level (phase 4): repolarization diminishes Ca2+ influx and enhances K+ efflux, whereas diminished Ca2+ influx and augmented K+ efflux further hasten repolarization, thus establishing the loop of a vicious cycle. Thus, a (hypothetical) synchronized opening and closing of Ca2+ chan-
Bicomputing Survey II
351
nels alone would eliminate the prolonged plateau phase even if Na+ and K+ channels were to maintain their probabilistic time course of fluctuations. The situation would become even worse when all ion channel fluctuations ceased to exist. In other words, synchronous opening and closing of ion channels in the ventricular muscle membrane would seriously compromise the function of the ventricular muscle unless the entire contractile machinery were radically "redesigned." The above scenario is not merely an imagination or a thought experiment. In fact, a more gentle manipulation of ion concentrations so as to diminish the rate of upstroke causes a cardiac ventricular cell to exhibit an "abnormal" action potential that lacks a plateau phase and actually looks like that of a cardiac atrial cell. For other types of action potentials with shorter durations than the ventricular action potential, the (hypothetical) alterations of their time course caused by the synchronous opening and closing of ion channels might not be as dramatic as in the heart muscle. However, the subtle effects of the altered action potential, on the neurotransmitter release at the axonal terminal, might propagate beyond the synapse in an insidious way and might then reveal its effect when the propagating signal reached another bifurcation point in the subsequent chain of events. In fact, an ion channel disease (channelopathy) known as hyperkalemic periodic paralysis (an autosomal dominant disorder) is caused by a defective inactivation mechanism of Na+ channels.105 In the case caused by a point mutation (Metl592Val), the activation of Na+ channels is normal, but the channels fail to become inactivated rapidly, resulting in a burst of reopening activities and prolonged open durations (Fig. 5). There is little doubt that the noise exhibited by ion channel fluctuations is endogenous in nature and was recruited by evolution for specific functional purposes. From the "normal" functional point of view, ion channel fluctuations are also a necessity. For the unitary conductance of many individual Na+ channels of the same kind to collectively give rise to a well-defined time course of the known macroscopic Na+ conductance with appropriate activation and inactivation, channels must not open and close in unison. Rather, channels must open and close in accordance with a well-orchestrated control law in such a way that each channel takes a turn to open and close, so that the time-dispersion (variable delay parameter r) of the unitary events collectively gives rise to a highly deterministic time course of activation and inactivation of the macroscopic Na+ conductance. Specifically, most channels must open during the first few milliseconds, whereas fewer and fewer channels should open later. That this is indeed the case is evident
352
F. T. Hong
Fig. 5. Impairment of inactivation of HyperPP Na+ channels. Unitary Na+ currents were elicited with depolarizing pulses in cell-attached patches on normal and HyperPP (Metl592Val) myotubes. Ensemble averages (bottom traces) show the increased steadystate open probability caused by disruption of inactivation in the mutant. (Reproduced from Ref. 105 with permission; Copyright by Annual Reviews)
by a close examination of the nine traces of opening and closing of Na+ channels shown in Fig. 18 of Chapter 1: opening and closing events in all nine traces take place within the first five milliseconds, and no opening and closing appears later than 5 milliseconds after the step depolarization. From the point of view of engineering simulations, the behavior of the + Na conductance, as depicted by Fig. 4B, can be simulated, in a deterministic way, by an array of a large number of programmable gates, of which each represents a channel-opening unitary event with appropriate delay
Bicomputing Survey II
353
from the onset of stimulation. In this way, members of the gate array open in succession with increasing delays and/or concurrently, with many more gates turning on during the first few milliseconds, but with fewer and fewer gates turning on subsequently, much like the performance of members of a bell choir. However, this simulation is not an exact duplication of the real events and may never fully capture the rich dynamics of the real channels, just like parallel processing can be simulated sequentially by pseudo-parallel processing (Sec. 4.11; see also Rosen's view regarding simulation, Sec. 6.13). The above-suggested simulation using parallel elements to replace the fluctuations of a single element is not as absurd as it may sound. Even Nature did it in the heart muscle. Presumably, it would be difficult for Nature to "design" a single type of ion channel that exhibits the type of complex "waxing-and-waning" and somewhat non-monotonic time course of the K+ current in the cardiac ventricular muscle. Thus, Nature resorted to the use of parallel elements: a variety of gates in an array identifiable as various types of K+ channels. In other words, Nature recruited specialists, but demanded each and every specialist to be somewhat versatile — to be a limited generalist — in its own way, as compared to their mechanical or electronic counterparts. Here, by being versatile I mean that each individual channel does not merely turn on once, thus covering only a tiny time interval, but, instead, it turns on and off for a while and covers a significant time interval — e.g., the entire phase 2 — by virtue of channel fluctuations. Ion channel fluctuations are most likely an intrinsic property of the channels, instead of the consequence of interference by external independent events, i.e., it is endogenous noise. Partial randomness is thus an active participant in the process of shaping the time course of the switching event rather than as a passive bystander or as an unwanted but unavoidable external intruder. This role is quite different from the role played by noise in causing incomplete switching of metabolic pathways described in Fig. 3. We are not certain whether the incomplete switching of metabolic pathways is desirable or undesirable for biocomputing. However, in the case of the Na+ channels, there is little doubt that partial randomness is indispensable for normal channel operation. The mechanism of partial randomness in ion channel fluctuations is unknown. It is generally assumed to arise from conformational changes between various channel states (at least three states for Na+ channels: closed, open and inactivated). Using the approach of nonlinear dynamics, Chinarov et al. 123 have shown that the bistability of ion channels can arise from interactions of the ion flux through ion channels with a conformational degree
354
F. T. Hong
freedom for some polar groups lining such pores. Hanyu and Matsumoto279 suggested that ion channel cooperativity mediated by membrane-cytoskeletons may be the source of randomness. Intriguingly, Lev et al.416 demonstrated that pores formed in a non-biological synthetic polymer membrane also exhibit fluctuations between high and low conductance states. The investigators proposed a mechanism based on the ionization of fixed charges within a channel or at its mouth of opening.379 Here, the main source of partial randomness in channel kinetics is quite local (endogenous noise), and is not a consequence of intrusion by unwanted and unrelated external agents. Of course, fluctuations in ionization, being a thermal phenomenon, has a distinct contribution from the environment, via fluctuations of the body temperature. However, bioorganisms were "designed" to operate in a noisy environment anyway. 5.9. "Controlled" randomness in a hierarchical biocomputing system A skeptical inquirer who believes in strong causality often feels uncomfortable with anything less than strict (absolute) determinism in the control laws. Behaviorism pioneer B. F. Skinner argued that the prerequisite of doing scientific research in human affairs is to assume that behaviors are lawful and determined, whereas the assertion of an internal "will,", which has the power of interfering with causal relationships, also makes the prediction and control of behaviors impossible (pp. 6-7 of Ref. 636). In other words, turning away from determinism is tantamount to abandoning the possibility of a science of human behaviors because, he thought, science is supposed to be deterministically predictive as well as explanatory. Earman thought that we were being presented with a false dichotomy: determinism versus non-lawful behavior, or determinism versus spontaneity and randomness (p. 243 of Ref. 183). In contrast, Earman indicated that he had seen not the slightest reason to think that the science of physics would be impossible without determinism and that denying determinism does not push us over the edge of the lawful and into the abyss of the utterly chaotic and non-lawful. Most likely, Earman had relative determinism in mind, though he did not say it explicitly. Although it is risky to second-guess someone else's inner feeling, I suspect that the discomfort of advocates of determinism might be rooted in the lingering suspicion that small errors, in the midst of a successive stream of information processing steps, may eventually lead to drastic "divergence"
Bicomputing Survey II
355
of the final outcome because of error propagation and subsequent amplification along a chain of events (the most dramatic example is the Butterfly Effect, Sec. 2.4). Advocates of absolute determinism perhaps feel that the only sure way to prevent its occurrence seems to be a strict adherence to absolute determinism except perhaps at the stage of input or output, as in digital computing, lest the "slippery slope" may carry the case to the unwanted extreme. What rescues biocomputing from this perceived or imagined disaster is the nested hierarchical computing scheme (Sec. 4 of Chapter 1). A seemingly highly random opening and closing event of ion channels at the mesoscopic level re-converges to a highly deterministic event of nerve impulse generation. However, it is not that a random event miraculously reorganizes to become more orderly. Rather, the re-convergence takes place at a different hierarchical level of biocomputing. Alternation of analog and digital processing — hence alternation of weak causality and strong causality — is characteristic of biocomputing. Ruelle's concern about mixing probability with responsible decision making is relieved by this alternating scheme, and a responsible and decisive action is "shielded" from those random or partially random microscopic or mesoscopic events by virtue of the nested hierarchical organization of the biocomputing scheme. 5.10. Impossibility free will
of proving or disproving the existence of
A major difficulty in analyzing the free will problem arises from the common practice of using statistical analysis to validate a scientific investigation. Often, a scientific hypothesis is tested by means of evaluating either a time-average or an ensemble-average of a series of repeated experiments and controls. For example, nipping a coin many times yields the time-average of heads-to-tails distribution ratio, whereas flipping many coins simultaneously yields the corresponding ensemble-average. For "honest" coins and honest operators, the same ratio of 1 to 1 is expected. Yet, when a behavioral experiment, designed to address the free will problem, is subjected to statistical analysis, it becomes problematic. It is not possible to perform ensemble-averaging because the expression of free will is strongly influenced by an individual's personality: the use of a group of experimental subjects with different types of personalities is guaranteed to introduce sample heterogeneity, thus degrading the ensemble-averaging to a mere exercise in futility. Nor is it possible to perform time-averaging
356
F. T. Hong
because repeating the behavioral experiment on the same individual at a later time cannot guarantee identical experimental conditions. For example, the memory of the outcome of a previous experimental run and the accompanying hindsight will almost certainly render time-averaging of the experimental results impossible to interpret or meaningless. In view of the possible presence of prohibitive but surmountable physical or social constraints and their intricate interplays with free will, the outcome of a behavioral experiment is expected to be ambiguous and impossible to interpret. Thus, free will is so personality-dependent and history-dependent that, since there is only one life to live, the concept of probability, as applied to a free-will experiment, is inherently problematic. This is probably the reason why Schrodinger said: "[WJhich of the virtually possible events are to be called possible under the auspices of free-will? I would say, just the one that actually follows." Thus, we conclude that free will cannot be experimentally proved or disproved with the rigor required by conventional science. Unlike the compatibilists who attempted to reconcile free will with determinism, the present analysis attempts to resolve the conflict by demonstrating that biocomputing does not comply with absolute determinism, only the weaker form of determinism — relative determinism — which encompasses situations from highly random (weak causality) to highly deterministic (strong causality) but never absolute. As Ruelle575 casually alluded to in passing, the deterministic outcome carries a probability distribution (meaning a non-zero variance). He thus inadvertently implied the existence of one-to-many correspondence and, therefore, relative determinism. In my opinion, the compatibilists' argument did not succeed in resolving the conflict between free will and absolute determinism. Examples used by compatibilists inadvertently invoked relative determinism, as explained in Sec. 5.5. Relative determinism in biocomputing that is accompanied by variable and controllable randomness is fully compatible with the notion of free will. However, none of the above arguments should be construed as scientific proof of the existence of free will. 5.11. Quantum indeterminacy
at the biological level
The advent of Heisenberg's uncertainty principle and quantum mechanics initially raised some hope that quantum mechanics would be compatible with the notion of free will better than Laplace's classical determinism, and Heisenberg was once hailed as the hero that freed humans from the bondage
Bicomputing Survey II
357
of determinism. However, Schrodinger, the other cofounder of quantum mechanics, claimed that quantum indeterminacy plays no biologically relevant role (Sec. 5.4). A considerable number of investigators hold the view that the "randomness" associated with the microscopic world of atoms and electrons does not impinge directly upon the problem associated with biological information processing and suggest that we ought to look for randomness beyond the microscopic world of atoms and electrons (see also p. 236 of Ref. 186). On the other hand, Hawking thought that it is relevant (see Chapter 12 of Ref. 286). Teller658 also thought that quantum mechanics may leave room for free will to shape the future. Determinism implied in quantum mechanics deserves a special comment. The temporal mapping of wavefunctions in quantum mechanics is one-to-one, and the boundary conditions are completely fixed. The timedependent Schrodinger equation exhibits time-reversal symmetry, and, as far as the wavefunction is concerned, the control law of quantum mechanics is deterministic. However, wavefunctions are not directly observable. A deterministic wavefunction describes a probabilistic specification of position. As far as position is concerned, the temporal mapping is not one-to-one. Furthermore, the initial conditions cannot be uniquely determined because of Heisenberg's uncertainty principle. Therefore, quantum mechanics does not determine the time-evolution of the position and momentum of a particle in the absolute sense (relative determinism) (cf. Popper's view524'525). The above conclusion is contingent on the validity of the Copenhagen interpretation of the wavefunction, which is not universally accepted by all physicists (e.g., see discussion in Refs. 251 and 374, and Chapter 6 of Ref. 542). Haken explicitly acknowledged the presence of quantum fluctuations and virtually denied absolute physical determinism (see p. 21 of Ref. 275). In his book Protobiology, Matsuno explicitly stated that the law of motion of one-to-many mapping type, supplemented by intrinsically partially fixed boundary conditions, gives rise to the dynamic tolerance entailed in relative determinism.444 At the time Schrodinger made the statement claiming that quantum mechanics is deterministic at the biological level, not much was known about the role played by quantum mechanical tunneling in biology. With the advent of modern photochemistry and photobiology, a counter-example to Schrodinger's claim can be found: long-distance electron transfers between macromolecules (or within a supramolecular complex) in photosynthetic apparatus (Sec. 5.2 of Chapter 1). It is reasonably well established that electrons in aromatic molecules are delocalized and their distribution is de-
358
F. T. Hong
scribed by its quantum mechanical wavefunction. The primary reaction in a photosynthetic reaction center is a long-distance electron transfer from a chlorophyll dimer to its primary electron acceptor by means of quantum mechanical tunneling; it is governed by a probabilistic control law. Photosynthesis is a key biological process that sustains animal life by regenerating molecular oxygen. Advocates of absolute determinism must explain how absolute determinism can be strictly maintained in a human body with a probabilistic process (photosynthesis) intervening quite early in the chain of events that are vital to the existence and sustaining of Homo sapiens. We are thus led to the inevitable conclusion that quantum indeterminacy is present in some, if not all, key biological processes, if quantum mechanical interpretation of photochemistry is correct. In view of chaos theory, quantum indeterminacy, however minute it may be, can, in principle, impinge upon biological processes at the macroscopic scale. Schrodinger's claim to the contrary was understandable because nothing about chaos theory was known at his time. 5.12. Microscopic reversibility and physical
determinism
We shall now examine the notion of physical determinism and a related principle of microscopic reversibility, in the context of both classical and quantum mechanics. Like statistical mechanics, quantum mechanics utilizes probabilities to describe the microscopic world of atoms and molecules, albeit for different reasons. Statistical mechanics and quantum mechanics are now widely accepted and have become common staples of modern chemists in dealing with research problems on a routine basis. However, Einstein thought that a probabilistic description of the microscopic physical world is a contingent and tentative approach. He could not quite accept the Copenhagen interpretation of quantum mechanics.293 Einstein's concern is highlighted in his famous remark that he did not believe that God plays dice [with the universe] (p. 443 of Ref. 501). On the other hand, Hawking (cited in Ref. 428) said: "God not only plays dice. He also sometimes throws the dice where they cannot be seen." However, by 1954 Einstein appeared to have changed his mind and no longer considered the concept of determinism to be as fundamental as it was frequently held to be (footnote, p. 2 of Ref. 529). The validity of microscopic reversibility is also intimately tied to the concept of time. Strict validity of microscopic reversibility implies time symmetry at the microscopic level: there is no difference between the past
Bicomputing Survey II
359
and the future. This led Einstein to say that time is an illusion.160'480 In contrast, the second law of thermodynamics implies that time is an arrow: the future is in the direction of increasing entropy. The compatibility of microscopic reversibility and macroscopic irreversibility has long been a controversial problem (Sec. 5.13). Thus, even eminent contemporary physicists could not come to a definite consensus (see discussion in Refs. 251 and 374). To scientists who are not physicists, this question must be treated as an unsettled problem. The readers are forewarned that the following arguments represent the author's highly personal view, which is inevitably plagued (and perhaps also blessed) with ignorance. The standard literature should be consulted for the mainstream thought. Prom an epistemological point of view, I suspect that absolute determinism may simply be a mathematical idealization of the real world, very much like the way the mathematical concepts of points, lines and surfaces (interfaces) are conceptual idealizations of geometrical objects. However, the idealization is actually a reductionist's luxury or illusion. A reductionist can pick suitable problems and can choose appropriate experimental conditions to make the variance of measurements vanishingly small. Regarding this idealization, physicist David Bohm78 presented, in his book Causality and Chance in Modern Physics, a particularly relevant and illuminating argument. He considered the three-body problem of a lunar eclipse. Over moderate periods of time, the lunar eclipse is a precisely predictable event, without taking into account the perturbations caused by other planets, from ocean tides, or from still other essentially independent contingencies. However, the longer the period of time we considered in the prediction, the more these perturbations became significant, and eventually even the molecular motion of gaseous nebulae from which the Sun, the Moon and the Earth arose should be taken into account. The question about absolute physical determinism is thus transformed into the following one: Is the history of the universe uniquely determined by the initial conditions of the Big Bang (Sec. 5.17)? The prediction of a lunar eclipse represents an extreme case of classical mechanics, in which the degree of isolation from outside perturbations is high, and therefore the variance of predictions are extremely small, thus giving rise to the illusion of absolute determinism. Weather forecasting represents the other extreme, in which the variance due to chance fluctuations is so large that only short-term predictions are feasible, whereas long-term predictions are virtually impossible at present (low degree of isolation from
360
F. T. Hong
remote perturbations) (cf. Butterfly Effect, Sec. 2.4). Life processes, being a manifestation of complexity, seldom afford investigators the degree of isolation comparable to the event of a lunar eclipse. In addition, life is an open system that constantly exchanges matter and energy with the environment. However, life is not as unpredictable as weather. Even complex phenomena such as emotion and behaviors have their neurophysiological and genetic basis, which underlies the governing control laws.292'157'410'618 Disciplines such as psychology and psychiatry owe their existence to reasonably strong causality entailed in the generation of emotion and behaviors, normal or pathological (phenomenological control laws). Unlike inanimate objects, these control laws are much more sophisticated and complex. It is of interest to examine the widely accepted physical concept of microscopic reversibility in the light of Bohm's argument outlined above. I suspect that microscopic reversibility is just an excellent approximation but an approximation nonetheless. If so, it is inappropriate to draw conclusions by invoking microscopic reversibility whenever its strict validity is required (such as the free will problem). The presence of dispersion, no matter how small, will ruin the validity of one-to-one correspondence, with regard to the position and the momentum, between two different time points, thus invalidating the time-reversal argument being invoked to establish the conflict between free will and determinism. Furthermore, those who advocate strict microscopic reversibility are required to identify the spatial scale where (microscopic) reversibility meets (macroscopic) irreversibility: a point of abrupt transition or discontinuity. Apparently, microscopic reversibility is not applicable on the mesoscopic scale where entropic changes are not considered exceptional. For example, the Mi to Mi transition of bacteriorhodopsin (a membrane-bound protein with 248 amino acid residues) is accompanied by a large entropic change.684 Therefore, the discontinuity must appear below the mesoscopic scale. But where and how? Schulman599 regarded the boundary between the microscopic and the macroscopic scale as one of the greatest mysteries. Apparently, he was not satisfied with the kind of conventional explanation offered in physics textbooks. If, however, we regard microscopic reversibility as a mathematical idealization and approximation, it becomes easy to address the problem regarding the point of transition from (apparent) reversibility to irreversibility. The transition can then be viewed as the gradual breakdown of the mathematical approximation, and the point of transition depends on how much error one can tolerate and is therefore not sharply and
Bicomputing Survey II
361
uniquely defined. That the concept of microscopic reversibility is highly questionable, even on the spatial scale of a typical small organic molecule, becomes evident by considering the well-established fact regarding fluorescence of a small organic dye molecule: the emitted photon always has a lower energy (longer wavelength) than the exciting photon (stimulated emission of photons). This is because an electron excited by the incoming photon to a higher electronic orbital first settles to a lower orbital by vibronic relaxation before a photon is emitted, and the energy difference of the two orbitals accounts for the loss of energy, which is dissipated as heat (radiationless transition). Microscopic reversibility, as applied to fluorescence of organic molecules, contradicts well-established experimental observations, because time reversal of the photophysical event would exhibit an emitting photon that were more energetic than the exciting photon, thus necessitating extraction of thermal energy from the environment — a blatant violation of the second law of thermodynamics — and thus betraying the prohibited time reversal. Furthermore, if one insists that time is reversible on the microscopic scale, another difficulty arises. Matsuno444 pointed out that the one-to-one temporal mapping together with complete fixedness [fixation] of boundary conditions — i.e., absolute determinism — requires mechanical adjustments caused by local perturbations to be propagated at infinite velocities (cf. p. 34 of Ref. 183). In other words, microscopic reversibility implies that a cause is followed instantaneously by its effect with absolutely no delay. The limit imposed by the speed of light prohibits propagation of causes at an infinite speed. Since a cause and its effect cannot occur simultaneously, invoking microscopic reversibility leads to the reversal of a cause and its effect. More recently, biophysicists have begun to toil with the idea of fluctuation-driven ion transport,678'21'23 thus indirectly challenging the validity of microscopic reversibility. Non-equilibrium fluctuations can, in principle, drive vectorial ion transport along an anisotropic structure in an isothermal medium by biasing the effect of thermal noise (Brownian ratchet mechanism). The mechanism constitutes a flagrant violation of microscopic time-reversal symmetry. This apparent violation was conveniently explained away by exempting non-equilibrium cases from the requirement of microscopic reversibility. For example, biophysical experiments designed to test microscopic reversibility have adopted the criterion that a violation of detailed balance in ion transport through ion channels indicates — and is attributed to — the presence of an external energy source.649'573 In the same way, the violation of microscopic reversibility exhibited by fluores-
362
F. T. Hong
cence can also be explained away, by identifying light as an external energy source. However, this modified interpretation of the principle of microscopic reversibility is an affront to the time-reversal invariance, stipulated by Newtonian mechanics; the practice is tantamount to cutting the feet to fit the shoes. Classical mechanics makes no such exemption of non-equilibrium cases, and the principle of microscopic reversibility should also apply to non-equilibrium cases if absolute determinism is strictly valid. Interestingly, Angelopoulos and coworkers16 have experimentally shown that the dynamics of the neutral-kaon system violates the time-reversal invariance. However, these physicists did not attribute it to the presence of an external energy source. The theory of Brownian ratchets provides a new explanation of muscle contraction with regard to how a myosin globular head interacts with an adjacent actin filament. According to experimental observations made by Yanagida and coworkers,372 myosin and actin do not behave deterministically. These investigators found that the myosin globular head hopped stochastically in steps from 5.5 to 27.5 nm long. Each step was an integral multiple of 5.5 nm, which is equal to the distance separating two adjacent actin monomers in an actin filament (the polymeric form of actin). Furthermore, a step, no matter how long, corresponds to the consumption of a single ATP molecule. Myosin globular heads sometimes even jumped backward instead of forward, but mostly forward. In other words, the myosin globular head was undergoing a biased two-dimensional random walk during muscle contraction, much like the stepping motion of a drunken sailor on a slope, instead of a leveled ground. These findings are consistent with the theory of Brownian ratchets, thus lending support to microscopic irreversibility (see Yanagida's comment included in p. 64 of Ref. 22). A related observation had previously been reported by Hatori et al.:283 fluctuating movements of an actin filament both in the longitudinal and transversal directions appeared in the presence of an extremely low concentration of ATP. Without a valid concept of microscopic reversibility, the following conclusions become inevitable: a) the validity of one-to-one temporal mapping is called into serious question, b) the argument leading to absolute physical determinism is seriously undermined, and c) the perceived conflict between free will and determinism cannot be established by invoking physical determinism.
Bicomputing Survey II
363
5.13. Incompatibility of microscopic reversibility and macroscopic irreversibility The debates about physical determinism keep coming back even after classical mechanics was surpassed by quantum mechanics, as exemplified by Einstein's plight. In a symposium designed to refute pseudoscience and antiscience, physicist Jean Bricmont91 made several explicit statements: chaos does not invalidate, in the least, the classical deterministic world-view but rather strengthens it, and chaos is not related in a fundamental way to irreversibility. Bricmont further stated "when they are correctly presented, the classical views of Boltzmann perfectly account for macroscopic irreversibility on the basis of deterministic, reversible, microscopic laws." I believe that most people who are familiar with chaos would agree with Bricmont's explanation that chaos can arise under the control of a deterministic mechanical law of motion (deterministic chaos), and unpredictable events are not necessarily non-deterministic (Sec. 5.4). However, chaos can also arise under the control of a non-deterministic control law of motion. Furthermore, predictable events are not necessarily strictly deterministic. I disagree with Bricmont's view that Laplacian determinism is compatible with macroscopic irreversibility. First, let us examine Bricmont's argument (Sec. 3.2 of Ref. 91). Bricmont's explanation of irreversibility was based on two fundamental ingredients: a) the initial conditions, and b) many degrees of freedom in a macroscopic system. He pointed out that the outcome of a physical event depends not only on the underlying differential equation but also on the initial conditions. Even though the differential equation exhibits time-reversal symmetry, the initial conditions may render the solutions time-irreversible. There is some truth in this view, but an inconsistency to be revealed at a deeper level uproots the foundation of his reasoning (see later). He also reminded us that irreversible phenomena always involve a large number of particles. He further pointed out that there is a many-to-one correspondence between a large number of microscopic configurations and a state function such as density or total energy, thus giving rise to many degrees of freedom. Superficially, the "many-to-one-correspondence" argument looks similar to the one-to-many correspondence that was alluded to by Matsuno444 (Sec. 5.4). Here, it is important to realize that the many degrees of freedom stipulated in Boltzmann's theory arise from the deliberate decision to relinquish keeping track of each and every particle in the ensemble. Bricmont's oneto-many correspondence is associated with thermodynamic state functions,
364
F. T, Hong
whereas Matsuno's one-to-many correspondence is associated with individual microscopic states. Bricmont also proposed a qualitative argument to convince the readers that irreversibility always involves a macroscopic system that contains a large number of particles. Specifically, he considered the motion of a (single) billiard ball on a frictionless billiard table, and reminded us that a movie that depicts its motion, if run backwards, would appear normal and would not reveal the time reversal. Bricmont's argument was misleading for the following reason. The principle of microscopic reversibility essentially stipulates that momentum reversal of motion of a particle is indistinguishable from time reversal (see Sec. 5.3). Thus, the actual path engendered by time reversal would be indistinguishable from the actual path of a single billiard ball with its momentum reversed but without time reversal, if microscopic reversibility is strictly valid. However, the two events — time reversal and momentum reversal — are not shown side by side in Bricmont's argument. Thus, a small difference between the original path and the retraced path would not be obvious to a casual observer. Even if both events were shown, our naked eyes would not have the precision to detect a small difference between the two paths, if the passage of time is not sufficiently long. An alternative way to detect irreversibility is to consider a few billiard balls, such as two piles of billiard balls with two contrasting colors (blue and red), and to allow the billiard balls to collide with one another and with the walls of the billiard table, resulting in mixing of the two colors of balls. Now, given the same assumption of a frictionless billiard table, running the movie backwards would show spontaneous unmixing (separation) of the blue and red balls, thus betraying the time reversal. In Bricmont's words, strict reversibility means that nearby initial conditions follow nearby trajectories (p. 136 of Ref. 91). Therefore, it really was not the lack of additional degrees of freedom that suppresses the irreversibility. It was the lack of the eyes' resolution (visual acuity) or the lack of reference to a nearby companion billiard ball that conceals the irreversibility. Nevertheless, finding fault with Bricmont's argument does not automatically constitute proof of the opposite conclusion. Mackey432 considered the origin of the thermodynamic behavior captured by the second law. He analyzed two types of physical laws of motion: invertible and non-invertible. He demonstrated that the invertible microscopic physical laws are incapable of explaining the second law. What Mackey referred to as invertibility can be regarded as synonymous with the time-reversal symmetry, mentioned above. He considered three possible sources of irreversibility for invertible
Bicomputing Survey II
365
dynamics: coarse graining, noise (from external deterministic processes), and traces. He dismissed all these three processes as possible explanations of irreversibility and concluded that invertible deterministic physical laws of motion were incorrectly formulated, and suggested that something minute and experimentally undetectable was omitted. Mackey believed that the dynamics of the universe is deterministic. However, he did concede that if the dynamics of the universe is composed of both deterministic and stochastic elements, then the problem is solved. We shall argue that it is indeed the case. The deterministic element is the mean of position and momentum specified by the law of motion, which gives the law its superficially deterministic behavior and predictability. The stochastic element is the dispersion of position and momentum, which grants dynamic tolerance and irreversibility. In the subsequent discussion, we shall present an analysis based on the consideration of microscopic states. We shall conduct an intuitive Gedanken (thought) experiment which appeared as early as 1867 in a discussion between Maxwell and his friends Tait and Thomson (Lord Kelvin), according to Brush.96 The discussion will be limited to individual microscopic states of gas molecules in an isolated system: a gas container with two equal compartments of 50 ml each. Here, a microscopic state is essentially a detailed record of the exact coordinates and momenta, as functions of time, of individual molecules inside the container. In real life, gas molecules of the same kind are not individually distinguishable but we shall assume, in this Gedanken experiment, that the record can track individual molecules as if they were distinguishable. Since it is an isolated system, there is no heat exchange with the environment through the container walls. There is also no temperature change because the container is to be filled with non-reacting gases at room temperature and standard atmospheric pressure. There will be changes of entropy but this macroscopic concept is not relevant to our Gedanken experiment because we are not investigating a statistical ensemble of microscopic states. We shall demonstrate that microscopic reversibility is not fully compatible with macroscopic irreversibility, and the conventional explanation offered in most physics textbooks is unconvincing. We shall first treat the deterministic law of motion as Newtonian mechanics, and later lift this restriction so as to accommodate any unspecified deterministic law of motion. Let the initial state So (at time to) of the isolated system be so constructed that the left compartment contains 50 ml of nitrogen gas, whereas the right compartment contains 50 ml of oxygen gas. An opening between the two compartments allows gas molecules to diffuse from one compart-
366
F. T. Hong
ment to the other, back and forth. After a sufficiently long time interval At has elapsed, fairly uniform mixing of the two gases will occur (state Si). At room temperature, this time interval should be reasonably short (for example, from a few hours to a few days, depending on the size of the opening). Now, reverse the momentum of each and every gas molecule at time t\ = to+ At, and take this altered condition as the new initial condition (state S[). Let the law of motion operate for another time interval At, and another state So will be reached. If microscopic reversibility were strictly true, the position of each and every gas molecule in the state SQ would be precisely the same as in the state So, but the momentum of each and every molecule would be precisely opposite to that in the state So- In the phase space, S[ can be obtained from S\, and So from S'o, by a "reflection" with respect to the position axes, i.e., by changing the signs of the three Cartesian components of all momentum vectors but keeping exactly the same three Cartesian components of all position vectors. Thus, S'o is symmetrical to So, whereas S[ is symmetrical to Si, with respect to the position axes, and, furthermore, the correspondence is one-to-one. What happened would be a spontaneous unmixing of two gases: each and every molecule would have retraced their previous trajectories in the reverse direction. Ostensibly, this outcome contradicts Boltzmann's kinetic theory of gases, and is historically known as Loschmidt's "reversibility paradox" (p. 83 of Ref. 95) or "velocity-reversal paradox" (p. 21 of Ref. 542). When Josef Loschmidt brought this paradox to Boltzmann's attention around 1876, rumor had it that Boltzmann responded by saying "Well, you just try to reverse them!" (p. 605 of Ref. 96). Of course, no one can do that, but can we wait for the system to evolve to the point of momentum (velocity) reversal? The next question is: How long do we have to wait? For that matter, we must consider another paradox, known as the "recurrence paradox," first brought up against Boltzmann's theory by Ernst Zermelo740 (an English text was reprinted in pp. 208-217 of Ref. 94). Zermelo's argument was based on a theorem previously published by Henri Poincare, known as Poincare's recurrence theorem520 (an English text was reprinted in pp. 194-202 of Ref. 94). In the present context, the theorem means that there are infinitely many ways to set up the initial configuration, specified by positions and momenta, of our Gedanken experiment so that the system will return infinitely many times, as close as one wishes, to its initial configuration, i.e., to So (not to S o ) : almost complete restoration to the initial conditions. It was estimated that times enormously great compared with
Bicomputing Survey II
367
1010 years would be needed before an appreciable separation would occur spontaneously in a 100 ml sample of two mixed gases (see p. 158 of Ref. 670). According to Bricmont,91 Boltzmann responded to the recurrence paradox by saying, "You should live that long." Since S'o is symmetrical to So, it is reasonable to assume that it takes time of about the same order of magnitude to achieve a momentum reversal. However, in a serious argument to defend the theoretical consistency between the second law and microscopic reversibility, it is a strange position to accept "approximations," instead of a mathematically rigorous derivation, as a way of explaining away a fundamental inconsistency. Poincare's recurrence theorem also states that there are infinitely many non-reversible solutions of the above problem but these solutions can be regarded as "exceptional" and may be said to have zero probability. This theorem lends credence to Bricmont's claim that an unusual initial state at the time of Big Bang could give us an irreversible universe. It is also a strange position to rely on the possible existence of this rare initial state while rejecting the probable existence of a rare recurrence just to defend the consistency between the second law and microscopic reversibility. Of course, this strange position alone did not prove him wrong. However, cosmological models of the Big Bang are in a state of rapid modifications (Sec. 5.17). Even a tentative settlement based on cosmology is unlikely in the near future. Therefore, we shall pursue the problem in other ways. As will be shown later, that the argument based on the initial state is untenable is a natural consequence of the flaw of absolute determinism. We shall defer the discussion of this flaw at the epistemological level. However, all these defenses on behalf of Boltzmann's theory — defenses that claimed no contradiction between Boltzmann's theory and Newtonian mechanics — were based on a probability argument and a practical approximation. When a success must be assured and a failure is absolutely unacceptable, it is cold comfort to be told that the chance of a failure is small but not zero; unlikely events can happen, and have happened, in a single trial run. That is the situation when judgment must be made regarding the validity of microscopic reversibility and absolute determinism. There is simply no room for a probability argument and/or a practical approximation. It is one thing to declare that there is no practical conflict, in real life, between the second law of thermodynamics and microscopic reversibility. It is another thing to say that the second law of thermodynamics is theoretically consistent with microscopic reversibility. The distinct theoretical possibility of spontaneous, complete unmixing of two different gases
368
F. T. Hong
in exactly the same amount of time taken for prior mixing, after achieving the momentum reversal (however unlikely), is still intellectually disquieting. Now, let us address several issues regarding the significance of this Gedanken experiment. First, whether we can precisely measure the position and momentum of each and every gas molecule is irrelevant to the argument. What is relevant is that absolute determinism requires that future values of positions and momenta be uniquely determined by the law of motion, and that the values corresponding to the initial state So, at time to, be mapped to the values of the state Si, at time ti, in a one-to-one correspondence. The mapping from Si, through S[ and So, back to So is also a one-to-one correspondence (microscopic reversibility). That correspondence holds for each and every gas molecule regardless of the fact that the experimenter could not keep track of the precise position and momentum of each and every molecule at all times. The same conclusion applies to any law of motion of the one-toone mapping type, and is independent of the present uncertainty about the interpretation of quantum mechanics. Second, whether we can stop the second part of the Gedanken experiment (starting with S{) after a time interval of precisely At has elapsed is irrelevant. This target time point would be passed, and a brief but sufficiently long moment, around the target time, would allow for, at least, partial unmixing of the two different gases to be detected before the two gases became thoroughly re-mixed again. Third, let us consider the question: Is the state S[ such a theoretically rare occurrence that the possibility that it may arise in real life without a "divine" intervention can be practically ignored? Like his predecessors, Bricmont argued that it is: the calculated Poincare recurrence time exceeds the age of the universe if the number of gas molecules is sufficiently large. However, as we shall see later, the estimated Poincare recurrence time may not cover the whole story. We shall set this issue aside for the time being and shall not let the lingering doubt cloud our judgment in the following discussion. As indicated above, the primed state So and S[ are symmetrical to the unprimed state So and Si, respectively, with respect to the position axes. Thus, an unprimed state and its corresponding primed state form a conjugate pair of "momentum-mirror images" — for lack of a better term — in the phase space. Here, So and Si exist in the Gedanken experiment since So is set up by the experimenter, whereas Si is derived from the timeevolution of So- In contrast, So and S{ are theoretical constructs. Instead of
Bicomputing Survey II
369
asking how long it takes for the state to go from Si to SJ spontaneously, let us ask a slightly different question: Are these theoretical constructs So and S{ much less probable to exist in reality than the corresponding unprimed states So and Si, as implied by Bricmont's argument? The primed states differ from the unprimed states only in the reversal of their momenta. Their existence is, therefore, not prohibited by Newtonian mechanics. Newtonian mechanics stipulates that So and Si are equally probable, since one can be derived from the other either by forward time-evolution or time reversal. So are SQ and S[ for the same reason. However, are the primed states and the corresponding unprimed (conjugate) states equally probable, or, as Bricmont's argument implied, the primed states are much less probable than the unprimed states? We shall prove that all four states, So, Si, So and S[, are equally probable by means of deductio ad absurdum. For the sake of argument, we can tentatively assume that the primed states So and S[ are less probable than the corresponding unprimed state 50 and Si. Recall that the primed state S[ is not prohibited by Newtonian mechanics, in spite of the present tentative assumption that it is less likely to occur in reality than Si. We simply re-run the Gedanken experiment as many times as necessary so as to wait for the turn of SJ to show up as the starting state of one of the re-runs. (For reason to be presented below, we avoid the move of just going ahead and taking S{ as the starting state at t = t0 of a single re-run of the Gedanken experiment.) Once S[ shows up as a starting state in the multiple re-runs, we mark the time as to and continue the Gedanken experiment. At time t = to + At, the state SQ is reached. Reverse the momentum to get the conjugate state So, run the experiment for another time interval At, and eventually the state 51 is reached. Now invoke the beginning (tentative) assumption so as to conclude that the states So and Si — the new mental constructs — are less probable to exist in reality than the primed states S'o and S[. The contradiction thus invalidates the beginning assumption. In other words, by the symmetry argument, the unprimed and the primed states are shown to be equally probable to exist in reality if the control law stems from Newtonian mechanics. For an unspecified deterministic law, it is also true if the primed states are not prohibited. There is no known physical law that prohibited the primed states. Fourth, Bricmont's argument implied that So is extremely rare, but we have just shown that So is as probable as So- We are thus prompted to ask the following question: Is So also extremely rare? Surprisingly or not so surprisingly, the answer is affirmative. The Gedanken experiment
370
F. T. Hong
requires the initial state So to satisfy certain macroscopic conditions: 50 ml of nitrogen gas at the left and 50 ml of oxygen gas at the right, both at room temperature and standard atmospheric pressure. A microscopic state that satisfies this condition can assume any possible configuration of positions and momenta of its constituent molecules, as long as oxygen molecules are kept in the left and nitrogen molecules are kept in the right compartment at t = t0. Although it is by no means difficult to set up a starting state that satisfies this requirement, it is virtually impossible to set up the starting state with a preconceived configuration of positions and momenta of individual gas molecules because the experimenter has no control over them. Essentially, the experimenter must pick the initial state So arbitrarily from infinitely many qualified microscopic states, and will be stuck with whichever one that actually comes up at the moment of setting up So- Thus, the starting state So is rare and improbable to exist by virtue of the above probability argument. In other words, if we were to specify, in advance, a particular state with a preconceived configuration of positions and momenta as the starting state at t = t0, the probability of setting up, in a single run, the starting state exactly as specified is infinitesimal though not exactly zero (see below). Thus, if we are to repeat the same Gedanken experiment many times over, it is extremely unlikely to duplicate the same state So exactly in all subsequent runs. The peculiarity regarding the probability of occurrence of a particular microscopic state stems from two seemingly incompatible but coexisting features in classical mechanics: a) a continuous distribution of physical parameters of the boundary conditions, and b) zero dispersion of these parameters, as predicted by the absolutely deterministic physical law. The combined effect of these two features demands that a particular occurrence occupies only an infinitesimal range in the continuous distribution. As a consequence, the probability of a discrete occurrence among infinitely many possible ones with a continuous distribution is always infinitesimal. This is symptomatic of mixing discrete events with a continuous distribution — a practice that has a suspicious ring of mathematical idealization. Thus, the proper way to define the likelihood of a specific occurrence of a discrete event with a continuous distribution is to define, instead, a probability density , i.e., the probability of a discrete occurrence, which is, by definition, confined to within an infinitesimal interval on the scale of the continuous distribution. A bona fide probability for occurrences within a finite (nonzero) interval can thus be obtained by integrating — by means of integral calculus — the probability density over the finite range being
Bicomputing Survey II
371
considered. That is the reason why the Poincare's recurrence theorem was framed as "... return ... as close as one wishes to its initial position" rather than as "... revisit ... exactly its initial position" (p. 194 of Ref. 94). Therefore, if we require, in our Gedanken experiment, the state exhibiting appreciable spontaneous unmixing of the two gases to be very close — within a specified range — to So in the phase space rather than exactly there, we can integrate the probability density over this specified range to obtain a nonzero probability; the greater the range of tolerable deviations from SQ the higher the probability. Note that this probability pertains to almost exact restoration of positions as well as almost exact momentum reversal of the initial conditions, if the specified range is sufficiently small. Fifth, let us get back to the undisputed conclusion that each run of the Gedanken experiment almost always has a different initial condition. A natural question to ask is: Why do we have to set up only a single starting state and follow the time-evolution of the same starting state So all the way through until it returns very closely to its conjugate "momentummirror image" state 5 0 , if our objective is check the consistency between the second law and microscopic reversibility? The answer is obviously no. Consider the purpose of invoking Poincare's recurrence theorem. We must ask ourselves: What is our objective? If the objective is to demonstrate that microscopic reversibility predicts that an isolated system will eventually revisit a previous initial condition, then by all means get as close as possible to that condition. On the other hand, if the objective is to check whether and when a predicted spontaneous separation of a previously mixed sample of two gases will ever occur, or to debunk the principle of microscopic reversibility, then the detailed requirement of the target state of spontaneous separation can be somewhat relaxed, without compromising our objective, in exchange for a shorter, more realistically achievable waiting (recurrence) time. Let us consider the following provisions to relax the requirements. First, the specification regarding momenta can be abandoned: spontaneous unmixing without the requirement of momentum reversal or restoration of momenta. We can then integrate over the range of all possible momenta in the calculation of the corresponding probability. The calculated probability will be increased by the additional degrees of freedom so gained. Accordingly, the waiting time for spontaneous unmixing will be shortened considerably. Second, since molecules of the same kind are indistinguishable, it is not necessary to require the same molecule to get back to where
372
F. T. Hong
it was at t = to; any other molecule can take its place. The probability can be further increased by virtue of the permutation of a large number of constituent molecules, and the corresponding waiting time can be further shortened accordingly. In fact, it is not even necessary to require a molecule to be in a previously filled position, occupied by another molecule in the momentum-mirror image conjugate state. The positional degrees of freedom can be further increased by imposing a minimum requirement of having oxygen molecules going to the left compartment and nitrogen molecules going to the right (never mind whether the position was previously taken by another molecule or not). Third, there is no need to require an almost full and complete spontaneous separation of the two previously mixed gases. A spontaneous partial separation would raise the specter of macroscopic reversibility, even though, strictly speaking, such a partial separation cannot be regarded as a recurrence or reversal of individual molecular trajectories. All these provisions can bolster the probability considerably by integrating the probability density over a wider and wider range in the phase space, thus shortening the waiting time for a spontaneous partial unmixing of the two previously mixed gases to occur. Here, we are looking for a "miracle" that is heretofore unobserved. By relaxing the requirement, we essentially settle for a "lower-degree" miracle, i.e., a half-way decent spontaneous unmixing instead of a full-fledged spontaneous unmixing. In this way, we can avoid being misled by the unrealistically long waiting time. If we can establish that this lower-degree miracle is not forthcoming within a reasonable waiting time, we can begin to cast a serious doubt about Bricmont's claim that the second law is indeed consistent with microscopic reversibility and even to suspect that microscopic reversibility may not be consistent with physical reality. Note that the first and second provisions essentially pool the data of an infinite number of separate runs of the Gedanken experiment, whereas the third provision allows for larger deviations from the idealized target configuration of complete restoration of positions and momenta or complete restoration of positions but complete reversal of momenta. Although this qualitative Gedanken experiment does not provide a hard number in terms of probability or waiting time, I suspect that the arguments based on the Poincare recurrence time might have grossly overestimated the time that it takes for a spontaneous partial separation of two previously mixes gases to occur under Newtonian mechanics. Sixth, a discrepancy remains to be reconciled. On the one hand, spontaneous momentum reversal may take an unrealistically long time to occur,
Bicomputing Survey II
373
in view of the probability argument. On the other hand, the symmetry argument establishes that the primed and the unprimed states are equally probable. The unprimed states So and Si definitely take place in our life time by virtue of the experimental construction; the primed state So and S{ should also be just as likely to take place in our life time by virtue of the symmetry argument. This discrepancy between the two arguments seems to be rooted in our uncritical inference that if something actually happens it is not improbable to exist in reality by default. However, this now-questionable "inference" is actually quite logical in view of absolute determinism that is historically associated with Newtonian mechanics: if something does happen, it has been destined to happen even before it happens. As we shall see next, this contradiction seems inevitable. In the above discussion in terms of probability density, the state So that eventually happens was found to be improbable to happen before happening. At issue here is the conditional probability for the occurrence of a discrete event with a continuous distribution. Before the initial state is set up, a mortal human being has no clues as to which particular microscopic state is actually going to materialize. The best bet is to assume that all microscopic states are equally probable since they are indistinguishable anyway. Note that this is the very same assumption which Boltzmann used to construct his theory of statistical mechanics. After the state So materializes, the conditional probability jumps to exactly unity; it becomes absolute certainty because the event is discrete with no dispersion whatsoever. This is a logical consequence of deterministic physics: deterministic physics, in contrast to quantum mechanics, assigns a continuous probability distribution to discrete events, thus necessitating the use of probability density instead of just plain probability. However, the very fact that the conditional probability can assume two drastically extreme values, infinitesimal and unity, on the probability scale (which ranges from 0 to 1), before and after the occurrence, respectively, tacitly breaks the time-reversal symmetry at the microscopic level, contrary to the basic tenet of deterministic physics. On the other hand, as just mentioned, absolute determinism demands that the (conditional) probability of occurrence of an event be the same, before and after its occurrence because events are pre-destined. Thus, mixing probability arguments with deterministic physics inherently leads to a contradiction. The contradiction reflects an irreconcilable clash between the deterministic view and the probabilistic view. This latter comment applies to any unspecified deterministic physical law. Here, we merely reiterate what Prigogine has been preaching all along. As we shall see, contradictions are not merely
374
F. T. Hong
limited to the ontological level but are also spread to the epistemological level. Last but not least, let us turn the table and play the devil's advocate. Is it possible that there is an inherent asymmetry, between the unprimed states and the corresponding primed states, that is not included in Newtonian mechanics other than the above-mentioned change of the conditional probability before and after the occurrence of a discrete event? Indeed, there may just be one, as pointed out by Bricmont: the initial state So, in which the two gases were confined to two separate compartments, was deliberately set up by the experimenter, whereas the state So of spontaneous unmixing of the two gases could not be directly set up the experimenter. Bricmont used this asymmetry to explain why So is probable but So is improbable to exist. Let us see how a personal intervention could cast different fates to a pair of conjugate "momentum-mirror image" states, which are symmetric as far as Newtonian mechanics is concerned. To uphold Bricmont's argument, it is necessary to demonstrate that, for every unprimed state, the corresponding primed state cannot exist in principle or, at least, much less probable to exist than the corresponding primed state. However, these excluded primed states cannot be specified ahead of time. The only valid specification is to set up and name an initial state, of which the representative is So. The law of motion then gives us a stream of ensuing states, Si, S2, S3, ..., etc., corresponding to successive times, t\, £2, £3, •••, etc., respectively, as the trajectory sweeps through, the phase space deterministically. As a consequence, the corresponding conjugate states, S[, S'2, S3, ..., etc., must then be deterministically rendered improbable to exist, one by one (otherwise So could evolve from any of them). It seems a strange coincidence that the corresponding primed states could be selectively and conveniently prohibited only after the onset of the Gedanken experiment. It is as if the experimenter could exert a downward causation on external inanimate matters, i.e., using the mind to affect the external physical world. However, the idea of downward causation is totally alien to the thinking of advocates of absolute determinism. Serious considerations of downward causation are usually restricted to life forms (Sec. 6.13). Even the most staunch, daring vitalists would find it "spooky" to have minds controlling inanimate matters by means of an action at a distance. Prom the above discussion, what makes the unprimed states so special as opposed to the primed states (but is not dictated or explained by Newtonian mechanics) is of course the intervention of the experimenter, as
Bicomputing Survey II
375
pointed out by Bricmont: the special state So was willfully set up by the experimenter, whereas the state S'Q exists only in the imagination of someone who performs the Gedanken experiment. However, Bricmont's argument is flawed at an even more fundamental level (epistemological level). Actually, it would be impossible for Bricmont to maintain an overall consistency in his argument against microscopic irreversibility because his argument was based on concurrent validity of two incompatible (mutually exclusive) views: willful decision making by the experimenter to set up the special state So and microscopic reversibility. Prom the foregoing discussion, it is obvious that the existence of free will and the validity of microscopic reversibility/absolute determinism are mutually exclusive. In other words, the validity of Bricmont's argument depends on the validity of what his very argument was supposed to disprove, albeit indirectly. Obviously, he could not have it both ways. The logical inconsistency of Bricmont's argument is akin to the claim of Epimenides the Cretan: all Cretans are liars (p. 58 of Ref. 50; cf. Sees. 6.10 and 6.13). This latter comment also applies to any unspecified deterministic law of motion. In conclusion, invoking the argument that the Poincare recurrence time is excessively long, as compared to a human being's life time, does not satisfy the required logical rigor to explain away the contradiction. In Popper's opinion, Boltzmann failed to defend the criticisms raised by the reversibility and the recurrence paradoxes in spite of his steadfast insistence that his theory is consistent with Newtonian mechanics. His heroic effort to derive the law of entropy increase (dS/dt > 0) from mechanical and statistical assumptions — his .//-theorem — failed completely; it ended up destroyed what he had intended to rescue. He conceded that reversibility is possible but extremely improbably in real life. Popper thought that Boltzmann must have realized this inconsistency, and his depression and suicide in 1906 might have been connected with it (p. 161 of Ref. 530). However, his statistical mechanics has survived and has been widely accepted presumably for the lack of better theories of kinetic behaviors of gases. Boltzmann would have been better off if he conceded that a new assumption beyond Newtonian mechanics had been introduced — "so what!" — and let posterity judge the validity of his assumption. Of course, hindsight is always 20/20. At the height of Newton's spectacular success, who had the courage to hint at possible incompleteness of Newtonian mechanics? Once quantum mechanics dealt a decisive blow to Newtonian mechanics, any one could pick on it without fear. However, the whole debate about the two paradoxes has often been held
376
F. T. Hong
at the ontological level. Prom the above analysis, it is evident that difficulties of the principle of microscopic reversibility also appear at the epistemological level. As we shall see, none of the aforementioned difficulties — both ontological and epistemological — persist if we treat microscopic reversibility as a good approximation: microscopic guasz-reversibility. Thus, probability enters the deliberation in statistical mechanics not merely because we are practically incapable of keeping track of the positions and momenta of each and every molecules but also because probability is an inherent nature of the law of motion. If we are willing to give up one of the most cherished tenets of "precision" physics, absolute determinism, we will see that, by invoking chaos, Boltzmann's theory can be shown to be fully compatible microscopic quasi-reversibility. 5.14. Origin of macroscopic irreversibility Let us take a moment to visualize how the mathematical idealization of microscopic quasi-reversibility leads to microscopic reversibility as an approximation. The trajectory of a single particle under the control of an absolutely deterministic law of motion is represented by a line, with no width, in a three-dimensional space. Reversal of momentum results in exact retracing of the trajectory in the reverse direction, thus ensuring strict microscopic reversibility. This is what Mackey referred to as invertible dynamics (Sec. 6.1 of Ref. 432). In contrast, the trajectory under a non-deterministic law of motion is represented not by a line, but rather by many lines emanating from the same initial point. The (mathematical) envelope of these trajectories is a cone with its apex located at the uniquely defined initial position (a mathematical point), and with its base pointing towards the future. In other words, a non-deterministic law of motion dictates that the trajectories must be confined to the interior of the trajectory cone. The thin divergent cone thus defines the dynamic tolerance, i.e., the range of dispersion tolerated by the non-deterministic law of motion. Initially, during a short time interval, the difference between this thin cone and a strict mathematical line is not apparent. However, the difference will gradually become apparent as time goes on. Furthermore, reversal of momentum does not result in exact retracing of the trajectory in the reverse direction. Since the trajectory cone continues to "fan" out, there is, at most, only one out of an infinite number of trajectories contained within the cone that can give rise to exact retracing of the original trajectory in the reverse direction. Therefore, the probability of retracing the original path
Bicomputing Survey II
377
is infinitesimal, though not exactly zero. This is what Mackey referred to as non-invertible dynamics (Sec. 6.2 of Ref. 432). A momentum reversal bends the cone by exactly 180° and folds it over itself, just like partially peeling off the rind of a banana to expose its delicious core. On the microscopic scale with a limited "viewing field," a bent trajectory cone is not readily distinguishable from a strict mathematical line being bent back on top of itself. Mathematical idealization of this microscopic guosi-reversibility thus approaches microscopic reversibility as a very good approximation. A similar representation by a thin bent cone for microscopic quasi-reversibility can be extended to the phase space, where both the position and momentum are represented by a single trajectory. Next, let us consider how chaos arises under a strictly deterministic law of motion, while using the analogy of frictionless billiard balls as a visual aid (Fig. 6). As frequently pointed out in the literature, deterministic chaos arises as a consequence of uncertainty of the initial conditions. Let us consider two nearby billiard balls, and let the separation of the two positions specify the maximum range of uncertainty in the specification of the initial position. For simplicity, we shall not include uncertainty regarding momenta, since generalization can readily be made in the phase space to include uncertainty in both positions and momenta. Thus, the two trajectory lines are initially parallel to each other, and will remain parallel after a collision with a perfectly flat edge of the billiard table. Collisions near a corner requires a special consideration. As long as the two balls bounce off the same edge, parallel trajectories will remain parallel. If they bounce off different but adjacent edges near a strictly rectangular corner, parallel trajectories will still run in parallel but in opposite directions, i.e., anti-parallel directions (Fig. 6 A). A second reflection around the same corner converts the anti-parallel trajectories back into the original parallel ones. Thus, nearby initial conditions follow nearby trajectories. There is no divergence of the trajectories of two nearby balls. However, depending on the detailed geometry of the four edges, other configurations are possible. For example, two nearby balls may bounce off opposite edges of a highly elongated billiard table, which looks more like a waveguide than a table; the two trajectories remain crossing each other, rather than assuming a parallel or anti-parallel configuration, as long as the two balls continue to bounce off the two long edges. Still, the two balls remain clustered and there is no divergence of the trajectories. If, however, the corner is not strictly rectangular, the two trajectories, after a reflection from different edges at the same corner, will no longer
378
F. T. Hong
Fig. 6. Deterministic and non-deterministic chaos. Trajectories of a billiard ball undergoing elastic collision with the edges of a billiard table are considered in four different cases. The four corners in the table are strictly rectangular in A and C, but non-rectangular in B and D. In A and B, the law of motion is absolutely deterministic. Two different initial positions, with identical initial momenta, indicate the uncertainty of the initial position. In C and D, the law of motion is relatively deterministic with a non-zero dispersion; there is no uncertainty in the initial condition (position and momentum) . The deviations of trajectories originate solely from a non-deterministic law of motion, and is confined to the interior of a "fan" (or, a "cone" in the three-dimensional space). The fan is the "envelope" of all permissible trajectories, and is delineated, in the diagram, by two trajectories with the most extreme deviations. The uncertainty of the initial position, in A and B, and the divergence of the fan, in C and D, are exaggerated so as to enhance the readability of the diagrams. Assuming comparable uniform speeds in all four cases, the positions after the passage of a fixed time interval and five consecutive reflections, are indicated by the tip of the end arrows. In A, the trajectories remain parallel most of the time but occasionally assume anti-parallel directions for a brief moment. The divergence of trajectories in C is increasing linearly with the passage of time. Although the envelope "fan" in C is occasionally shattered by reflections near a rectangular corner, the same fan is restored after two consecutive reflections. Therefore, the trajectories in A and C are non-chaotic. In contrast, the parallel or the fan-shaped envelopes in B and D, respectively, are "shattered" after two consecutive reflections at different edges around a non-rectangular corner. Chaos thus arises in B (deterministic chaos) and D (non-deterministic chaos). (Reproduced from Ref. 326 with permission; Copyright by Elsevier)
remain parallel to each other, and a subsequent reflection may take place around different corners, thus greatly diminishing the probability of restoring to the original parallel trajectories (Fig. 6B). In fact, such restoration would require a purely coincidental matching of the angles among these corners. The trajectories of two initially nearby balls will diverge exponentially
Bicomputing Survey II
379
(exponential divergence). Sooner or later, chaos ensues. However, reversal of momentum allows the two balls to retrace exactly their respective original trajectories in the reverse direction; time reversal symmetry is preserved in spite of the chaotic outcome. Therefore, chaos alone does not lead to irreversibility. What if the edges are not perfectly flat, but rather rugged like a corrugated cardboard? The condition is tantamount to having a polygon with numerous corners, many of which are non-rectangular. Therefore, collisions with these rugged edges also contribute to the emergence of chaos. The larger the dimension of the edge irregularity (as compared to the size of the ball), the sooner chaos emerges. Since the ruggedness of the container walls usually matches the dimension of colliding gas molecules, the mathematical idealization of having strictly rectangular corners with perfectly flat edges (or surfaces) is hardly applicable to a real-life gas container, and chaos thus inevitably arises under a strictly deterministic law of motion (deterministic chaos). Now, consider a law of motion that is non-deterministic but nearly deterministic (with a non-zero but small dispersion). Let us also reimpose the condition of perfectly flat edges and perfectly rectangular corners. For simplicity, let us consider only uncertainty of the law of motion but not uncertainty of the initial position. The trajectories are confined to the interior of a sharply pointed cone, as explained above. Each time upon hitting a perfectly flat edge, the trajectory cone is bent but its shape is well preserved. However, when the range of uncertainty of the trajectory cone happens to cover two different edges around the same rectangular corner, the trajectories within the cone may hit either edge. As a consequence, the cone will be temporarily split into two sub-cones going in roughly opposite directions, but will usually recombine back into the original cone after a second subsequent reflection (Fig. 6C). Again, depending on the geometry of the table, the two sub-cones may remain separated for a few subsequent reflections. In any case, the envelope cone continues to diverge at the same rate with the passage of time (linear divergence). The situation is non-chaotic. If the corners are not strictly rectangular and the trajectory cone covers both edges around the same corner, the cone will be split into two sub-cones, which are unlikely to recombine into a single one after the next reflection (Fig. 6D). Subsequent reflections around non-rectangular corners tend to shatter these sub-cones into topologically disjointed (non-contiguous) pieces (i.e., sub-sub cones, sub-sub-sub cones, etc.). Thus, the divergence is amplified exponentially, or even supra-exponentially (see later). The end result is
380
F. T. Hong
similar to what transpires in deterministic chaos. For lack of a better term, this will be referred to as non-deterministic chaos. However, reversal of momentum is extremely unlikely to result in exact retracing of the original trajectory. This is of course because the bent-over trajectory cone continues to diverge and offer an infinite number of possible paths. The path that actually materializes is extremely unlikely to retrace, in the reverse direction, the original path (prior to momentum reversal), which was also one among an infinite number of possible paths. The probability of retracing the original path is further diminished after each collision around a non-rectangular corner if the trajectory cone happens to cover two neighboring edges. This probability is drastically diminished for the same reason that is responsible for the emergence of deterministic chaos: retracing the original path after a collision with two adjacent edges around a non-rectangular corner becomes even more unlikely than a collision with a perfectly flat edge or a collision around a rectangular corner, because the possible paths tend to cover very different regions of the phase space and tend to be scattered all over it. In other words, situations that generate chaos are very effective in magnifying path uncertainty and causing irreversibility provided that the law of motion is non-deterministic. In contrast, collisions cause no such a "compounding" effect in deterministic chaos: uncertainty appears only at the very beginning, and no new path uncertainty is introduced after each collision. The above comment regarding collisions with a "corrugated" edge also applies to non-deterministic chaos, and will not be repeated here. Thus, unlike deterministic chaos, non-deterministic chaos is irreversible even on the microscopic scale. So far, we have deliberately ignored collisions between billiard balls, by limiting their number to one or two. Now, let us lift this restriction and consider the motion of a number of billiard balls under the control of a non-deterministic law of motion, but reimpose the condition of rectangular corners and perfectly flat edges. As long as collisions between balls are rare compared to collisions with the edges of the billiard tables, the situation remains non-chaotic. Thus, for a sufficiently brief time interval and a sufficiently small number of colliding balls, the trajectory cones of two initially nearby balls will diverge linearly at the same rate and will be bent roughly the same way in space. Therefore, the position of spatially contiguous balls will remain reasonably contiguous or, rather, "clustered." Likewise, the momentum of "spatially" contiguous balls will remain "spatially" contiguous in the phase space. That is, the dispersion of positions or momenta will not ruin the clustering of these parameters in the phase space.
Bicomputing Survey II
381
Reversing all of the momenta will lead to a subsequent time-evolution that is almost indistinguishable from time reversal, because continuing dispersion (de-clustering) will not significantly ruin the spatial contiguity of positions and momenta in the phase space. Figuratively, it is difficult to tell the differences between a thin divergent cone, a mathematical line, and a thin convergent (or, rather, inverted) cone. In other words, for a short time interval, de-clustering of billiard balls looks like re-clustering, unless they are color-coded and made individually distinguishable. As a consequence, the principle of microscopic reversibility will appear to be approximately valid. However, the replacement of a divergent cone with a convergent cone upon time reversal breaks the symmetry between the past and the future, in Prigogine's words.542 Thus, strictly speaking, the motion of an individual ball is not reversible, but the irreversibility is not readily detectable by inspecting the trajectory of an individual ball or just a few balls. As the number of billiard balls increases, the frequency of collisions between balls, as compared to that of collisions with the table edges, will also increase, and can no longer be ignored. Collisions between balls are always chaotic because the situation is akin to collisions with a highly "corrugated" edges. This is also true for collisions between balls under a deterministic law of motion, as vividly depicted in Fig. 38 of Prigogine and Stengers' book Order Out of Chaos543 (reproduced in Fig. 7A). A head-on collision between two balls, either of the same size or of different sizes, results in reversal of their momenta, much like a collision with a perfectly flat surface at an exactly perpendicular incidence angle, whereas a glazing collision results in a slight change of speed and a slight deviation of the direction of their respective trajectories, much like a collision with a perfectly flat surface at a very shallow incidence angle. Thus, after a collision, if the shape of the ball is perfectly spherical, parallel trajectories becomes enveloped by a divergent cone (Fig. 7A), whereas a trajectory cone becomes considerably more divergent, i.e., a thin cone may suddenly become a fat one after the collision (Fig. 7B). Consequently, the trajectories of two nearby balls do not usually remain nearby after a collision with another ball of the same or different kind, and a cluster of nearby balls becomes rapidly "de-clustered." This conclusion is also valid if some or all of the colliding balls are not perfectly spherical. As a consequence, spatially contiguous balls no longer remain approximately contiguous to one another, after a collision; this is true whether the law of motion is deterministic or non-deterministic. In the non-deterministic case, momentum reversal will no longer give rise to approximately the same outcome
382
F. T. Hong
Fig. 7. Chaos arising from collisions between molecules. A. The law of motion is absolutely deterministic, but there is uncertainty of the initial condition. The range of uncertainty is indicated by a black and a white ball, with the same initial momenta but slightly different initial positions. Deflections from slightly different contact points on a big ball lead to divergent trajectories, thus resulting in deterministic chaos after many collisions. B. The law of motion is not absolutely deterministic but has a small dispersion (relatively deterministic). The initial position and momentum are precisely defined. Initially, the dynamic tolerance of the control law allows the trajectories to be spread within the confine of a sharply pointed envelope cone. After the first collision with a big ball, the trajectories are now confined to a significantly more divergent cone. In the successive collisions, different big balls are involved, and the envelope cone is eventually "shattered," thus leading to non-deterministic chaos. (Reproduced and modified from Ref. 543 with permission; Copyright by Ilya Prigogine and Isabelle Stengers)
as the time reversal does. Irreversibility will now become more and more noticeable, and the validity of the approximation gradually breaks down. Now, let us examine Bricmont's explanation of irreversibility in terms of many degrees of freedom. Prom the above discussion, increasing the number of gas molecules in an ensemble increases the likelihood of collisions between gas molecules, thus hastening the emergence of chaos. When two different gases are allowed to mix, mixing follows closely the emergence of chaos. Mixing or unmixing of a small number of molecules undergoing Brownian motion is not apparent, since it is difficult to distinguish, by means of a simple and unsophisticated observation, the difference between unmixing and fluctuations of density. However, increasing the number of molecules (and, therefore, the degrees of freedom) facilitates differentiation between mixing and unmixing but does not guarantee the occurrence of irreversibility, since deterministic chaos can be reversed by momentum reversal. Apparently, Bricmont confused deterministic chaos with irreversibility despite his claim that chaos is not related to irreversibility. In spite of the similarity in nomenclature, non-deterministic chaos does not depend on deterministic chaos (or absolute determinism) for its va-
Bicomputing Survey II
383
lidity; chaos is not a monopoly of deterministic physics. Actually, nondeterministic chaos is more general than deterministic chaos, and includes the latter as a special case. A non-deterministic law of motion inevitably gives rise to uncertainty of future boundary (initial) conditions, and the uncertainty is greatly amplified after collisions between balls or collisions of balls with non-rectangular corners or corrugated edges, even if the initial condition has been strictly certain. Obviously, non-deterministic chaos engenders a faster trajectory divergence than deterministic chaos. As compared to deterministic chaos, non-deterministic chaos amplifies trajectory divergence supra-exponentially rather than just exponentially. Consequently, a non-deterministic law of motion is more robust in bringing about chaos than a deterministic one, and appears to be a requirement for generating irreversibility. Thus, the Poincare recurrence time, which was defined for classical mechanics, becomes vastly prolonged if it is modified for the non-deterministic case. It is important to recognize that chaos alone is insufficient to generate macroscopic irreversibility in view of the requirement of one-to-one correspondence of strict microscopic reversibility. Thus, contrary to Bricmont's claim, chaos does not strengthen the deterministic world-view, and strict microscopic reversibility cannot give rise to macroscopic irreversibility but microscopic quasi-reversibility can. Although Boltzmann's theory did not challenge classical mechanics, its validity does not depend on strict microscopic reversibility. Refusal to subscribe to a deterministic law of motion results in no loss in the content of Boltzmann's theory, but, instead, eliminates the contradiction with reality. The objection raised by Boltzmann's detractors becomes irrelevant. As a bonus, the time-reversal illusion also vanishes. Sanity of our world-view is restored. Actually, the time-reversal symmetry has been broken even at the microscopic level, but the symmetry breaking becomes conspicuous only at the macroscopic level. Furthermore, the time symmetry has been broken at the trajectory level of an individual particle (individual description), but it becomes patently obvious only at the ensemble level of many particles (statistical description), in Prigogine's terminology. Thus, a legitimate physical law of motion must be non-deterministic. A strictly deterministic law of motion inevitably leads to contradictions with physical reality. A small but non-zero dispersion rescues it from these contradictions without ruining the predictability of the mean. Thus, the combined effect of microscopic irreversibility (intrinsic irreversibility due to dispersion of the control law) and chaos can account
384
F. T. Hong
for macroscopic irreversibility, as was first demonstrated by Prigogine and coworkers both analytically and in terms of computer simulations.543'492'542 Bricmont's argument was intended to refute Prigogine's idea. Ironically, we have demonstrated that Bricmont's argument, if enunciated logically, actually supports Prigogine's idea. Thus, the above chaos argument does not contradict Boltzmann's theory but rather strengthens it. In the above discussion, a potential source of confusion is the meaning of the word "prediction." A predictable outcome need not be strictly deterministic. A well-defined control law that exhibits relative determinism satisfactorily predicts the outcome, in the conventional sense. Thus, Newtonian classical mechanics predicts the mean values of position and momentum fairly accurately (the dispersion is easily masked by the imprecision of measurements or observations), thus giving rise to the impression or illusion that the law is strictly deterministic. However, Newton's law of motion makes no mention whatsoever of the variance or dispersion. Likewise, the Schrodinger equation predicts the wavefunction deterministically, but not the position and momentum. The position is described by the probabilistic density function that is obtained by multiplying the wavefunction and its complex conjugate; the certainty of specifying momentum is limited by Heisenberg's uncertainty principle. Thus, quantum mechanics explicitly acknowledges the dispersion, but classical mechanics has been misconstrued as deterministic in the absence of any mention or discussion of the dispersion. The notion of time reversal or time invariance specifically refers to the mean value of the position and momentum in classical mechanics, and the wavefunction (as a mathematical entity) in quantum mechanics. In other words, the temporal mapping of the mean position and momentum in classical mechanics and of the wavefunction in quantum mechanics is strictly a one-to-one correspondence. Therefore, these laws of motion are time-reversible, i.e., the transformation, v —> —v, generates the same outcome as time reversal, t —• —t, does. In contrast, the dispersion in both classical mechanics and quantum mechanics is not time-reversible, because time reversal of a divergent cone produces a convergent cone (or, rather, an inverted cone with its tip pointing in the direction of the future), thus ruining the time-reversal symmetry. In most discussions of microscopic reversibility using the billiard-ball metaphor, attention is usually paid to the predictability of the mean position and momentum to the neglect of possible dispersions. As a consequence, the microscopic event is misconstrued as reversible.
Bicomputing Survey II
385
The constraint imposed by false dichotomies, such as strict determinism or complete randomness, strict reversibility or irreversibility, B. F. Skinner's concern regarding behaviorism and determinism, etc., reflects what Francis Bacon meant by "the ill and unfit choice of words wonderfully obstructs the understanding" ("prisoner of words" phenomenon, see Sec. 4.7). Subsequently, one must then pay the price of trying hard to awkwardly explain away the apparent paradox of generating macroscopic irreversibility from microscopic reversibility; skeptics remain so far unconvinced. The apparent time-reversal of the equation of motion is thus just an illusion made possible by neglecting the accompanying dispersion. In other words, Einstein's remark that "time is an illusion" may be by itself an illusion, which Einstein seemed to have retracted towards the end of his career (see discussion in Ref. 542; cf. Ref. 599). This kind of illusion is more likely to happen with well-defined control laws than with stochastic control laws (see Feynman's discussion of the radioactive beta decay, p. 110 of Ref. 202). By explicitly addressing the predictability of mean values and the unpredictability of dispersions, and by explicitly distinguishing uncertainty in the initial conditions from uncertainty in the control laws, clarity of the thinking process can be maintained throughout the discussion and confusion can be avoided. So far, the above discussion implies that chaos appears only on the macroscopic scale. It is not really so. Interests in nanotechnology has brought investigators' attention to chaos on the scale where quantum mechanics is the relevant law of motion.270'532 Investigations of nanotransistors often require modeling with "particles in a box" with stationary or vibrating boundaries; both quantum and semiquantum chaos have been found on the atomic scale. Park and coworkers 502 studied unimolecular transistor using moving bucky balls (buckminsterfullerene, Ceo; see also Sec. 7.8) to trap electrons, and have observed semiquantum chaos. Diggins and coworkers176 have also found semiquantum chaos in their study of a superconducting quantum-interference device (SQUID). With the benefit of hindsight, it is clear why predictions of an eclipse of the Sun or the Moon are more reliable than long-term weather forecasting. In the Sun-Moon-Earth three-body problem, collisions between these three celestial bodies never happened so far, and collisions of these celestial bodies with asteroids hardly caused any significant change of momenta due to size disparity. The situation is non-chaotic. Owing to the small number of planets, it is feasible to make corrections due to perturbations from other planets. In contrast, weather changes are primarily motion and collisions of nitrogen, oxygen and water molecules, and chaos is the predominant
386
F. T. Hong
feature mainly because the irregularity of the surfaces being bombarded with these gas molecules matches the molecular dimension of these gases, as explained above. Weather forecasting is not just a formidable manymany-body problem; chaos renders it intractable for two reasons. The first reason is well known: inability to ascertain the initial condition to a high degree of accuracy in computer simulations. The second reason is the nondeterministic nature of the law of motion: the time-evolution of the weather condition itself is not strictly deterministic. In retrospect, had Prigogine not emphatically and categorically abandoned the notion of trajectories, his idea of microscopic irreversibility would have been more palatable for Bricmont and other detractors to "swallow." The detractors obviously have found that the notion of trajectories is a valid one in macroscopic systems that are in relatively good isolation (cf. Bohm's analysis, Sec. 5.12). On the other hand, the notion of trajectories of gas molecules in long-term weather forecasting is not useful, if not meaningless. Again, a dynamic gray scale of determinism is sorely needed to accommodate peculiar situations near both extremes as well as situations in between. Bricmont admitted that he could not prove [absolute] determinism, but he also hastened to point out that the opponents could not prove indeterminism either. Indeed, Laplace's determinism can neither be proved nor disproved, as will be demonstrated, by logical reasoning, in Sec. 5.16. 5.15. Enigmas of alternativism, intelligibility and origination Now, let us consider the question, raised by Ruelle, whether partial randomness is compatible with a responsible decision made by an individual (Sec. 5.5). This concern was shared by many others, including Walter who discuss the issue under the designation of intelligibility (Sec. 5.2). However, a discussion of the problem of intelligibility is incomplete without considering, at the same time, the issues of alternativism and origination. Walter pointed out that both determinism and indeterminism pose a problem for intelligibility (p. 191 of Ref. 698). Perlovsky pointed out that free will is opposite to determinism but is also opposite to randomness or chaos (pp. 422-423 and p. 435 of Ref. 514). An action based on deterministic causation does not meet the criterion of intelligibility because it has been causally predetermined; any alleged rational deliberation is superfluous and therefore illusory. He also rejected causation by probabilistic or "undeter-
Bicomputing Survey II
387
mined" [non-deterministic] laws because no deliberation is involved (p. 70 of Ref. 698). In the context of moral responsibility, this dilemma was often referred to as Hume's Fork338 (see also Chapter 10 of Ref. 519). By "undetermined" laws, Walter probably meant laws with complete, utter randomness: at least, this was what Hume originally meant. Walter found the notion of probabilistic laws of causation disturbing, presumably because the notion of probability conjures up a scheme of arbitrariness (a concern shared by Ruelle). I believe that this is a semantic problem partly rooted in the word "randomness," which is often associated with probability. The customary usage of the word "randomness" means complete, utter randomness rather than restricted randomness, which — in our present usage — is associated with a nearly deterministic physical law with small dispersion. Walter admitted quasi chance as a plausible explanation, but he did not explicitly link it to what we meant by relative determinism or a gray scale of determinism. Instead, he proposed that the chaotic brain provides possibilities of bifurcations in decision making (chaotic alternativism vs. indeterministic alternativism; p. 186 of Ref. 698). However, if physics is absolutely deterministic, neither chaos nor occasional occurrences of singularity satisfies the requirement of alternativism (see Sees. 5.14 and 5.4, respectively). In an attempt to reconcile alternativism with intelligibility, Walter sought sanctuary in intentionality found in biology (p. 195 of Ref. 698). However, intentionality is an attribute that is intimately linked to free will. Walter merely replaced an enigmatic problem with another; the circularity is apparent. In contrast, in a regime of relative determinism, a bifurcation point exhibits a precarious equilibrium; noise, acting as the tie-breaker, pushes it one way or another (cf. Sec. 5.8). Although deterministic chaos does not lead to alternativism, non-deterministic chaos does. Microscopic reversibility and chaos do not free the world, but microscopic quasi-reversibility and chaos do. Let us consider how a decision is made by an individual. It should be pointed out that alternativism is a rather subjective notion. A genuinely arbitrary choice among several neutral options made by an individual should be regarded as a free and uncoerced act, from the first-person (introspective) perspective. However, from a third-person (external observer's) perspective, if a swift decision is not forthcoming, the hesitation will be regarded as evidence of indecisiveness rather than evidence of free will at work. Recall that free will is not the sole determinant in decision making because free will is still subject to the constraint of physical laws and social customs (Sec. 5.2). From the first-person perspective, invoking free will makes lit-
388
F. T. Hong
tie difference when the external barrier is prohibitively high or absolutely insurmountable. However, the choice so made under the circumstance appears to be quite decisive without a trace of randomness (hesitation), and is often regarded as a manifestation of free will in action, from the thirdperson perspective. Actually, free will plays a minor or even negligible role in the deliberation because an alternative is either totally unacceptable or simply does not exist. Even when such a barrier is absent, a rational decision that is made to avoid punishments or to seek rewards is actually a coerced act that is hardly free, from the first-person perspective. Yet, an external observer tends to regard the latter decision as a perfectly normal decision made under free will. Perhaps we have got used to being so coerced in daily life that it becomes hardly noticeable even from the first-person's perspective. Likewise, intelligibility is also a subjective notion. In the absence of complete knowledge, the process of decision making may not be perfectly rational. That is, the decision may not be well informed and is, therefore, hardly "intelligible" (cf. Simon's notion of bounded rationality in administrative decision making.620'625) Out of ignorance, however, the decision may be regarded as intelligible and rational, from the first-person perspective. With the benefit of additional knowledge and hindsight, the decision may turn out to be irrational or unwise, from the third-person perspective. Finally, for those who deny the existence of free will, here is the ultimate disparity. The origination of free will is an illusion from the third-person perspective. However, it is a reality from the first-person perspective; the urge and decision to convince others that free will is an illusion is an introspective testimonial of its existence. Investigators who vehemently deny the existence of free will usually admit that they do plan ahead. They often become irritated when their will power (free will) to execute a plan is questioned (personal observation). In real life as opposed to theoretical idealization, element of randomness is not necessarily incompatible with rational deliberation, because, in the context of relative determinism, it does not imply complete randomness or complete arbitrariness. Although a small error may be amplified exponentially by virtue of chaos, the nested hierarchical levels of biocomputing guard the error from spreading beyond levels (Sec. 5.9). Here, we consider a few entry points for randomness. When rational deliberation plays a major role in decision making, randomness may sneak into the process of problem solving. For example, when fuzzy logic must be invoked to arbitrate two or more conflicting constraints or conditions, limited arbitrariness is often
Bicomputing Survey II
389
unavoidable however rational the deliberation may be. Even if the decision is nearly perfectly rational, it may sometimes be made with the aid of serendipity which dawns on the individual who happens to recognize a subtle, accidental cue. Element of randomness intrudes in two ways. First, the occurrence of an accident that happens to serve as a strategic hint is not under the control of the problem solver. Second, since picture-based reasoning is often involved in the recognition process and recognition so accomplished cannot be systematically preprogrammed, randomness associated with the act of scanning a picture is inevitable. It is for this inherent randomness and unpredictability that luck is always an indispensable element in serendipitous discoveries no matter how insignificant it may be. In contrast, externally derived accidents can also act in the opposite way. As a consequence of random external interferences (distractions rather than hints), one may fail to recall a previously known decisive factor at the crucial moment of decision-making, thus inadvertently committing an error by chance. Again, in reality, intelligibility need not be maintained at all time. Exercising reasons to override an irrational temptation is a situation where an occasional partial departure from intelligibility is not entirely unexpected. As Simon pointed out, motivation and emotion are the mechanisms responsible for our allocation of attention to different tasks with varied urgencies (pp. 90-91 of Ref. 630). The individual's emotion state, which covers a continuous gray scale of intensity, may be instrumental in tipping the balance, one way or another, between two equally compelling but opposing options, or between a rational course and an irrational temptation. The obsessivecompulsive disorder is a case in point. Rational reasoning convinces the individual that the obsessive thought/compulsive act is absurd and is not in the best self-interest. Yet, the emotional state, instigated by a persistently recurrent message issued by a faulty "alarm"-sensing circuit, which comprises the orbital frontal cortex, the caudate nucleus and the thalamus, compels the individual to continue the practice of the irrational thought/act any way (see Chapter 2 of Ref. 601). According to Schwartz and Begley, it is possible for a patient to exercise will power to overcome the persisting, debilitating condition. Essentially, the part of the brain which controls the emotional feeling triggered by a false alarm and the part that underlies the conscious awareness of its absurdity are engaging in a tug-of-war — hardly a deterministic event. In brief, alternativism is not incompatible with intelligibility. Invoking free will may make a difference when the external barrier is surmountable.
390
F. T. Hong
Free will can become the sole decisive factor at an individual's indecisive moment, when the external constraints are ambivalent towards either alternatives, i.e., a bifurcation point. In this indecisive moment, the impact of free will carries the same weight as the tie-breaking vote that a U.S. vice president occasionally casts in the Senate. Yet, conventional wisdom tends to associate a strong (free) "will" with the appearance of decisiveness when free will has not actually entered the process of deliberation. Ironically, a responsible decision made after an initial hesitation may actually be the triumph of free will against all odds (external deterrents). Presumably, it was this paradoxical observation that caused the confusion that led to Ruelle's argument against indeterminism or partial indeterminism. Relative determinism poses neither problem for intelligibility nor for alternativism. In our present terminology, the dispersion exhibited by relatively deterministic control laws is referred to as dynamic tolerance (Sec. 5.4). The proverbial "cone" of trajectories, alluded to in Sec. 5.14, specifies the constraint set by physics and the range of deviation (freedom) tolerated by physics. In contrast to the occasional occurrences of bifurcations in classical deterministic physics, dynamic tolerance offers a continuous availability of bifurcations and a gray scale of alternativism, thus providing a more effective escape from the straitjacket of determinism than occasional bifurcations in strictly deterministic physics. Besides, the bifurcations made available by deterministic physics are not true bifurcations at all. The conversion of a singularity point into a genuine bifurcation point requires the aid of noise, which, by virtue of the doctrine of classical determinism, is actually generated by an undisclosed deterministic process, conveniently hidden from discussions. This use of noise of undisclosed sources to convert a singularity into a bifurcation is a subtle act of cheating. If the control law governing the noise in question were included in the discussion, alternativism offered by a singularity point would be completely demolished because the outcome would be deterministic, and the alternative would simply be just an illusion. Strictly speaking, deterministic physics is utterly incompatible with alternativism, in spite of the occasional presence of singularity in a deterministic law of motion. The above discussion may alleviate the "anxiety" of some skeptics, but others may have a persistent discomfort beyond the above reassurance. As Pinker pointed out, the pundits that shunned a biological explanation of human consciousness "found it cold comfort to be told that a man's genes (or his brain or his evolutionary history) made him 99 per cent likely to kill his landlady as opposed to 100 per cent" (p. 177 of Ref. 519). Thus, relative
Bicomputing Survey II
391
biological determinism fails to quench the discomfort regarding the erosion of moral responsibility. Pinker further pointed out that [absolute] environmental determinism — the opposite of [absolute] biological determinism — does not eliminate the concern regarding moral responsibility either. Perhaps even a half-and-half joint biological and environmental determinism may also cause the same concern. For those who insist on an omnipotent "mental power" or "mental force" so as to completely push biological and environmental influences out of the picture, they are reminded that they have long accepted, without any discomfort or concern, the inevitable limitations of human capabilities, e.g., the lack of ability to hear the ultrasound (which a bat has) and ability to identify narcotics concealed in the luggage of an airline passenger (which a sniffing dog has). On the other hand, those who expect something else in addition to [relative, not absolute] biological and environmental determinism do have a healthy dose of skepticism. The real concern is essentially the issue of origination. Let us see how machine intelligence may shed light on the issue. With the advent of agent technology, the computer programmer can delegate, to the computer (or, rather, the software agents), a significant fraction of the tasks of data gathering, problem solving and decision making (Sec. 7.4). The programmer does not micromanage the problem-solving process but only prescribes some general, global strategies. The agents are granted a considerable degree of freedom in formulating their detailed, local tactics, and are no longer completely obedient and passive slaves of the programmer, but are actually equipped with "motivation" so as to commit available resources to a relentless pursuit of their goals. These acts are not readily distinguishable from those of a genuine self-starter. Whenever the available options appear to be neutral and equally attractive after "thoughtful deliberation," a software agent simply goes ahead and makes a random (arbitrary) choice. We as well as the software programmer do not expect it to hesitate and lose the opportunity. However, most, if not all, people would not regard the computations performed by a software agent as acts of free will. The double standards applied to either case apparently stem from human's prior knowledge about the computer's construction and programming, and have little to do with the notion of arbitrariness or even the quality of the performance itself. Similar double standards were at work in judging the "creativity" exhibited by problem-solving computer programs, mentioned in Sec. 4.26. In point of fact, these programs did better than some, if not most, of our students trained to perform exclusively rule-based reasoning.
392
F. T. Hong
Arguably, the most famous problem-solving program was Deep Blue, a chess playing program designed by an IBM team of programmers, which (or, rather, who) defeated the world chess champion Garry Kasparov in 1997 (Sec. 5.18). However, in spite of their spectacular performance, we clearly know that these programs could not come into being without the intervention of its designer/programmer. Therefore, they are not true agents that command the origination of their choices or "thinking." The ability to have beliefs, desires and intentions does not reflect true consciousness. The ability to make novel discoveries does not reflect true consciousness either. As Schwartz and Begley aptly put, "[cjonsciousness is more than perceiving and knowing; it is knowing that you know (p. 26 of Ref. 601). But then, if a computer is programmed to convey its own awareness of knowing, the act would be dismissed as faking rather than a demonstration of the presence of consciousness. Again, the real issue is origination. Origination was often a central issue of debates between strong AI supporters and their detractors, although it was seldom mentioned explicitly. I suspect that some debaters might not be conscious of it. Essentially, Deep Blue was preprogrammed and controlled by a Svengali — or, more precisely, a team of Svengalis — behind the scene, even though the Svengali himself could not perform as well or not at all. With due respect, I must point out that the relationship between Deep Blue and its IBM team of programmers bears a striking resemblance to the relationship between Wolfgang Mozart and his father. Leopold Mozart, Wolfgang Amadeus' obscure composer father, was instrumental in shaping Wolfgang's education and career. According to Hildesheimer, Wolfgang himself did not have an aspiration for posthumous fame or preoccupation with eternal significance of his work. The concept of "posterity" probably seldom or never crossed his mind — he was too bound up with his work and his daily life — but his father, as the Svengali behind Wolfgang, had a clear vision and expectation for his prodigious son and had planned accordingly (p. 61 of Ref. 299). Metaphorically, Leopold Mozart was the programer who programmed his son, to be a software agent, by providing the global strategy while letting his son figure out local tactics of how to compose fine music of lasting charms, not to mention that Wolfgang got part of his genes from his father (which indirectly and partially influenced Wolfgang's performance). However, Mozart was by no means a robot under his father's full control. At least, he defied his father's wish and married Constanze Weber (p. 250 of Ref. 299). In addition, Mozart's best works were completed after his father's death. Likewise, a software agent can, in principle, also defy its programmer's wish
Bicomputing Survey II
393
and continue to perform after the death of its programmer. Thus, in order to be consistent, I must reach an absurd conclusion: Mozart had no free will because his father was his true origination — but strangely his father had free will. Thus, the issue of origination can be quite subtle: it is difficult to define the concept of "origination" clearly in words without becoming absurd, as I just did. Pardon me for the irreverent, though not irrelevant, reference to the relationship between Mozart and his father, but I hope that the point so made is clear. In summary, the real issue behind the disagreement between Simon and Wertheimer, and between strong AI protagonists and antagonists, is origination. On the one hand, humans know with reasonable certainty that the computer does not have true origination; humans still call the ultimate shot. On the other hand, we know very little about the origination of human's free will and are divided in our opinions regarding its existence. Popper527 (see also p. 227 of Ref. 186) pointed out that indeterminism is a necessary, but not a sufficient, condition to allow for the existence of free will. The issue which Popper raised is essentially the origination of free will. Eccles asserted that "physics and physiology are still not adequately developed in respect of the immense patterned complexity of neuronal operation" (p. 222 of Ref. 185). A comprehensive description of free will must also include the elucidation of anatomical localization and physiological mechanism. Crick, in his book The Astonishing Hypothesis,150 presented an interesting discussion of free will in anatomical and physiological terms. More recently, Shidara and Richmond616 reported experimental evidence indicating that the anterior cingulate cortex may be the locus of "willed" control over executive selection of the appropriate behaviors. Schwartz and Begley described, in their book The Mind and The Brain,601 experimental observations that patients of obsessive-compulsive disorders could alter physiological activities in crucial parts of their brain by exerting will power. They proposed the concept of directed mental force — essentially an agent of downward causation or final cause (Sec. 6.13) — based on quantum mechanics. The experimental evidence is compelling but the detailed interpretation and additional demystification must await further elucidation of the brain's executive control processes such as selective attention (Sec. 4.8). Schwartz and Begley suggested that the thought process of patients who suffered from obsessive-compulsive disorder might just be an ideal model to study such executive control processes. Presently, I have no choice but to suspend my judgment on the origination of free will. However, one thing seems certain from the brief history of cognitive science: a denial of our in-
394
F. T. Hong
trospective awareness may quench our anxiety temporarily but the practice prevents or delays our quest for a deeper understanding and the ultimate peace of mind. 5.16. Laplace's "hidden cause" argument In debates of physical determinism, advocates of absolute determinism often mentioned the following argument, which was originally enunciated by Laplace (Sec. 5.8). Although the control law under consideration appears to contain dispersion or appears to be probabilistic in nature, another heretofore unknown and "hidden" control law (hidden cause) may still dictate the individual occurrence of noise with absolute determinacy and, thus, may provide a fundamental explanation of the perceived noise.
Fig. 8. Flowchart explaining why Laplace's "hidden cause" argument cannot be refuted. The flowchart shows two loops in which Laplace's "hidden cause" argument can be invoked for an indefinite number of times as long as there are dispersions. Each time the outer loop is traversed, a major advance about our understanding of dispersions is made. The inner loop is traversed repeatedly until a new hidden cause is found. Exhaustion of existing dispersions is only tentative because improvements of measurement techniques may uncover new dispersions. Thus, Laplace's argument can neither be proved nor disproved. See text for further discussion. (Reproduced from Ref. 323 with permission; Copyright by Plenum Press)
It is impossible to refute Laplace's "hidden cause" argument, but there is no way to prove his argument either. An examination of the flowchart shown
Bicomputing Survey II
395
in Fig. 8 will explain why. Given the observation of a new phenomenon, the first step is to establish a viable theory or interpretation (mechanism). If any dispersion is present in the measured value of a parameter (or, rather, a physical quantity) which the proposed theory predicts, one can invoke Laplace's "hidden cause" argument to question whether the dispersion is true noise. One then engages in the search for a hidden cause that can explain the dispersion. If such a cause is not found, the old interpretation is retained but the search for the hidden cause continues. If a hidden cause is found, one must then revise the theory or interpretation to include a new explanation of the former dispersion. One then must re-examine the problem to see whether there is any residue of dispersion that remains to be accounted for. If so, one invokes Laplace's argument again. When no more dispersion is found, the search stops. However, that does not prove Laplace's claim because, if new dispersions are found after improvements of the measurement techniques, the entire process starts all over again. It appeared that Laplace had arrived at his conclusion by induction; his claim was based on numerous successful predictions of Newtonian mechanics. However, it is now widely recognized that it is not possible to prove a general proposition by means of induction (Sec. 6.1 for a discussion). Just because the hidden cause of most known dispersions was found in the past is no guarantee that the hidden cause of a newly discovered dispersion will definitely be found in the future. On the other hand, just because a hidden cause could not be found after extensive searches by many of the most competent investigators, living or dead, is no good reason why it will never be found in the future. There are two loops in the flowchart. When a hidden cause is found, a major advance in our understanding is made (outer loop). If a new hidden cause is not found, then one traverses in the inner loop for an indefinite number of times. Thus, Laplace's argument cannot be refuted because the number of times one is required to traverse the two loops is infinite. On the other hand, Laplace's argument cannot be proved either because it would require infinite many successful examples of finding a hidden cause and because the search for successful examples should continue till eternity; the possibility that new dispersions will appear in the future as a consequence of improved instrumental resolution cannot be excluded ahead of time. A scientific proposition must be falsifiable by empirical evidence (Sec. 6.1). A proposition that can neither be proved nor disproved is not a scientific problem but, instead, an epistemological problem. The choice between the two alternatives, absolute or relative determinism, thus appears to be
396
F. T. Hong
equally acceptable, but science history indicates that Western science settled for absolute determinism in physics. This is tantamount to a cognitive equivalent of spontaneous symmetry breaking (cf. Sec. 3.5). It is instructive to see how the psychology of scientists and philosophers might affect science and philosophy history by virtue of this mechanism. The physics, as practiced by Galileo and Newton, strived for truths that explain natural phenomena with well-defined natural laws, i.e., control laws that yield single-valued predictions. Any observed dispersion was then treated as the result of the random distribution of known but uncontrolled variables (which were ignored) or unknown variables (which hopefully were randomly distributed and evenly dispersed across the two sides of the mean value). Statistical analysis of ensemble-averaged or time-averaged data then took care of the dispersion problem. Judging from Laplace's writing, it appeared that the spectacular prediction of the 1759 return of Halley's Comet had such an enormous impact on Laplace's optimism about future reduction of our ignorance that he chose absolute determinism out of his free will; his choice constituted a process of spontaneous symmetry breaking in the cognitive sense. The choice of absolute determinism was subsequently reinforced by the spectacular and continuing success of physics by virtue of a positive feedback (self-reinforcing, self-perpetuating, autocatalytic) mechanism, as applied cognitively (cf. Sec. 3.5). Probabilistic control laws, such as what governs beta decay, were too rarely observed and were discovered much too late to make a significant impact on epistemology, and, therefore, failed to seriously challenge the universal validity of absolute determinism. With an enormous momentum already gathered by advocates of absolute determinism, the dissident voices such as Prigogine's were simply swamped out. Had the cognitive symmetry been broken in the other way — rejection of absolute determinism instead of acceptance — too many unknown phenomena would have been easily explained away by dismissing any dispersion simply as noise, and physics and chemistry would not have become what they are today. Apparently, this is what Earman meant by the importance of determinism in the development of modern physics, as a guiding methodological principle (p. 243 of Ref. 183). In this way, we should all be grateful to Laplace's choice. Thus, the true significance of Laplace's argument is the role it plays as the devil's advocate. It is like a carrot hung in front of a mule at a fixed but unreachable distance: in striving to reach the impossible goal the mule makes "advances" anyway. Likewise, in our collective attempt to explain
Bicomputing Survey II
397
free will in terms of physics and chemistry, we inevitably fail but we may gain additional insights into biocomputing, and make some progress. There might be another reason why Laplace chose absolute determinism. In his treatise Essai philosophique sur les probabilites,406 Laplace ex-
pressed his discomfort about leaving a process of decision making to "the blind chance of the Epicureans" and "without motives." However, Laplace's choice did not completely resolve the difficulty, for a motive that is preprogrammed before one's birth is no motive at all. This dilemma was enunciated by William James,347 and by many others. It seems that the only way out of James' dilemma is to adopt a gray scale of determinism (Sec. 3 of Chapter 1). Thus, it is no longer a pure blind chance, but, metaphorically, a restricted freedom constrained by the thin "cone" mentioned in Sec. 5.14: the "mean" of an action does not violate the deterministic law of motion, thus asserting the apparently decisive will, but the "dispersion" of an action allowable within the thin cone provides ample room for freedom (cf. Matsuno's notion of dynamic tolerance, Sec. 5.4). Laplace's concern about events without a cause thus evaporates and James' dilemma also vanishes. 5.17. Physical determinism and cosmology Strictly speaking, cosmology is a topic beyond the scope of the present article. Nevertheless, the problem of biological determinism inevitably links to cosmology through physical determinism: the initial conditions at the moment of Big Bang determined, more or less, what the universe is supposed to be subsequently. Furthermore, biological evolution can be viewed as an extension of cosmic evolution. Since no other systems can be considered absolutely isolated, it is relevant to look at the problem of determinism of the universe as a whole: all of inanimate objects and life forms, taken together. In spite of a lack of expertise, personal ignorance is an insufficient reason to evade the problem. The subsequent discussion is based on two popular science books of the consummate cosmologist Stephen Hawking.286'287 As mentioned earlier, Hawking thought that quantum determinacy is relevant in biology, contrary to Schrodinger's view (Sec. 5.11). He also thought that God sometimes throws the dice where they cannot be seen, contrary to Einstein's view (Sec. 5.12). Together with Roger Penrose, Hawking was instrumental in establishing the Big Bang Hypothesis. The hypothesis was originally proposed by Karl Schwarzschild, who found that general relativity predicts a point of singularity, which was interpreted as the beginning of the expanding universe. In
398
F. T. Hong
his books, Hawking had extensive discussions about the initial conditions of the Big Bang. Hawking pointed out that physicists had traditionally concentrated on the study of physical laws governing causality (control laws in our present terminology) and had left the initial conditions to metaphysics or religion (p. 11 of Ref. 287). However, the laws that govern the initial conditions are just as important for the following reason. Many of the known physical constants are delicately and precariously poised. Had some of these constants assumed slightly different values, the universe would not be the way it is and, perhaps we would not be here to ask all these questions with regard to cosmology or free will. Thus, the investigation of the initial conditions becomes as important as the study of the control laws. Several detailed cosmological models were discussed in Ref. 287. Some of them demanded rather unique initial conditions that were thought to be prerequisites for the present universe to have evolved. Other models were less demanding, and various initial conditions were thought to be compatible with the present universe. The Big Bang theory is no longer questioned seriously, although the detail remains in a state of flux.510'686 In a stunning about-face, Hawking suggested that the classical laws of general relativity ought to breakdown at or near the singularity of the Big Bang, because the density of matter was so high that the quantum effect could not be ignored. He thought that the correct approach is an intermarriage of general relativity and quantum mechanics. Although the goal had not yet been reached, Hawking outlined what such a comprehensive theory ought to be. One of the models depicts the space-time of the universe as devoid of a boundary, like the surface of a sphere. The implication is that the universe has neither a beginning nor an ending. Hawking has previously made major contributions to the study of black holes. The no boundary model was in part based on his insight into the property of a black hole, which also has a high density of matter that demands the consideration of the quantum effect. Hawking acknowledged that he appeared to be undoing what he had helped establish in the past: the Big Bang. The highly tentative status of plausible cosmological models essentially forces non-experts to suspend judgment. Hawking viewed a cosmological theory as a model of the universe, or a restricted part of it; it exists only in our own minds and does not have any other reality (p. 10 of Ref. 287). He embraced Karl Popper's philosophy and viewed any physical theory as being always tentative (cf. Sec. 6.1). However, he is not an antirealist (cf. Sec. 6.7). He said that he is a realist; he thought that.there is a universe out there waiting to be investigated and
Bicomputing Survey II
399
understood (p. 44 of Ref. 286). He refused to be pigeonholed into any of the "isms" that inevitably carry fatal errors one way or another. His major concern is logical self-consistency. It is of interest to note that Conrad also considered general relativity and quantum mechanics in constructing the fluctuon model,138'139'140 as a physical foundation of life phenomena. Conrad dissociated his view from traditional nonlinear dynamic approaches, such as artificial life, whereas Hawking's view of the evolution of the universe and the life forms is essentially traditional.287 There appears to be no direct relation between the two lines of thought. However, Conrad's objection that linear physics is unlikely to offer explanation of highly nonlinear life phenomena should be taken seriously (Sec. 2.3). Here, I suggest that his objection may be resolved as follows. The linearity that Conrad alluded to is associated with the mean value specified by the control law, classical or quantum mechanical, but not with the variance (dispersion). It is suggested that the non-zero variance associated with the control law — dynamic tolerance — may be sufficient to provide the needed nonlinearity.
5.18. Free will and simulations
of
consciousness
The discussion thus leads to the question: Can the objective attributes of consciousness be simulated by means of digital computing? In the discussion about the concept of intelligent materials, I have emphasized the importance of control laws that link outputs to inputs in a coherent and concerted manner (Sec. 7.3 of Chapter 1). The major features of a control law governing the rudimentary intelligence in these materials are consistency, coherence and rationality. These are also the qualities that investigators use to judge the objective attributes of consciousness. In principle, it is possible to simulate many objective attributes of consciousness, if computational time and resources are not restricted. For example, emotional responses can be programmed as part of the output patterns. So does a speech output reporting the preprogrammed (faked) subjective feelings. The more attention the designer pays to the detail of a simulation, the more believable the output patterns will appear to the observer. Even if the simulation is implemented in a crude way so that some subtle discrepancies may become apparent to a keen observer, the performance may not be worse or less believable than a schizophrenic patient who exhibits grossly erratic control
400
F. T. Hong
laws such as incoherence of speech and thought, and the use of paleologic.0 The judgment of simulatability of a particular conscious attribute is influenced by the observer's own experience and expectation, and is therefore somewhat subjective. It is easier to fool a naive judge than an experienced judge. Some years ago, Alan Turing designed a test of machine intelligence — the Taring Test — in which the performance of a machine is communicated to a judge behind a curtain by means of machine-readable and machine-writable communications679 (see also Refs. 305, 604 and 126 and p. 6 of Ref. 512). A machine is considered to pass the test if the judge cannot distinguish its performance from that of a human being. Can the computer simulation of consciousness pass the classical Turing Test? Such a test would yield clear-cut results when machine intelligence was at its infancy and simulations were sufficiently crude so that the dichotomy between "real" and "fake" was obvious to an average observer. Once machine intelligence became more sophisticated and simulations became more elaborate, the transition from a faked performance to a real one began to take on a finer "grain" of shades on the gray scale, and the threshold of distinction is no longer sharply defined but spreads out into a continuum of gradual changes. This is because consciousness is judged not by a single attribute but rather by the holistic picture comprising the sum-total of many conceivable attributes, and because different judges tend to assign different weights to various attributes. One can almost set up a gray scale of consciousness for various nonhuman animals, similar to the one used by anesthesiologists in a surgical operating room and the one set up by Baars for conscious attention (see Sec. 4.8). The same type of difficulty that has confounded some animal behavior scientists (behaviorism vs. cognitivism) will begin to plague some, if not all, judges of the Turing Test. The selection of the judge may be as important as the quality of the simulation being evaluated in determining the outcome of the Turing Test. The Turing judge will have to make subjective judgment on objective behaviors/performances; the judgment will be strongly influenced by the judge's background and experience. The example about Koko's behavior, mentioned in Sec. 4.18, is a case in point. Strong AI supporters believe that man-made machine may eventually pass the Turing Test (see the debates in Refs. 604 and 126; see also Chap°As indicated in Sec. 4.9, paleologic is an absurd kind of logic (pseudologic) which is often used by psychotic patients in reasoning without any self-awareness of the underlying incoherence and irrationality. However, the patients may appear intelligent otherwise — sometimes more intelligent than their psychiatrist.
Bicomputing Survey II
401
ter 5 of Ref. 127, Chapter 2 of Ref. 603 and Chapter 10 of Ref. 485). Take the case of the debate between Wertheimer and Simon as an example (Sec. 4.26). Wertheimer had a point because there is still a performance gap between the strongest computer problem-solving programs and the most celebrated geniuses in human history. However, Simon's programs are impressive. If some of our high-achieving students trained to practice exclusively rule-based reasoning and some of Simon's programs were subjected to a Turing Test, I suspect that the former will not, but the latter will, pass the test unless the judges had a strong background in AI and had the privilege of access to some of our biomedical students. The debate between Wertheimer and Simon can be readily resolved and their views reconciled if we adopt a gray scale of creativity (Sec. 4.16). Some AI detractors may argue that implementation of a believable simulation can become impractical because the enormous programming complexity would require time exceeding the age of the universe or require computational resources exceeding what the entire universe can offer, or both. However, that seems irrelevant and beside the point because the limitations currently perceived may change for the better with future advances made in computer and software technology. The consummate AI grandmaster Raymond Kurzweil402 pointed out that the state of the art in computer technology is anything but static; computer capabilities that are emerging today were considered impossible one or two decades ago. It is unlikely that anyone can set a valid limit on the performance of any future computers, silicon-based or carbon-based, because such limits are often based on linear extrapolations from contemporary capability. As a case in point, Kurzweil cited computer chess. In February 1996, the historical chess tournament match in Philadelphia between world chess champion Garry Kasparov and an IBM computer named Deep Blue must be a watershed for many, if not all, detractors of machine intelligence. Kasparov admitted that he narrowly escaped a defeat. It is true that Deep Blue did not think like Kasparov. Deep Blue practiced mainly rule-based reasoning. Although rule-based reasoning is adequate and well suited for chess games, an extraordinary capability of parallel processing is important; Deep Blue used 192 processors in parallel.487 Because of Deep Blue's enormous processing speed and capacity (in both the random access memory and the disk storage space), it could afford to practice exhaustive searching under suitable circumstances. This was especially advantageous for Deep Blue in the so-called endgame phase — the final phase when there are only a small number of chess pieces left. The problem of all
402
F. T. Hong
five-piece endgames has been solved. Therefore, the endgame phase favored Deep Blue when it advanced to the stage of five pieces or less. However, this phase also favors the human grand masters, at least for endgames of more than five pieces, since many endgames involve subtle moves that cannot be "understood" by even the strongest computer programs. Computers also have good open books (for the opening phase that lasts five to fifteen moves) because of the access to vast databases which include records of major chess tournaments in the past. It was the middle phase that Kasparov outplayed Deep Blue, because there were simply too many possible moves, even for Deep Blue, to resort to exhaustive searching. A combination of brute-force computing power, based on parallel processing, special purpose hardware, and a clever (heuristic) search algorithm, afforded Deep Blue the time to explore a much bigger search space with a less intelligent mode of thinking than Kasparov did.487 What counts here is that Deep Blue was able to compensate for its weakness with its formidable computing speed and stamina which Kasparov lacked. Deep Blue did get helps, in certain steps of decision making, from the IBM team of human experts. However, human participation in the games might not always be a blessing but could be a liability sometimes. In retrospect, the human team made a mistake in rejecting Kasparov's offer of a draw, otherwise Kasparov could have entered the final (sixth) game with one victory, one loss and three draws in his hands and could have lost the tournament due to extreme emotional stress. In contrast, Deep Blue was immune to an ego threat (cf. Sec. 4.21). About a year later, Deep Blue defeated Kasparov in a rematch. Thus, it is not inconceivable that many, if not all, salient attributes of consciousness can ultimately be simulated, and that fooling an average judge of the Turing Test may not be impossible. What then is the unique attribute of consciousness that probably cannot be simulated or that is impossible for humans to judge in a simulation? In my opinion, few types of simulations can match the amazement and confusion elicited by a computer program designed to simulate free will. I have pointed out the impossibility of conducting a decisive behavioral experiment to prove or disprove the existence of free will (Sec. 5.10). Therefore, the verdict passed by a Turing judge is almost guaranteed to be controversial. As pointed out by Rosen, free will, like consciousness or life, cannot be formalized; any formal model or simulation can never fully capture all of the entailments (see Sec. 6.13). However, it may be able to capture some core aspects and simulate many of the known entailments of free will with a
Bicomputing Survey II
403
formal scheme (digital computer program), so that it may even fool most, if not all, Turing judges. In view of the recent advances made in agent technology, this is not merely a speculation (Sec. 7.4). Based on agent technology, a computer program (or, rather, a software agent) can now possess a considerable degree of autonomy and can develop, from self-directed information gathering and experience, its own beliefs, desires and intentions, make well-informed decision without the programmer's or the user's direct intervention, and execute "voluntary" acts that are not irrational, thus raising the specter of possessing free will and having a concept of self. In other words, the components of alternativism and intelligibility of free will are now possible to simulate, yet the component of origination will probably remain beyond the reach of simulations (cf. Sec. 5.15). For those who believe that free will exists, the component of origination remains unsolved. Even if we eventually attain more insights into the nature of origination, the criteria of origination may be so subtle, and so spread out on a gray scale, that a consensus may be virtually impossible. Thus, no matter how well a machine simulates free will, even the free will advocates are likely to deny, by default, that a man-made machine possesses free will. Perhaps the impossibility of proving or disproving the existence of free will is the last line of defense that allows humans to practice self-denial by proclaiming that humans are still the masters after all, albeit the kind of masters that lose control of their hand-crafted monsters. 5.19. Critique of the new-mysterian view Like cosmology, consciousness is one of deepest problems which mankind has tried to grasp since the dawn of civilization. It is one of those problems that one feels like calling for suspended judgment, yet one cannot help speculate about it. While awaiting more concrete data and knowledge concerning the anatomy and physiology of the brain to emerge, computer scientists have been engaging in simulation studies under the assumption that consciousness is an emergent phenomenon. This approach has been questioned by Conrad (see Sec. 2.3), McGinn, Chalmers and many others. While Conrad looked for remedies within the realm of physics, philosophers McGinn 449 and Chalmers112 thought consciousness lies beyond the explanatory power of science. Rosen held a similar view but for a different reason (Sec. 6.13). In his book The Mysterious Flame,*49 McGinn argued that human in-
404
F. T. Hong
telligence would never unravel the mystery of the bond between the mind and the brain. Unlike mysterian views held by superstitious people in a primitive society out of sheer ignorance, McGinn's view was proposed in the backdrop of modern neuroanatomy and neurophysiology out of our collective failue to comprehend consciousness. Philosopher Owen Flanagan206 coined a new term — the new-mysterian view — for it (see also Ref. 310). It should not be construed as a derogatory term. Casual readers who may dismiss McGinn's work as "pseudoscience" or "higher superstition" are referred to pp. 69-76 of his book. Here, I will comment on some points with which I disagree. I will also expand some points with which I agree. Despite my criticism to be presented below, I regard McGinn's treatise as a piece of serious and thought-provoking work. The most problematic point in McGinn's argument was the definition of the term "understanding" in the cognitive closure thesis: consciousness cannot possibly be completely understood by humans. Just how much understanding is sufficient to constitute full understanding? As discussed in Sec. 4.10, there exists a gray scale of understanding. Usually, when more details are known about a subject or when a more general theory is formulated to explain a greater variety of different phenomena, a deeper understanding is gained. On the other hand, understanding can also shift in the regressive direction, as demonstrated by the performance of some biomedical students of the current generation who regard successful keyword-matching, in a standardized test, as understanding (see Sec. 4.22). Compared to biologists, physicists are considerably more explicit in stating the goal of their quests for understanding. One of the major pursuits is the Grand Unification Theory that will ultimately permit the derivation of all of the physical laws governing gravitation, electromagnetic force, the strong nuclear force, and the weak nuclear force, from a common reduced set of physical laws.705 Physicists recognized that gaining an understanding at one level leads to additional questions at the next level. Ultimately, one has to investigate the law governing the initial conditions of the Big Bang (Sec. 5.17). Thus, by following McGinn's line of reasoning, one can conclude that human minds are not equipped to understand the deep problems in cosmology. Yet, physicists seem never to be bothered by such a concern; they simply charge ahead. In my opinion, McGinn's predicament arose from his deep discomfort in reconciling the subjective notion of self-consciousness and the current knowledge about the brain, which is so far insufficient to provide detailed knowledge of physical correlates that parallel the subjective feelings of con-
Bicomputing Survey II
405
sciousness. McGinn seemed to imply that the mystery of consciousness is unique and is not shared by the cosmology problem. In our alternative view, the entailed mystery is no less and no more than the mystery surrounding the origin of the universe. Metaphorically, cosmology and consciousness reside at the "edge" of our knowledge. Consider the notion of space and time. Humans presumably formulated the ordinary concept of space and time by induction of examples from their daily experience. For example, every object has a boundary which separates the interior from the exterior. Extrapolation of such a notion of space to cosmology is problematic. According to the current understanding of the expanding universe, the universe is finite (cf. Ref. 426). A question follows immediately: If the universe is finite then what is the space beyond its outer boundary? A physicist's answer to this question usually invoked the notion of curved space of general relativity and suggested that the question has no meaning. Even the inner boundary of the subatomic world is a problematic notion. What is inside those subatomic particles? Is it a homogeneous continuum? That is odd in a quantized microcosmos. Is it filled with finer discrete particles? That is also odd because the notion leads to what McGinn referred to as infinite regress. Perhaps the prospect of an infinite regress is not so odd. Apparently, finding a fine structure at the next (sub-subatomic) level depends on the availability of a particle smasher with sufficiently high energy to crack open subatomic particles and humans' ability to measure and recognize these structured patterns. Presumably, there is no compelling reason to believe that the inner limit has been reached, and calling for suspended judgment thus appears to be a wise policy. Physicists usually are more concerned with the availability of a particle smasher with sufficiently high energy to crack these subatomic particles open than with the above enigmatic questions which cannot be answered in the immediate future. On the other hand, such a question about space beyond the outer or inner boundary usually does not arise in our daily experience, because the question can always be deferred to the next level of hierarchy. During the time when the Earth was considered to be flat, the question of the boundary of the Earth (the world) was problematic, as demonstrated by the figures of "angels" near the margins of an ancient map. The notion of a spherical Earth solved the problem but deferred the question of the boundary to the next levels: the solar system, our own galaxy, etc. The question cannot be deferred any more at the highest level of cosmology — the boundary of the universe.
406
F. T. Hong
The notion of time is similarly problematic. For ordinary events, there is always a beginning, and there is always other events preceding the beginning. A similar strategy of deferment can be employed to deal with the boundary of time. At the onset of the Big Bang, the deferment strategy breaks down again. If the universe had a fixed time of beginning, what happened before the Big Bang? But then what was to happen when nothing was there? Again, such a question is as "meaningless" or as enigmatic as the question of the space beyond the finite boundary of the universe, as discussed above.p Physicists seem to be quite at ease with these types of mind-boggling questions. Presumably, physicists in general are more concerned about the consistency and coherence of physical theories than comfort to the mind. The notion of time-dilatation and space-shrinking in special relativity has been generally accepted, otherwise a higher price would have to be paid: a blatant inconsistency between the constancy of the speed of light (as demonstrated unequivocally by Michelson and Morley) and the universality of physical laws in various inertial systems.q It appears that enigmatic and counter-intuitive notions often accompany the "outer boundaries" of knowledge (or, in McGinn's words, the outer edge of the sayable), and perhaps consciousness is no exception/ By the same token, the enunciation of Godel's theorem pertains to the "edge" effect of logical reasoning; it should not be construed as a summary indictment of logical systems in general, such as various branches of mathematics (see Sec. 6.13). Consciousness also has its fair share of the edge effect, as indicated by the difference in dealing with one's own consciousness (reflective consciousness of self) and others' consciousness (objective consciousness of non-self); the problem rides across the great divide of subjectivity and objectivity. As long as scientists and philosophers continue to refuse to admit subjectivity in scientific investigations, there is little hope of resolving the problem (cf. Sec. 6.13). p Perhaps the question is not as meaningless as it appeared to be at first. String theory suggests otherwise.686 What appears to be meaningless may simply be a question of which any presently conceivable answer or explanation is temporarily defying comprehension. q Max Wertheimer claimed that the Michelson-Morley experiment was instrumental in Einstein's formulation of special theory of relativity (p. 216 of Ref. 709). However, this claim contradicted Einstein's own account (see Footnote 16, p. 215 of Ref. 464). r An interesting and concise account of the quantum origin of space and time that helps remove or "smooth" the "edge effect" accompanying the concept of space and time can be found in an article by Hogan309 (see also Ref. 308).
Bicomputing Survey II
407
A dominant component of McGinn's argument is the notion of space. He was, however, remarkably silent about time; he briefly alluded to our experience of a spatio-temporal world but did not elaborate any further. Essentially, he ignored the process itself. He considered the assembly of material particles that form the brain and concluded that the brain is spatial but consciousness is not. Few would be so naive as to believe that the mere assembly of the molecules that constitute the brain, even if it could be done, would automatically allow the brain to spring to life. The mind certainly cannot be reduced to the brain, if one takes a static view. However, a dynamic view recognizes that the mind is the brain in action. Apparently, McGinn laid an extraordinary emphasis on seeing consciousness as a concrete entity or object. As he argued, since we cannot see consciousness in the same way that we see the brain, consciousness is not perceptible. As a specific example to demonstrate the point, he pointed out that a piece of brain tissue on a conveyer belt does not distinguish itself from an inanimate object or a kidney specimen. This example is somewhat misleading because an Intel Pentium II microprocessor (specifically, Slot 1 type) on a conveyer belt looks hardly different from a piece of hardware with more limited capabilities such as an expoxy-encapsulated regulated power supply for AC to DC conversion. Clearly, what distinguishes a microprocessor from a power supply is its (electrically) energized state and the process going on in it under normal operating conditions. That is not to say an operating microprocessor has consciousness. However, the similarity between the two cases is striking. Why did we not marvel at the spectacle to such an extent as to declare that, due to cognitive closure, the understanding of the "behaviors" of a microprocessor is beyond all human minds (including that of its designer)? This is because even a non-engineer is aware of the fact that at least the designer knew all the possible intricate interactions between various components of the microprocessor, and there is no mystery left, just the amazement of a non-engineer. It is of interest to note that the designer could not see the processor. However, any competent engineer can visualize the underlying process and various intervening states of action by hooking up a logic analyzer to the microprocessor. Actually, brain scientists are just beginning to do the same by means of various imaging techniques in real time. However, the activity is far from reaching the stage of completely cataloguing all of the possible interactions; the cataloguing of an open-ended process such as consciousness or biocomputing will perhaps never be complete, according to Conrad (Sec. 11 of Chapter 1).
408
F. T. Hong
The great enigma of an active brain is how the initial conditions have been achieved to begin with. For a conventional digital computer, the mere correct assembly of various parts together with the availability of software would not be sufficient to start a computer running. A sequence of bootstrapping (abbreviation: "boot" in computer terminology) is needed to start a computer. In an old-fashioned minicomputer such as PDP-8 (trademark of Digital Equipment Corporation), a small number of assembly language instruction steps must first be manually toggled, at the console, into the main random access memory (RAM). Running the short program then allows a short paper tape to be read and understood by the minicomputer. In turn, the operating system, stored at the system area of a digital tape or hard disk, is then read by the newly empowered computer and loaded into the main memory. An application program can then be loaded and run at the request of a computer operator (human user). Now the tedious process of bootstrapping in a personal computer is almost completely hidden from a casual user's view, because bootstrapping programs are now embedded in the firmware and controlled by a concise CMOS (complementary metal oxide semiconductor) setup utility program. However, one can still note that there are two kinds of booting: cold-booting from a state of power-off, and warm-booting with the power staying on. Thus, metaphorically, setting the initial conditions of the brain is tantamount to booting the brain. Interestingly, frozen bacteria can spring to life upon thawing. However, it is difficult to extrapolate the situation to a brain because of the latter's structural and functional complexities. No one has proved that it is possible to cold-boot a brain, nor has anyone proved it impossible. Cerebral resuscitation is not exactly cold-booting, but rather a kind of warm-booting. Perhaps electroconvulsive therapy in a psychiatric practice can also be regarded as warm-booting which may happen to reset the initial (boundary) conditions, but the actual mechanism of its efficacy is still unknown.203 McGinn also took issue with the limits of our sensory faculty. Quite the contrary to his claim, people who are born blind do form mental images that depict spatial relationship of objects in the outside world (p. 334 of Ref. 389). Admittedly, the mental images so formed, on the basis of other sensory (tactile, kinesthetic and auditory) modalities, by congenitally blind subjects cannot match, in quality, the mental images formed by seeing subjects. For example, the size adjustment due to distance differences (perspective effects) is missing in the mental images of congenitally blind subjects. However, humans are capable of compensating for the limitations
Bicomputing Survey II
409
of their sensory faculty. In fact, humans use scanning tunneling microscope and a host of other related instruments to visualize the spatial arrangement of atoms in a specimen (see Sec. 7.8). McGinn's argument based on the echolocation ability of a bat and that based on a human born to see only black and white were also flawed. Though humans cannot appreciate the subjective feeling of what it is like to be a bat, and what it feels like to have the sense of ultrasound, humans know considerably more about ultrasound and the mechanism of echolocation than a bat; the bat almost certainly does not understand Fourier analysis (cf. Ref. 484). As for a human who is born to see only black and white, it is still possible to appreciate the notion of color on the basis of some optimal interference experiments or even on the basis of some psychophysical experiments of a bird's color vision. McGinn must have taken the notion that seeing is believing too literally. Biophysicists have been investigating photosynthesis for quite some time solely on the basis of observable biophysical processes. When the atomic structure of the photosynthetic reaction center was eventually revealed, the "picture" previously inferred by means of biophysical studies was not radically wrong. In cosmology, black holes are accessible to study because of the gravitational pull on neighboring stars, even though a black hole gives rise to almost no observable electromagnetic radiation. The key to the resolution of McGinn's predicament about consciousness seems to lie in the consideration of underlying processes and the brain states which the processes represent. Thus, consciousness does not occupy space, but it does require the mediation of the space-occupying brain for all the relevant processes and states; the brain forms the substrate or infrastructure for all of the mental processes to take place. The high degree of complexity exhibited by the brain is a prerequisite for it to provide ample interactions so as to give rise to the attributes that are not present in a simpler structure, such as a two-neuron reflex arc. That is not to say increasing structural and functional complexity automatically guarantees the emergence of consciousness (cf. Sec. 6.13). It is perhaps as difficult to design an intelligent machine with a top-down approach as to understand a readymade intelligent machine with a rule-based thinking process. Perhaps it is not feasible for the designer of a neural network to make sense of each and every mundane step of information processing but it is not necessary for the designer to micromanage such steps. The understanding of the designer is limited to the structure of the particular neural network, and the control laws governing the underlying computational processes.
410
F. T. Hong
McGinn pointed out that consciousness is not a particularly advanced mental faculty that exists only in higher animals. Lower animals need it to evade predators. McGinn correctly implied that there are many different attributes of consciousness and lower animals do not possess all of the attributes that are present in humans (Sec. 5.1). However, it was a contradiction for him to claim that consciousness is not explained by selection pressure of evolution. Is reflective consciousness needed for survival? Obviously not. Some bacteria have survived for more than a billion years without the benefit of consciousness. Does creativity require reflective consciousness? No, not necessarily. Modern problem-solving computer programs are capable of Hcreativity without consciousness (Sec. 4.26). The pertinent question to ask is not "Is consciousness necessary?" but rather "Is consciousness evolutionarily advantageous?" In considering the question of evolutionary advantages, our attention is immediately drawn to the topic of selective attention. Selective attention allows an animal to be aroused by noise generated by approaching predators, and to discontinue routine activities in favor of strategies of fight or flight. Selective attention is a neural mechanism that is essential for the implementation of effective heuristic searching in creative problem solving. Although heuristic searching can be implemented in a rule-based computer program, selective attention is required to endow the animal with the capability of random access to a subset of the search space for the purpose of heuristic searching and the capability of shifting the search to a previously neglected subspace, again in a random-access fashion (Sec. 4.8). It is apparent that the existence of reflective consciousness is a prerequisite for the implementation of volition, motivation and, ultimately, free will. It is the presence of reflective consciousness that distinguishes free will from its faked simulation (but this statement may soon be invalidated by future advances in computer simulations research). Furthermore, consciousness also endows an individual with the sense of self. A selfless creature may seem advantageous to the society in view of the fact that many sins are rooted in human selfishness. However, a selfless machine with no internal value system is a dangerous and unpredictable creature when the designer/programmer ultimately loses control of it. As stipulated by Korner and Matsumoto's theory,381'382 the archicortex encodes the endowed initial value system, such as fear of the face of a menacing predator. The awareness of perceived danger has a survival value and is certainly favored by selection pressure. The organism becomes an active
Bicomputing Survey II
411
seeker of survival rather than being passively manipulated by evolution. Korner and Matsumoto also pointed out that such a genetically encoded system is closest to the notion of self. It is certainly not a simple avoidance behavior mediated by primitive neuronal reflexes. The emergence of consciousness requires the development of a memory system — a self-referential system that allows newly acquired information to be integrated into an existing knowledge structure, in a consistent, coherent and rational way. That consistency, coherence and rationality are the utmost important requirements can be appreciated by examining the behavior of a schizophrenic patient that is not in touch with reality. It is certainly mind-boggling to conceive and perceive the intricate interrelationship that guarantees such consistency, coherence and rationality, and it is naturally to infer a supernatural master mind behind the design or to relinquish humans' ability to comprehend it. Following the line of thinking of Korner and Matsumoto, the addition of paleocortex and neocortex allowed higher animals to refine their consciousness, thus transforming a selfish organism into one with "vision." The ability to foresee and anticipate the future thus empowers the organism to override short-term selfishness with long-term selfishness. It is not an exaggeration to regard morality as the ultimate selfishness that enhances the chance of long-term survival of the group, the species of Homo sapien, or even other species that co-inhabit this blue planet. Possession of reflective consciousness thus confers an enormous advantage to species that have it. McGinn suggested that our science is limited by our own cognitive makeups. In my opinion, our science is shaped and constrained by the epistemology that we choose. For example, human beings could have chosen a kind of circular reasoning instead of the open-ended logic of reasoning. The mind-brain problem could then be disposed of as the necessity that ensures harmony between mankind and Nature, and the case could be closed to further inquiry. Thus, the epistemological choice of Laplace's classical determinism is incompatible with the notions of free will and macroscopic irreversibility; insistence upon absolute determinism renders both free will and time illusory. Rosen569'570 claimed that the question "What is life?" is refractory to the attack of contemporary physics because of the current epistemology adopted by Western science. He proposed a new approach — relational biology — based on a new kind of epistemology (Sec. 6.13). McGinn also raised the issue of whether science is a frivolous activity of humans. It is a pertinent question in view of the success of American pragmatism. Facing dwindling resources for public financial support, fine
412
F. T. Hong
art and music have become primary targets of reduction or elimination in resource allocation, while scientific disciplines that are not immediately convertible into practical applications have become secondary targets for attrition (e.g., science history and science philosophy). Are these human endeavors frivolous activities? Modularity that is present in each and every hierarchical level of biocomputing provides some clues. Many superficially frivolous activities are actually the manifestation of fundamentally vital modules of cognitive abilities. For example, Calvin101 pointed out that dancing utilizes the same cognitive ability that enabled a prehistoric hunter to predict the trajectory of a hunting device. As analyzed in detail in Sec. 4, creative problem solving requires the concurrent uses of many different modules of cognitive abilities. The impetus to pursue science is apparently rooted in mankind's curiosity to explore the environment as well as oneself. Exploratory instinct is of course an important factor for creative problem solving, thus conferring an evolutionary advantage. Furthermore, scientific activities add an element of exploration to practical technological problem solving. The fruits of scientific research provide a short-cut to technological innovations, thus constituting a heuristic search. Just imagine the time and effort that is needed to discover, by random searching alone, a kind of high-speed miniature switch with a low rate of power consumption that is required to transform an abacus into a modern digital supercomputer! However, just like fitting a number of candidate templates to a given pattern, not all results of scientific research are immediately fit for technological applications. It is difficult, if not impossible, to predict what kind of scientific activities will eventually yield fruits of technology. Policy makers' lack of understanding of creative problem solving has led to the implementation of highly focused research policies — e.g., goal-directed or mission-oriented research — that may sometimes delay rather than speed up technological innovations. Mission-oriented research works best when all supports of infrastructure (including basic research) has become ready, but not before that. Obviously, few scientists think that science is a frivolous activity but a surprisingly number of scientists think that philosophy is. It is obvious, in the present context, that philosophy is not a frivolous vocation or endeavor. Regarding consciousness and cosmology, philosophers deal with the vital problems located at the edge of our knowledge. It is sobering to recall a remark made by David Hawkins: "Philosophy may be ignored but not escaped; and those who most ignore least escape" (quoted in p. 45 of Ref. 568). By the same token, music and art are not frivolous activities.
Bicomputing Survey II
413
Music and art activities share the same core faculty underlying human intelligence. Music and art appreciation provides an alternative to indulgence in hard drugs and other self-destructive human activities, which, ostensibly, are side effects of reflective consciousness. The above analysis demonstrates that McGinn's proof of his thesis of cognitive closure is flawed. However, it does not prove that his thesis is wrong. It is just premature to adopt the new-mysterian view and accept the cognitive closure as the final verdict. 5.20. Readiness potential and subjective feeling of volition Now, let us return to the question raised by McGinn: Is consciousness necessary? This is a serious question in light of a strange finding associated with the investigation of the "readiness potential," which is an electric potential that was found to precede a voluntary motor action. The following discussion is based on a review by Libet.420 The experimental details can be found in an earlier article by Libet and coworkers.421 The readiness potential (RP) is a slow negative shift in electrical potential that precedes a self-paced, apparently voluntary motor act by a second or more. It is recorded from the scalp and its magnitude is maximum when recorded at the vertex of the head. The exact nature of the RP is not clearly known. Its amplitude is so small that it is overshadowed by noise. However, modern digital recording techniques allow such a small signal to be "signal-averaged." Unlike noise filtering techniques that tend to distort signals, signal averaging preserves the signal per se while diminishing random noise by partial self-cancellation. However, the procedure of signal averaging requires a synchronizing trigger that allows the time zero of all repeated traces of signal recording to be aligned with one another. Libet and coworkers used the onset of the electromyogram (EMG), recorded from a selected skeletal muscle, as the synchronizing trigger. The experimental subjects were asked to perform a specified voluntary movement (either flexion of the wrist or fingers of the right hand) at the beginning of a trial session. There were no externally imposed restrictions or compulsions that directly or immediately controlled the subjects' initiation and performance of the act. The subjects felt introspectively that they were performing the act on their own initiative and that they were free to start or not to start the act as they wished. Thus, abortive trials were allowed. That is, the contraction was not a direct consequence of an external stimulus. However, indirectly, the movement was the consequence
414
F. T. Hong
of the experimenter's request. Such a movement is operationally denned as a fully endogenous self-initiated voluntary act. These authors made clear that this definition was not committed to or dependent upon any specific philosophical view of the mind-brain relationship. The experimental subjects were asked to report the moment of initial awareness of intending or wanting to move (W). The report was made in retrospect by remembering a clock position that was seen to coincide with the moment of W. An alternative way of reporting by a verbal response or pushing a button might have complicated the measurement by initiating a separate RP associated with the motor act of reporting. In order to evaluate the reliability of reporting, control experiments, in which a near-threshold skin stimulus was delivered irregularly, measured the reported time of the initial awareness of skin sensation (signal S). This time S was used to obtain the corrected W time. The analyzed experimental data indicated that the uncorrected W signal appears at approximately —200 ms, whereas the RP signal appears at approximately —550 ms. Thus, the average onset of the RPs preceded the average W by approximately 350 ms. This lead time became 400 ms after correcting the W signal with the S signal. In other words, the time sequence of appearance was: RP, W, and then the EMG signal. It is important to realize that the sequence of appearance held not just for the average values of all series but for each individual series of 40 self-initiated acts in which RP and W were recorded simultaneously. The RPs associated with self-initiated endogenous voluntary acts were designated as "Type II" RP signals. For comparison, Libet and coworkers also recorded RPs that were associated with pre-planned acts, which they labeled as "Type I" RP signals. The onset of "Type I" RPs was found to be —1050 ms relative to the EMG signal. Based on these results, Libet proposed that voluntary acts can be initiated by unconscious cerebral processes before conscious attention appears. However, Libet also proposed that conscious volitional control may operate not to initiate the volitional process but to select and control it, either by permitting or triggering the final motor outcome of the unconsciously initiated process or by vetoing the progression to actual motor activation. The work of Libet and coworkers constitutes a major attempt to interface externally observable signals of the brain state with the events that are accessible only through introspection. Bridgeman appropriately pointed out that the attempt was "nothing less than a beginning of the physiology of free will" (p. 540 of Ref. 420). Bridgeman also pointed out that the sub-
Bicomputing Survey II
415
jects' "wills" were not as free as Libet implied; the subjects had received instructions from the experimenter. This point has been discussed in Sec. 5.2 and will not be repeated here. It is not unexpected that the experimental work of Libet and coworkers, as well as their interpretation, was greeted with controversy. In spite of its potentially important implication to the mind-brain problem, the work was not mentioned in authoritative textbooks such as Principles of Neural Science.355 Walter essentially dismissed Libet's work, citing that the RP experiments were "based not only on an inadequate theory of consciousness, but also on a naive, workaday notion of mental causation" (p. 297 of Ref. 698). There are two separate issues surrounding the controversy: the reliability of the experimental results and the validity of the interpretation. The validity of experimentally timing the onset of an introspective feeling, as yet, cannot be objectively or independently verified. In addition, the nature of RP is still poorly understood. These problems were extensively discussed in the main text and the open peer commentary section of Libet's review article.420 Here, we shall treat the experimental results with the full benefit of the doubt, and focus on the second issue: the interpretation of an anomalous result. This does not mean that I advocate an uncritical acceptance of the work. More work in the same or similar direction is obviously needed to consolidate or refute the work. However, a preliminary discussion of the implication may help set the stage for additional experimental work. I just do not want to reject the work prematurely, and to ignore a potentially important topic. Taken at face value, the experimental results imply that the physical correlates of volition precede the introspective conscious awareness. This novel finding elicited diverse responses. Vanderwolf advocated the behaviorists' advice of abandoning the introspective mentalistic approach to behaviors, thus making the problem disappear (p. 555 of Ref. 420). Here, I share Libet's view: the logical impossibility of direct verification of introspective reports was an insufficient reason to avoid studying a primary phenomenological aspect of our human existence in relation to brain function. Dennett and Kinsbourne171 preferred to accept the challenge of making sense of the experimental results. We thus encounter an unsettling prospect of an effect preceding its cause. One of the escapes is to advocate Descartes' dualism, but that is contrary to the contemporary approach of finding physical correlates with mental activities. In fact, Libet was labeled, in one of the open commen-
416
F. T. Hong
taries, as a dualist, which Libet denied in his response. Taking the challenge to reconcile the apparent discrepancy means that we regard this problem as an apparent paradox in the sense that it may just be our temporary confusion. Here, it should be pointed out that taking this problem seriously is a tacit admission of accepting the notion of free will. The observed anomaly poses no conceptual difficulty if one takes the position of absolute determinism, because the so-called conscious awareness and the so-called voluntary act are both the deterministic effects of the common first cause (s), which can be construed as the initial conditions at the time of the Big Bang. At best, conscious awareness can be regarded as an epiphenomenon, and the conscious self a mere spectator. With this deterministic view, the relative timing of the conscious awareness and the motor act is inconsequential, as long as the time discrepancy between the two is not grossly noticeable (much like the situation encountered in dubbing voices to a movie after the original production). However, this view presents another apparent paradox. Velmans685 pointed out that consciousness appears to be an epiphenomenon from the third-person (objective) perspective but not from the first-person (subjective) perspective. Libet's interpretation of the RP as the unconscious decision made by the brain requires additional explanation or demystification. Kinsbourne371 interpreted the findings as evidence in support of Freud's original theory of the unconscious, and the manifestation of the brain's capability of parallel processing. Dennett and Kinsbourne171 considered two models of how consciousness treats subjective timing: a) the standard "Cartesian Theater" model of a central observer that judges all sensory modalities together, and b) the alternative "Multiple Drafts" model in which discriminations are distributed spatially in the brain and temporally in the brain processes. Dennett and Kinsbourne favored the "Multiple Draft" model, and used it to interpret the strange time sequence as the consequence of parallel distributed processing. In an open commentary published along with Dennett and Kinsbourne's article, Baars pointed out that the choice of the "Multiple Draft" model over the "point center" conception of consciousness — i.e., Cartesian Theater, or Global Theater (see Sec. 4.8) — effectively denies consciousness any integrative or executive function at all. Baars therefore emphasized that consciousness is associated with both central and distributed processes; a merger of both models is in order. Thus, multiple drafts are prepared but, eventually, only a single draft is finally submitted. The merged model reflects the modularity of biocomputing, whereas the central integration reflects the hierarchical control with numerous two-
Bicomputing Survey II
417
way regulatory processes. Of course, a peculiar lack of integration occurs in split-brain subjects whose corpus callosum has been severed644'645 (see also Chapter 2 of Ref. 646). As Dennett and Kinsbourne pointed out, failure to integrate the distributed processes leads to disorders such as multiple personalities, hemispheric neglect, etc. The merged model is also consistent with the self-referential model proposed by Korner and Matsumoto381'382 (Sec. 4.17). Recall that sensory recognition involves both top-down and bottom-up processes: the archicortex makes an initial guess, with subsequent refinements being assisted by the paleocortex and the neocortex. It is therefore quite likely that decision making can be formulated at these three different levels, whereas any ensuing conflict can be resolved subsequently. The commonly held notion of reason-overriding-emotion may be construed as a triumph of the neocortex over the archicortex, whereas hesitation and indecision may reflect an impasse to a resolution of the conflict. This interpretation is consistent with Libet's interpretation of a triggering or vetoing function of consciousness. In this regard, the conventional view of a single conscious self (at least for psychologically normal persons) may have to be expanded to include Poincare's subliminal self, which we may now associate with the self exhibited either by the archicortex, by the right hemisphere, or by processing centers other than the one that dominates at waking hours with full alertness. Here, an additional interpretation is speculated. In our discussion of creative problem solving, the unconscious prior work is associated with the parallel processes that take place in the search-and-match phase of problem solving (primary-process thinking, Sec. 4.8). It is possible that the RP reflects a similar primary process of a parallel nature. If so, then the delay of conscious awareness may be explained as follows. Since the timing criteria of the W signal were given in spoken or written words by the experimenter, the recognition (conscious awareness) of the matching between the subjective feeling and the timing criteria might also be carried out in the verbal form (silent speech). The observed time delay may be caused by (internal) verbalization — i.e., parallel-to-serial conversion — of the unconscious feeling, thus resulting in the apparent paradox (see Sec. 4.10 for actual examples illustrating such a delay). In the above discussion which summarizes and attempts to merge several different views, the unconscious detection or recognition of analog patterns is assumed to be non-recallable. Yet, during a problem-solving session, we sometimes detect the "gut feeling" of impending illumination prior to the process of dawning upon a solution. Can an unconscious recognition pro-
418
F. T. Hong
cess be brought to consciousness under a suitable condition? Or, is the "gut feeling" itself a process that precedes or accompanies silent verbalization of an unconscious process? If Libet's interpretation of conscious triggering/vetoing of a motor act is correct, why is the process not accompanied or preceded by an RP or a similar signal? While these questions and many others remain to be answered, it is not difficult to recognize that the work of Libet's group constitutes a landmark achievement in search of the profound nature of consciousness. The speculation which was just presented above is, however, plagued with a seriousflaw:the interpretation is not consistent with another controversial discovery by Libet's group, which is known as the backwards referral of the timing of a sensory experience.422 Libet and coworkers compared the timing of subjective awareness of a sensory stimulus applied directly to the primary sensory cortex at the postcentral gyrus, and another stimulus applied to skin. It is known that a train of threshold-level electric stimuli applied to the sensory cortex registers a subjective sensation after a 500 ms delay (known as neuronal adequacy). It was therefore expected that the subjective awareness of the skin stimulus would incur the same delay plus the transmission delay from the skin to the primary sensory cortex (about 15 ms). Yet, the experimental results showed that the skin experience takes place about 300 ms before the cortical stimulus. There is, however, an important difference. Although the skin stimulus generates a (primary) evoked potential that can be recorded from the cortical cells, direct cortical stimulation evokes no such potential. As a comparison, the stimulus can be applied to the ascending nerve fibers in the medical lemniscus, which is the pathway that transmits the skin stimulus to the primary sensory cortex. The resultant subjective awareness also shows a 500 ms delay, but a primary evoked potential is also detectable in the primary sensory cortex. It appears that, between the skin receptor and the primary sensory cortex, there may be two branching pathways. The skin stimulus that leads to a backwards referral of its timing may not go through the lemniscal system. Yet, the lemniscal pathway is considered to be more direct than any other known branching somatic sensory pathway. Therefore, the transmission through the lemniscal system should incur the least delay. Perhaps a heretofore unknown but slower pathway may provide a faster access to conscious awareness by bypassing the required neuronal adequacy. It is also important to realize that the relative timing involved in the backwards referral experiment was inferred by means of a masking effect (see Ref. 422 for detail). Although the masking effect appears to be more
419
Bicomputing Survey II
reliable in determining relative timing than subjective recalls of a clock position, the apparently logical interpretation may turn out to be oversimplified when more detailed sensory network information is revealed in the future. Thus, the actual situation may be more complex than what has transpired so far. The readers are referred to the debate between Libet and Churchland in Philosophy of Science.128'419 In defense of the criticism of his experimental methodology, Libet pointed out that his article had passed the stern tests of scientific quality that are normally applied by the authoritative editors of the journal Brain. However, in the arbitration of a controversial subject of scientific truths, the authority is an unreliable judge (see, for example, p. 136 of Ref. 130). Churchland could not accept Libet's data presumably on the philosophical ground, but she was unsuccessful in dismissing the experiments on technical grounds. It is a right moment to suspend our judgment. Experimental results that we cannot understand for the time being may not always turn out to be wrong. Prom the discussion of Sec. 4.13, it is risky to refute a questionable interpretation on the basis of another shaky interpretation. Leaving all alternative interpretations viable while suspending judgment for the time being is a way of expanding the search space for the ultimate solution. It is better to wait for future evidence and clarification before the verdict is hastily and prematurely rendered. 6. Digression on Philosophy and Sociology of Science In the discussion of the controversy between Koestler and Medawar, the experimental data and observations were regarded as the pattern of natural phenomena, and a physical (or mathematical) theory was treated as a template for the pattern (Sec. 4.13). Since several templates can often fit a given pattern, it is not surprising that "more than one theoretical construction can always be placed upon a given collection of data" (see p. 76 of Ref. 399). Poincare was among the first to explicitly point out this important revelation (see G. B. Halsted's remark, p. x of Ref. 521), after the scientific community had been thoroughly impressed by Newton's spectacular success and blinded by his disclaimer "I feign no hypotheses." Henceforth, scientific truths took on a new meaning. 6.1. Falsifiability and non-uniqueness of scientific
theories
If, in principle, more than one scientific theory can fit a given set of scientific observations, no theory can claim to be unique. A well-established
420
F. T. Hong
and widely accepted theory can be suddenly called into serious question, as new data subsequently emerge. Several examples come to mind easily: the replacement of Ptolemy's epicycle theory of planetary movements by Copernicus' heliocentric theory and the emergence of special relativity and quantum mechanics as Newtonian mechanics met the limits of ultra high speeds and ultra small sizes, respectively. These abandoned or superseded theories survived a long time because they were not utterly flawed. Even in hindsight, the epicycle theory was ingenious, and it differs only slightly from the heliocentric theory (Chapter 7 of Ref. 439 provides a concise tutorial on Ptolemy's theory).s Newtonian mechanics fueled several hundred years of industrial revolution and continues to enjoy its success in the macroscopic world which encounters speeds much slower than the speed of light. Thus, an abandoned theory might not be radically wrong and a superseded theory might not be totally discredited; they might just be less satisfactory or less general. Figure 9 illustrates the point. The competing theories for explaining the planet's apparent wandering ("looping") motion, as projected on the two-dimensional celestial hemisphere, can be regarded metaphorically as the ligands in Fig. 9. Protein X represents the observed astronomical data available at the time of Ptolemy. Protein Y represents the precision data collected by Danish astronomer Tycho Brahe. Ligand a represents the initial epicycle theory. The fit was not perfect (note the shorter "leg"). Therefore, Ptolemy toiled with variations of the theory and added additional epicycles and oscillations in order to enhance the fit, resulting in a refined theory represented by Ligand b. Copernicus was fortunate to have inherited Tycho Brahe's vast precision data, which Protein Y represents. Copernicus was thus able to come up with the heliocentric theory, represented by Ligand c. Both Ligand b and Ligand c fit the data of pre-Tycho Brahe-era data (Protein X) reasonably well. The new data Protein Y, by merely adding two "stops" at the outer rims of the binding pocket, imposed a greater constraint than the previous s
The main difference between Copernicus' heliocentric theory and the epicycle theory (in its original primitive form, as applied to an inner planet such as Venus) is the presence and the absence, respectively, of the Earth's orbiting motion. Here, the orbit of an inner planet corresponds to the epicycle, whereas the Sun's apparent path — the ecliptic — corresponds to the deferent on which the center of the epicycle is located. Given no absolute inertial frame of reference (as dictated by the theory of special relativity), the two theories are formally identical if the Earth's orbital motion is ignored. Ptolemy's subsequent elaboration by adding additional epicycles and oscillations was tantamount to the common modern practice of using additional adjustable parameters in order to force-fit a cherished physical model to experimental data.
Bicomputing Survey II
421
Fig. 9. Specificity metaphor of pattern recognition. Protein X is able to bind all three Hgands, which have slightly different structures, but the binding sites are similar. Ligand a has slightly less affinity than the other two ligands because of a shorter "leg." Protein Y has more stringent specificity requirements for matching than Protein X; the additional protuberances at the outer ream of the binding pocket exclude both Ligand a and Ligand 6. Regarding ligands as competing theories, Protein X reveals less data of Nature's behavioral pattern than Protein Y does. (Reproduced and modified Ref. 683 with permission; Copyright by McGraw-Hill)
data, and were instrumental in ending the 1,500 years of unchallenged reign of Ptolemy's theory: among the three choices, Ligand c best fits Protein Y. The story repeated itself, when Johannes Kepler proposed elliptical orbits for the planets to replace Copernicus' circular ones. Modern astronomy
422
F. T. Hong
has provided convincing evidence that the Sun is not stationary but actually moves with respect to the center of our own "galaxy." It is thus necessary to replace Copernicus' theory with yet another one, which describes the Earth's orbit as a spiral one. I have previously demonstrated the non-uniqueness of physical models with actual examples from membrane biophysics.316 Oreskes et al.499 subsequently demonstrated the non-uniqueness of numerical (mathematical) models. I also presented actual examples of two different physical models sharing the same mathematical model; one of the two models had to be eliminated by a self-contradiction.322'325 Undoubtedly, more such examples can be found in just about every scientific discipline. Although this pitfall seems to be widely known and has been mentioned in Carl Sagan's books of popular science, a surprising number of modern biomedical scientists are not aware of it. A reason will be speculated later. In view of the non-uniqueness of a theory, achieving experimental proof must not be regarded as a single act but rather as a continual and on-going process of eliminating the less satisfactory alternative scientific theories, as new data and/or new theories continue to emerge. This conclusion follows from a more general formulation of Popper:526 a proposition cannot be proved with absolute certainty by means of induction, but it can be falsified — i.e., disproved — by a single counter-example. For example, the Sun has been observed to rise in the East by a countless (but finite) number of observers and over thousands and thousands of years of recorded history in the past, but the induction does not prove that the Sun will always rise in the East till eternity. In reality, the Sun will die eventually according to modern cosmology. In plain English, Popper's formulation means: demonstrating the consistency of a theory with a finite amount of data or a finite number of observations does not prove a scientific theory in the absolute sense. Paraphrasing science historian Thomas Kuhn, absolute proof of a given theory cannot be accomplished by checking the theory against experimental data at a finite and limited number of "points of contact" (see p. 30 of Ref. 399). These points of contact correspond to the non-covalent bond binding sites in the metaphor of molecular recognition (Sec. 6 of Chapter 1). However, a tentative proof of a given theory can be achieved by eliminating all currently existing competing theories by the latter's inconsistencies with experimental data or observations. This view of "proof" by eliminating alternative theories with more and more accurate data is compatible with the view of the Bayesian school
Bicomputing Survey II
423
(e.g., see Refs. 58 and 350). The Bayesian school views probability not as the conventional notion of frequency of occurrences of a certain event in an infinite series of trials, but rather as the degree of belief that a certain event will occur in a single future trial. Thus, the Bayesian analysis does not cast a hypothesis as being correct or incorrect but rather in terms of its plausibility. 6.2. Rise of postmodernism The tentative nature of a physical theory or hypothesis may cast doubt on the value and usefulness of a physical theory that cannot be proved in an absolute sense. This "handicap" may have contributed to the rise of the postmodernist sociology of knowledge, and of science in particular.408 Although postmodernism has its inherent merits, it has inadvertently fostered an extreme view regarding science and knowledge. Various versions of the postmodernist view go under the designation of constructivism, social constructionism, antirealism, or antirealist epistemology. Here, we recognize that these various formulations cover such a broad range on the gray scale that total dismissal may be as problematic as total acceptance. A thorough review is beyond the scope of this article. Here, we shall base our discussion on a lucid and insightful summary by Held.294 A brief recapitulation of her summary on the distinction between realism and antirealism will set the stage. The realist doctrine claims that the knower can attain some knowledge of an independent reality — reality that is objective in the sense that it does not originate in the mind of the knower, or knowing subject. In contrast, the antirealist doctrine asserts that the knower cannot under any circumstances attain knowledge of a reality that is objective, independent of the knower, or how the world really is. The postmodernist view of science thus asserts that scientists do not discover objective reality as it is; rather, they make, invent, constitute, create, construct, or narrate, in language, their own subjective "realities." In the most radical form, scientific knowledge is regarded as a social construct that serves as the ideological tool of a particular class of scientists: Western white male scientists. Thus, all knowledge is relative and without objective validity, and so can never be true but only useful. This view has had a devastating influence on the public perception of science. The adoption of this view makes all conflicting scientific theories equally valid or equally invalid. Of course, not all postmodernists hold this extreme view. Held further analyzed the distinction between constructivism
424
F. T. Hong
and social constructionism. However, we shall not pursue this fine line of distinctions among the various versions of the postmodernist view of science. Instead, an attempt will be made to assess the postmodernist view of science in an even-handed manner. In order to do so, an insightful analysis by Gauch227 will be presented next. A more extensive and technical discussion about the subject was included in Sec. 20 of Ref. 325. 6.3. Gauch's analysis Gauch227 examined two aspects of a theoretical model: postdiction and prediction. Postdiction refers to the agreement between the model and the data on which the model is based. Prediction refers to the agreement between the model and any future data, especially data collected by radically different approaches or pertaining to radically different aspects of the model. Here, the "future" data could have been data generated in the past in other independent laboratories, or the data to be generated by a future experiment suggested by the model's predictions or designed to debunk the model. Loosely speaking, the predictive and postdictive performance of a theoretical model corresponds to its predictive and explanatory power, respectively. Most theoretical models perform well in postdiction for obvious reason. Prediction is harder than postdiction because the proponent (s) and the advocates of a theory cannot possibly foresee future data during the construction of the theory and during the theory's early life. There are two kinds of predictions, roughly classified as interpolation and extrapolation. Interpolation can be interpreted as a prediction governing data that are obtained in a similar and conventional way and are, therefore, more likely to be consistent with the model being tested. Extrapolation can be interpreted in the literal sense, or as a prediction governing future data obtained by means of a radically different experimental design. Extrapolation allows the experimenter relatively little control over the manipulation of parameters of a theoretical model. Extrapolation makes a model more vulnerable than interpolation does. It is the extrapolative prediction that offers the greatest challenges to a model, and thus carries the biggest weight in the validation of a theoretical model if successful, or in its rejection if unsuccessful. Gauch's notion of postdiction and prediction can be appreciated by referring to Fig. 9. Again, all three ligands are competing theories to explain a particular behavior of Nature. For the sake of argument, let us assume that we know ahead of time that Ligand c is the best fit for the "complete"
Bicomputing Survey II
425
pattern of Nature's behavior. Protein X is the data used to construct a theoretical model (template, or ligand). It is possible to come up with at least three competing theories o, 6, and c. Note that Protein X presents only affinity and specificity requirements of the binding pocket but no information or restrictions regarding the lengths of the "sidearms." Thus, all three ligands fit Protein X reasonably well (postdiction). However, only Ligand c predicts the length of the "sidearms" well. Note that the "backside" of the ligands is not tested in most cases of molecular recognition. A notable exception is the binding of 2,3-diphosphoglycerate to hemoglobin tetramer; the tetramer "hugs" the ligand's entire surface (Sec. 7.3 of Chapter 1). Here, Ligand b incorrectly predicts the "backside," even though it does better than Ligand a in predicting the binding pocket. Only Ligand c predicts the shape of the "backside" as well as the length of the "sidearms" correctly. Predictions of the "sidearms" or the "backside" constitute extrapolation, whereas a refined prediction inside the binding pocket is interpolation. It is thus readily seen that a theory enjoys a higher level of confidence, if it has survived challenges of the experimental data from many radically different angles rather than from the same type of experiment over and over again. 6.4. Fallibility of falsification It is evident from Gauch's analysis that the nature of experimental data invoked to support a theory thus critically affects the credibility of a scientific theory. A mere corroboration by independent laboratories is insufficient to prove a scientific theory, if the experimental verifications are of postdiction or interpolative prediction in nature. A theory can usually be treated as proved beyond reasonable doubt only after extensive testing of its extrapolative predictions, but still there is no guarantee that it will remain valid forever. On the other hand, the nature and quality of experimental data invoked to falsify a theory is also important. Falsification can sometimes be anything but straightforward. Let us consider the following common criticism of Popper's philosophy. If a scientific theory can be falsified by a single piece of counter-evidence, according to Popper, then most theories, including those that become well established subsequently, can be eliminated at an early stage of development; a preliminary theory prior to adequate "debugging" is usually full of defects (see, for example, p. 146 of Ref. 399). Here, I must point out that the validity of Popper's proposition is contingent on the absolute certainty
426
F. T. Hong
of the counter-evidence. However, counter-evidence cannot be absolutely ascertained. Prior to full or nearly full elucidation of a phenomenon under investigation, data that are misunderstood or not explainable in the framework of current knowledge could be misconstrued as counter-evidence, because the judgment of a given set of data as valid counter-evidence is a process of pattern recognition at the next nested hierarchical level, and is therefore not necessarily error-free. What constitutes counter-evidence can be a rather subjective matter, and false alarms (or, rather, examples of premature falsification) are by no means rare. Thus, with the benefit of hindsight, a piece of counter-evidence can sometimes be re-interpreted as supporting evidence instead (see, for example, Sec. 11.2 of Ref. 325). The dismissal of the right brain's role in creative problem solving is probably a case of premature falsification that has critically deflected (or even derailed) the course of inquiry about creativity (Sec. 4.15). Therefore, misunderstanding contributes both to premature acceptance of a bad theory and to premature rejection of a good theory. In other words, subjectivity cannot be completely eliminated from the judgment of acceptance or rejection of a theory. There is inevitable circularity since the validity of evidence or counter-evidence often remains uncertain without the benefit of hindsight, even if one accepts the more modest criterion of "proof beyond reasonable doubt." Regarding the circularity in ascertaining evidence, the well-known controversy between Luigi Galvani and Alessandro Volta is a case in point. During the 1780s Galvani conducted a series of investigations on "animal electricity."218 He discovered that a contact of two different metals with the muscle of a frog resulted in muscle contraction. Galvani did exhaustive control experiments. Superficially, it seemed that Galvani's experimental elimination of ordinary and atmospheric electricity as the possible alternative causes of muscle contraction was sufficient proof for animal electricity. Unbeknownst to Galvani, dissimilar metals in contact is yet another (new) way of generating electricity. Volta began experimenting in 1794 with metals alone and found that animal tissue was not needed to produce an electric current, thus leading to his invention of the Volta pile, a forerunner of modern batteries. Volta declared his successful disproof of Galvani's claim of animal electricity. However, Volta also fell into a similar pitfall. Although Volta was correct in refuting Galvani's original interpretation that overlooked the possibility of generating electricity by means of metal contacts, he was wrong in prematurely excluding the possibility of a concurrent presence of animal electricity.
Bicomputing Survey II
427
Hindsight from science history told us that the Volta potential — as it has subsequently been called — can be generated by connecting two metals with differing electron-donating (redox) potentials. The potential served as the stimulus to trigger the muscle action potential, which is a manifestation of electricity of animal origin. That is, both types of electricities exist, rather than just one type, as both Galvani and Volta had assumed. However, neither Volta nor Galvani had the privilege of this hindsight. Nor had any of their contemporaries. Thus, Volta and Galvani must make inferences from rather incomplete knowledge of electrochemistry and electrophysiology (they deserved the title of the father of electrochemistry and of electrophysiology, respectively). Galvani subsequently performed a decisive "metal-less" experiment to uphold his discovery of animal electricity, but failed to convince Volta's followers. By that time, attention had already been shifted to Volta's newly invented batteries. To add insult to injury, Galvani was stripped off his university position because of his refusal to pledge allegiance to the Cisalpine Republic established by Napoleon (Volta did swear the allegiance). Galvani died — a broken and bitter man — in 1798. 6.5. Science of conjecture It is instructive to examine how scientists at the dawn of civilization wrestled with uncertain or inaccurate evidence and attempted to make an educated guess. A comprehensive review of the subject has been presented by Franklin in his book The Science of Conjecture.212 I shall take advantage of this resource to recapitulate some of the key points, but I will restrict the discussion to the so-called "hard science" since Aristotle. Aristotle used Not-by- Chance argument — a qualitative statistical argument — in arbitrating rival theories with regard to whether the stars in their daily revolutions around the heavens move independently or whether they are all fixed to some sphere (p. 133 of Ref. 212). It is observed that those stars that move in large circles near the celestial equator take the same amount of time to rotate as those near the polestar (Polaris of the constellation Ursa minor). Aristotle concluded that it is unlikely that the stars move independently since all stars near the celestial equator must move considerably faster than those around the polestar. To do so would require too many coincidences. Essentially, Aristotle supported his subjective judgment with a primitive probability theory. With this line of reasoning, Greek astronomers routinely took averages of not-so-accurate astronomical
428
F. T. Hong
measurements to arrive at what they thought was a valid measurement. Ptolemy's practice was somewhat different from his predecessors. He allowed theory to confront the set of all observations in a more holistic fashion, instead of taking means of individual observations or trying to make the theory fit all the observations, as his predecessor Hipparchus did. He was engaged in a dangerous practice of selecting data, thus significantly increasing the influence of subjectivity. Franklin pointed out that it is a practice allowed even in modern statistics under the name of rejection of outliers, to deal with the awkward variability of real data. The latter practice is justified if the control law is well-behaved, i.e., denned by a continuous curve (function); it will be problematic if the control law happens to be probabilistic. Ptolemy also considered the vexed subject of simplicity of theories and began the discussion as to why, and to what extent, one should prefer simple theories: he considered it "a good principle to explain the phenomena by the simplest hypotheses possible." Here, we see the forerunner of what we now refer to as Ockham's razor. As Jefferys and Berger350 indicated, the underlying principle of the ostensibly subjective notion of Ockham's razor is Bayesian probability (see later). It is human's attempt at seeking a point of balance between extreme subjectivity and extreme objectivity. As demonstrated by Gauch227 and echoed in Franklin's treatise, a more complicated model can fit the data better, but fitting more noise degrades the predictive performance of the model — a well-known fact of the modern intelligent systems theory. The same principle seems to have been carried over to the level of decision making in rejecting a good theory or retaining a theory under siege, in particular, large-scale physical theories of which conflicts with some data are inevitable. It is the widely accepted Duhem thesis (p. 193 of Ref. 180; see also Chapter 5 of Ref. 237), asserting that observation confronts theory only on a holistic basis: "an experiment in physics can never condemn an isolated hypothesis but only a whole theoretical group." In other words, falsification is not conducted on the basis of isolated incidents. Again, it is human's attempt to maintain a balance between stability and flexibility in the quest of truths of scientific knowledge. This explains why major research paradigms shift in steps and a threshold exists both for the rejection of inadequate theories and for the acceptance of radically new ones, in accordance with the observation of science historian Thomas Kuhn: notion of paradigm shift399 (Sec. 4.16).
Regrettably, modern students can seldom afford the benefit of science
Bicomputing Survey II
429
history. Modern intellectuals tend to forget that there are two ways of applying probability theory in real life (see also Sec. 6.7). Most people are familiar with the conventional way of collecting a large sample and applying standard statistics methodology. The second way is akin to what the Bayesian school preaches. When a dice shows the same face five times in a row, it is likely to be "loaded," because probability theory tells us that the likelihood that it appears by pure chance is less than 1 in 7,776 (or, 65). This practice may seem a bit careless if applied to serious matters such as criminology. However, the practice of using DNA fingerprinting to identify crime suspects is actually based on the same principle. Only this time, the dice has four faces: A, C, G, T; the number of throws of the dice is the number of base pairs in a typical strand of the DNA sample. The likelihood of a spurious match by pure coincidence is now reduced to 1 in 4 n where n is the number of base pairs. Thus, if the distribution is completely random, for n = 16, the odd is 1 in 4,294,967,296, or no more than one person among all living persons in the entire world. The assumption of a completely random distribution may not hold for every region of the genome. However, for the size of DNA commonly used in criminology, the likelihood of spurious matches is reduced to such a universally acceptable level that the practice constitutes proof beyond reasonable doubt, especially when it is corroborated with other evidence as well as a confession made in a lucid and sane condition without physical or mental tortures. In contrast, an unusual pattern can sometimes be recognized during the second encounter if the first encounter has generated a sufficiently striking impression. Of course, the first occurrence is merely an incident, but the second one establishes a "recognizable" pattern. Examples? I could have used a popular excuse made by school children: "my dog ate my homework." However, I am not certain about its recognizability upon the first two occurrences since its inception. So I will mention another excuse that I have personally witnessed twice and only twice so far. I was deeply impressed with a memorable sound-bite made by a professor of internal medicine in response to a question, posed by an inquisitive pathologist during a clinicopathological conference: "That's exactly the question I wanted to ask." Several decades later, when a student responded to my inquiry by saying "That's exactly the question I wanted to ask," I had no difficulty recognizing the repeating pattern, even though a long time separated the two incidents and two different languages with dissimilar grammatical structures were used. The interpretation is of course another matter that requires ver-
430
F. T. Hong
ification. My interpretation of the semantic meaning of this remark was indirectly verified, since the student blushed upon hearing the recount of my similar past experience. Recognition upon the second exposure to a repeating pattern is encouraged or expected under certain circumstances, as exemplified by the following adage: "Fool me once shame on you; fool me twice shame on me." If cheating or deceiving is regarded as a habit (a repeating behavioral pattern), then the advice makes good cognitive sense. Recognizing a repeating pattern upon the second exposure in a timesequence of evolving events may be premature most of the time, but it suggests an effective way of quickly formulating a working hypothesis. A hypothesis can be tentatively formulated as an initial guess upon limited exposures to a potentially recognizable pattern. The hypothesis is then subjected to repeated verifications or revisions upon subsequent exposures. In this way, the hypothesis is continually evolving as incoming information continues to accumulate. The plausibility of the evolving hypothesis improves by virtue of continually integrating incoming new knowledge with accumulated prior knowledge. The drawback of misinterpreting early observations — "jumping to conclusions" — can thus be avoided or alleviated by an equally continual process of verifications or revisions of the working hypothesis. In essence, the Simontonian process alternates between the searching-and-matching phase and the verification phase in an iterative fashion (method of "successive approximations"). This is reminiscent of the brain process stipulated in the theory of Korner and Matsumoto381'382 (Sec. 4.17). I suspect that this mode of induction might be the preferred mode used by many past creative scientists, because it offers the advantage of induction through foresight. In contrast, induction by means of examining a large pool of samples or observations constitutes induction through hindsight. Of course, there is no reason that one cannot mix the two modes of induction in a judicious fashion called for by a particular problem. I further suspect that the same strategy might have been used by a potential prey (and its fellow members of the same species) for the purpose of early detection of a heretofore unknown potential predator; the penalty for not doing so might be extinction of the particular species. The taboo against the second way of using probability theory was probably established as a necessity to prevent or minimize false conclusions stemming from human's unconscious tendency to accept or remember favorable evidence but downplay or ignore unfavorable one ("confirmation bias"; see Ref. 482 and p. 116 of Ref. 481). However, the conventional way of using probability theory is not without pitfalls. Let us take an even-handed look
Bicomputing Survey II
431
at the two practices. The conventional practice examines a sufficiently large number of observations or samples so as to draw a conclusion by induction — a process akin to ensemble-averaging. The second approach examines sufficiently frequent repetitions of a pattern so as to draw a conclusion by induction — a process akin to time-averaging. The individual "trials" (samples or observations) must be independent in both cases. But independency of "trials" can only be ascertained in hindsight. Operationally, the practices require that deviations from the norm ("noise") be completely random.4 But then again, randomness can only be ascertained in hindsight: random sampling is easier said than done. In the conventional practice, it is well known that samples taken from different populations or different (spatial) regions may exhibit sample heterogeneity. The same can be said about the second approach. Aside from confirmation bias, sample heterogeneity can be caused by underlying timedependent processes: a systematic bias is gradually introduced as time changes. For the same reason, the conventional approach can also suffer from the same time-dependent systematic bias, if the samples are taken from different past periods. In conclusion, there is no fundamental difference between the two approaches. In either approach, sample heterogeneity stems from non-randomness (unsuspected systematic differences) in different regions of the "sample space," from where samples are collected; here the term "region" means either a geographic entity or a time period. Thus, the conventional approach is not inherently more reliable than the second approach, as we were often led to believe. When data collected at different times or from different spatial regions are pooled in a conventional statistical analysis, part of the information regarding peculiarity in the sequence of appearance — e.g., sample heterogeneity — is lost. On the other hand, this particular information may not escape the attention of the sample collector (observer) provided that the observer is willing to formulate a tentative hypothesis during — rather than after — the session of data collection or accumulation. The accidental discovery of pulsars by Jocelyn Bell and Anthony Hewish was an example regarding to temporal heterogeneity, whereas the identification of (thyroid) goiter as an endemic disease (of iodine deficiency) was an example regarding spatial heterogeneity. *To firm believers of absolute physical determinism, no two events are causally independent. But non-deterministic chaos may generate sufficient randomness to meet the requirement of statistical methodology.
432
F. T. Hong
Even if one adopts the conventional approach, there is no reason why one cannot take a preliminary examination of incoming data rather than wait to do so until data collection has been completed. On the other hand, such an attempt is quite natural if one adopts the second approach. Detecting a peculiar non-random temporal or spatial sequence often requires a subjective judgment of the observer. This is particularly problematic when sample heterogeneity results in partial cancellation of opposing effects; an unsuspicious mind is often oblivious to such heterogeneity even in hindsight (see examples presented in Sec. 4.21). The taboo against the second approach inadvertently suppresses the exploratory instinct of humans. If one wishes to lift this taboo, then the saving grace that must come with the change is to treat an observed pattern resulting from a small number of observations as a hypothesis or speculation — instead of a foregone conclusion or fact — and to insist upon subsequent verification with the aid of conventional statistics, especially in the case of serious matters. However, the latter practice does not completely eliminate errors; it merely lowers the likelihood of false conclusions. (Besides, statistics provides only the likelihood of correlation rather than causality.) The threshold of acceptable errors must be set in accordance with the seriousness of matters being evaluated (flexible and adjustable thresholds). In this way, rationality is safeguarded. For example, two types of court cases demand different thresholds. A conviction of criminal offenses requires proof beyond reasonable doubt so that the possibility of convicting innocent people of a serious crime by pure coincidence is sharply reduced. On the other hand, a conviction of civil offenses requires only preponderance of evidence. Setting safety standards poses a dilemma. Risk assessments regarding public policies related to environment and health is a complex matter because policies designed to protect the environment and human health have a financial repercussion. In an article with the title "Precautionary principle debate," Hileman300 chronicled the history of environmental conflicts between the U.S. regulators and their counterparts of the European Union (EU). The precautionary principle is the guiding principle to deal with uncertainties, due to limited knowledge and information, in risk assessments. It appeared that EU regulators tend to err on the safe side of protecting public health and the environment, whereas U.S. regulators tend to err on the safe side of minimizing disruption of economic health. This is not a simplistic issue of the public versus big corporations, since a global economic decline also hurts the public. The same article also pointed to a major difference between the U.S.
Bicomputing Survey II
433
policy makers and their EU counterparts. It cited George M. Gray, acting director of the Harvard Center for Risk Analysis (p. 24 of Ref. 300): There are a lot of subtleties involved in making decisions that the precautionary principle doesn't help you address. A blanket principle can't be used as a decision-making tool. Traditional risk assessment — which uses animal studies and human health effects and exposure data to make quantitative estimates of risk — is still the best approach as a trigger for action in deciding how and when to regulate chemicals, substances, or practices. Superficially, it looks like that Gray was addressing the "big" picture instead of mundane local rules, drafted by bureaucrats, and he was attempting to adjust the above-mentioned "threshold" — the critical level that triggers action. A closer look reveals what Gray was really pointing at, i.e., a rigid number just like the threshold set for a passing grade. Thus, if your grade is 60.00 you pass, but if your grade is 59.99 it is "just too bad"! However, the article also pointed out that, in the case of asbestos, there is no known threshold, below which an exposure can be considered safe. What caused such a "cultural" difference between the U.S. and the EU? A side-bar entitled "Continental divide: a question of semantics?" cited an illuminating explanation, which was proposed by Joel A. Tickner of the Lowell Center for Sustainable Production at the University of Massachusetts: "In the U.S., any law or regulation developed under uncertainty is likely to be challenged in court, whereas European society is not so litigious." U.S. regulators are therefore forced to rely on quantitative risk assessments for their defense. In other words, U.S. regulators are forced to opt for quantitative, rule-based criteria rather than a holistic judgment based on combined rule-based and picture-based reasoning, no matter how absurd the rules or criteria can be. Alas, the prevalent practice of exclusively rule-based reasoning may just be a defensive move to combat "trigger-happiness" of American people, exhibited in terms of frivolous litigation in numerous lawsuits. After all, rule-based reasoning prevails in court. Lawyers often used hair-splitting arguments to win a court case. As a counter-measure, a defense attorney must summon "hair-splitting"resistant numbers in order to deflect the attack. Unfortunately, reliance on quantitative risk assessments may not as objective as the public is led to believe. It is well known that potential risks can easily be "glossed over" by quantitative data from experiments or surveys that ignore sample heterogeneity when a small sub-population (such as
434
F. T. Hong
children) exhibits a significantly higher susceptibility than the general population (Sec. 4.21). As Ahlbom and Feychting4 pointed out, uncertainties in risk identification poses an even greater problem than the determination of safety levels, because the mechanism of the implicated health hazard is still poor understood; our ignorance forces us to deal with the problem in a judgmental and informal way. Health hazard of extremely low frequency electromagnetic radiation is a typical example (for an overview of biological effects of electromagnetic fields, see Refs. 73 and 321). To deal with uncertainty in risk identification, Ahlbom and Feychting suggested a Bayesian approach in the handling of pooled evidence from epidemiological data, experimental data, and other sources. This approach helps point out where subjective judgments come into play. Incidentally, Tickner also pointed out that the U.S.-EU dispute is, to an extent, motivated by trade barriers. The U.S. accuses the EU of making regulations to protect its markets rather than health or the environment. When big money is involved, the issue gets murky. However, Hileman also pointed out that it is not a simple issue of money versus health and the environment. Historically, ignoring credible early warnings, even though the risk was not completely understood or scientifically proved, often cost a lot more to fix the problem in the aftermath; it was not even good for long-term economic health. Regarding Tickner's allegation, it is curious to think from the EU's point of view. Supposing that the U.S. adopted a significantly lower standard for a particular product, the EU would have no choice but to ban its import in order to uphold EU's own standard. The U.S. would then view the ban as a trade barrier, but there might be a different motive behind EU's decision: unwillingness to water down EU's own higher standard. It is apparent that a country that adopts a significantly lower standard than other countries may have to face the penalty of trade barriers, and/or the necessity to purchase advanced environmental technologies from abroad subsequently, in order to clean up the act, because of the earlier lack of incentive for a domestic research and development effort. However, in a general culture of instant gratification and in a corporate culture of emphasizing quarterly profits instead of long-term growth, even global and long-term economical considerations have to take a backseat. The success of the 20th century science has fostered an attitude of political correctness towards objectivity. It is about time that we examine role of subjectivity in value judgment.
Bicomputing Survey II
435
6.6. Role of subjectivity in creative problem solving and value judgment It is instructive to consider how subjectivity intrudes into the cognitive process of novel problem solving in science. Subjectivity often enters at the search-and-match phase of problem solving, if heuristic searching is elected. A hypothesis is almost always subjectively formulated unless it is stolen or permanently borrowed from a fellow investigator;418 absolute objectivity demands trial and error of a virtually infinite number of possibilities. If picture-based reasoning is invoked, judging the goodness of match between a potential hypothesis and relevant experimental data is also a subjective process because a Gestalt (holistic) judgment is called for. Insistence upon absolute objectivity often condemns one to become a practitioner of exclusively rule-based reasoning. Despite his confessed and widely publicized handicap in languages, Einstein said it eloquently (p. 236 of Ref. 100): "For, if a researcher were to approach things without a pre-conceived opinion, how would he be able to pick the facts from the tremendous richness of the most complicated experiences that are simple enough to reveal their connections through [natural] laws?" Let us interpret Einstein's remark, quoted above, in terms of pattern recognition. Here, Einstein specifically referred to the process of discovering Nature's behavioral "pattern": If an investigator does not have a preconceived "template," which no one else knows, how is he or she going to recognize the "pattern" when he or she stumbles on it? It is difficult, if not impossible, to make sense of data (especially, of a complex phenomenon) unless one has a preconceived idea, which constitutes an initial guess or hypothesis. This point was most spectacularly demonstrated by a consummate primatologist's attempts to interpret chimpanzees' facial expressions and gestures without the benefit of the apes' introspective reports (unlike Koko and Chimpsky, these apes did not know any sign languages). To a completely objective casual observer, many of these fine emotional expressions will appear meaningless and go unnoticed. As de Waal explained, the situation was not too different from what a chessboard pattern appears to an ignorant non-player (pp. 17-18 of Ref. 162). One needs to first learn to recognize certain basic patterns (modules) before one can make any sense of a complex sequence of patterns that constitute a meaningful political behavior, such as forging an alliance, seeking reconciliations, or maneuvers to dethrone a reigning "alpha" male chimpanzee. The Gestalt perception of the whole patterns can then be used to refine interpretations
436
F. T. Hong
of the individual modules. The rise on the objectivity scale of a given interpretation is achieved by repeated refinements to accommodate continually accumulating data and increasingly complex social interactions among various large groups of chimpanzees, i.e. perpetual verifications and revisions (Griffin called this procedure informed inferences; see p. 256 of Ref. 264). The verification phase in creative problem solving is expected to be completely objective because rule-based reasoning must be exercised. That may be true as far as creative problem solving in science and mathematics is concerned. It becomes somewhat problematic when value judgment is involved. As discussed in Sec. 4.20, subjectivity enters even at the verification phase of a creative act in art and music, because the process requires value judgment regarding "beauty" instead of "truth." Essentially, pattern-based reasoning is required in passing judgment on qualities of art and music. No one expects a beauty contest to be judged solely on the basis of objective criteria, even though some superficially objective rules are usually imposed to restrict freedom of the judges in exercising their subjective judgment. Judgment of students' performance and judgment of a scientist's work also belong to the same category; excellence has an inherent esthetic element in addition to a utilitarian element. Unlike the practice in a beauty contest, subjectivity has been deliberately suppressed in the name of fairness. Attempts to enhance objectivity often lead to quantification of qualities: a quantitative "point" system is established to evaluate qualities. Is subjectivity completely eliminated? Superficially, yes, but no in reality: subjectivity simply becomes hidden or disguised. The following examples show why and how. A colleague from another department was once proud of their department's criteria of merit in research productivity. Their merit system boasted to include both "quantity" and "quality" in the evaluation process: it is not a simplistic process of merely counting numbers of publications. According to the point system, abstracts submitted to scientific meetings are assigned a certain point, whereas formal publications are assigned a higher point. In fact, formal publications are further classified according to the venues of publications: refereed journals, non-refereed journals, book chapters, conference proceedings, etc. When I questioned the rationale of their choice of that particular distribution of points among various categories (e.g., "why 3 instead of 2 points?"), he suddenly realized that the criteria were not completely objective: they were somewhat arbitrarily determined but were regarded as objective because no one had raised any objection. Another colleague from Finland once told me that his university had
Bicomputing Survey II
437
asked each department to submit a set of criteria of merit in research productivity but these diverse submissions all favored the home department at the expense of other departments. Apparently, these criteria which had been formulated by local consensus in each department lacked global consensus among the university because of the diversity of the university's constituent departments. Of course, criteria set by consensus among the rank and file are usually considered better than criteria set by a single "boss" in a dictatorial fashion, but it is not always so. It is well known that criteria of grade performance that answer the wishes of the majority of students have the deleterious effect of "grade inflation." The use of multiple criteria, instead of a single one, leads to another drawback. Although the intent may be to address the issue of diversity and fairness, the end result does not always lead to excellence, as we know it, and fairness, as we expect it. Instead, it encourages — in John Holland's words (Sec. 7.2) — "highly optimized mediocrity." This is because weaknesses in certain aspects are allowed to cancel strengths in other aspects, often in bizarre proportions. Because of arbitrariness of point distributions and nonlinearity of the scale of evaluation, an extraordinary strength in one aspect may not be sufficient to offset a modest weakness in another aspect. For example, promptness in ending lectures on time has been used as an important criterion of teaching excellence. Although a failure to do so is a relatively innocuous offense from an educational point of view — it is, however, a major irritation to students which the school administration wishes to avoid — it carries a sufficient weight to cancel a merit point attributed to enthusiasm in the classroom. In a medical school nowadays, holding of substantial research grants that return lavish indirect-cost money — as a "kickback" — to the home institution is almost universally considered a necessary condition for tenure and promotion of academic positions. This criterion, if applied to Einstein, would have precluded the possibility of his first faculty appointment at the University of Zurich in 1909. At that time, he not only did not hold a research grant, but he was not even a full-time academic scholar by profession or by definition; he was a low-level clerk at a Swiss patent office in Bern, with only a short stint as a high school mathematics teacher and a part-time non-faculty teaching position at the University of Bern to be added to his meager resume.501 Yet, four years prior to the appointment, he published five masterpieces in Annalen der Physik, including a special theory of relativity, which made him a household legend, and a photon theory regarding the photoelectric effect of solids, which led to the award of a Nobel prize in
438
F. T. Hong
physics, not to mention the paper that contained the formula: E = me2. Is the prevailing system of using the "impact factor" of the venue of publications to evaluate research excellence any better than other point systems? Superficially, objectivity is maximized by virtue of nearly universal consensus and arbitrariness is minimized by virtue of the authority of editors and editorial boards. In reality, it is far from being perfect. In order for the system to work properly, a roster of perfect referees with consistently unfailing ability to judge must be assembled — an almost certain impossibility. Normally, a journal in good standing contains mostly good articles but occasionally inferior articles slip through the cracks. An obscure journal sometimes contains an article of exceptional quality. If sociologists' research regarding conformity also applies to an organization, then the first thing that most officials of a prestigious journal want to do is preserve the journal's hard-earned reputation, in part because big money is often involved in the publishing industry (cf. Sec. 4.21). Being anonymous, referees have a tendency to reject what they do not quite understand ("It has never been done before, therefore it is not possible"). Given this tendency, an article of "paradigm-shift" quality would have a hard time winning approval of the referees, not only because it is over their head or beyond their imagination but also because it rocks the boat and jeopardizes their career and livelihood. Thus, conformity, as applied to the publishing world, tends to weed out the very worst and the very best. Practitioners of the impact factor game inadvertently promote a medieval value system that advocates the notion of guilty-by-association; each and every article in a journal of low standing must all share the same stigma. Invoking impact factors in judging the worth of a publication is as repugnant as racial discrimination: an article is not judged on the basis of its intrinsic merit but rather its "appearance" — the label attached to it. The impact factor game is discriminatory not only against individual investigators but also against individual scientific disciplines. Years ago, I had an opportunity to discuss issues pertinent to a journal with its publishing editor. This publishing editor informed me of an interesting observation. According to the particular publisher's own investigation, authors in the field of molecular biology tend to cite articles within a small handful of journals. This practice reflects, more or less, the tendency of these authors to concentrate their publications within these same small number of journals. The "clustering" effect thus tends to increase the impact factor of
Bicomputing Survey II
439
these journals. On the other hand, the particular journal, represented by this publishing editor, tends to contain articles that cited references from all over a wide variety of journals. Without identifying this particular journal, I shall simply say that it is multi-disciplinary in nature. The typical authors of this journal are routinely exposed to a wider range of knowledge, and often publish in a wider variety of journals than molecular biologists. This kind of practice tends to lower the impact factor of the "home" journal, i.e., the official organ of the professional society with which an investigator is primarily affiliated. However, this is not a simple issue of loyalty to the home journal. The very nature of inter-disciplinary/multi-disciplinary work makes it necessary for the investigators to reach out and to network with fellow investigators in related disciplines. Such networking activities should not be discouraged at a time when coherence of science is increasingly threatened by fragmentation of knowledge, as a consequence of increasing specialization. Discrimination against certain disciplines has another grave impact. Various disciplines and subdisciplines in science constitute an ecosystem because they share and compete for the same resources and, at the same time, are mutually supportive. Thus, mathematics, physics and even engineering feed biology with new concepts and new tools. Biology inspires new endeavors in mathematics, physics and engineering (see Sees. 2.4 and 7.2). Two new trends helped create something tantamount to an ecosystem crisis in science. Arguably, the advent of molecular biology was one of the greatest, if not the greatest, scientific achievements in the 20th century. Just like what was pointed out by Gibbs and Fox235 regarding innovation in the computer industry, the rise of molecular biology and biotechnology did not result in a vastly increasing demand of high-level innovative molecular biologists but rather it increases the employment of mid-level scientists that can build the much needed database quickly, and highly skilled technicians that fill the need of the biotechnology industry. However, the concentration of personnel in molecular biology and biotechnology to the exclusion of other areas of biomedical sciences may eventually lead to an ecosystem crisis; few competent people will be around to handle the rest of scientific problems, not to mention teaching critical subjects such as neuroanatomy and physical organic chemistry. That trend alone would not matter as much, if scientists were really free to pursue their chosen research subjects. With dwindling availability of research funds, the balance of scientific ecosystem became shaky as scientists
440
F. T. Hong
are increasingly attracted to — as Willie Sutton eloquently alluded to — "where the money is." The process of pursuing funds, on the part of scientists, and enhancing productivity, on the part of administrators of research institutions, thus acquires an autocatalytic (positive feedback) effect: the rich gets richer and the poor gets poorer. Thus, "hot" topics tend to attract a great deal of funding and a large number of investigators, whereas important topics that are deemed "old-fashioned" attract neither funding nor investigators. The outcome is tantamount to what is called monoculture in agriculture. In the name of productivity, the same crop is planted year after year after year, without rotating the land and letting the land have an opportunity to recuperate (see Chapter 2 of Ref. 57). Needless to say, the impact factor game helps reinforce the trend of monoculture in scientific research. Thus, the impact factor game has yet another impact. As mentioned in Sec. 11 of Chapter 1, many diseases have complex etiology: both genetic and environmental factors contribute to the susceptibility. Rees559 warned against the dominance of molecular medicine in medical research, at the expense of clinical discovery and patient-oriented research. However, in view of the popularity of the impact factor game in the research community, Rees' plea will almost certainly fall on deaf ears. Thus, the impact factor game is also bad for human health. Of course, the impact factor game is not the only villain and can, in principle, be modified to minimize its harm. Perhaps someone could come up with some normalization factors — fudge factors, so to speak — to compensate for the inequity across disciplines, and to give some justice to multi-disciplinary subjects. However, I am not going to waste my time to promote such an idea, because the impact factor game is inherently flawed. The above example suggests that subjectivity also enters in deliberation of social policies because it involves value judgment regarding public "good." It is well known that the distribution of the weight of various criteria in deciding a public policy is usually not arbitrary, but is shaped to a large extent by various special interest groups even in a democratic society. The distortion of weight distribution so engendered seriously undermines the democratic principle of one-person-one-vote. However, nothing cripples the system more devastatingly than a failing educational system, which produces a large crop of intellectuals that practice exclusively rule-based reasoning. These intellectuals often have trouble think rationally but usually have no self-awareness of their own handicap. Sorting out a complex public issue among the thicket of propaganda, launched by various factions
Bicomputing Survey II
441
of the political spectrum, often depends critically on the ability of individual voters to cast independent judgment, based on critical and rational thinking. Therefore, a failing educational system not only harms science and technology but also jeopardizes the existence and persistence of democracy. It is apparent that quantification of qualities does not automatically guarantee objectivity. Quantification of qualities is tantamount to replacing a parallel process of holistic judgment with a sequential process of evaluating a few (often arbitrarily) selected features with little regard to the overall balance of weight distribution among them (see Rosen's analysis of semantic and syntactic entailments, Sec. 6.13). How one weighs the relative importance of these features critically affects the outcome of evaluation; just imagine a beauty contest solely based on the criteria of preconceived ideal anatomical dimensions, with an equal or arbitrary weight distribution, of various body parts. Although no one can say categorically that it is impossible to develop a satisfactory point system of merit evaluation, the task is a formidable one for contemporary university administrators who, unlike Robert Hutchins (Sec. 4.22), were often recruited on the basis of their fund-raising ability instead of a profound understanding of what it takes to become a creative scholar and what it takes to educate an enlightened citizen. A merit system with a point distribution designed primarily to maximize "profit" and to minimize expenditure carries so much distortion that it tends to breed mediocrity rather than foster excellence. After all, pattern recognition is a subjective experience of perception. It becomes difficult to recognize the "pattern of excellence" when quantification of qualities fractures and distorts perceived patterns in a wholesale manner. Conventional wisdom tends to associate objectivity with rationality, and subjectivity with irrationality. However, as illustrated by several examples cited in previous sections of this article, exclusively rule-based reasoning often leads to superficial objectivity without genuine rationality. It seems to me that an easy way to restore sense to value judgment is to reinstate subjectivity. It is possible to be subjective but still fair and rational. In other words, a rise on the gray scale of objectivity can be achieved by careful and rational deliberations. A careful and rational deliberation must include, but is not limited to: a) paying attention to the big picture, b) paying attention to opponents' opinions, or even thinking from the opponents' point of view, c) verifying one's thought or tentative conclusions against available evidence, d) not exercising double standards in evaluating
442
F. T. Hong
favorable and unfavorable evidence," and e) not letting one's emotion or selfishness cloud one's judgment. Complete objectivity in value judgment may be an impossible goal, but making an effort to reduce subjectivity stemming from irrational thought processes often goes a long way in enhancing objectivity. Last but not least, rational deliberations can be expected only from an enlightened citizenry. It is obvious that the current practice, in the United States, of mass-producing highly optimized examination-taking biomachines can hardly meet this need, and does an incredible disservice to the society. Delon Wu, a classmate of mine in medical school and now a cardiologist and university/hospital administrator, once said, "Your objectivity is nothing but another kind of subjectivity." It took me several decades to fully grasp the significance of his remark. 6.7. Critiques of science fundamentalism and postmodernism Franklin, as well as Sokal and Bricmont,640 blamed Popper, Kuhn and their school for the rise of postmodernism and the gradual waning of faith in scientific objectivity (p. xiii of Ref. 212). I disagree. The postmodernists simply misinterpreted Popper and Kuhn's philosophy, because they did not understand that there is a gray scale regarding the level of confidence in scientific knowledge. However, I suspect that the popularity of the postmodernists' wholesale distrust in scientific knowledge might not have much to do with the teaching of Popper and Kuhn's school. Rather, the distrust might merely manifest an objection and overreaction to the opposite view advocated by a subpopulation of scientists and science philosophers, whom I call science fundamentalists, for lack of a better term. Intriguingly, like the postmodernists, the science fundamentalists also have no sense of gray scale in their overzealous defense of scientific truths. They believe that scientific truths are absolute. Apparently, postmodernist sociologists of science have been aware of how scientific theories are constructed, and have been able to see through the smoke screen (or, rather, fig leaf) set up by science fundamentalists (see, for example, Chapter 1 in Part B of Ref. 408). Naturally, they harbored a distrust of the fundamentalist "Some intellectuals exhibit a tendency to accuse others of being subjective when they disagree with others' opinion, while they embrace the opposite view uncritically (confirmation bias). Ironically, subscribing to double standards is itself an irrational act stemming from self-serving subjectivity.
Bicomputing Survey II
443
view, thus adopting the opposite view. Often, extremists at the opposite ends of the opinion spectrum made strange bedfellows. Unwittingly, the postmodernists and the science fundamentalists have co-conspired to create an oppressive climate of "political correctness" to sustain their opposing views that would be otherwise untenable. Thus, they have breathed life into each other, by furnishing each other with a reason to exist and persist. How did the science fundamentalism arise? I suspect that the view was rooted in the success of reductionism which brought about detailed knowledge and sophisticated methodology in modern science (modern biomedical sciences, in particular). Modern biologists are fortunate to work with wellestablished methodology. Barring technical errors, experimental results are mostly reliable and often unambiguous. Detailed knowledge in molecular biology further exerts a top-down constraint to limit the number of viable alternative interpretations. As a consequence, some biologists have gradually been lulled into such an uncritical mind-set that they sincerely believe that absolute proof in science is possible. Worse still, some of them even took a limited number of experimental tests as the ultimate proof of their interpretation (or theory), without bothering to consider the possibility that better alternative interpretations might be found in the future or might have already existed (e.g., see Ref. 322 for examples). The harm inflicted by science fundamentalism has spread beyond the mere deterioration of the quality of a bench scientist's work. The writing of science philosopher Bunge gives us a glimpse.97 Admittedly, some of Bunge's objections to postmodernism are valid, but some others are problematic and misleading. For example, his view regarding Bayesian analysis was an affront to common sense. He labeled Bayesian analysis as subjective probability and classified it as academic pseudoscience. First, probability theory is not science but a branch of mathematics. Unlike science, a mathematical construct is not falsifiable by experiments but is held accountable for its internal logical consistency. (Of course, it takes more than internal logical consistency to make a good piece of mathematical invention: non-trivial logical conclusions.) Second, Bunge's orthodox version of probability can be applied to real life with strict validity only if the samples or events under consideration are infinite in number, otherwise the conclusion so drawn must be accepted with reservation. For example, the common practice of using the p value at 0.05 as the dichotomous point of significance versus nonsignificance of hypothesis testing is arbitrary, and was based on a subjective decision (other choices at 0.01 or 0.001 are less common). It can be problematic under certain conditions.59
444
F. T. Hong
Bayesian analysis was proposed to deal with real-life situations in which only a finite number of trials or samples is possible.58 Bunge's major objection was the subjectivity involved in the assignment of prior probability. It is true that there is an inevitable subjective component in the assessment of prior probability, which is defined as a conditional probability (e.g., see Ref. 350) For example, the knowledge of the likelihood that a particular dice is being "loaded" will certainly affect one's (subjective) assessment of the prior probability in a game of chance. In the absence of any evidence to the contrary, the dice must be treated as an "honest" one initially. However, in light of a subsequent "reduction of uncertainty or ignorance," the conditional probability should be adjusted accordingly, in order to avoid being duped. Continuing to treat the dice as a regular one, in spite of strong subjective or objective evidence to the contrary, is not an admirable scientific practice but rather a manifestation of foolish mental inflexibility. More recently, Kording and Wolpert380 demonstrated that sensorimotor learning also invokes Bayesian integration of a priori knowledge with information from sensory feedback. When we learn a new motor skill, such as hitting an approaching tennis ball, both our sensory feedback and our motor act possess variability. Since our sensors provide imperfect information, we must rely on our estimates of the ball's velocity. Bayesian theory suggests that an optimal estimate results from combining prior knowledge about the distribution of velocities with information provided by our sensory feedback. In executing an integration, the brain weighs the two sources of information according to the respective uncertainty. When the uncertainty of sensory feedback increases, such as in the presence of a fog, reliance on prior knowledge of the velocity distribution increases accordingly. Statistically, not all velocities are a priori equally probable. A subjective estimate can be made. Any attempt to be completely objective would dictate an assumption of equal probabilities for all velocities, from zero to infinity or, even more inclusively, from minus infinity to plus infinity and also in all other (three-dimensional) directions; all other estimates are subjective. Yet, it is extremely unlikely for a pitcher to throw balls in the opposite direction, and certainly impossible to throw balls with a velocity of infinity in any direction. Thus, the objective estimate is absurd and misleading, and objectivity is just an illusion in this case. In computer simulation studies of the evolution of the genetic code, Preeland and Hurst213'214 invoked two kinds of probabilities to evaluate the performance of various variant (hypothetical) genetic codes in terms of error rates of mutation-generated or mistranslation-generated protein
Bicomputing Survey II
445
damage. They found that from a sample of one million alternative codes only about 100 codes had a lower error value than the natural genetic code if a random error distribution was assumed, as dictated by objective probability. In fact, translation errors in the third position of a codon are more likely to occur than in the other two positions because of a weaker noncovalent bond interaction between a messenger RNA and a transfer RNA in the third position (dubbed the "wobble" phenomenon). However, when they incorporated additional restrictions — i.e., they used prior knowledge to modify the error distribution — to reflect observed patterns in the way DNA tends to mutate and the ways in which mistranslations occur, the natural code outperformed all but one in a million of the alternative codes. In this case, insistence on objectivity led to a reduction of insight rather than a reduction of ignorance. For further detail, see Sec. 7.5 of Chapter 1. That some investigators shun the use of prior probability or prior belief yet have no qualms about the use of assumptions is probably a consequence of the power of words which "plainly force and overrule the understanding," as Francis Bacon once pointed out. Thus, invoking prior probability conjures up the specter of making subjective choices, but making assumptions, however arbitrarily, is an acceptable scientific practice since all past masters did so routinely. Regarding the illusion of objectivity, Irving J. Good put it aptly but bluntly: "The subjectivist states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science" (cited in p. 110 of Ref. 58; see also Ref. 59 for additional discussions). Apparently, Bunge pushed the rules of orthodox probability theory beyond its domain of validity. Readers who are interested in seeing how some scientists misused or abused statistics, in real-life applications, should read a report by Bailar.35 Bunge was also opposed to the application of chaos theory in political science. He pointed out that "[James N. Rosenau, a well known political scientist,] did not write, let alone solve, any nonlinear differential or finite difference equation for political processes; all he did was some hand-waving" (p. 104 of Ref. 97). However, numerous political events exhibit the Butterfly Effect: the enactment of a policy or the occurrence of an incident — often innocuous or minor from the point of view of the initially involved parties — triggers a chain of events that eventually precipitate a major political crisis, e.g., the collapse of the former Soviet Union, the collapse of the Ferdinand Marcos regime in the Philippines, and the forced resignation of the former U.S. president Richard Nixon. So, what is wrong about ignoring the underlying mathematics and just using chaos theory in a qualitative man-
446
F. T. Hong
ner? Bunge ridiculed Rosenau's attempt as a "loose talk," "science without its substance," and "pseudoscience." Apparently, Rosenau's work lacked mathematical rigor and elegance which Bunge expected. But Einstein said: "If you are out to describe the truth, leave elegance to the tailor" (p. 81 of Ref. 431). As we shall see, social science defies quantification and cannot be fully formalized (mathematized) (Sec. 6.13). However, chaos theory certainly can be used to construct a "semantic" model, instead of a mathematical model, to describe and predict certain truths in social science. 6.8. Level of confidence in scientific
knowledge
Perhaps it would be a sobering experience for both radical realists and science fundamentalists to read Alcock's article The Propensity to Believe,8 which appeared along with Bunge's article in the same proceedings volume. Alcock pointed out that the internal representational model of the outside world is not always isomorphic with what is really "out there," i.e., the relationship is not a one-to-one correspondence and distortion is inevitable. The distortion of our visual perception starts at the level of the retina where edge enhancement of an image formed upon the retinal photoreceptors is implemented by means of a built-in distorting mechanism known as lateral inhibition.282 Such a mechanism specifically allows an animal to detect the contours of an object in the absence of a complete and detailed image, and is of great survival value. Thus, the brain is an active knower that often tries to make predictions based on incomplete information. The initial guess is often laden with emotional inputs from the limbic system, and is thus tainted with a heavy dose of subjectivity (see the theory of Korner and Matsumoto, Sec. 4.17). However, the brain is also equipped to detect logical inconsistencies and holistic incongruities. Subsequent iterative corrections, which also involve the paleocortex and the neocortex, reduce the degree of subjectivity but can never completely eliminate it. Likewise, a scientific theory is a product of active modeling of Nature — a mental construct so to speak. The initial hypothesis reflects the investigator's bias and educated guess (Sec. 6.6). However, a theory may not be completely subjective, especially after thorough revisions with the aid of honest public debates. In summary, although theories are mental constructs of scientists, some theories fit experimental data better than others, and some theories make better predictions than others. Consequently, there exists a gray scale of scientific truths, and the level of confidence in scientific knowledge reflects
Bicomputing Survey II
447
this gray scale. Although absolute truths are theoretically unattainable, they can be approached asymptotically by repeated attempts of successive approximations. Held294 pointed out that the word "construct" (noun or verb) is behind much of the confusion about active knowing and passive knowing (prisoner of words phenomenon, Sec. 4.7). The fact that knowing involves an active process on the part of the knower/investigator — the process of proposing a hypothesis — does not make all knowers proponents of purely subjective reality, nor does it render all knowledge purely subjective. 6.9. Sociological aspects of science Although sometimes social pressures or authoritarian dictations may sustain a faulty theory and suppress a valid dissident theory, a meritorious but suppressed theory may eventually be redeemed, and a faulty theory abandoned, when the initial pressures subside along with a reversal of fortune. The futility of Galileo's coerced recantation is a case in point. There are certain important factors that constructionists/constructivists have overlooked: merit of a theory and rationality and intellectual honesty of some, if not all, scientists. Rationality allows a scientist to judge critically and independently the fallacy that the science authority attempts to thrust down the collective throats of fellow scientists. Rationality also enables a scientist to recognize the merit of a scientific theory which the science authority attempts to discredit or suppress. Intellectual honesty gives a scientist the courage to defend a meritorious but unpopular theory proposed even by a total stranger, even under the threat of retaliation by the authority. Thus, by collective, repeated attempts to arrive at a finer and finer theory — a process of successive approximations — objective truth can be approached "asymptotically." In this way, scientists lose complete control of the fate of their "pet theory." If we are to borrow a phrase from sociologists Berger and Luckmann,60 a theorist cannot "wish away" all future unfavorable evidence, especially after his or her death. Nor does the "reality" of unfavorable evidence that confronts a theorist during his or her lifetime vanish from all others' perception upon his or her death, as radical antirealists are apt to think. The last assertion that I have just made implies that I am a realist, albeit a realist based on faith alone, without the benefit of rigorous scientific proof. Descartes' famed "proof" of his own existence may never be considered satisfactory by radical antirealists. Antirealists can always claim
448
F. T. Hong
that the introspective feeling of "Cogito ergo sum" was just Descartes' own imagination, or, in modern AI language, a simulation or virtual reality in the mind (brain) of Descartes. Nevertheless, if objective reality does exist after all, the readers will have an opportunity to examine any proposition or theory and judge it on its own merits, quite independently of the protagonists and the antagonists. I accept the existence of objective reality on faith, because this is the only way that I can maintain self-consistency with the kind of philosophy that I endorse and the kind of world-view that I adopt.
6.10. Logical inconsistencies
of
antirealism
The logical inconsistencies of radical antirealists' claim have often been pointed out (see, for example, Ref. 294). The most serious flaw of radical antirealism is the impossibility for antirealists to maintain self-consistency unless they maintain silence and keep their view to their own. Bertrand Russell once told the following amusing story with regard to solipsism — an extreme form of antirealism — which claims that the self can know nothing but its own modifications and states (p. 180 of Ref. 582; also cited in p. 54 of Ref. 640): "I once received a letter from an eminent logician, Mrs. Christine Ladd Franklin, saying that she was a solipsist, and was surprised that there were no others." The antirealists' proclamation and insistence that all theories are relative is similar to the claim of Epimenides the Cretan: all Cretans are liars. Radical antirealists thus committed an elementary error: self-contradiction by virtue of self-reference440'638 (cf. Sec. 6.13). For radical antirealists, Russell had the following harsh words (p. xi of Ref. 582): Skepticism, while logically impeccable, is psychologically impossible, and there is an element of frivolous insincerity in any philosophy which pretends to accept it. Moreover, if skepticism is to be theoretically defensible, it must reject all inferences from what is experienced; a partial skepticism, such as the denial of physical events experienced by no one, or a solipsism which allows events in my future or in my unremembered past, has no logical justification, since it must admit principles of inference which lead to beliefs that it rejects.
Bicomputing Survey II
449
6.11. Objective knowledge: Popper's third world There remains an important discrepancy to resolve. Popper, whose philosophy we adopt in this discussion, claimed the existence of objective knowledge in a book of the same title.528 What Popper did was elevate the part of knowledge that had been considered well-established to a level that could be categorized as belonging to the third world — a world independent of the first world (the physical world) and the second world (the world of the human mind). His point was that knowledge can be detached from the mind of the proponents or discoverers and can become accessible to future readers. This potentially allows a reader to reconstruct the whole idea long after the death of its proponents. Did Popper change his mind? Not at all. In the same book, Popper gave a concise and lucid recapitulation of his book Logik der Forschung in plain English (Chapter 1 of Ref. 528). In his reamrmation, knowledge, however well established, remains tentative and conjectural. So, where has the subjective element gone? How did Popper make it vanish? What Popper did in assigning objective knowledge to the third world is tantamount to partitioning our knowledge into an objective and public part, and a subjective and private part. However, as Rosen pointed out, the act of partitioning our immediate universe of impressions and percepts, into a public part and a private part, is by itself a private [subjective] act (p. 84 of Ref. 570). In other words, the element of subjectivity in Popper's theory resides in the step of making the decision of granting a given theory or proposition the sanctified status of objective knowledge in the third world. However, we suspect that the status (or, rather, the membership in the third world) will forever remain tentative even for a well-established theory, although a revocation of that membership seems unlikely for the time being. The reason was stated by Popper himself: there was never, and perhaps never will be, a theory as well-established as Newton's classical mechanics; yet, it did not escape the fate of being superseded and becoming an approximation of special relativity (and quantum mechanics). 6.12. Method of implicit falsification: Is psychoanalysis unscientific ? Among the casualty of overreaction of science fundamentalists to the threat of postmodernism was the Freudian theory of psychoanalysis (e.g., see Ref. 149). Appraisals of psychoanalysis from various angles can be found in a philosophical critique by Griinbaum266 and the open peer commentaries
450
F. T. Hong
following the main text. The key ingredient of Freud's theory — the notion of the unconscious — is not externally observable. The conclusion drawn from introspective recounts of the patient is not directly falsifiable. In addition, free associations of the patient may be tainted with the analyst's suggestions. Thus, in accordance with the Popperian doctrine, psychoanalysis was deemed unscientific. Is psychoanalysis really unscientific? This is a question that is intimately related to the admissibility of subjective judgment in science. Absent the subjective aspect, the science of neuropsychology is a science that "excludes the psyche" (p. 9 of Ref. 641). More recently, attempts have been made to reconcile neurological findings with Freud's theory of the mind.642 In my opinion, it is too early to write Freud off. After all, Freud's theory, being phenomenological in nature, consists of control laws governing the mind at the behavioral level, whereas findings in modern neuroscience provide control laws at the molecular or the neuronal circuitry levels. Both types of control laws are products of evolution, and both must conform to the attributes of consistency, coherence and rationality in normal individuals, whereas deviations from these control laws lead to pathological states with manifestations at the molecular and/or the behavioral levels. A reductionist explanation alone or a purely phenomenological "big-picture" explanation alone can hardly do justice to the complex topic of the mind. Neither of these descriptions can replace each other and both must work together to achieve a better understanding of how the mind works. Falsification of psychoanalytic data is not impossible but the procedure of falsification is not as straightforward as that routinely encountered in hard science. In this regard, soft science is actually harder than hard science. With regard to the question whether Freud's method of psychoanalysis was scientific, Pribram recounted a true story (pp. 14-15 of Ref. 538): One evening at a dinner hosted by the San Francisco Psychoanalytic Society, I was seated between Ken Colby and Alan [sic] Newell. I asked them to compare the psychotherapeutic process to that by which computer scientists program chess games. I asked: Doesn't the computer scientist develop a "theory," test it against an opponent, note the theory's failings, incorporate a solution provided by the opponent, test the amended theory against a more proficient opponent, and then repeat the process until satisfactory winning strategies and tactics are developed? Substitute psychotherapist for computer scientist and patient for opponent and doesn't this de-
Bicomputing Survey II
451
scribe the therapeutic processes? Newell agreed to the correctness of the chess analogy; Colby stated that was exactly what he was doing (which I knew, but Newell didn't) in simulating, by computer program, his therapeutic procedure and testing it against his patients' productions the following week. We all agreed as to the similarity of the two processes: ergo, either psychoanalysis as Freud proposed it in the Project and psychotherapy are indeed both "scientific" procedures or else computer programming as used in developing chess strategies fails to be "scientific." Thus, the validity of a psychoanalytic therapeutic procedure is implicitly falsifiable by repeated iterations (cf. Korner and Matsumoto's theory, Sec. 4.17). In fact, this is exactly the procedure used in hard science when the Popperian falsification is not straightforward. When the counter-evidence to be used in falsifying a given proposition cannot be explicitly and unequivocally established, falsification tests need to be applied to the counterevidence itself, in a nested hierarchical fashion. Thus, when well-established counter-evidence is called into serious question, it is necessary to send the entire line of reasoning back to the drawing board. As demonstrated in Sees. 5.13 and 5.14, a perennial debate about the consistency of Boltzmann's statistical physics and deterministic classical mechanics begs to send the widely accepted principle of microscopic reversibility for a re-trial in light of new insight, and the outcome threatens to overturn the century-old verdict. The implicit and iterative method of falsifications is also exactly the method that ethologists used to attain a valid subjective interpretation of the animal's objective behaviors via informed inferences (see earlier comments on de Waal's work, Sec. 6.6). As indicated in Sec. 5.20, admission of subjectivity is indispensable if we are to make progress in the study of consciousness. It is also a necessity if natural scientists aspire to reach out to social scientists and to build a bridge towards consilience so as to achieve the unity of knowledge.
719 158
>
6.13. Life itself: epistemological
considerations
Among a number of claims that physics is not adequate to explain life and consciousness, Rosen's claim deserves special attention. Rosen provided one of the most comprehensive and compelling explanations in his book Life Itself569 and a companion volume Essays on Life Itself.570 Here, Rosen restricted his discussion to contemporary physics rather than the ideal physics that it aspires to be. Rosen took on the issue of duality (dichotomy) between
452
F. T. Hong
"hard" science and "soft" science, i.e., quantitative and qualitative science, or "exact" and "inexact" science. Rosen quoted two diametrically different positions (pp. 2-3 of Ref. 569). The first position is due to physicist Ernest Rutherford: "Qualitative is nothing but poor quantitative." Francis Galton (1879) also thought that "until the phenomena of any branch of knowledge have been subjected to measurement and number, it cannot assume the status and dignity of a science" (reprinted in Ref. 153, p. 24). The second position is due to educator and administrator Robert Hutchins: "A social scientist is a person who counts telephone poles." Here, Hutchins tacitly accepted Rutherford's notion of science as quantitative descriptions of natural phenomena. Thus, the mere juxtaposition of the two terms "sociology" and "science" reduces them (social science) to trivial activities. These two rather derogatory remarks beg serious explanations, which Rosen eloquently offered. They represent two extreme views that surround the distinction between "hard" or quantitative science and "soft" or qualitative science. Thus, physics and chemistry are hard science, whereas sociology and politics are soft science, with biology (and psychology) positioned in between. Rutherford obviously believed that every percept or quality with regard to science is quantifiable and can be expressed in numbers. Anything short of that is inexact and perhaps inferior. Hutchins' view asserts the opposite: quantitative is poor qualitative. The features or qualities of a social structure or, a piece of art that are of interest or importance are precisely those that are not quantifiable; anything we can count is either trivial or irrelevant. Rosen argued that the gap between the two perspectives which some scientists hoped to close is, in principle, unbridgeable: some qualitative features are inherently not quantifiable regardless of technical prowess. Recalling our analysis in Sec. 4, this gap is the same gap that renders step-by-step algorithms unsuitable for solving a problem of pattern recognition, because there is no clear-cut threshold of recognition that can be specified as a set of numerical values of some judiciously chosen parameters. Most conclusions reached in hard science can be verified in terms of step-by-step manipulations of words and symbols (rule-based reasoning), whereas conclusions reached in judging a piece of art or music is at best vaguely expressible in words or symbols (picture-based reasoning) (Sec. 4.20). Rosen pointed out that there are two kinds of truths: syntactic truth and semantic truth. He defined a purely syntactic system as: a) a finite set of meaningless symbols, an alphabet, b) a finite set of rules for combining these symbols into strings or formulas, and c) a finite set of production
Bicomputing Survey II
453
rules for turning given formulas into new ones. A system that is amenable to being reduced to syntactic truths easily comes to mind: mathematics. A semantic mathematical truth allows a term such as "triangle" to refer to an actual geometric object rather than merely a pure mathematical object. Ideally, one starts, in constructing a mathematical structure, from a set of axiomatic rules and definitions, and then formulates a logical system, i.e., a system of syntactic inferential entailments. Such a procedure is called formalization. A number of mathematical systems have been successfully formalized in the 19th century. One of mathematician David Hilbert's ambitious dreams was to formalize each and every mathematical system, in which internal consistency could be assured. Hilbert believed that allowing semantic truths into mathematics gave rise to trouble in mathematics. He and his formalistic school argued that a semantic truth could always be effectively replaced by more syntactic rules. It was tantamount to the prevalent tenet in engineering that the art of creating something can be converted into a craft of fabricating something. This craft of fabrication is achieved by knowing more and more about the control of the underlying physico-chemical process, e.g., what clean room technology brought to integrated circuit fabrication. This dream of the formalist program was shattered by the advent of the Incompleteness Theorem of Godel.243'483'161 As Rosen put it, Godel's theorem essentially stated that no matter how one tries to formalize a particular part of mathematics such as Number Theory, the totality of syntactic truths in the formalization does not coincide with the entire set of truths about numbers (pp. 7-8 of Ref. 569). The set of all syntactic truths is only part of the entire set of truths; there is always a semantic residue that cannot be accommodated by that syntactical scheme. Thus, formalizations are part of mathematics but not of all mathematics: one cannot forget that Number Theory is about numbers rather than simply a bunch of meaningless self-consistent rules and inferents that happen to bear the nomenclature and terminology of numbers. The semantic parts are external referents that give Number Theory its real-world meaning. If mathematics fared so badly in the attempt of formalization, we can only expect natural science to fare even worse, not to mention social science. In Western science, the universe is partitioned into self and ambience (everything else; the external world of objective reality) — a partition that Rosen referred to as the first basic dualism. Science requires both elements: an external, objective world of phenomena, and the internal, subjective world of the self, which perceives, organizes, acts, and understands [not
454
F. T. Hong
just knows and then memorizes, as some modern biomedical students are apt to do]. He pointed out that the fact that inner, subjective models of objective phenomena exist connotes the most profound things about the self, about its ambience, and above all, the relations between them. What Rosen then called the second basic dualism pertains to the traditional partition of the ambience into systems and their environments. Unlike the first dualism, this second partition rests on a consensus imputed to the ambience, rather than on some objective and directly perceptible property of the ambience. Rosen thought that this was a fateful and decisive step taken by science. Henceforth, systems get described by states, which are determined by observations; environments are then characterized by their effects on the system in question. Fundamental trouble begins to creep in: the difficulty brought about by reductionism (see later; see also Ref. 484). Let us take a close look at what Rosen envisioned as science, and, in particular, natural science. Here is Rosen's conception of Natural Law, which makes two separate assertions about the self and its ambience (pp. 58-59 of Ref. 569): (1) The succession of events or phenomena that we perceive in the ambience is not entirely arbitrary or whimsical; there are relations (e.g., causal relations) manifest in the world of phenomena. (2) The relations between phenomena that we have just posited are, at least in part, capable of being perceived and grasped by the human mind, i.e., by the cognitive self. As Rosen pointed out, the first part of Natural Law is what permits science to exist in the abstract. The second part of Natural Law is what allows scientists to exist. Clearly, concrete science requires both. Let us rephrase Rosen's assertions in the language used in this survey. First, patterns exist in Nature that can be recognized by human minds. Second, it is possible to explain and predict at least some of these patterns. Next, we must refer to Fig. 10 to see how science is implemented by means of modeling relations. Here, N represents a natural system, which refers to the outside world (ambience), whereas F represents a formal system, which exists in the cognitive self, i.e., a mental construct of humans. The arrows within each system represent the representative entailment structures; causality in TV (path 1) and inference in F (path 3). The natural system N contains a collection of causal entailments about the natural phenomena, whereas the formal system F contains a collection of inferential entailments about the propositions.
Bicomputing Survey II
455
In order to model the natural system N with the formalism F, a dictionary is required to encode the phenomena in N into the propositions in F and another for decoding from propositions in F back to phenomena in N (represented by paths 2 and 4, respectively). The encoding arrow (path 2) is naturally associated with the notion of measurement. The decoding arrow (path 4) can be regarded as an "inverse" measurement or, rather, the prediction of a subsequent measurement. Codings and decodings are acts of associating the phenomena with a limited set of abstract numbers. This finite set of numbers corresponds to what Thomas Kuhn referred to as "points of contact" (p. 30 of Ref. 399). As explained in Sec. 4, the validation of modeling relation between F and N depends on an overall judgment of goodness of match of pattern recognition. It is a judgment on how well the outcomes of paths 2,3, and 4 match the outcome of the path 1, based on a finite number of contact points between the template (F) and the pattern (N).
Fig. 10. Modeling relation between a natural system and a formal system. See text for explanation. (Reproduced from Ref. 569 with permission; Copyright by Columbia University Press)
The modeling relation between a formal system and a natural system is homomorphism but not isomorphism. In other words, more than one natural system can be modeled by the same formal system — a situation called analogy. Thus, the same "exponential-growth" formalism can describe the population explosion or the growth of bank account interest in a compounding interest scheme. Analogy permits the use of the same equation to describe diffusion of molecules and conduction of heat. Furthermore, more than one formal system may have a modeling relation with the same natural system N, with varied degrees of goodness of match (cf. Sec. 6.1). A
456
F. T. Hong
natural system may contain more causal entailments than the corresponding inferential entailments in a matching formal system. The attempts to discover bigger and bigger formal models that contain more and more syntactic inferential entailments to match the causal entailments in the natural system constitute the major scientific endeavors of theoreticians. Can one, in principle, find the biggest formal model that allows all causal entailments to be formalized? Rosen argued not. The difficulty arises from Western science's insistence on objectivity. That is, the inferential entailments in the formal system must be context-independent, and, therefore, must be syntactic in nature and devoid of any external referents. Thus, causation must come from within the formal system. The upshot is: among four causal categories posited by Aristotle, only the material, formal and efficient causes are allowed (e.g., see Ref. 195 and p. 301 of Ref. 145). In biology, there is often an additional cause: the end or goal, or as Aristotle put it, "that for the sake of which" something is done. In biology, every structural part serves a function intended for the bodily need. Thus, in response to the question "why does the heart pump blood?", one of the acceptable answers is: ... because it circulates blood around the body so as to supply tissues with oxygen and nutrients while removing carbon dioxide and wastes. This is the final cause or functional cause that Aristotle also emphasized. In complexity research regarding emergence phenomena (emergentism) in a multi-level hierarchical system (such as life), two processes are considered: a semiotic process, which is deterministic, rule-based, and localized at lower levels, and a dynamic process, which is statistical, and distributed at upper levels.13'505 The semiotic process, which conforms to microscopic reversibility, exerts an upward causal effect on higher levels or from parts to wholes, thus encompassing the material, formal and efficient causes. In contrast, the dynamic process exerts a downward causal effect on lower levels or from wholes to parts, thus encompassing the final cause. Finnemann204 argued that the interactions between upper and lower levels are neither completely deterministic nor completely random, but somewhere in between, i.e., relative determinism. The philosophical foundation of downward causation was reviewed by Bickhard66 and Kim.370 In our present formulation, the semiotic process is no longer absolutely deterministic. How this modification will affect the formulation of the concept of downward causation remains to be seen. It is necessary to invoke external referents in order to include the final cause. However, in Western science, final causes are dismissed as ideological
Bicomputing Survey II
457
and therefore excluded for the sake of maintaining objectivity. Any such claim carries the stigma of vitalism. Thus, causal chains must always flow from parts to wholes, never from wholes to parts. The concept of intelligent materials, discussed in Sec. 7.3 of Chapter 1, implies a final cause: the purpose of an intelligent material dictates its design. However, such a final cause for naturally occurring intelligent materials can always be deferred to evolution and the inherent intelligence is thus more virtual than real. Nevertheless, the concept of intelligent materials motivates bioscientists to adopt a new style of inquiry. One is no longer content with finding out just the function and mechanism of biological structures. One must also do some hard thinking about why Nature designed these structures the way they are. This point is best illustrated by the writing of Seneca the Elder (Lucius Annaeus Seneca) about two centuries ago.v In his treatise entitled "On the Shortness of Life (De Brevitate Vitae)" Seneca the Elder pointed out the difference between the Romans and the Etruscans in their style of thinking when they responded to the sight of aflockof geese flying by. The Romans wondered how the birds managed to fly, whereas the Etruscans wondered for what reasons the birds were flying. It is apparent that, in pursuing the research of intelligent materials, the investigators have inadvertently transformed their inquiry from the Roman style to the Etruscan style; the notion of final causes no longer appears so offensive. Rosen claimed that the insistence of objectivity thus makes understanding of life impossible because final causes are disallowed. Emmeche et al.195 analyzed three forms of downward causation. They concluded that the "strong" version of downward causation is in conflict with contemporary science, whereas the "medium" and the "weak" versions can co-exist with contemporary science. For example, the medium version may describe thoughts — i.e., free will — constraining neurophysiological states. The medium version is incompatible with absolute determinism but is compatible with relative determinism. The medium version of downward causation is thus intimately related to the issue of origination of free will (Sec. 5.15). By far the strongest evidence in favor of downward causation was furnished by Schwartz and Begley.601 The weak version is compatible with absolute determinism but may not, in practice, be a feasible description of the mindbrain relation. v
The author is indebted to Arnost Kotyk who mentioned this writing in his closing lecture in the 11th School on Biophysics of Membrane Transport (May 1992, KoscieliskoZakopane, Poland).
458
F. T. Hong
Rosen argued that, just like mathematics, a natural (physical) system, in principle, may contain semantic entailments that cannot be formalized. Of course, one can always have a model with semantic entailments but the model is no longer a formal system. For example, Simonton's chanceconfiguration theory, as well as our present rendition in terms of pattern recognition, is a semantic model (Sec. 4). The triad of search, match and verification phases constitute a pseudo-algorithm rather than a bonafidealgorithm. There is no way to faithfully implement the parallel processes embedded in the search-and-match phase by a sequential algorithm although it can be approximated or mimicked by an algorithm of rapidly alternating sequential searches that constitute pseudo-parallel processing, as explained in Sec. 4.11 (simulation). The coding and decoding processes that link a formal system to a natural system are not part of the entailment structures in either system. Although codings and decodings can sometimes be expressed as syntactic statements, the judgment of goodness of match almost always contains a semantic residue (subjective judgment). Codings and decodings are therefore outside of the realm of the natural and the formal systems, and are part of the creative act of human minds: pattern recognition that links a formal system to a natural system. In hard science, such as physics, the formal systems that constitute theories usually contain only syntactic inferential entailments. Thus, it is possible to deploy a sequential process in the verification phase of the discovery of a scientific theory (Sec. 4). In soft science, it is much harder, though not impossible, to parametrize the natural systems, i.e., encoding the natural systems in numerical attributes (cf. Ref. 429). Sometimes, the inferential entailments in a model system are often stated in natural languages by necessity. Of course, the model is no longer a formal system. Natural languages contain a syntactic and a semantic part. Some investigators believed that semantic entailments can be replaced with more elaborate syntactic entailments. As- Rosen pointed out, these efforts all failed. Furthermore, he claimed that the failure was not caused by technical difficulties but rather by conceptual difficulties (p. 37 of Ref. 569). In retrospect, this is understandable. Whereas syntactic entailments can be manipulated by rule-based sequential processes, semantic entailments must be evaluated as a whole by parallel picture-based processes. Although languages are primarily sequential in nature, a new language cannot be learned effectively with a coding and decoding instruction manual alone — a dictionary and a grammar book. As discussed in Sec. 4.15, a new language must be learned
Bicomputing Survey II
459
with the entire context as a whole. In addition, the situation of the ambience is part of the semantics that must be learned as part of the new language. Biology is partly hard science and partly soft science. Traditional research in biology consists of breaking down the biological structure into component parts and then analyzing their structure and function in isolation (reductionist approach). In a sufficiently "reduced" system, such as model bacteriorhodopsin membranes, it is possible to create a formal (mathematical and physical) model with syntactic entailments that describe the natural system with physics-like precision.325'328 This could be done because the bacteriorhodopsin (purple) membrane is machine-like (essentially a proton pump). Most of its function is preserved and remains operational while in isolation; its separation from ATP synthase does not render its proton pumping activity totally irrelevant and meaningless (Sec. 5.4 of Chapter !)•
In contrast, life and consciousness cannot be studied without the intact whole. The brain in isolation is very different than the same brain in situ and in vivo; essentially it loses its mind. The situation is analogous to the operational difference of a resistor (a simple component) and a microprocessor (a complex component). Although a resistor can be studied in relative isolation by simply hooking it up to a voltage source and monitoring it with a current meter (ammeter), the microprocessor can only be studied by means of a logic analyzer provided that the microprocessor is also connected to the bus of a motherboard so that it is properly linked to several essential components: the power supply, the random access memory (RAM), and at least a keyboard and a display monitor or printer. In other words, in order to study a microprocessor, a number of critical entailment loops must be closed. Even though its function can be algorithmized, a microprocessor in isolation (without closed entailment loops) loses its intelligence though it has no mind to lose. This analogy illustrates the limitation of reductionism in biology, which Rosen emphasized in his two books. However, the microprocessor is a bad analogy for the brain with regard to consciousness because consciousness cannot be studied by hooking up an isolated brain to a life supporting system. Consciousness is more than just intelligence. Owing to complex organization in biology, some isolated biological components operate very differently as compared to what the same components do in situ and in vivo. Specifically, the component that used to interact with many other now-purged components must then interact, in isolation, with the environment instead. It is like lifting a critical sentence
460
F. T. Hong
out of context. In doing so, a big portion of the semantic meaning is lost. Thus, a complex biological component is, in general, very different from a machine part. A conventional machine part does not significantly interact with the environment (they are inert though they sometimes rust), but only with a limited number of other matching parts. In the case of a microprocessor, the interactions are extensive but still limited and can be studied in a diminutive system with only a reduced set of critical functional loops closed. In the case of the brain, it is virtually impossible to close, in isolation, the functional loops, at least not yet with a sequential assembly process that can be used by humans. This is due to the subtle and transient properties of some of the functional loops. Presumably, these functional loops are not just connecting lumped parts but also distributed parts; the functional loops are not itemizable. In fact, these loops do not form a conventional discrete network but rather a kind of dynamic network exemplified by the myriad of biochemical reactions in the cytoplasm. It is guaranteed to have some semantic residues eliminated if one attempts to model a distributed system with a lumped system — a well-known problem in equivalent circuit analysis of a distributed electric system (see, for example, Sec. 20 of Ref. 325). In the case of life, there are too many functional loops to close. The structural organization of an organism that dies recently is approximately the same as that of a living one. However, a recently dead organism is not approximately alive. Presumably, the event of death subtly and irrevocably altered the massively and densely distributed functional entailment loops between various components and between the organism and its environment. In order to appreciate the subtlety involved in maintaining consistency, coherence and rationality of intertwining entailment loops, let us consider an example: the gradient strategy of molecular recognition (Fig. 10 of Chapter 1). It is a syntactic model showing the discrete interactions of two docking macromolecules which explore each other for an alignment with the maximum number of matching non-covalent bonds. The diagram depicts a fitness landscape, in which the mutual search attempt to reach the highest peak. Owing to technical difficulty experienced by the author, the shape factor is ignored in the model and only a single kind of non-covalent bond interaction is considered. Furthermore, the search is limited to a single dimension. The author relied on sheer luck to arrive at such a relatively simple diagram. Even so, this man-made fitness landscape of a top-down design is, however, deeply flawed. Various peaks depicted in the diagram are separated by increasingly deep valleys as the highest peak is being approached. The search for an optimal match can end up being stranded at various lo-
Bicomputing Survey II
461
cal peaks. Thus, some entailment loops are left open with gaping "holes." Attempts to close these entailment "loopholes" seemed to always end up disrupting the intended gradient design. For the sole purpose of explaining the gradient strategy, we certainly can assume that, given sufficient thermal agitation, the search can move in steps of two or three so that the "cracks" or "crevices" (valleys) appear too small to strand the search in a local peak. The diagram is flawed in yet another sense because the search steps and the matching sites are assumed to be discrete. The diagram depicts a discrete and formal model that does not capture all salient features of a real-life functioning gradient strategy. Inclusion of the shape factor (in three dimensions) introduces a gray-scale nature to the interactions since the force of a non-covalent bond interaction fades away gradually with increasing distances rather than vanishes abruptly. Inclusion of other types of non-covalent bond interactions further helps convert a discrete model into a virtual-continuum model, i.e., a semantic model. These additional elaborations will reduce some semantic residues, which perhaps cannot be completely eliminated. Evolutionarily derived fitness landscapes would not be plagued with the kinds of flaws mentioned above. With regard to the protein folding problem, Rosen suggested that the real potential function [in the free-energy landscape] must have a single deep minimum, a single well corresponding to the active conformation [of a folded protein] (p. 270 of Ref. 569). Modern nucleation theory of protein folding affirms this prophetic view (Sec. 7.6 of Chapter 1). The investigation of consciousness encounters an unusual obstacle. Again, Rosen's scheme of analysis provides an illuminating explanation. The study of consciousness requires subjective evaluation. This means that the cognitive self must be pulled into the ambience and further into the formal system, thus becoming entangled with the inferential entailments. The situation is akin to letting an umpire participate in the same game which he or she is supposed to referee. The necessary processes of self-reference thus result in a situation of impredicativity (what Bertrand Russell referred to as "vicious circle"; pp. 59-102 of Ref. 440 and pp. 3-31 of Ref. 638), as highlighted by the celebrated Incompleteness Theorem of Godel (see also Hofstadter's entertaining discussion in Chapter 1 of Ref. 304). Thus, it is, in principle, impossible to construct a consistent formal system to model consciousness. This is why the study of consciousness is so refractory to the attack of contemporary physics. Does this mean that it is hopeless to study consciousness and self? Apparently, not. But why?
462
F. T. Hong
Consider the tools needed to investigate consciousness and self. The most unique tool is unquestionably the human brains. In order to study consciousness and self, we must use our own brains to study our brain and its functional states. This is the predicament pointed out by the new mysterians: our brains may be powerful but we humans are unable to understand how our brain works; metaphorically, Hercules might be powerful but he could not lift himself up. This would be true for Hercules only if he used his bare hands as the only tool. However, Hercules certainly could do it with the aid of a pole and a pivoting point. Where do we find the equivalents of a pole and a pivoting point? Perhaps introspection is a way for the self to examine self, much like looking into a mirror to see the reflection of oneself's image (pardon the pun). However, all images are subject to distortion. So is introspection. The most serious kind of distortion is self-deception, to which few people are immune, the author included. Can introspection be rendered more objective (assuming a gray scale of objectivity)? Apparently, yes — by our brains we do not have to mean my own brain alone; instead, we mean our collective brains. That is, we can let other selves also examine the content of a particular self's introspection, such as reported by Henri Poincare (Sec. 4.8). This is indeed a very powerful approach. Hadamard examined Poincare's introspection and found himself to have a similar experience (see JohnsonLaird's remark in p. ix of Ref. 272). So, he zoomed in and locked onto Poincare's analysis. We readers subsequently joined the effort, and many, if not all, of us also had a similar experience. It is the evaluation performed by the collective selves that imparts an element of objectivity to an otherwise highly subjective and private observation. (In this regard, objections raised by some investigators who had never had a similar experience do not effectively falsify the claim that the experience did occur to some, though not all, individuals.) In this way, while a self is being pulled into the ambience, the collective selves can still observe it and its "reflection," so as to be able to detect any inconsistencies. That this approach is possible, after all, is due to two important factors. First, we assume the equivalence between this self and other selves, at least with the help of ensemble-averages (integrating diverse opinions) and time-averages (standing the test of time). Although this assumption is difficult, if not impossible, to prove, only a diehard egocentric would dispute its validity. The second factor is what Popper referred to as the third world, i.e., written or recorded information that transcends the barrier of space and time and becomes detached from the original self that was
Bicomputing Survey II
463
responsible for its generation/creation (Sec. 6.11). However, all recorded images or information are subject to alterations. Elevation to the status of the third world does not guarantee validity. For reasons mentioned in Sec. 6.1, periodic re-evaluation of the conclusions of investigations about self is necessary, especially when new evidence arises or a new theory is proposed (cf. Griffin's notion of informed inferences). In view of the limitations of the contemporary approach in biological research that can be regarded as the extension of Newtonian mechanics, Rosen proposed the alternative approach of what he called relational biology, which he modestly credited to Nicolas Rashevsky. A traditional reductionist approach advocates disruption of the organization of biological structures in favor of component analysis and the subsequent recapture of organization only after the analysis. In contrast, the approach of relational biology preserves the organization of biological structures at the expense of the components, and aspires to realize the relation model by additional material components at a later stage. Rosen presented a compelling argument but the task had barely been initiated; much remains to be done. Rosen dismissed Descartes' machine metaphor as irrelevant for understanding biology. He also emphasized the distinction between simulation (mimesis) and realization (actualization) (p. 116 of Ref. 570): As I shall employ the term in what follows, mimesis involves classification of things on the basis of a perceived sharing properties, or behaviors, or attributes, independent of any shared causal basis for these attributes. Science, on the other hand, is concerned with the causal underpinnings of those attributes on which mimesis depends; it classifies systems according to these underpinnings and cannot rest content with the seductive superficialities of mimesis. A machine is usually assembled by means of a sequential process, whereas biological components are assembled by means of the embryonic development, which is a partly sequential and partly parallel process. Some readers may raise an objection here: the fabrication of a microprocessor requires parallel processes in addition to sequential processes; the growth of silicon crystals is a parallel process. However, there is a major difference. It is possible to segregate parts of a manufacturing process into modules of locally parallel processes that can then be linked together to form a primarily sequential assembly process. Here, the particular parallel processes can be regarded as "piece-meal" parallel or piecewise parallel, for lack of a better term, rather than massively and densely parallel. Similarly, the par-
464
F. T. Hong
allel processing involved in almost all digital computer systems (including parallel computers) are piecewise parallel or pseudo-parallel. It would be extremely difficult, or impossible as Rosen claimed, to do so in a massively parallel distributed system such as a living organism. It appears that Rosen succeeded in ruining the dream of strong AI aspirants without offering a new hope. But is simulation really a worthless approach in machine intelligence? I think not — at least, not completely worthless (cf. Ref. 47). I believe that the reductionist approach and the machine metaphor are still of limited use, because some biological components remain very much functional in relative isolation. Thus, the machine metaphor remains valid in certain restricted aspects of life (the heart as a pump and bacteriorhodopsin as a proton pump, for example), so as to shed light on our understanding of biology. The AI concept of sequential/parallel processing turns out to be enlightening in our quest for understanding human creativity (Sec. 4). Owing to efforts made in simulations, those problems that were once considered intractable, such as protein folding471 and ab initio protein structure prediction,79 have begun to yield to theoretical attacks (Sec. 7.6 of Chapter 1). Perhaps the surprising prospect begs an explanation. Mirny and Shakhnovich pointed out the following (p. 362 of Ref. 471): Although these [simplified] models do not match the full complexity of real protein architecture, they capture a core aspect of the physical protein folding problem: Both real proteins and simplified lattice and off-lattice proteins find a conformation of the lowest energy out of an astronomically large number of possible conformations without prohibitively long exhaustive search. In other words, both these simplified approaches and real-life protein folding attempt to perform heuristic searching towards a solution that fulfils the requirement of the respective task. Moreover, the decision made by theoreticians to tackle simplified lattice and off-lattice models first was by itself an act of heuristic searching! Nevertheless, Rosen's messages serve as an important reminder, lest we get carried away: simulation (mimesis) should not be construed as duplication (realization) and analogy should not be equated with identity (cf. Ref. 604). Note that analogies are seldom perfect. A perfect analogy is identity, but identity is a useless analogy. Likewise, simulation in engineering is not intended to be duplication. A genuine duplication of human beings is either trivially natural, such as normal childbirths, or dangerously unac-
Bicomputing Survey II
465
ceptable socially, such as attempted cloning of human beings. Thus, from the point of view of technological applications, the situation is not as bleak as Rosen suggested. Simulation remains a fruitful approach in constructing man-made intelligent machines. Technological applications concern more with utility and less with causality. Imitation is not the ultimate goal of technological applications, but rather a means to an end — a starting point to gather "inspiration" from Nature. In other words, human's effort in reverse-engineering Nature is not just limited to a copycat's act alone but, with the aid of insight stemming from a deeper understanding of Nature, can also score a major innovation. Rosen cited the difference between the design of an airplane's wing and that of a bird's wing to highlight the difference between simulation and duplication. This example also serves to illustrate the difference between a pure copycat's act and an innovation through insight. It might be true that if humans were to design a flapping wing for the airplane, we might never have got off the ground. However, the machine metaphor was not totally lost. In fact, the airplane's wing resembles a bird's wing in the sense that both parts rely on Bernoulli's principle for floatation. The difference lies in propulsion: a bird uses its wings for the purpose of both floatation and propulsion, whereas an airplane uses propellers or jet engines, instead, for propulsion. This modification was probably necessitated by the limitation of material strengths, and a thorough understanding made it possible to do so. The machine metaphor can been exploited, for technological applications, at the level of functional principles rather than at the level of morphological superficialities.320 In addition, technological applications seldom ask for theoretical perfection. Rosen's theory stipulates that it is theoretically impossible to formalize semantic entailments. However, an approximation may be possible. This point was vividly demonstrated by Arif Selc.uk 6grenci.w Ogrenci proposed to solve the problem of reaching an apple by taking the following prescribed steps, which essentially emulated one of Zeno's paradoxes: Achilles and the Tortoise.585 In this scheme, the apple was located at a fixed distance apart. Ogrenci must first take a step and proceed to the midpoint between the current position and the destination, and then repeat the same procedure again and again, in discrete steps. In other words, if the distance between the starting point and the destination was taken to be 1, he was proceeding in steps that form a sequence of geometric progression: | , j , | , j ^ , "This demonstration was performed during the presentation of a paper at an international school sponsored by NATO Scientific Affairs Division.498
466
F. T. Hong
..., etc. Mathematically speaking, it would take an infinite number of steps to reach the destination. Theoretically, it is thus impossible to reach the apple. Practically, it is an entirely different matter. After making a small number of steps in acting out the scheme, Ogrenci proceeded to the point where the apple was within his reach. So he stopped and simply picked up the apple and, while eating the apple, made a remark: "Close enough is good enough." Ogrenci's message can be generalized as follows. Although it is theoretically impossible to fully capture the essence of a parallel process by means of sequential processing, it is possible to simulate its essence by means of pseudo-parallel processing (Sec. 4.11). Otherwise, there would be no such things called literary works. The writer of a literary work only supplies the syntactic part explicitly. The semantic part that the writer intends to convey has to befilledin by the readers. In other words, reader participation is required to make a literary work meaningful. In essence, the readers must fire up their imagination and conjure up the scenarios loosely sketched in words by the writer. A good writer knows how to achieve the intended "pictorialization" — or, rather, a serial-to-parallel conversion — by means of artistically crafted strategic sentences, whereas a bad writer often generates misunderstanding. That reader participation is a requirement can be appreciated by examining the following introspective report: speed reading of a poem renders it tasteless (J. K. H. Horber, personal communication). The simple explanation is: speeding reading of a poem leaves no time for a serial-to-parallel conversion to take place, thus making it impossible to reconstruct the semantic part intended by the poet. It is interesting to note that German composer Felix Mendelssohn once made an attempt to describe a visual scene in terms of a composition known as Fingal's Cave Overture (Die Hebriden, Op. 26). Richard Wagner subsequently praised it as a vivid and accurate "portrait," upon his visit of the Hebrides. Again, listener participation is essential in the reconstruction of the visual scene, intended by the composer. Wagner confirmed that Mendelssohn had succeeded in his undertaking. Likewise, Rosen's analysis poses only a theoretical limit but not a practical one on technological applications of computer simulations. Theoretically, it is impossible to duplicate the act of protein folding by a sequential process. Practically, by simplifying the models, investigators arrived at a crude but reasonable simulation that captures sufficient essence of protein folding to be able make useful predictions. This was also the essence of creativity simulations, mentioned in Sec. 4.26. Although it is arguable whether
Bicomputing Survey II
467
those simulations exhibit true creativity, "close enough" might just be good enough. Sometimes, partial fulfillment of the totality of a causal entailment of a natural system is all it takes to make a significant technological advance. However, there is always the other side of the coin. The success of digital computers helped exacerbate information explosion, thus creating a need of using digital computers to manage and utilize information. This is especially acute in the area of molecular biology. The success of reductionism has generated a glut of information so vast that integrationists have no choice but to resort to computer simulations in order to make sense of it. As pointed by Gibbs,234 the simplest living cell is so complex that supercomputer models may never simulate its behavior perfectly, but even imperfect models could shake the foundations of biology. However, the limitations of these supercomputer models also reaffirm Rosen's misgiving. Effective management of knowledge depends on two key technologies: knowledge representation and agent technology (Sec. 7.4). The success of knowledge representation lends credentials to the usefulness of rule-based (logic-based) reasoning. However, the warning given in Sec. 4 regarding the pitfalls of rule-based reasoning remains appropriate, especially in education and health care (see below). The success of agent technology illustrates the usefulness of formalizing such introspective attributes as beliefs, desires, and intentions. Even the once intractable problem of capturing semantic residues with a sequential algorithm seems to find a possible alleviation because of the progress made in knowledge representation and agent technology. It is quite possible that the future application of agent technology in the so-called Semantic Web will revolutionize the use of the World-Wide Web (WWW).62 Rosen's misgiving is particularly relevant in two areas: education and health care. It is quite obvious that the practice of exclusively rule-based reasoning, which is prevalent among biomedical students, is tantamount to discarding semantic entailments in reasoning and learning. Similarly, private U.S. health-care insurance companies have been tacitly enforcing rule-based reasoning in the delivery of health care (managed care system). 179 In their book Dangerous Diagnostics, Nelkin and Tancredi revealed the following practices by many health maintenance organizations (HMOs) (pp. 60-61 of Ref. 486): The process of diagnosis, once dependent on insight, observation, and personal judgment, increasingly relies on diagnostic tests that
468
F. T. Hong
minimize individual decision making. Technology is considered a more efficient way to decipher patient symptoms, and health care providers are turning to standardized testing sequences that minimize individual discretion. The use of algorithms and decision trees illustrates the tendency to seek more scientific grounds for diagnostic decisions through the systematic use of tests. Algorithms help to categorize and channel patients according to statistical probabilities related to their complaints. Those complaining of certain pains, for example, are given predetermined tests in a sequence that is statistically most likely to reveal relevant information. The sequence of tests is based on the information already known about a condition, the importance of its detection, the penalty for delay, and the least time, risk, and inconvenience for the patient. Clinicians also use decision trees to help them in patient prognosis. Decision trees derive from operations research and game theory. The tree is used to represent the strategies available to the physician and to calculate the likelihood of specific outcomes if a particular strategy is employed. The relative value of each outcome is described numerically, and a statistical conception of normality helps define proper patient care, (emphasis added) I do not deny the values of the operational approach as a supplementary and complementary aid to diagnosis and treatments. On the one hand, I applaud the accomplishment of AI professionals in enriching the resources of health care. On the other hand, I deplore the mindless and heartless decision made by HMO executives to exclude a physician's holistic judgment in the diagnosis and treatments of illnesses. Essentially, the decision trees, also known as Treatment Pathways, are sequential algorithms constructed by means of primarily rule-based reasoning. The statistical conception of normality, mentioned in the above quotation, was established by metaanalyses of data published in the clinical literature. Glossing over by means of the statistical treatment conveniently provided the superficial objectivity and uniformity of methodology in patient managements, which might help withstand the test in court in case of malpractice lawsuits. However, the subjectivity of lumping together data from diverse sources in the clinical literature was cleverly hidden from the full view of casual observers that are not familiar with the methodology or are familiar with it but do not exercise picture-based reasoning in formulating judgment (see Sec. 4.21 for Lepper and coworkers' reservation about meta-analyses). Human lives are
Bicomputing Survey II
469
reduced to mere gambling chips in the game of cost containment. The practice is tantamount to replacing partially subjective semantic processes in decision making with purely objective syntactic rote processes of algorithms (algorithmic medicine). The above discussion leaves an impression that insurance company executives conspired to undermine the traditional medical practice, and physicians and medical educators were merely reluctant accomplices or innocent bystanders. In reality, there was a co-conspirator from within the medical community: a group of clinical epidemiologists at McMaster University who aspired to the practice of evidence-based medicine (EBM).197'583 Boasting to have achieved a "paradigm shift," EBM proponents proclaimed (p. 2420 of Ref. 197): A NEW paradigm for medical practice is emerging. Evidence-based medicine de-emphasizes intuition, unsystematic clinical experience, and pathophysiological rationale as sufficient grounds for clinical decision making and stresses the examination of evidence from clinical research. Evidence-based medicine requires new skills of the physician, including efficient literature searching and the application of formal rules of evidence evaluating the clinical literature. In Sec. 4.10, we have associated intuition with picture-based reasoning, which is characterized by heuristic searching, parallel processing and random access ("unsystematic" retrieval) of information. In contrast, rulebased reasoning is characterized by systematic, sequential searching for information as well as applications of formal rules in reasoning. What EBM attempts to de-emphasize is essentially picture-based reasoning and genuine understanding based on pathophysiology. A physician is now trained as an agent that goes in between the patients (who furnish their complaints, symptoms and signs) and a rich depository of information and formal rules in the computer database. This is exactly the role played by the agents in John Searle's Chinese Room argument, mentioned in Sec. 4.22. Essentially, it is formalization of human biology (and medicine) which Rosen once denounced vigorously. However, EBM advocates were quick to issue a disclaimer: they were not opposed to the use of clinical experience and intuition, to understanding of basic investigations and pathophysiology, and to standard aspects of clinical training; they just wanted to add the EBM approach on top. They correctly dismissed the traditional "cookbook" medicine and heavy reliance on the teaching of authority. They even had a sense of gray scale regarding
470
F. T. Hong
the reliability of clinical studies, and expected physician training to build the ability to question the authority and make critical assessments — critical appraisal — on the validity of clinical studies on their own. At the foundation of EBM is the literature of clinical research based on systematic and rigorous methodology acquired over the past 30 years. The tools to access this literature include meta-analysis and a profusion of articles on how to access, evaluate, and interpret the medical literature. Regrettably, EBM advocates cannot have it both ways. As explained in previous sections, medical students have already been overloaded with domain-specific knowledge. It is extremely unlikely that they will be able to regain their intuition when they are lured with an additional temptation to relinquish their judgment and delegate it to digital computers. The Evidence-Based Medicine Working Group197 pointed out that physicians often read just the introduction and the discussion sections, while skipping the method section, when they consulted the clinical literature. They thought that the practice deprives physicians of their ability to judge the validity of what they had read in the clinical literature. As a comparison, let us consider the teaching of physics. Without consulting the original physics literature, motivated students who study physics are usually able to make valid judgment — to a reasonable extent — of the content of a physics textbook. A good textbook reduces the readers' reliance on the authority, although it does not completely eliminate the reliance; that goes a long way. What EBM advocates should have done is write better textbooks to make cryptic clinical methodology intuitively understandable to medical students, as was done in the past by authors of physics textbooks on behalf of physics students. Instead, EBM advocates came up with a bunch of rule-based algorithms to help physicians make judgment, thus evading their responsibility as medical educators. For reasons already presented in previous sections, the EBM approach is inherently flawed because of its reliance on computer-based literature searching and a set of formal rules to make critical appraisals. The EBM approach will not steer physicians away from "cookbook" medicine but instead towards it; the computer database merely replaces their traditional "cookbook." It will not diminish the reliance of practicing physicians on authority but actually subject them to the dictatorship of digital computers and invisible programmers and clinical experts whom they cannot readily question. The EBM advocates thus fell into the same pitfalls which they had pledged to avoid. Eventually, practicing physicians must relinquish their independent
Bicomputing Survey II
471
judgment, and rely upon the terse summaries or formal rules and simplified criteria provided by the authority. Ultimately, they will have no choice but to stop thinking. Residency training of physicians thus dwindles to the production of a new breed of "robotized" physicians — robodocs — that practice exclusively rule-based reasoning with the help of a digital computer and invisible experts. In other words, the EBM approach provides physicians with a legitimate escape from their responsibility of judgment, and that suits the HMO executives well. EBM advocates probably would not admit it, in view of their disclaimer. Let us see how writer Ayn Rand, through the words of her alter ego John Gait, criticized their evasion (p. 935, Part Three Chapter VII of:550) 'Thinking is man's only basic virtue, from which all the others proceed. And his basic vice, the source of all his evils, is that nameless act which all of you practice, but struggle never to admit: the act of blanking out, the willful suspension of one's consciousness, the refusal to think — not blindness, but the refusal to see; not ignorance, but the refusal to know. It is the act of unfocusing your mind and inducing an inner fog to escape the responsibility of judgment It is well known that handling of digital information that does not make intuitive sense is prone to errors. It is probably the reason why there was an increase of the incidence of "friendly fire" accidents on battlefields, as a consequence of the advent of high-tech weaponry: it is difficult to recognize that one is actually aiming at friends instead of foes, or even at oneself, by just looking at the numerical coordinates that control the launching of a cruise missile. Besides, a cruise missile only takes orders from its controller. It harbors neither affection for friends nor hatred towards foes, and is essentially selfless. It is not clear how EBM advocates manage to minimize errors and make error corrections, especially if physicians staffed at the front line are forced to overwork without adequate rest. The likelihood of blunders committed by an exhausted medical staff is significantly increased by the advent of managed care systems, of which the executives are well known for their so-called bean-counter mentality. I suspect that the incidence of "friendly poisonings or injuries," and even "friendly killings" of patients will rise sharply with the advent of EBM, in spite of all good intentions. Even the proclaimed objectivity of EBM may be called into question. Sussman, in an editor's note of the journal Primary Psychiatry, pointed out a major source of bias on randomized-controlled trials of drug efficacy:
472
F. T. Hong
positive trials are more likely to be published than negative trials.656 This tendency stems in part from a human instinct: people accept more responsibility for successes than for failures, even though the failure may not be due to their own fault ("self-serving bias"; p. 94 of Ref. 481). It also stems in part from the academic culture of glorifying positive findings and belittling negative ones, even though a negative finding may sometimes be more important than a positive one. The refereeing system of academic journals has been designed specifically to prevent "false valid" (false positive) conclusions from sneaking into print rather than to prevent "false invalid" (false negative) conclusions from getting lost in oblivion. The fallibility of human referees thus contributes to yet another systematic bias, as was also hinted at by Sussman. Our position may seem inconsistent. If we value so much the accomplishment achieved by Simon's problem-solving programs, why are we opposed so much to the EBM approach, which merely advocates using problemsolving programs to deliver health care? The key factor is: to err on the safe side. Failure of a problem-solving program in making novel scientific discoveries is no big deal, whereas failure of a problem-solving program in delivering health care means someone may have to die unnecessarily. This is a consideration that health-care professionals must keep in mind. No faulty strategies can come into being and flourish without having some redeeming values. Without any doubt, EBM is an effective weapon against no-holds-barred claims commonly made by "quackery," such as often associated with alternative medicine. EBM also serves as an effective defense of frivolous litigation in malpractice lawsuits. Since the late 20th century, physicians have been forced to conduct extensive routine laboratory tests in order to fend off potential malpractice suits. This costly and time-consuming practice of systematic searching is now replaced with an almost instantaneous and inexpensive practice of virtual systematic searching, e.g., a 30-minute computer search at a cost of $2.68 (Canadian dollars), according to Ref. 197 (Sec. 4.3). Thus, some pundits claimed that EBM is needed to hold down skyrocketing of the health-care cost. EBM prevents fraudulent claims made by some ill-motivated patients and/or unscrupulous physicians but, at the same time, restricts the legitimate freedom of well-meaning physicians in safeguarding individuals' health. The situation is analogous to what has been encountered in environmental and health hazard issues: quantified data and rule-based reasoning are needed to combat Americans' national penchant for litigation, but, at the same time, undermine legitimate precautions against poorly understood environmen-
Bicomputing Survey II
473
tal hazards (Sec. 6.5). In both cases, the solution seems worse than the problem. Perhaps it is a timely wake-up call for all Americans. Their relentless and often groundless pursuits of litigation come with a big price-tag: the prevalent practice of exclusively rule-based reasoning may ultimately cost them their collective and individual health. The trend can, in principle, lead to the untimely demise of democracy and Western civilization, for reasons already elaborated. People who abuse their privilege (of freedom and good health-care services) simply do not deserve the privilege and may eventually lose it. I do not object to the use of EBM as supplementary and complementary aids by physicians who still can call the ultimate shot. However, common sense tells us that the outcome is going to be something similar to a consequence of the prevalent use of hand-held calculators: a high-achiever in our graduate program was unable to calculate, with or without a calculator, the product of 100,000,000,000,000 and 0.00000000000001 simply because the number of digits exceeds the mantissa of her calculator! The irony is: the cognitive profile of an increasing number of medical graduates produced by our failing educational system fits the EBM job description. It looks like that a faulty educational system and a faulty paradigm of medical practice are going to sustain each other by becoming symbiotic. The practice of EBM will breed more incompetent physicians, and the plethora of incompetent physicians will need EBM to cover their collective incompetence. It makes one wonder whether it was the egg first or the chicken first. Rumor had it that New York Times Magazine hailed EBM as one of the most influential ideas of the year 2001. EBM advocates believed that physicians have the moral imperative to practice EBM, and began to act with a religious fervor and forcibly demand a conversion. Based on our understanding of biocomputing principles, EBM advocates threatened to undo what humans have gained in cerebral lateralization through evolution. If EBM ever makes its way into the mainstream medical practice, it will probably earn the reputation of the most devastating calamity that befalls human health in the 21st century. There is, however, a possibility that this dire prediction could be wrong: future generations of computers — biocomputers — may be capable of thinking like humans, without the serious handicaps of digital computers, mentioned in this survey (cf. Sec. 7). In order for EBM to work satisfactorily, these futuristic computers must also be capable of explaining the rationale behind its decision to the physicians in charge and the physicians
474
F. T. Hong
in charge must be able to understand what the computers attempt to explain. Computers that can only process formal rules and medical graduates that can only perform exclusively rule-based reasoning simply will not do. Until then, EBM should be treated as a nascent basic research topic; the technology is not yet ripe for harvest and ready for clinical applications. Presently, neither the most advanced computers nor physicians trained to master exclusively rule-based reasoning are qualified for the task. A similar trend of formalization has emerged in association with the popularity of distance learning and computer-assisted education (euphemism: virtual university and virtual classrooms). I suspect that the impetus to replace teachers with computers was rooted in the intent of cost-saving and/or profit-making rather than in the ideal of making higher education widely available. For similar reasons, technology also opens the door for a new breed of "fast degree" education: a diploma can be obtained without having to go through the drudgery of learning. As Thirunarayanan663 pointed out, these "virtual" degrees, offered through the Internet by both for-profit and non-profit institutions of higher education, "may taste great to their students but are likely to be less filling," just like fast food. Sadly, as a consequence of such attempts to "formalize" education and health care, the meaning and purpose of these activities gradually become marginalized and forgotten. Human costs are no longer part of the deliberation; only financial costs and profits matter. The strategy of simulations, and hence, of formalization, may be acceptable for technological applications and for the purpose of calculating life insurance premiums. However, this strategy is totally unacceptable when it is used in nurturing human minds and in deciding the matter of life and death of other human beings. Contrary to Rosen's advice, the analyses presented in this two-part survey do not depart significantly from the traditional approach which Rosen eloquently denounced. However, there is one line of reasoning that seems to dovetail with Rosen's revolutionary idea nicely: the re-introduction of what Matsuno called "dynamic tolerance" without abandoning the old physics structure. By refusing to accept Laplace's absolute determinism on faith and by giving up the restriction imposed by purely syntactic formalism, the missing semantic components are re-introduced. By admitting a gray scale of determinism, the order so characteristic of biology is not destroyed by the replacement with complete randomness. In exchange for the limited price paid, both the illusion of time reversal and the conflict between free will and determinism evaporate without a trace. I found that the sanity so regained outweighs the loss of giving up some cherished age-old tenets, but,
Bicomputing Survey II
475
as Rosen was apt to say, that is for the readers to assess. 6.14. Unity of knowledge or great divide: the case of Harris versus Edwards In view of the accelerating trend of fragmentation of knowledge, concerned scientists, such as Edward Wilson, have championed a movement to unify knowledge (consilience).719'158 However, the cultural barrier between science and humanities remains high. Art teacher Betty Edwards' encounter with science is a case in point. Edwards had been influenced by Roger Sperry's split-brain research,644'645 and was among the first to make an attempt to apply the result of Sperry's research to art practices. She proposed a novel instruction method to transform non-artists into believable budding artists within a surprisingly short period.190 Yet, her interpretation of artistic creativity in terms of the function of the right brain was "dispelled at once" by experts (p. 204 of Ref. 361). In teaching drawing to novices, Edwards' approach was to minimize the left brain's interference, because she thought that it is this interference that ruins a novice's inborn talent of drawing pictures. Apparently, Springer and Deutsch had not been convinced, but they said that they would not "quarrel with success" (pp. 298-299 of Ref. 646). Springer and Deutsch's writing actually offered a fascinating glimpse into why some experts were reluctant to attribute creativity, at least partly, to the right hemisphere. They cited Carl Sagan, the famed astronomerbiologist and popular science writer, who had accepted the distinction that the left hemisphere is analytic and the right hemisphere is intuitive (pp. 190-191 of Ref. 584): There is no way to tell whether the patterns extracted by the right hemisphere are real or imagined without subjecting them to left-hemisphere scrutiny. On the other hand, mere critical thinking, without creative and intuitive insights, without the search for new patterns, is sterile and doomed. To solve complex problems in changing circumstances requires the activity of both cerebral hemispheres: the path to the future lies through the corpus callosum. Except for the somewhat poetic reference to the corpus callosum, Springer and Deutsch essentially agreed to what Sagan had said above. So do I: Sagan pretty much summed up the main message of Sec. 4 in a single paragraph. Now, let us consider the three major components for problem solving:
476
F. T. Hong
domain-specific knowledge, logical reasoning, and intuition (or creative insight). The first two components are under the control of one's will power: logical reasoning can be made rigorous by training, and domain-specific knowledge can be acquired by hard work. The third component, intuition, is not something that one can summon at will when it is needed (cf. YerkesDodson law), or something that one can acquire consistently through years of schooling and hard work. Quite the contrary, conventional schooling, if coupled with a strong desire to succeed academically, is one of the most potent ways to restrain curiosity and enforce exclusively rule-based reasoning, thus suppressing intuition and diminishing creativity. Therefore, even though all three components are indispensable in problem solving and may even be equally important, the current educational system renders intuition a rare commodity. Intuition, being hard to come by, thus becomes the decisive factor in defining creativity. In other words, it is the proficiency in the search-and-match phase that distinguishes a genius from a non-genius even though the proficiency in the verification phase is just as important. Thus, the right hemisphere does play a critical role in creativity, although creativity is not its exclusive prerogative. Experts often cautioned against oversimplification of the left-brain/right-brain dichotomy and thought that there is little evidence supporting the dichotomy of preferred left or right cognitive style (p. 210 of Ref. 86). However, time has changed. The preference of either cognitive style — picture-based or rule-based — might not be obvious a century ago. Information explosion and fierce competition have accentuated the contrast of these preferences, and have steered the preference of "educated" people towards the wrong side (Sec. 4.23). To regard a one-time valid observation as a universal and eternal truth is dogmatism at its worst. The price to pay would be to miss an important insight towards educational reform. Apparently, some experts' steadfast refusal to admit this intuitively obvious conclusion probably has to do with their mind habit of valuing behavioral experiments over introspection, anecdotes, and common sense. Some of them instinctively defend a behavioral experiment at all costs, no matter how questionable its design may be, and stick to the interpretation based on statistical analysis of the results so obtained even if it defies common sense. Yet, interpretations of behavioral experiments are often theory-dependent, and the peril of arriving at something akin to circular reasoning is often grave. However, some investigators sincerely believe that, notwithstanding anecdotes and common sense, only statistics will tell the ultimate truth. Some investigators' superstition about the mighty power of statistics can
Bicomputing Survey II
477
be astonishing and disturbing. Here, if we do not stick to the "letter" of what Betty Edwards said but to the "spirit" of what she did, her approach of encouraging the practice of picture-based reasoning seemed reasonable and was perfectly in line with our present understanding of creativity. In principle, her approach was not unlike what I routinely did to biomedical students who sought helps. I often ask students to close their eyes and refresh their memory of a certain lecture's content in terms of visual imagery. In one of the most dramatic cases, a positive result began to manifest in less than five minutes. Of course, I was not always successful. Of practitioners of exclusively rule-based reasoning, especially highly educated ones, excessive confidence seems to be one of the most recognizable hallmarks (personal observation). They usually could not quite accept the suggestion that their thinking process could still be improved, much less come to terms with their handicap. Naturally, they resisted my attempt to convert and rehabilitate them, or were even offended. Thus, a high rate of success was not one of the strengths of my approach. In this regard, Edwards was criticized perhaps for a similar reason. Harris281 was particularly critical to Edwards' approach. Harris failed to replicate Edwards' results, and suspected that Edwards might have selected only successful cases to report in her book, while withholding unsuccessful ones from publication. Whether that was the case or not could not be ascertained without a first-hand investigation, since Edwards, a non-scientist, might not have known that the alleged practice is prohibited. However, even if the allegation was true, Harris still missed two important points. The first point is practical, the pundits often forgot that an approach to improve students' artistic or cognitive performance is valuable even if it does not score a high rate of success. After all, education is not supposed to be a "for-profit" business, and, therefore, educators are not supposed to adopt a bean-counter's mentality. In order to appreciate why this is important for educators, just consider a missionary's work — another non-profit enterprise. For an earnest and enthusiastic missionary, souls are supposed to be saved one at a time, rather than all at once or none at all, and a success that happens once in a while is valued no less than a mass conversion. The pundits' mind-set was similar to what detractors of gun-control laws have: if a legislation merely cuts down the rate of gun-related crimes rather than eliminate them completely, then forget it. So goes the same reasoning: if an environmental law merely cuts down the morbidity rate of a certain type of cancer rather than completely eliminate it, then forget it entirely. In order to be consistent, perhaps the pundits should also discourage the practice of
478
F. T. Hong
body-weight reduction as a means to reduce the risk of hypertension and diabetes mellitus simply because the success is not guaranteed. The second point is fundamental, but is often neglected by investigators regarding the burden of proof of a negative observation. When someone discovers a new effect, but the effect could not be replicated by others, we automatically suspect that something was wrong with the original investigation. It is true that there are many ways to go wrong, such as careless mistakes, improper methodology and analysis, or even an outright fraud, etc. However, in behavioral experiments, there is an additional consideration: sample heterogeneity may obscure an otherwise positive effect. But sample heterogeneity is usually an after-the-fact revelation following a pursuit of inconsistent replications. In this regard, the table should be turned around: whoever failed to replicate Edwards' original observation should bear the burden of proof of the negative observation for the following reason. First, Edwards never claimed that her approach worked for every prospective student. Therefore, the presence of negative cases would not refute her claim. Second, a lone negative observation by Harris was insufficient to refute Edwards' claim, because absence of evidence is not evidence of absence.10 Harris must prove that her attempt to replicate Edwards' result had failed for each and every possible prospective student — a theoretically impossible task. On the other hand, there are several possible explanations for Harris' failure to replicate the claimed effect. First, she might have inadvertently chosen a population of "students" that was resistant to training instead of one that was amenable to training. Second, having an obvious intent to debunk Edwards, Harris probably would have stopped at her first failure to replicate Edwards' results. Had she tried hard, she might have found a positive case. Third, her failure might have been due to a technical failure. Interested readers are invited to take a first-hand look at sample drawings presented in Edwards' book.190 My common sense tells me that Edwards' approach requires a great deal of technical skills to implement. Perhaps it is not something that a non-artist can implement by following a "cookbookrecipe" procedure. Interestingly, a different scenario would emerge had we merely reversed the chronology of Edwards' and Harris' reporting. Supposing Harris first reported a negative observation on Edwards' proposed right brain training program. Edwards then came along to report a positive observation, instead. Now, although it was impossible for Harris to prove that her negative claim is valid for each and every student, all it takes was a single
Bicomputing Survey II
479
counter-example — a single valid positive observation of Edwards — to debunk Harris' claim. Here, we merely followed the simple reasoning that Karl Popper had taught us (Sec. 6.1). However, we must pay attention to the now-obvious pitfall: the counterexample itself might be false. If we take the opposite (hypothetical) scenario, and view Edwards' report as an attempt to debunk Harris' claim, a false counter-example means data fabrication rather than data selection, because data selection alone would not invalidate the counter-example, whereas data fabrication would. However, it is difficult to find a credible motive for Edwards to commit a fraud of data fabrication. Fabricating data would not help her business since she eventually had to back up her claim with visible results. I am therefore willing to give her the benefit of the doubt. Of course, as a non-expert of both art and cognitive science, I must suspend my ultimate judgment. I hope that I will not hesitate to change my mind when I am confronted with new evidence to the contrary. I also hope that the above lesson is not lost amidst the controversy. Next, Harris went so far as to dismiss the role played by visual imagery in the thought processes of "creative scientists" such as Einstein (the quotation marks were originally added by Harris in an apparent gesture of contempt). Like many anti-imagery cognitive scientists — who regarded mental imagery as an epiphenomenon546'547 — before her, she trivialized the reports documented by Hadamard272 and by others as being "anecdotal." She further characterized the term "creativity" as one of those "vague words whose frequency of use lies in inverse proportion to the carefulness of use." In plain English, she meant that, being amateurish, Hadamard was not qualified to identify creative elements or factors, whereas experts, such as herself, knew better how to define creativity properly than non-expert outsiders, such as Hadamard the mathematician. However, my own experience in writing this survey convinced me that Hadamard's anecdotes and the introspections of Poincare and Einstein are more reliable than orthodox behavioral experiments that were carried out, according to the "book," by someone who had never experienced a first-hand act of creativity. Keep in mind that the use of anecdotes and introspections in the search-and-match phase of creative problem solving helps expand the search space without compromising scientific rigor, if they are treated as mere hypotheses. Of course, treating anecdotes and introspections as foregone conclusions without adequate verification is unscientific. On the other hand, the practice of a wholesale distrust of anecdotes and introspections is an unnecessary selfimposed constraint (dogmatism) rather than a reliable approach to guar-
480
F. T. Hong
antee scientific rigor. Miller thought that the anti-imagists had missed a golden opportunity. A concise account of the pro-imagery/anti-imagery debate was given by Miller (pp. 223-233 of Ref. 464). As pointed out in Sec. 4.21, mixing the two disparate groups, such as social leaders and scientific creators, together in a behavioral experiment contributes to serious sample heterogeneity in creativity research. In contrast, anecdotes and introspections are all "isolated" cases with no risk of incurring sample heterogeneity if they are judged individually, especially by someone who had a similar experience. It is particularly note-worthy if anecdotes and introspections are combined. In the preface to the paper back edition of Hadamard's book, P. N. Johnson-Laird pointed out that Hadamard himself had an experience similar to what Poincare described during his geological excursion, as did Gauss, Helmoholtz, and others (p. ix of Ref. 272). Hadamard might be biased but was biased with judgment and was also in good company. By dismissing Hadamard's analysis and ignoring Einstein's introspection, Harris inadvertently pruned off the most fruitful "branches" of the search tree for creativity research, thus reaching an erroneous conclusion. Harris also dismissed the notion that "the right hemisphere is a sort of endangered species that needs special exercise and encouragement to survive." She claimed that "the right hemisphere gets plenty of exercises," and the best evidence is that "the cerebral hemispheres are inherently specialized for their respective functions, work normally from the outset, and receive ample opportunities to do what they are genetically programmed to do merely in the course of normal everyday living in an environment that is open to all of our senses every day of our lives." She must be a true believer of orthodox behaviorism and hard-core biological determinism, thus becoming totally oblivious to the availability of two choices during the search-and-match phase of problem solving: either rule-based reasoning, which pertains to the function of the left hemisphere, or picture-based reasoning, which pertains to that of the right hemisphere. Harris also had no sense of the brain's functional hierarchy. One can use the right brain for viewing a picture or movie and for recognizing printed words or symbols, thus getting some exercises of the right brain at the level of shape recognition. But one can still use rule-based reasoning exclusively in search of a solution that matches a particular problem, thus getting no exercise of the right brain at this functional level. Therefore, Harris's best evidence is not sufficient evidence at all. The prevalence of the practice of exclusively rule-based reasoning in our
Bicomputing Survey II
481
younger generations means that a large number of people have begun to scale back their right-brain exercises. The sorry state of the current U.S. educational system bears a painful testimony that the right hemisphere is now on a path of "regression" or "atrophy" by disuse (at least figuratively, if not literally). Harris must have been either out of touch or so lucky as to be surrounded by exceptionally good students at all times. Categorically, however, Harris accused the entire "right-brain movement" of antiliteracy, anti-intellectualism, anti-rationality, and, last but not least, "blatant hucksterism." Actually, the "right-brain movement" was not against literacy. True literacy requires mastery of both the syntactic and sematic parts of a language. A semantic comprehension requires ample participation of the right hemisphere (Sec. 4.15). Advocates of literacy should not have felt so threatened. Likewise, advocates of mathematics proficiency should have welcome the "right-brain movement." Without participation of the right hemisphere, mathematics is nothing but some strange rote procedures and meaningless formulas that make no intuitive sense, and is soon forgotten once one gets out of school or even gets past the examination. Who was anti-rational and anti-intellectual? Edwards or Harris? That is for the readers to decide. Here, I have no intention to defend everything done by advocates of the right-brain movement. As mentioned in Sec. 4.19, some advocates missed the point and used the practice of visual thinking to enhance long-term memory. Furthermore, Harris' warning against "dichotomization" and oversimplification of scientific facts is deemed valid. Being judged with the benefit of hindsight, earlier proponents of the "rightbrain movement" were guilty of oversimplification. So were some of the rest of us (myself included), from time to time, about many important issues. However, their sin should not have been used as an excuse to completely obliterate their important messages. 7. Technological Applications 7.1. Expert systems in artificial
intelligence
In problem solving, the search space of a modestly difficult problem is often so large that it is not practical to launch a systematic or exhaustive search in the entire search space. This restriction is less prohibitive for a digital computer than for a human being. Given a set of explicit instructions in terms of programming algorithms and a rich depository of reference materials (database) in a digital computer's random access memory and in its mass storage devices (e.g., hard disks), the computer can carry out
482
F. T. Hong
decision-making and problem-solving tasks faithfully without a mistake or fatigue, all with a blinding speed. Early artificial intelligence (AI) research took advantages of these features. It culminated in the development of rulebased, knowledge-based expert systems (specialist AI-programs).233'342 The apparent intelligence of these systems belies the "brainless" computational procedures hidden beneath. It is apparent to those who have experience with assembly language programming that a digital computer does multiplication and division of two numbers by repeated additions and subtraction, respectively. The decision making is little more than comparing numbers, two at a time, with the sequential comparison linked by logic operations of AND, OR, NEGATION, exclusive OR, etc., and by procedural branching in terms of conditional jump routines. It uses the "brute-force" approach of sequentially searching the entire database until a match is achieved, or until the entire database is exhausted and the search ends without a match. Because of the high-speed and accurate number-crunching capabilities of the computer, all mundane operations are largely hidden from the full view of a nonprogrammer or even a high-level language programmer. The above discussion on the shortcomings of sequential searching may be somewhat exaggerated. First, in view of the high speed of data search, a sequential search can be designed to canvass a vast database, thus achieving what was referred to as pseudo-parallel processing in Sec. 4.11. Second, the search tree can be pruned in such a way as to give preferential considerations to certain more probable search rules {heuristics, Sec. 3.4). If properly programmed, the computer can even learn and incorporate new heuristics into its internal database and make sophisticated discoveries (Sec. 4.26; see also Chapter 8 of Ref. 75). The expert systems work wonderfully for the types of problems that can be solved by means of well-defined (explicit) rules that are usually generated by human experts. However, for the types of problems with ill-defined rules or with rules that cannot be readily broken down into sequential steps, it is awkward to construct an expert system. Thus, the knowledge-based expert systems are ill-suited for solving problems of pattern recognition and other subtle decision-making processes such as value judgment (Sec. 6.6). The state-of-the-art AI has come a long way since the early days and has overcome many obstacles.731 However, the predicaments of the early AI serve to illustrate the very different nature of information processing in a living organism, as compared to that in a digital computer. Pattern recognition requires assessment of the whole and often defies decomposition
Bicomputing Survey II
483
into a step-by-step assembly-line-style combination of operations. Rosen's analysis explains why the semantic entailments underlying the process of pattern recognition cannot be fully replaced by sequential rule-based (syntactic) entailments (Sec. 6.13). However, the prospect turned out to be brighter than what Rosen had envisioned. Nevertheless, the recognition of the limitations of rule-based systems ushered in a change of strategies: attempts to simulate the human brain in ever increasing details. A combined top-down and bottom-up scheme is involved in evolution, in embryonic development, and in human's creative problem solving. In contrast, expert systems are based on a top-down design, and thus often incur an enormous software overhead. The fact that expert systems work at all has a lot to do with the ever increasing speed and capacity of digital computers; enhanced hardware performance shoulders part of the software task and reduces the software overhead. 7.2. Neural network computing Artificial neural network research was inspired by neuroscience.332 It was launched as a "counter-culture" to AI research that gave rise to expert systems. According to Rosen,570 the original idea of neural network was suggested by Nicolas Rashevsky in the 1930s,553 and was later recast in the language of Boolean algebra by McCulloch and Pitts. 447 For the benefit of novice readers, we shall present a brief and elementary description. A succinct description of artificial neural networks and an explanation of their cognitive abilities can be found in an article by Churchland.124 A short list of monographs is suggested for further reading: Refs. 120, 216, 269, 514 and 337. A typical neural network consists of three layers of neurons with interconnections, called synapses, between various neurons (Fig. 11). The middle layer is also known as the hidden layer. A neuron typically receives several synaptic inputs from other neurons. Each synaptic input carries a synaptic weight. Firing of a neuron constitutes the output, which imposes a synaptic input on the subsequent neurons in synaptic contact. The control law is a step function or a sigmoid-shaped threshold curve as in a real neuron (see Fig. 17B of Chapter 1). In the back-propagation model,708 training of a neural network is implemented by presenting a set of data as the input. The progress of learning is monitored by an error function that evaluates the difference between the performance and the known goal. The individual neuron in the network
484
F. T. Hong
Fig. 11. Neural network. A. A neuron is shown with three inputs, s\, S2, S3, with synaptic weights, tui, u>2, W3, respectively. B. The strength of output so is related to the total input E by a sigmoid-shaped control law. C. A neural network with an input layer, a hidden layer and an output layer is shown. The property of a network depends on the number of neurons in each layer, their interactions and the control laws governing their interactions. (Reproduced from Ref. 124 with permission; Copyright by MIT Press)
then adjusts its synaptic weight in order to minimize the performance error. Thus, a neural network is essentially a virtual machine within a conventional sequential digital computer. Variations of the design are only limited by the imagination of its designer. Here, we shall examine a few salient features that are pertinent to our discussion of biocomputing. Consider the ill-defined problem of odor discrimination (e.g., see Ref. 508). The capability of odor discrimination is difficult to implement with traditional knowledge- and rule-based AI, because the rules for mak-
Bicomputing Survey II
485
ing odor discrimination are not explicitly known. This type of problem can be handled effectively by an artificial neural network. The characteristics of each odor are eventually learned by the network as a distributed property of synaptic weights over various neurons in the network. In order to visualize how a network discriminates, let us consider the synaptic weight for odorant j as an n-dimensional vector for n neurons, (Xij, X2J, X3J,..., Xnj). Each odor is thus represented by a position in the n-dimensional space as indicated by the position vector which points from the origin of the coordinate to the location representing odorant j . This mode of coding the information is referred to as vector coding.125 In order to appreciate how increasing the dimension of representation improves the ability of the network to discriminate various odors, we shall consider the opposite question: How can decreasing the dimension diminish or suppress the ability to discriminate? For simplicity, let us assume that it is feasible to discriminate two odors among three neurons, and the vector space is thus three-dimensional. This means that two spatially distinct points represent the two odors: methanol and ethanol (Fig. 12). In reality, the two odors are represented by two regions each with a finite volume rather than a volume-less mathematical point, because of the presence of measurement errors. In the situation where these two spatially distinct regions project geometrically to the same region (or nearly the same region) on one of the planes, e.g., the H2-H3 plane, elimination of the HI axis would also eliminate the ability to discriminate the two odors. An increase of dimensionality is potentially helpful in averting such an overlap or partial overlap (cf. extradimensional bypass, Sec. 7.5 of Chapter 1). Furthermore, clustering of such regions in a multidimensional space will help classify and identify closely related odors. This is essentially the design principle of an "Artificial Electronic Nose."509 The discriminatory ability can be further enhanced by some ostensibly top-down rules such as fuzzy ru/es735,385,384,736 o r "hints." Abu-Mostafa2 has demonstrated the usefulness of hints in visual pattern recognition of human faces by a neural network. By suggesting to the network that there is a bilateral symmetry (across the sagittal plane of the head) in human faces, the artificial neural network learned much more efficiently than it did without the suggestion. These top-down rules are equivalent to the initial hypothesis stipulated in the self-referential model enunciated by Korner and Matsumoto (see Sec. 4.5). Conversely, rule-based systems can be implemented as a virtual machine in the software environment with a bottom-up design, as exemplified by Holland's genetic algorithm.311 The algorithm operates on a population
486
F. T. Hong
Fig. 12. Odor discrimination as an ill-defined problem. A. A neural network for odor discrimination. The network has a hidden layer. Two different odors presented to the input neurons (labeled methanol and ethanol in the output), each with a characteristic profile of input strength distribution. The network learns to discriminate the two patterns by adjusting the distribution of synaptic weights. B. Hidden-unit activation-vector space. The profile of synaptic weights are characterized by a vector in the activation-vector space, which is an n-dimensional space, where n designates the number of neurons in the hidden layer. Each odor thus occupies a region in such a space. Discrimination is possible if there is no overlap of the two regions. For simplicity, only three dimensions are considered in the diagram. Two separate spherical regions that are not overlapping allow for discrimination. However, both regions project to the H2-H3 plane with a considerable overlap. Therefore, if the Hi unit is eliminated, the ability to discriminate will be lost. (Reproduced and modified from Ref. 124 with permission; Copyright by MIT Press)
of rules (or agents) which are allowed to be mutually conflicting. The rules are then subjected to natural selection by imposing selection rules that reward intelligent behaviors in the simulated environment. Rules that perform well will be allowed to reproduce and proliferate in succeeding generations. Mutations and recombination of traits from two parents are introduced to generate variations in order to avoid ending up with highly optimized
Bicomputing Survey II
487
mediocrity, which is a major drawback of "inbreeding." For a good performance that requires cooperation of several rules arranged in a sequential succession, Holland implemented a reward scheme called the bucket-brigade algorithm to distribute the rewards among participating rules, not only to the rule that happens to score a success by implementing the last step to consummate the transaction, but also to other rules that set the stage. The distributed nature of synaptic weights in artificial neural networks offers a very different kind of memory scheme. The memory regarding the discriminatory feature of a pattern is distributed among a collection of synaptic weights rather than stored in a fixed location, as is in a conventional digital computer. There are at least three advantages of this scheme. First, when the memory is partially altered, the discriminatory ability in pattern recognition is not lost abruptly but rather degrades gradually and gracefully (fault tolerance). Second, the retrieval of the memory content stored in the network does not require addressing a fixed memory location. Instead, the memory may be addressed by association with the memory content (content-addressable memory). This type of memory addressing protocol suits well the problem-solving scheme discussed in Sec. 4. Last but not least, memory retrieval is a parallel process with random access, and is therefore much less time-consuming. In this regard, it is apparent that, in spite of the fact that most biochemical reactions are slow (in the millisecond time range), as compared to the clock cycle of a digital computer, the efficiency in pattern recognition more than makes up for the lost time. Modern artificial neural networks are rather sophisticated and exhibit an increasing degree of intelligence. They are no longer rigid rule-based systems but are capable of improving their performance through experience or training. That is, they are capable of learning through exposures to realworld inputs. There are two types of learning: supervised and unsupervised learning (e.g., Ref. 477 and Chapter 3 of Ref. 216). In pattern recognition, the machine is supposed to learn how to recognize patterns that belong to the same class but exhibit almost infinite shades of variation, such as different accents of pronouncing the same word. In supervised learning, the neural network program is exposed to training samples that have already been classified by the "teacher" or "supervisor." A neural network simply learns to minimize errors of its outputs in accordance with the furnished class standards, but is not allowed to question the correctness of these standards, much like military training in a boot camp. As a consequence, the neural network is trained to follow, like a faithful copycat, the prejudice or bias of the programmer or whoever has passed the ultimate judgment on
488
F. T. Hong
the classification of samples. In contrast, no such class information is provided in unsupervised learning. The neural network must keenly observe any regularity or recurrence of certain patterns exhibited by the training samples and extract these features accordingly. The program must formulate hypotheses, in an inductive way, according to clustering of vectors in the vector space (of the vector coding diagram). For example, in monitoring changes of body temperature, human experts can certainly recognize striking temporal patterns such as a spiking fever in the afternoon and spatial patterns such as a patch of regionally elevated temperature on one of the four extremities. But a neural network program can learn to differentiate more subtle differences in spatial and temporal patterns with high resolution, such as a tiny heat-producing focus and an extremely short boost of temperature as well as subtle spatial distributions and temporal variations that tend to elude a human expert. Neural networks capable of unsupervised learning can thus perform subtle pattern recognition and differentiation far beyond human's innate capability. The capability of an artificial neural network is intimately linked to the programmer's ingenuity. The capability also improves with the increasing processor speed, the increasing capacity of RAM memory and mass storage devices, and other enhanced hardware capabilities. Speech recognition and hand-written character recognition, which were once considered intractable, are now routine capabilities of personal computers.
7.3. Animat path to artificial
intelligence
In an effort to alleviate the limitations incurred by the top-down design of knowledge-based AI systems, an approach based on robotics addresses the problem by building specific-purpose robotic modules with a bottomup design philosophy. These robotic modules are called animats and the approach is called the animat path to artificial intelligence.669 In this combined top-down and bottom-up approach, each module consists of sensors, processors (behavior generating subsystems) and actuators. This feature is reminiscent of the intelligent materials (Sec. 7.3 of Chapter 1) except that the intelligent module is implemented with robotic hardware and software. The ability of a robotic module to adapt to the environment and the task requirements (specified by human designers) can be improved by subjecting the robot to training and learning. The environment to which the animat learns to adapt can be the real world instead of an "artificial world."
Bicomputing Survey II
489
7.4. Agent technology Agent technology was hailed as an important breakthrough in the 1990s. Its development was intimately related to another branch of AI: knowledge representation. Essentially, knowledge representation is a current AI effort in algorithmizing knowledge.560'303'588 Just like databases are separated from data processing procedures, knowledge representation allows knowledge to be modularized and separated from reasoning procedures that process knowledge and make decisions. If knowledge representation constitutes the database arm by furnishing well-organized knowledge modules, then agent technology can be regarded as the procedural arm of utilizing these knowledge modules. An intelligent software agent transforms a computer program from a dumb receiver of the user's requests into an equal partner of the user. It is capable of handling chores and solving problems for the user of a computer or the owner of an Internet web server, without the necessity of the programmer's or the user's micromanagement. Outside of the AI community, one of the best known examples is cookies which a web server sends and plants in a user's computer that has requested to access the web. "Implanting" of cookies is often done without the user's consent and knowledge. A cookie keeps track of the user's path of web navigation and decision making, and can subsequently send the information back to the server for the purpose of constructing a user profile. A universally accepted definition of a software agent is lacking because there are many types of them. Murch and Johnson479 summarized a number of different definitions that depend on the mind-set of the designers. It is easier to define them implicitly by listing some of the major attributes. Jennings and Wooldridge351 defined an intelligent agent as a computer system that is capable of flexible autonomous action. By flexible, they meant that the system must be: • responsive: capable of perceiving their environment and responding in a timely fashion, • proactive: capable of exhibiting opportunistic, goal-directed behavior and take the initiative where appropriate, and • social: capable of interacting, when it deems appropriate, with other artificial agents and humans in order to complete their own problem solving and to help others with their activities. It is readily recognized that agent technology attempts to equip the
490
F. T. Hong
computer with the ability of creative problem solving, and with "motivation" and "will power" to pursue its entrusted goals. Prom the discussion of Sees. 4 and 5, it is apparent that the price of doing so is to give the computer a certain degree of freedom (autonomy) — a rudimentary form of free will or, rather, pseudo-free will. Proceeding along this line of thinking, AI designers eventually included attitudes such as beliefs (B), desires (D), and intentions (I) in the design of rational agents. The design principle, known as BDI-architectures,551'229 is based on Cohen and Lebvesque's133 formalization of some of the philosophical aspects of Bratman's theory on intention and action.88'89 Based on a BDI architecture, a software agent can automatically gather information (from the Internet, for example), harbor beliefs, manifest desires and intentions, formulate detailed tactics and actions by exploring various options in accordance with some general heuristic guidelines, and commit the computing resource towards the realization of its goals. Thus, in spite of Rosen's objection to formalization of attributes of introspective nature such as intention and motivation, BDI-architectures turned out to be relevant for the design of rational software agents in meeting the need of problem solving in an ever-changing, dynamic, uncertain, and complex world. 7.5. Neuromolecular brain model: multi-level neural network The scheme of the M-m dynamics of biocomputing led Conrad and coworkers towards a different and improved implementation of the connectionist neural networks.143'136'137 This approach exploits the nested hierarchical architecture of computational dynamics. Essentially, it is a neural network with internal dynamics. This approach is also termed the neuromolecular brain model to distinguish it from the conventional neural network that captures the macroscopic dynamics only. Conrad and coworkers developed several computer simulation models, each addressing a separate aspect of dynamics and then in various degrees of combinations. We shall describe one of these models here in detail.116'117'118 This model — the multi-level neuromolecular model — is a combination of two previously separate models known as the reference neuron model and the cytoskeletal model. The neural networks consist of two types of neurons: the reference neurons and the primary neurons with different hierarchical controls. The input-output control laws of the primary neurons are regulated by the cytoskeletal dynamics that was modeled in terms of cellular
Bicomputing Survey II
491
automata. In this model, three levels of evolutionary learning are possible: a) at the level of proteins that alter the structure of the cytoskeletal network, b) at the level of readout proteins on the neural membrane, which respond to cytoskeletal signals, and c) at the level of the reference neuron selection and primary neuron orchestration. This multi-level brain is required to learn the strategy of navigation through a maze-like artificial world with barriers and food. There are important lessons to be learned from this model. An integrated system like this, with three operational levels of evolutionary changes, can always perform more complicated tasks than the individual components can perform collectively but in isolation. The different modes of learning are synergetic. This is indeed what Gestaltists cherish: the whole performs better than the sum of the parts. In one type of navigation, one of the rules dictated that an organism must die upon bumping into a barrier, but in reincarnation the lesson is remembered. The movement of the organism exhibited a long delay every time it was required to make a turn before it finally negotiated the turn. Here, it should be pointed out that the maze in this particular artificial world required infrequent turns. Apparently, the organism recognized (learned) that making a turn almost always led to death, and therefore appeared to be extremely reluctant to do so. In contrast, if the selection rule was modified in such a way that the organism that bumps into a barrier has a small chance of surviving the mistake, the long delay in negotiating a turn vanished.116 The message is obvious: the behavior of the artificial organism is reminiscent of habituation of cigarette smoking. Although it is widely known that cigarette smoking is linked to the development of a subtype of lung cancer, the development of the cancer is not guaranteed. The built-in uncertainty encourages new smokers to begin and discourages old smokers from quitting. Likewise, the uncertainty of penalty encouraged the organism to explore the artificial world, thus facilitating its learning. This simulation result also suggests that bold and divergent thinking facilitates the acquisition of novel solutions (Sec. 4.1). Interestingly, this simulation might be among the first to demonstrate that extrinsic rewards and threats of excessive punishment inhibit cognitive performances of a machine (cf. Sec. 4.21). What are the strengths and weakness of the neuromolecular computing approach? This approach and the conventional neural network approach share the same weakness, namely, the necessity to create a virtual machine
492
F. T. Hong
within a conventional digital machine. The basic incompatibility is even more acute for neuromolecular simulations. For example, the internal subneuronal dynamics dictates a spatial and temporal scheme of continuum. Each type of biochemical reaction has its own "clock-cycle" (kinetics), and the reaction network is parallel and distributed in nature. A simulation model in a digital environment requires discretization of both space and time, thus necessitating the use of cellular automata (with discrete cells) and the use of discrete time intervals for data updating. A bona fide nested hierarchical network architecture requires that the network architecture at different nested hierarchical levels be sufficiently different from one another, otherwise it is not different from a conventional neural network for the following reason. Any three-dimensional network can be topologically collapsed onto a plane. This is exemplified by an electronic engineer's circuit diagram which represents a three-dimensional circuit by means of a pseudo-two-dimensional diagram. Of course, special provisions must be made for the circuit lines that appear to cross each other but are not actually connected. In this pseudo-two-dimensional representation, there is no fundamental difference between a neuromolecular model and a conventional neural network model. Taking the above consideration into account, in order for a neuromolecular simulation to be believable, the software overhead has to be vastly inflated. Consequently, limitations of the computer speed and memory capacity allow only a small fraction of the rich multi-level dynamics to be captured. In view of the above limitations, Conrad and coworkers offered several possible remedies.143'137 First, a reasonably powerful computer can be dedicated to the evolutionary development of a given neuromolecular brain model and, from time to time, the useful "dynamical products" are harvested. Once the specifications — i.e., optimal control laws — of the dynamics are found, a dedicated hardware VLSI (very large scale integration) circuit can be constructed in order to enhance the real-time performance (the biocomputing equivalent of the use of a math-coprocessor, instead of software subroutines, for number crunching). Incidentally, analog electronic circuits that simulate the organization and function of nervous systems have been implemented in VLSI configurations. 177 However, these devices are not based on digital implementation of neural networks or neuromolecular models. Of course, the VLSI version of a neuromolecular model is no longer capable of evolutionary learning (unless it is made programmable); evolutionary learning is limited to the drawing board only. The immediate contribution of the neuromolecular model is thus more
Bicomputing Survey II
493
conceptual than utilitarian or practical: it offers a vision for future machine intelligence. The vision led Conrad and coworkers to make another daunting suggestion: the molecular implementation of biocomputing. Molecular implementation is thus one of the few remaining options because only then can the potential of biocomputing dynamics be fully unleashed. This latter approach is known as molecular electronics, and the computing process as molecular computing. Since molecular components of nanometer dimensions are called for, molecular electronics is intimately related to nanotechnology. 7.6. Embryonics:
evolvable hardware
Bio-inspired principles can also be implemented in conventional siliconbased hardware that is capable of self-replication and self-repair. The research and development of this kind of evolvable hardware is known as embryonic electronics or emfcrT/onics.587'635'437'435 There are three sources of inspiration: • the phylogenetic level: the temporal evolution of the genetic programs (Darwinism). • the ontogenetic level: the embryonic development of a single multicellular organism. • the epigenetic level: the learning processes during an individual organism's lifetime. The goal of the Embryonics projects at the Logic Systems Laboratory at the Swiss Federal Institute of Technology is to develop highly robust integrated circuits, endowed with the capabilities of self-repair (cicatrization) and self-replication (cloning or budding).436 The Embryonics architecture is based on four hierarchical levels of organization implemented in the twodimensional world of integrated circuits on silicon: a) the basic primitive is the molecule, a multiplexer-based element of a novel programmable circuit, b) a finite set of molecules makes up a cell, essentially a small processor with an associated memory, c) a finite set of cells makes up an organism, an application-specific multiprocessor systems, and d) the organism can itself replicate, thus giving rise to a population of identical organisms. Mange and coworkers made clear that their goal of bio-inspiration is not the modelization or the explication of actual biological phenomena. Thus, Rosen's objection, mentioned in Sec. 6.13, is not relevant here. Instead, the goal are: a) to meet the scientific challenge of implementing the original specifications formulated by John von Neumann for the concep-
494
F. T. Hong
tion of a self-replicating automaton,690 b) to meet the technical challenge of realizing very robust integrated circuits capable of self-repair and selfreplication, and c) to meet the biological challenge of attempting to show that the microscopic architectures of artificial and natural organisms, i.e., their genomes, share common properties. The artificial genome was implemented with an identical program in each cell. Differentiation of each cell was effected by activation of the state of the cell, i.e., the contents of its registers. The molecules were made of field-programmable gate arrays (FPGAs). Self-repair was implemented by means of reprogramming spare molecules or cells, and the repair could be conducted either at the molecular or the cellular levels. These investigators constructed a modulo-4 reversible counter (a unicellular organism) and a timer (a complex multicellular organism). They also constructed a special-purpose Turing machine capable of self-reproduction and self-repair: a parenthesis checker.434 7.7. A successful example of molecular computing: solving the direct Hamiltonian path problem That molecular systems can be used for computation purposes has been elegantly demonstrated by Adleman.3 Adleman used standard tools of molecular biology to solve a case of the directed Hamiltonian path problem with 7 vertices (or, rather, 7 cities). Simply put, the problem called for the path of a traveling salesman to go from a home city to a destination city via all other cities in such a way that he visits each city exactly once. What Adleman invoked to solve this problem is a non-deterministic algorithm with the following steps: (1) Generate random paths through the cities. (2) Keep only those paths that begin with the home city and end with the destination city. (3) If the graph has 7 vertices (cities), then keep only those paths that enter exactly 7 vertices. (4) Keep only those paths that enter all of the vertices of the graph at least once. (5) If any paths remain, say "yes," otherwise say "No." The strategy is based on the complementary base-pair recognition of DNA. An oligonucleotide of 20 monomeric units with distinct but otherwise random sequences was chosen to represent each of the 7 vertices. Since each
Bicomputing Survey II
495
oligonucleotide possesses a 3'- and a 5'-end, a polarity sense is preserved. Thus, an edge — i.e., the directed (polarized) connection between two cities — could be represented by an oligonucleotide of the same length with 10 from the 3'-end of a vertex and 10 from the 5'-end of the next vertex being connected by the edge. In this way, each edge, with its sense of direction preserved, was represented by a distinct oligonucleotide. Another oligonucleotide that is complementary to each vertex was also synthesized. Mixing of complementary "vertex"-oligonucleotides and "edge"-oligonucleotides in ligation reactions allowed for linking of edges to form a continuous path, by virtue of the complementary vertex oligonucleotide as a "splint." A large number of paths were then randomly generated. Note that even though the reactions were random, only meaningful paths (with no discontinuity) were generated because of the use of complementary vertex oligonucleotides (heuristic searching). Also note that some of these randomly generated paths might pass a certain vertex more than once. A selection process must be instituted to perform steps 2-4 to reduce the number of possible candidate samples, and to perpetuate (reproduce) only those selected. The strategy was to use primers as the tools of selection and to use polymerase chain reaction (PCR) as the tool of perpetuation (reproduction). It is well known that PCR can produce a highly enriched sample of any selected oligonucleotide, starting from a minute amount of the original template. By cleverly designed experimental protocols from the repertoire of standard techniques of molecular biology, Adleman was able to purify the oligonucleotide that encoded the correct path. The impressive feat has helped launch the field of DNA computing. More recently, Adleman and his coworkers solved a 20-variable NP-complete three-satisfiability problem on a DNA computer.87 The unique answer was found after searching more than 1 million possibilities. 7.8. Prospects of molecular electronics in biocomputing In general, the molecular implementation of neuromolecular computing — molecular electronics — would provide an escape from the straitjacket of having to create a virtual machine within a conventional digital computer. However, the challenges are now shifted to materials science and fabrication technology. The quest for novel materials to meet the demands for performance and sophistication in information processing has spurred a flurry of activities in two areas: supramolecular chemistry412'394-395'396'413'561 and biomaterials
496
F. T. Hong
research.9 Advances made in organic synthesis research have allowed complex interlocking organic molecules to be synthesized. For example, artificial enzymes were constructed by means of directed assembly of molecular components.455 In addition, advances made in the research of conducting polymers (also known as synthetic metals) and other synthetic organic molecules have generated a number of classes of organic molecules suitable for applications in molecular electronics. These novel organic molecules include buckminsterfullerene Ceo and its analogue C70 (called bucky balls), tubular molecules (called nanotubes), and branching molecules (called dendrites). Using moving bucky balls, Park and coworkers502 constructed a unimolecular transistor. Self-replicating molecules have been synthesized to mimic the evolutionary process, thus giving a paradigm for life.557 A strategy similar to that of DNA computing has been used to construct specific geometrical and topographical molecules (DNA nanotechnology).607'608 More recently, scientists at the IBM T. J. Watson Research Center constructed a carbon-nanotube based field-effect transistor.726 On the forefront of biomaterials research, the successful utilization of bacteriorhodopsin for device construction marked the advent of biomolecules as functional materials. Vsevolodov and coworkers 693>694>692 constructed an imaging device called Biochrom film. These investigators exploited a property of bacteriorhodopsin, known as the photochromic effect: the ability to change color upon illumination. Bacteriorhodopsin is a pigment protein found in the purple membrane of Halobacterium salinarum496 (Sec. 5.4 of Chapter 1). It possesses a legendary stability and is capable of withstanding high temperatures (up to 140°C)612 and high acidity (around pH 0).314 Vsevolodov and coworkers used chemically modified bacteriorhodopsin to fine-tune the photochromic effect of bacteriorhodopsin in order to meet the requirements of imaging function. Modifications of protein function via conventional chemical methodology offer a limited range of manipulation. The advent of recombinant DNA technology greatly enhances the range of manipulation. Hampp and coworkers 277,90,278,276 subsequently constructed a dynamic hologram using a genetic variant of bacteriorhodopsin. Birge and coworkers70'71'72 have devoted a great deal of efforts in pursuing several different designs of bacteriorhodopsin-based optical memory devices. Both chemically modified and genetic mutant bacteriorhodopsins were explored for possible applications. It is apparent that the major challenge in utilizing biomaterials for device construction is fine-tuning a particular functional property. The pho-
Bicomputing Survey II
497
tochromic effect of bacteriorhodopsin is not exploited by Halobacteria for physiological functions. It is a latent function — a by-product of the proton pumping function. These properties must be fine-tuned and optimized for device construction. The advent of recombinant DNA technology inspired Conrad135 to propose a genetic-evolutionary protocol, called the molecular computer factory, to generate and mass produce a wide variety of molecular building blocks. New mutant proteins are produced by variations through site-directed mutagenesis. Subsequently, the proteins are subjected to performance evaluations, selection, and reproduction. However, this top-down approach must rely on systematic searching for the correct protein structures: effective systematic searching via site-directed mutagenesis may require time exceeding the age of the universe, not to mention the required material resources. Although computer simulations have been routinely used in the design of small molecules in the pharmaceutical industry, simulation of medium-size proteins such as bacteriorhodopsin is still a formidable problem. A compromised approach is a combined top-down and bottom-up strategy: random mutations. In this way, only those mutant proteins that can actually fold into a stable conformation are evaluated for performance. However, recent progress made in protein folding simulations471 and ab initio theoretical predictions of protein structures79 has revived the hope that computer simulations may eventually provide an avenue of heuristic searching for specific functional modifications via site-directed mutagenesis (Sec. 7.6 of Chapter 1). Electrostatic modeling, which is essential for understanding the biophysical aspect of molecular interactions, used to be limited to small molecules with less than 50,000 atoms. Baker et al.39 have extended the practice to molecular systems that are orders of magnitude larger in size, such as microtubules and the ribosome, by using a clever algorithm, called parallel focusing, and a supercomputer. Fabrication of molecular devices in accordance with biocomputing principles is technically demanding. Presently, no computational models with modest complexity have been successfully implemented with molecular construction. Most research activities center around the construction of prototype molecular sensors or actuators as well as simple devices, such as diodes and transistors.558'423 The construction of molecular diodes or transistors, especially with single molecules, has been used as a benchmark to demonstrate the feasibility of molecular electronics.24'462'115'726 There is a huge gap between current fabrication technology and the expectations of the sophisticated computational paradigms, envisioned by
498
F. T. Hong
researchers in neural networks and in neuromolecular simulations. Some mainstream advocates of biocomputing dismissed the efforts of making molecular equivalents of transistors, diodes, etc. They often cited the difference of computational dynamics (the lock-key paradigm vs. the switching mechanism of logic gates and flip-flops) as the basis of their objections. In my opinion, all efforts in the development of molecular gates and transistors will not be lost, because these activities constitute heuristic searches in the learning process of attempting to address the difficult problem of molecular fabrication.320 Inspiration from biology indicates that the key to the solution of fabrication technology is to exploit the self-assembly capability of molecules (cf. Sec. 7.6). In the construction of biosensors, a popular approach is to immobilize enzyme proteins on a solid support (usually a metal electrode). There are two main techniques exploiting molecular self-assembly. The first one is the Langmuir-Blodgett (LB) technique.564 This technique relies on the amphiphilic properties of molecules such as fatty acids and phospholipids. An amphiphilic molecule possesses both hydrophilic and hydrophobic domains usually at two separate ends of the molecule. Multiple layers of oriented amphiphilic molecules can be deposited on a glass substrate that can be pre-coated with a transparent metal layer acting as an electrode. The LB technique can be supplemented with other assembly techniques for stronger bonding (such as the silinization technique 703 and the avidinbiotin coupling technique,716) or for better specificity (such as the monoclonal antibody technique) than non-specific non-covalent bond interactions can provide. Other techniques borrowed from conventional integrated circuit fabrication include spin coating and molecular beam epitaxy. Electropolymerization has also been used to deposit polymer molecules on a solid surface (in situ electropolymerization). Additional techniques based on the amphiphilic properties of phospholipids include several methods of forming bilayer lipid membranes (BLM).475'666'667 An intriguing material of bacterial origin — S layer — has been exploited as a matrix for building molecular devices.545 Most of these techniques provide layered thin films with heterogeneity in the perpendicular direction (perpendicular to the solid support) but not in the lateral direction. Nanophotolithography of photosensitive multilayers constructed with any of the above methods provides lateral heterogeneity (e.g., Ref. 181). Of course nanolithography can be combined with various kinds of supporting matrices to create both perpendicular and lateral heterogeneity.
Bicomputing Survey II
499
A method for assembling molecules one at a time was discovered by Foster et al.210 These investigators found that it is possible to move a molecule, deposited on a solid graphite surface, by applying a pulse of electric voltage (or current) via the tip of a scanning tunneling microscope. Scanning tunneling microscopy (STM) together with several related techniques (such as atomic force microscopy, friction force microscopy, photonic force microscopy) can also be used to characterize a deposited molecular thin film on the nanometer scale.334'335 The advent of these tools has fueled the development of nanotechnology. There were different opinions regarding the feasibility of nanotechnology — technology of fabricating devices of nanometer dimensions. Drexler178 is the champion of nanotechnology, whereas Smalley639 thought it can never be realized. The difficulty in molecular implementation of the "dynamic products" discovered by neuromolecular simulations lies in the inherent demand of a top-down design: the discovery of the algorithm and its control laws comes first, whereas the selection of materials and fabrication techniques come second. In other words, the construction of a device with a specified algorithm and computing function requires the use of a) the right kinds of molecules with the required physicochemical interactions (specifically tuned functionality), b) the appropriate assembly techniques to put each kind of different molecule in the specified spatial relationship and in a specific temporal sequence, just like assembling a three-dimensional jigsaw puzzle, and c) the appropriate implementation of specifically designated initial conditions so as to jump-start the machine in a consistent, coherent and rational way. Numerous functional entailment loops must be closed, and it is difficult to implement by means of sequential, or piecewise parallel, or pseudo-parallel assembly procedures (Sec. 6.13). At the present stage, both materials science research and fabrication technology research are the rate-limiting steps. 8. General Discussion and Conclusion Part 2 of this survey examines life processes at the systems and evolutionary levels, and the technological applications of biocomputing. As compared to lower life forms and plants, the most conspicuous manifestations of humans and nonhuman primates at the systems level are consciousness and cognition. Human-like creative cognition is the most coveted capability that computer scientists and engineers hope to implement in computers, whereas scientists and philosophers continue to debate whether it is possible to sim-
500
F. T. Hong
ulate reflective consciousness in a man-made machine. Metaphorically, evolution is tantamount to collective problem solving that involves generations after generations of living organisms for long-term survival. Therefore, evolution is also considered in this article, with an emphasis on the parallelism between evolution and creative problem solving. A parallel also exists between creative problem solving and the formulation of scientific theories in human's quest of understanding physical reality. Therefore, the philosophy and the sociology of science are also discussed. In fact, Donald T. Campbell was the chief proponent of combining all three parallels into an evolutionary epistemology — all-purpose explanation of fit, which Wuketits paraphrased as follows (p. 178 of Ref. 733): "There is a fit between the organisms and their environment, between vision (and other types of perception) and the outer world, and between scientific theories and the aspects of reality they purport to explain — and these fits are results of selection processes that bring forth ever-better adapted organisms, eyes, and theories." The common thread that ties all three parallels together is pattern recognition. Here, I cannot resist the temptation to point out that Campbell's identification of these three parallels in the formulation of his evolutionary epistemology is by itself an act of pattern recognition. We continue the tone established in Part 1 of the survey (Chapter 1): to examine the relevance of relative determinism and the role of "error" or noise in biological information processing. As frequently pointed out by Conrad,136'137 a digital computer is material-independent; all internal dynamic processes of the hardware components are suppressed so as to conform to only two discrete states of binary data coding. In contrast, the brain utilizes its internal dynamics and, therefore, operates in both analog and digital modes. The mixing of the two modes of operation in a nested hierarchical system of multi-level dynamics contributes to the differences between how the human brain and a digital computer process information. Incorporating analog processing makes biocomputing vulnerable to error intrusion. However, it turns out that the inclusion of errors in biocomputing is a small price to pay, and the return is true intelligence unmatched by a digital computer, and perhaps the freedom of will that distinguishes humans (and perhaps also higher animals) from a deterministically preprogrammed digital machine. The adoption of relative determinism in biocomputing constitutes a departure from the dichotomous thinking that dominated Western science until the late 20th century. This comment must not be construed as a wholesale condemnation of Western science, as social constructivists had
Bicomputing Survey II
501
attempted to do. Had Western science adopted a gray-scale thinking in the first place, the determination to seek hard truths would have been compromised; investigators would have been lulled into accepting half-truths or outright fallacies. The lack of real progress of the so-called alternative science and alternative medicine is living testimony. Thus, Western science made progress because of — rather than in spite of — Descartes' dualism and Laplace's adherence to absolute determinism. It thus appeared easier for science to proceed from a clear-cut dichotomy — absolute determinism or complete randomness — to a gray scale of determinism, whereas to proceed in the opposite direction could have caused a great deal of unnecessary confusion. This trend was reflected in music history: the strife for recognizable patterns came first, and the strife for perpetual novelty which blurs the patterns came later. More recently, both Baars and LeDoux independently reached the same conclusion: one is better off if one first studies the clear-cut cases on the gray scale while deferring the study of those covering the intermediate region to the future. Baars29 chose to investigate consciousness at the two extremes on the gray scale: clearly conscious and clearly unconscious. LeDoux411 suggested focusing on psychologically well-defined aspects of emotion, such as fear, and avoiding vague and poorly defined concepts such as "affect," "hedonic tone," or "emotional feelings." Nevertheless, one should not forget to periodically re-examine what one had missed in the intermediate regions on the gray scale, during persistent and relentlessly hot pursuits of hard truths. This is probably the most precious lesson that one can learn from the study of biocomputing. Under the scheme of relative determinism, biocomputing resorts to exploration in decision making. However, biocomputing seldom goes as far as complete randomness. Certain degrees of order is maintained both structurally and functionally in bioorganisms. The principle of balancing aimless exploration and inflexible targeting, exhibited by molecular recognition, is manifest at many different hierarchical levels. Relative determinism in biocomputing is thus characterized by freedom with restraints (controlled randomness) and by determinacy with tolerance of small deviations (dynamic tolerance). The same strategy seemed to have been invoked in evolution and in cognition: both bottom-up self-organization and top-down restriction, as well as the resulting heuristic searching had been at work. The generality of the principle spills over to science philosophy, and threatens to spread even farther. In this regard, I could not resist the temptation to mention an obvious metaphor: between the two extremes of totalitar-
502
F. T. Hong
ianism/authoritarianism, on the one hand, and anarchism, on the other hand, there is democracy that combines the best of both worlds, while excluding the shortcomings of either extreme. Even within the framework of democracy, the wisdom of striking a balance between two extremes prevails. Arthur M. Schlesinger, Jr., once said: "The answer to the runaway Presidency is not the messenger-boy Presidency. The American democracy must discover a middle ground between making the President a czar and making him a puppet" (p. 744 of Ref. 50). As described in Chapter 1, modularity and nested hierarchical organizations are the predominant ordered features in biocomputing. The presence of modularity and hierarchy in protein folding implies that evolutionary improvements are not always accomplished by random or exhaustive searching.41'42 Heuristic searching at the evolutionary level allows for existing modules to be recruited and modified, thus avoiding the reinvention of the proverbial wheel. The most outrageous example of Nature's modular design was the capture of ancient bacteria as permanent prisoners by eukaryotic cells. In the hierarchical organization, individual modules are not always rigidly linked. Thus, graded and dynamic network connectivity at the cellular level degrades the determinacy of control laws. Graded connectivity is also manifest in the two systems of learning: behaviorism and cognitivism. The noncognitive habit forming system is harder to reprogram but can be altered by training or environmental influences. This part has great stability and is responsible for traits such as personality. The cognitive memory system is more flexible, easy to reprogram, and is intimately related to the manifestation of intelligence. Cerebral lateralization might be Nature's solution to the dilemma of having to compromise between computational stability and reliability on the one hand, and computational flexibility and versatility, on the other hand. It is of interest to note that self-organization and modular organization are also evident in ecology and in social interactions, including economics. 19 20 - Modularity seems to recur at various levels of organization, and is reminiscent of self-similarity in fractal geometry in a metaphoric sense. Thus, recurrence of modularity and hierarchical organizations implies "unity of knowledge," as championed by Edward Wilson. Yet, philosopher Bunge was strongly opposed to the application of chaos theory in political science, citing the lack of mathematical rigor as reason. This science fundamentalist attitude was unfortunate; it discourages cross-fertilization and unity of knowledge. Physiologists of past generations freely used the engineer-
Bicomputing Survey II
503
ing concept of positive and negative feedbacks even though perhaps only Hodgkin and Huxley's theory of nerve excitation, as an attempt to formalize a major biological phenomenon, had attained sufficient mathematical rigor to satisfy purists like Bunge. The analogy between evolution and human cognition is striking, especially with regard to the crucial role of error. Without copy errors in gene duplication, evolution would not have been possible. Though less apparent, error is also essential for the emergence of intelligence and creativity. The human frailty of making occasional — or not so occasional — mistakes turns out to be the main source of human creativity. After all, a completely deterministic machine can only reflect the intelligence of its creator. It is not surprising that creative problem solving resembles evolutionary learning. Most, if not all, models of creative problem solving were inspired by Henri Poincare's introspective account. However, Poincare did not speak the language (jargon) familiar to experts. His view was either ridiculed or dismissed as the vague account of a non-expert witness. In the present article, Simonton's chance-configuration theory is singled out for extensive analysis. The theory is recast as pattern recognition in the framework of parallel and sequential processing. A contemporary interpretation of Freud's concept of the unconscious in terms of selective attention allows Poincare's introspective account to be understood in terms of Simonton's theory. The improved rendition of Simonton's theory applies equally well to both science and humanities. It turns out that the creative process practiced by geniuses is not fundamentally different from everyday ingenuity practiced by ordinary folks. A genius simply pushes the process to its limit. By bringing the distinction between parallel and sequential processing into focus, the mystery of high creativity is reduced to a level that is congruent with our present degree of ignorance of the brain function; no additional mysterious process needs to be invoked. What Bailin alluded to as excellent thinking can now be explained in less elusive terms. A demystified theory of creativity suggests where improvements are possible by training, and may guide educational policies in a concrete way. What we have arrived at is essentially a non-elitist view of creativity. However, unlike the non-elitist view of Weisberg and of Hayes, the creative process is non-algorithmic and not preprogrammable because of the underlying parallel processes. Creativity cannot be summoned at will. Therefore, motivation, hard work and possession of a large body of domain-specific knowledge are necessary but not sufficient conditions for creativity. The creative process is difficult to articulate because of the underlying parallel processes. Central to the creative
504
F. T. Hong
process is the ability to invoke both pattern-based reasoning in terms of parallel processes and rule-based reasoning in terms of sequential processes — a multi-track mind so to speak. With the benefit of hindsight, we can now enumerate the important insights that helped demystify the creative process: • The common pattern, exhibited by all creative processes or discoveries, is a process of "pattern recognition." • There are two ways — via holistic pictures or algorithmic rules — of matching a candidate solution (template) to the problem (pattern). • Picture-based reasoning is a parallel, analog, semantic, and nonverbalizable process, whereas rule-based reasoning is a sequential, digital, syntactic, and verbalizable process. However, the most important approach, above all, is to invoke picturebased reasoning because it is all about recognizing the "pattern" exhibited by the creative process, which itself is a kind of pattern recognition. It would be difficult, if not impossible, to attain such insights by means of rule-based reasoning alone, since the above-mentioned keywords mean differently in their respective original context and do not have anything in common superficially (e.g., comparing the words "digital" and "syntactic" in the context of computer technology and natural languages, respectively). However, if we allow these terms to be "pictorialized" and their meaning to be liberally "stretched" or distorted, it becomes apparent that both the words "digital" and "syntactic" are related by being deterministic and discrete, without an associated gray scale, and both pertain to something that is easy to enumerate, itemize, or quantify — easy to verbalize — in a sequential fashion. In contrast, both the words "analog" and "semantic" are related by carrying a continuum of gray scale, and both pertain to something that is difficult to enumerate, itemize, or quantify — difficult to articulate in concrete terms — and can only be felt as whole in actual perceptive experience rather than be dissected into parts (components), be assigned a numerical score to each and every part of it, and be evaluated by means of a single summed total score. It is awkward to handle an analog process in a sequential fashion, because something so spread out in the continuum may fall through the "cracks." To appreciate this point, let us imagine how a graphic scanner digitizes — "discretizes" — a picture by canvassing the picture in a sequential (pseudo-parallel) fashion. Let us also imagine how a silkworm spins silk — a sequential process — and attempts to wrap itself up in a cocoon. What
Bicomputing Survey II
505
transpires in either case is the formation of a sheet of meshwork, which is not airtight and always lets something through. However, the size of meshes ("holes") between "wires" (or "cords") and "nodes" of a meshwork can be reduced by means of a computer technique called "interleaving," in the case of image displaying on a video monitor, or by means of a silkworm instinct called "interweaving," in the case of silk spinning. The meshwork can be made waterproof, if not completely airtight, in both cases. This is how pseudo-parallel processing works. Again, picture-based reasoning must be invoked in the comparison of the two rather different scenarios: picture scanning and silk spinning. In this way, I hope that the entire picture is now crystal clear. In retrospect, the obstacle to demystification of high creativity appeared to be the difficulty in conferring concrete meanings to elusive terms such as intuition, insight and primary-process thinking. When Sternberg and Davidson wrote, "what we need most in the study of insight are some new insights!", they did it for a good practical reason: the enigma of creative problem solving is not a routine problem but rather an "insight" problem. Essentially, they used a term, of which they had no clue about the exact meaning, in a definitive way (as if they knew its meaning exactly), to express a valid opinion regarding how we should go about uncovering the hidden meaning of that term. The irony is: we, the readers, knew exactly what they meant when we read the remark, even though we were equally in the dark. We understood the semantics of their remark even though its syntax sounded like tautology. In other words, we all knew the meaning of insight implicitly but could not verbalize its meaning other than substituted the term with another equally elusive term. Sternberg and Davidson's remark sounds suspiciously self-referential, almost like a situation known colloquially as "Catch-22." The circularity goes like this. Because we wanted to discover how we discover, we need to have what we do not know how to describe in order to describe it. But the circularity is not exact, because the latter statement is not equivalent to the next one: we need to have what we do not have in order to have it. Of course, the real situation was not as bleak as the near-circularity had suggested, because here the "we" is a collective we. We certainly could use someone else's insight, such as offered by Poincare through his introspection, to solve this insight problem. As we shall see, the cryptic and somewhat humorous remark of Sternberg and Davidson furnishes an important insight (see below). Even so, we did get dangerously close to self-reference. In hindsight, we know that we needed to invoke picture-based reasoning
506
F. T. Hong
in order to understand what insight really is, as explained above. In hindsight, we also know that picture-based reasoning is what empowers one to have intuition or insight. But if we did not know this "connection," how could we know that we had to make the connection, anyway, in order to find that connection? Even if we stumbled on it by chance, how could we manage to recognize something that we did not expect (our mind was not "primed" for the recognition)? The way out of this near-circularity is: both our action (making connections) and our recognition must come together cooperatively and concurrently, getting close to each other a little bit at a time, in an iterative fashion. The recognition process involved several intermediate stages, in which the problem "pattern" were stretched successively in order to facilitate matching (recognition) with as many potential candidate "templates" as possible. In this regard, we must point out that, although the ill choice of words obstructs the understanding as Francis Bacon once claimed, an appropriate choice of words eases the transition of concepts, thus leading to the ultimate elucidation. This means that we must stretch the meaning of each term and then replace a crude term with a sharper and sharper term, thus shifting the meaning in the right direction. Successive stages of terminology replacements involve the following pairs of contrasting terms: verbal/visual, rule-based/pattern-based, algorithmic/holistic, sequential/parallel, digital/analog, discrete/continuous, etc. Note that, except for the first two pairs, most of the above pairs of terms are neither synonymous nor equivalent to one another. However, they do share certain overlapping shades of meaning with their nearest "neighbors." In contrast, the first and the last pairs have almost nothing in common. Successive stages of replacements were needed to make a smooth transition. The shift of shades of meaning from one pair of terms to another pair is thus gradual and insidious. Replacing "verbal" with "rule-based" allows the chain to propagate away from cognitive science and shift towards computer science and artificial intelligence. It is also helpful to include the following pairs of terms in the above list: verbalizable/non-verbalizable and syntactic/semantic. The linkage of the first pair to the previous list is obvious, but the second pair owes its linkage to the help of another pair of terms "formalizable/non-formalizable," according to the usage of Rosen. The latter's linkage to the pair "rulebased/picture-based" or the pair "algorithmic/holistic" is natural. However, a direct linkage between the pair "verbalizable/non-verbalizable" and the pair "syntactic/semantic" is rather tenuous, since a natural language
Bicomputing Survey II
507
itself consists of both syntactic and semantic elements. But the readers are expected to be so forgiving, in the present context, as to exclude semantics from the notion of verbalizability. Thus, the adjective "verbalizable" means easy to articulate with syntax alone. It would be an entirely different matter if semantics are included (see below). Once we recognized that two options of reasoning are available during the search-and-match phase of creative problem solving, the linkage of intuition as well as related concepts, such as insight and primary-process thinking, to picture-based reasoning became apparent. As a matter of fact, the above-mentioned remark of Sternberg and Davidson carries a hidden but useful message. We did understand the term "insight" implicitly through our gut feeling. Exactly how we did it is reflected in a famed remark made by U.S. Supreme Court associate justice Potter Stewart, regarding the meaning of the term "obscenity": "I know it when I see it" (p. 622 of Ref. 211). Indeed, we all recognize insight when we encounter it. Sternberg and Davidson's remark suggested that insight is a process that requires a holistic appreciation just like the subtle distinction between fine art and pornography does. In other words, the remark alone helped us recognize that insight is a parallel process rather than a sequential process, once we "prime" our mind to differentiate between picture-based [holistic] and rule-based [algorithmic] processes. The fact that we understand their remark through its semantic content, in spite of its empty syntactic content, also suggested the same, once we identified the semantic process with a parallel process and the syntactic process with a sequential process. The sudden "convergence" of all these pairs of dichotomous terms used in different scientific disciplines, made possible by our own picture-based reasoning, constitutes Wallas' illumination phase in regard to demystification of the process underlying intuition and insight as well as creativity: Eureka! Speaking about the "eureka" or "aha" experience, it is crucial to recognize that there are two options of reasoning in the search-and-match phase. Without this distinction, it would be difficult to explain why some people ever have an "aha' experience in their discovery process, whereas others never have. Picture-based reasoning is conducive to the occurrence of a snapping action that gives rise to the feeling of "Eureka!", whereas Simon's explanation of the "aha" experience in terms of rule-based reasoning is rather tenuous. Simon refused to acknowledge the subtle distinction between pseudo-parallel processing and true parallel processing. On the other hand, Simon's success was rooted in his approach of using pseudo-parallel processing to closely emulate parallel processes in human's thought: a tri-
508
F. T. Hong
umph of the cognitive approach in computer-based problem solving. An obvious question is: Have we succeeded in presenting a rigorous definition of "intuition" and the "aha" phenomenon? Emphatically not. Only a partial demystification has been accomplished. Since the main thrust of the demystification is to associate intuition with a non-verbalizable, parallel process, to claim that we did articulate intuition in unambiguous terms would be a blatant self-contradiction: we were not supposed to be able verbalize what we thought to be non-verbalizable. Nevertheless, the identification of intuition with picture-based reasoning or, more accurately, with parallel processing has demystified the process to the extent that it became possible to devise effective ways to enhance our own intuition and to verbally communicate our intuitive feeling. Vague descriptions, such as sagacity and imaginativeness, are now identifiable with concrete personality traits or mind habits, thus enabling us to design educational approaches to foster creativity and to avoid policies that inadvertently suppress it. The identification of intuition with picture-based reasoning suggests a heuristic approach to help us "localize" or "pinpoint" our gut feeling. Instead of examining verbal memories, one simply focuses on remembered patterns in the Gestalt field of perception in search of clues that may be responsible for our gut feeling. The demystification also made it possible to articulate the main reason why standardized testing is harmful: enforcing rule-based reasoning and suppressing intuition. The demystification of intuition was akin to the use of pseudo-parallel processing to simulate a parallel process; we did not actually verbalize (duplicate) it but we articulated (simulated) it with sufficient comprehensiveness to make it useful. In summary, the process of elucidation of the meaning of intuition was not a straightforward sequential process but rather a tortuous parallel process, with all related concepts being concurrently distorted and "stretched" until they matched one another in a single snapping action. It was akin to how one solves a kind of puzzle that contains mutually interacting and mutually dependent pieces: several pieces must be snapped together cooperatively and concurrently, in a self-consistent manner, rather than one piece at a time in a definite sequential order. It was also akin to solve an equation implicitly by repeated iterations until the solutions converge to a stable value by successive approximations. Moreover, it was akin to protein folding; metaphorically, the postcritical state was reached when we recognized what Poincare meant by his introspection and when we recognized the hidden message of Sternberg and Davidson's remark. In view of the elucidated meaning of intuition and primary-process
Bicomputing Survey II
509
thinking presented in this survey, it is evident that Kris' theory is essentially correct, with only a minor modification to be made. Kris associated Wallas' preparation and verification phases with secondary-process thinking, and associated the incubation and illumination phases — i.e., the search-and-match phase — with primary-process thinking. The difference lies in the search-and-match phase: primary-process thinking is the only option in Kris' formulation, whereas primary-process and secondary-process thinking are two available options in our current rendition of Simonton's model. Kris' omission is understandable because the rampant epidemic of exclusively rule-based reasoning, which I had the misfortune or privilege to witness, was a relatively recent occurrence. Had I not had this personal experience, I would never have suspected that rule-based reasoning could be a preferred option during the search-and-match phase. This reminds me of a remark aptly made by someone: the best way to study physiology is to study pathology instead. Translated into the present setting, the best way to study creativity is not to study geniuses' propensity of it but, instead, to study some of our high-achievers' spectacular lack of it. The realization that certain people practice exclusively rule-based reasoning whereas others practice combined rule- and picture-based reasoning ought to put the imageless thought controversy to rest (p. 2 of Ref. 386). Imageless thought is made possible by exclusively rule-based reasoning. The persistence of the controversy suggests that whoever insisted that thoughts are imageless had to be practitioners of exclusively rule-based reasoning, for otherwise an introspection would provide a first-hand counter-example (unless one let one's distrust of introspection override one's own experience). Likewise, whoever regarded imagery as an epiphenomenon were probably also practitioners of exclusively rule-based reasoning for the same reasoning. Again, with the benefit of hindsight, it is apparent that creative problem solving frowns at extremism. To maximize the likelihood of creative problem solving, several conflicting requirements must be met: • to use picture-based reasoning so as to maximize the probability of finding novel solutions, but to use rule-based reasoning so as to increase the speed of thinking, • to focus on problem solving so as to enhance the retrievability of key techniques and knowledge, but to defocus on problem solving so as to avoid getting trapped in an unfruitful search path, • to subdue one's attention during a problem-solving session so as to optimize one's affect, but to extend one's attention beyond a formal
510
• • • • •
•
F. T. Hong
problem-solving session so as to reap the benefit of hypnagogia and serendipity, to perform heuristic searching so as to select a manageable search space, but to avoid premature shrinking of the search space, to "zoom in" and pay attention to details, but to "zoom out" and pay attention to big pictures, to be highly explorative so as to expand the search space, but to have sufficient task involvement so as not to spread oneself too thin. to be sufficiently confident to assert an unpopular view, but not to be so excessively confident as to overlook clues provided by critics or opponents, to be sufficiently disciplined to perform rigorous logical deductions and to play by the rules forged by social consensus, but to be sufficiently undisciplined to defy authority whenever necessary — a "mood swing" between the traditionalist and iconoclast dispositions, in Simonton's words (see below). to be able to flexibly and dynamically switch between two opposing modes of action, as listed above.
Getzels spoke of the "paradox in creative thought": creative thinking entails child-like playfulness, fantasy and the non-rationality of primaryprocess thinking as well as conscious effort, rationality, reality orientation and logic.231 Csikszentmihalyi mentioned the conflicting requirements of openness and critical judgment. Simonton emphasized the well-adjusted trade-off between the traditionalist and iconoclast dispositions (p. 5 of Ref. 633). Hayes regarded divergent thinking and heuristic searching as two mutually conflicting approaches: one seeks to generate more candidate solutions, and the other seeks to limited the number of candidate solutions (p. 141 of Ref. 289). He decided that heuristic searching is more important than divergent thinking and thought that divergent thinking is unrelated to high-level creative activities. However, there is no compelling reason that the conflicting requirements must be fulfilled simultaneously. The resolution of these conflicting demands calls for a departure from the traditional dichotomous thinking. Besides, a creative mind is not a static mind; it is dynamic and flexible. Therefore, there is no real paradox — just our temporary confusion — and there is hardly any trade-off or compromise — just well-timed mood swings between extreme randomness and extreme determinacy in the thought process. One certainly can exercise divergent thinking in only a certain selected
Bicomputing Survey II
511
direction and maximize the number of candidate solutions only in that direction, thus enlisting both divergent thinking and heuristic searching. Furthermore, one can exercise divergent thinking to generate a large number of possible directions and then judiciously select only the most promising directions for further scrutiny, and, in case of an impasse, re-examine those directions which were initially rejected. I see no real conflict between divergent thinking and heuristic searching. Invoking heuristic searching without divergent thinking leads to excessive restrictions of the search space: a symptom of dogmatism. Furthermore, one can be extremely speculative and subjective at the stage of formulating a hypothesis, but later becomes extremely logical, methodical and objective at the stage of verification. In contrast, individuals with low creativity often fall short of both extremes and practice a static compromise (trade-off) between the two extremes, instead. The dynamic flexibility in cognition, or the lack of it, reflects an individual's mind habit, which is part of character or personality. Einstein was right when he once said: "Most people say that it is the intellect which makes a great scientist. They are wrong: it is the character" (p. 81 of Ref. 431). The creative character traits are simply mind habits, and are, therefore, not something that can be casually summoned simply by motivation to be creative, as Hayes once claimed. However, a creative mind habit can probably be acquired through diligent practicing of — for example — visual thinking. The creative character traits are probably partly inborn but can also be remolded by nurture in either directions. All normal children are born to be creative. It is probably the indoctrination, especially schooling, that diminishes a creative child to a mediocre adult: the well-known "dumbing-down" effect of education. Yet, schooling is almost indispensable for acquiring a significant body of domain-specific knowledge. The challenge to educators boils down to resolving this dilemma. With the elucidation of the concrete meaning of intuition, it is now understandable why intuition is the major source of inspiration in creative acts. Since intuition is a private feeling to begin with, it is also understandable why Poincare's introspective account is instrumental in the elucidation of the creative process. However, intuition and introspection are prone to error, verification is mandatory for every hypothesis so derived. Modern science tends to downplay the value of introspection, and modern education tends to distance itself from intuition, and to opt for the safe haven of logical deduction. Social pressures tend to discourage speculation. Speculation (which is tantamount to making a hypothesis) based on a limited number of
512
F. T. Hong
observations often invites the remark: "Do not jump to conclusions." However, investigators that recognize an emerging pattern after a small number of exposures certainly have a better chance to make a discovery than those who refuse to recognize a pattern until the pattern becomes validated by an inordinate number of repetitions and safely sanctioned by an orthodox statistical analysis. Likewise, a prehistoric hunter who recognized the distinction of patterns between a dangerous predator and an easy prey after a small number of exposures certainly had a better chance to survive and to pass on the "intelligence genes" to his offsprings than others who had to lose their life in order to learn the lesson. Human cognition is also organized in a nested hierarchical fashion, and modularity is a prominent feature both structurally and functionally. Conceptual information, which is modularized knowledge, is preferentially stored in long-term memory. Novel exploratory functions are predominantly performed by the right hemisphere, whereas "routinized" cognitive functions are transferred to the left cerebral hemisphere. Although the popularly held dichotomy of cerebral lateralization turned out to be oversimplified, it had provided a good starting point prior to reaching the next level of sophistication. Modularity persists even during the process of pattern recognition, an ostensibly right hemispheric specialty: a complex pattern is decomposed into familiar components (to be processed by the left hemisphere), and a set of relationships stipulating the linkage of these modules (to be processed by the right hemisphere). Thus, what appeared to be counter-intuitive at first sight can be understood in terms of functional modularity and hierarchy. That good science is counter-intuitive, as is commonly claimed, may be a symptom of half-baked science, instead. Simonton's chance-configuration theory should not be treated as simply a three-step procedure of creative problem solving. Rather, it is a nested hierarchical process that invokes both parallel and sequential processes. In other words, inductive and deductive reasoning are thoroughly mixed and integrated throughout the cognitive process. Failure to appreciate the nested hierarchical nature of the creative process was responsible for Medawar's denial of the existence of the inductive process in cognition. Likewise, attempts to demonstrate the dominance of the right hemisphere preference in creativity often led to ambiguous results presumably because the nested hierarchical nature of cognitive processes requires frequent switching between the right and the left hemispheres. Significant asymmetric hemispheric activities might have been masked by constant switching between the two cognitive modes of reasoning and a lack of sufficient time resolu-
Bicomputing Survey II
513
tion of the method of detection. Intriguingly, the hope of demonstrating the hemispheric asymmetry in the thought process may lie in the use of a peculiar sample, known as biomedical students, who have a penchant for practicing exclusively rule-based reasoning. The behavioral pattern of this subpopulation of students is so strikingly different from that of other subpopulations — and its recurrence in a series of observations is so consistent — that an attentive teacher hardly needs the help of orthodox statistical methodology to recognize it. Ironically, advances in machine intelligence research with the accompanying automation have helped exacerbate information explosion. Instead of replacing old and obsolete knowledge with new and essential knowledge, some curricular planners kept expanding the curricular content presumably under the misguided notion that knowledge means power; the more the better. The pundits somehow neglected the important fact that it is the skill of manipulating knowledge that transforms knowledge into power. Metaphorically, it is the process of oil-refinement and the skillful use of internal combustion or diesel engines that release energy in a useful form from crude oil; merely burning it generates only heat — a disordered form of energy — and pollution. Essentially, knowledge was dispensed in a highly efficient but ineffective top-down fashion in almost total disregard of students' cognitive development. As a consequence, students gained knowledge at the expense of knowledge-manipulating skills, and became overloaded and overwhelmed with a large body of raw information being shoved down their reluctant throat. Teachers' excessive emphasis of domain-specific knowledge together with increasing competition in school admission and in job hunting had a chilling effect of "dumbing down" — as the Americans aptly put — the students. Encountering a dilemma of whether to learn or to attain high grades, many of them opted for rule-based learning, thus accepting truths on faith most of the time. Widely used standardized testing further fostered the practice of exclusively rule-based reasoning. This is because standardized testing encourages students to use deductive reasoning at the expense of inductive reasoning, to discourage explorations and to encourage learning within the narrow confines of narrative thoughts presented by the examiners, to ruin the good habit of verification, and to foster a passive work ethic. In a desperate attempt to cope with information explosion and fierce competition that had not been significant at the time of the installation of standardized testing more than half a century ago, students were forced to practice exclusively rule-based reasoning instead of combined rule- and
514
F. T. Hong
picture-based reasoning, thus forsaking a major advantage of cerebral lateralization. Furthermore, the deductive ability of students diminished in spite of their penchant for exclusively rule-based reasoning. At least, these students managed to learn how to manipulate knowledge at the rule level. Worse still, some students were not even capable of rule-based learning. Many of them simply abandoned thinking altogether and succumbed to the authority of teachers and textbooks. All they could do was engage in the behaviorist activity of rote memorization and regurgitation on command or on demand. Fear of failure forced them to trust their memory of what teachers had preached more than the insight of what their mind had comprehended. The only accomplishment appeared to be a mass production of — if John Holland is again paraphrased — highly optimized mediocre examination-taking biomachines. In this regard, it is sobering to hear what Ayn Rand had to say through the mouth of John Gait, her alter ego (pp. 973-974, Part Three Chapter VII of Ref. 550): 'Do not say that you're afraid to trust your mind because you know so little. Are you safer in surrendering to mystics and discarding the little that you know? Live and act within the limit of your knowledge and keep expanding it to the limit of your life. Redeem your mind from the hockshops of authority. Accept the fact that you are not omniscient, but playing a zombie will not give you omniscience — that your mind is fallible, but becoming mindless will not make you infallible — that an error made on your own is safer than ten truths accepted on faith, because the first leaves you the means to correct it, but the second destroys your capacity to distinguish truth from error. In place of your dream of an omniscient automation [automaton], accept the fact that any knowledge man acquires is acquired by his own will and effort, and that that is his distinction in the universe, that is his nature, his morality, his glory.' Mindless learning has another serious consequence. The resulting degradation of academic performance exacerbates the situation and makes an individual even less capable of coping with information explosion. Excessively heightened affect (anxiety) caused by poor performances further diminishes the effectiveness of learning. Essentially, modern biomedical students seem to be so hopelessly trapped in the morass of the stimulus-response (SR) learning scheme of behaviorism that they become overwhelmed by the vicious cycle of information overload.
Bicomputing Survey II
515
The practice of exclusively rule-based learning is nothing new, but it has never reached such an all-time epidemic, especially among the subpopulation of premedical students: the legendary high-achievers that the public at large, as well as almost every concerned parent, admires. The intellectual inflexibility exhibited by these dumb high-achievers — an oxymoronic term — also happens to characterize the long-standing image of the inhabitants of the ivory tower: academic scholars. Dewey pointed out the contempt felt by the practical and successful executives for the "mere theorist," thus prompting the depreciatory practice of making distinctions between the terms abstract, theoretical, and intellectual and the term intelligent (see p. 138 of Ref. 174). In my opinion, it was not the fault of theories or theoreticians, but the rule-based academic interpreters who were either too dogmatic and conservative to generalize a given rule to a wider domain of usefulness or too liberal and undisciplined to stick to the range of validity of a given theory; a picture-based understanding of rules minimizes the errors of either extreme. In brief, it was the consequence of the inability of some academic interpreters to deal with the cases that fall between the "cracks" of dichotomy set by the well-established rules; it is a well-known symptom of digital processing and inflexible minds. However, Dewey also pointed out that habitual practices of depreciating theories can make these executives "too practical, as being so intent upon the immediate practical as not to see beyond the end of one's nose or as to cut off the limb upon which one is sitting." Need we go far to find good examples in the new corporate culture of cost-cutting and downsizing? Modern educational theories such as constructivism are much more compatible with how the human brain works. However, constructivism is currently under attack by public opinion in the United States.119 In my opinion, constructivism is theoretically sound but results of its implementation were disappointing so far. Constructivism is designed to foster picture-based reasoning and to cultivate cognitive flexibility through re-discoveries. In other words, it is a bottom-up approach that requires a self-organizing capability of the students. Strangely, its implementation has been top-down in nature, and there has not been any explicit and systematic effort to convince students or to enlist their cooperation. Apparently, the thinking of behaviorism is so deeply entrenched in the educational community and society at large that the evaluation methods that were originally designed for the purpose of enforcing behaviorism remain largely intact. A surprising number of educators continue to believe that learning can only be achieved by repeated drills, and continue to think
516
F. T. Hong
that every bit of useful information must be taught and memorized, as if the students were robots or expert systems of the old-generation, constructed during the early phase of AI research. In order to achieve high grades while being flooded and overwhelmed with information, students often find clever ways — they are not robots after all — to defeat the purpose of education, i.e., to perform spectacularly in standardized testing without actually mastering the skill of utilizing the acquired knowledge. In brief, constructivism has the right principle but lacks the right procedure to implement its ideals. In recent decades, it has become fashionable to advocate computerassisted education and Internet-based distance education. Although the idea appears superficially innocuous or even progressive, it is a questionable assumption that students will be able to self-assemble the available information into usable organized knowledge. Perhaps the educational community should learn the lessons entailed in the process of protein folding: a randomly synthesized peptide simply will not fold most of the time. Some constituent amino acid residues must be placed in strategic locations (e.g., folding nucleus) with apparent foresight in order to ensure proper folding. Even so, assistance by chaperons and other corrective measures are often needed to ensure the correct folding. Similarly, merely enforcing small group discussions alone may result in a regression of the instruction into a conventional top-down approach. The biocomputing principle provides a scientific basis and offers an easily understood rationale for picture-based learning. An understanding of the principle is conducive to the design of simple prescriptions that are so easy to follow that repeated indoctrinations, as advocated by behaviorists, are seldom necessary. Understanding the biocomputing principle thus offers some hope in educational reform. Understanding the cognitive process is also needed in implementing health-care reform. In the interest of cost-cutting, U.S. health-care providers have implemented a procedure of dispensing health care that is rooted in exclusively rule-based reasoning. The powerful executives of the health-care industry apparently believe that medical diagnoses can be made exclusively by rule-based reasoning. Regrettably, some investigators proposed evidencebased medicine, and claimed it to be a major paradigm shift in medical practice. I suspect that institutionalizing exclusively rule-based reasoning in medical training and practice will seriously undermine the quality of health care and exacerbate the already-skyrocketing health-care cost, contrary to the declared intent of the proponents. In spite of my misgivings about computerizing education and health care, some of the shortcomings may be avoidable. Insights gained from cere-
Bicomputing Survey II
517
bral lateralization suggest the following remedies. The common element of the aforementioned shortcomings is robotization and the accompanying loss of analog information. To de-robotize the process is to give back what has been taken away rather than go back to square one and undo the entire process of computerization: just restore analog pattern recognition while maintaining digital pattern recognition concurrently. It is not difficult to minimize friendly fire accidents in the battlefield: one just has to re-convert the digital coordinate information back to analog forms so that the decision makers have a backup system for error detection and corrections. Likewise, for health care and education, cerebral lateralization again suggests that visual thinking should be restored and reinforced rather than discouraged and eliminated. For Internet-based distance education and computer-assisted diagnosis and treatments, there is at least one technology that is ripe to serve our intended purpose: virtual reality. In order to give students back the capability of parallel processing and random access of information, one simply has to make "virtual-reality" books available to students so that a student can browse the book and flip through the pages. Most importantly, it is time for the education community to understand that it is not how much one knows that counts but rather it is how one ties together various knowledge modules to solve both routine and novel problems that really matter. Intelligence and consciousness are emergent phenomena that are characterized by many attributes. Furthermore, some, if not all, of these attributes possess a gray-scale nature. Thus, as one proceeds from inanimate objects, through biomolecules and lower life forms, to human beings at the top on the evolutionary scale, these attributes emerge in an insidious and continuous manner. Likewise, as one proceeds downward on the evolutionary scale from human beings down to unicellular organisms, consciousness gradually fades into non-consciousness, and intelligence fades into genetically endowed instinct without a clear-cut demarcation. Lacking criteria with explicit dichotomy to deal with these diverse attributes, investigators had no choice but to resort to somewhat subjective judgment, thus leading to a general lack of consensus. Historically, these emergent phenomena, which require subjective judgment, tended to be excluded from the realm of scientific investigation in a desperate attempt to maintain scientific objectivity. The potential technological applications of consciousness studies motivates scientists to directly confront the problem rather than evade it. We all know, by means of introspection, that we have consciousness. We also know, by means of introspection, that our intelligence is different
518
F. T. Hong
from animal instinct because of our ability to plan ahead in solving a problem. Most of us unconditionally transfer these attributes to other human beings rather than dismiss them as preprogrammed behavioral patterns. Of course, to do otherwise would be much too egocentric to be socially acceptable. However, we are more reluctant to grant these attributes to nonhuman animals. Animal intelligence and consciousness, if any, can only be ascertained by subjective judgment on observable objective behaviors of nonhuman animals. Consensus is virtually impossible because different investigators tend to assign different weights to those attributes that are impossible to evaluate with itemizable criteria or numerical scores. The notion of planning, motivation and will power makes sense only if one has the power to alter a preprogrammed course of action in order to evade an anticipated difficulty or to seek a desirable outcome. The presumption of mankind's ability to plan ahead implies a tacit acceptance of the existence of free will. Free will has a prominent subjective component of introspective nature. However, a significant number of eminent scientists and philosophers denied the existence of free will, and dismissed it as an illusion. Presumably, these investigators considered only a dichotomous choice between two alternatives: a process is either absolutely deterministic or completely random. So, they reasoned: if it is absolutely determined it is not free, but if it is completely random there is no will (see, for example, p. 1136 of Ref. 729). In neither case can mankind's cherished notion of moral responsibility be tenable. This dilemma, enunciated by Popper,529 by McGinn,449 by James,347 and by many others, can be resolved by admitting relative determinism in physical laws. In this way, the physical laws allow for sufficient dynamic tolerance to grant a restricted freedom, while at the same time providing sufficient constraints to make possible the formulation of will. The crucial issue in the conflict of free will and classical determinism is whether physical determinism is absolute or not. If biocomputing strictly abided by absolute determinism, then the conflict would be real and inevitable. In my opinion, the argument proposed by compatibilists to evade the conflict is erroneous because they confused relative determinism with absolute determinism. On the other hand, our analysis of biocomputing control laws suggests that absolute biological determinism is not valid. In this article, an argument at the ontological level is proposed to demonstrate that, contrary to conventional wisdom, microscopic reversibility contradicts macroscopic irreversibility. By attempting to derive the second law of thermodynamics from various possible (microscopic) physical laws of motion,
Bicomputing Survey II
519
Mackey432 previously concluded that the microscopic laws of physics are incorrectly formulated though the violation of these laws must be very minute. Alternatively, by treating microscopic reversibility merely as an excellent approximation rather than an exact physical law, the minute deviation from ideality can be amplified by the chaotic behavior of molecular collisions, thus leading naturally to macroscopic irreversibility. Again, contrary to conventional wisdom, I suspect that quantum mechanical uncertainty may be relevant in at least part of life processes that involve long-distance electron transfer reactions. An objection that advocates of absolute biological determinism may legitimately raise is the prospect that errors, once admitted into biocomputing, may be amplified beyond bounds, as implied by chaos theory. This possible detrimental effect is prevented by the self-organization of biosystems in terms of M-m-m dynamics, proposed by Conrad (Sec. 4 of Chapter 1). This nested hierarchical organization makes it possible to convert weak determinism, at a given computing dynamic level, into strong determinism, at another (usually higher) dynamic level. It is this re-convergence at a higher level of hierarchy that prevents randomness from being propagated beyond bounds. Absolute physical determinism, as enunciated by Laplace in his "hidden cause" argument for classical mechanics, can neither be proved nor disproved. This conclusion also holds true for the quantum mechanical version of Laplace's dictum. Being not falsifiable, absolute determinism is therefore an epistemological choice instead of a scientific fact. Likewise, the existence of free will can neither be proved nor disproved experimentally. Humans are thus free to accept or deny the existence of free will, because neither choice is prohibited by conventional science. Thus, the choice is a philosophical one. The scientific endeavor is a perpetual effort of seeking a deeper and deeper understanding. Verifying a scientific theory with experimental evidence is similar to the process of molecular recognition: a candidate theory is a "template" which is proposed to "shape-fit" a set of experimental observations. In molecular recognition, a receptor ("pattern") often fits more than one ligand ("template"). While molecular recognition is the goal, molecular deception by toxins or antagonists is a constant hazard. Likewise, a set of data may fit more than one competing theory, since a given set of experimental data can make only a finite number of "points of contact" with the theory being tested. Thus, a mere agreement with a finite set of experimental data does not automatically constitute proof of a theory; a
520
F. T. Hong
"misfit" between a theory and a set of experimental observations is not uncommon, especially in the absence of a competing theory. Because of the non-uniqueness of scientific theories (including mathematical or numerical models), the experimental proof of a given theory is a continual and on-going process of eliminating the less satisfactory alternatives. This conclusion follows from a more general formulation of Popper: a scientific theory can only be falsified but cannot be proved with absolute certainty. Yet, a surprising number of competent scientists were found to be unawareness of the difference between a scientific proof and a mathematical proof. A mathematical proof can be absolute because the axioms (mathematical counterparts of scientific hypotheses) are the pre-conditions of a mathematical theorem and are therefore exempt from a test of falsification; logical consistencies are all that matter. However, logical consistencies in a formal system may not be maintained at all times. Although some inherently undecidable propositions exist, as demonstrated by Kurt Godel, mathematics is not invalidated by Godel's theorem in a wholesale manner. Likewise, the fact that scientific theories cannot be proved in the absolute sense should not shatter our faith in science. The Popperian (non-uniqueness) view of scientific theories is actually compatible with the thinking of the Bayesian school. The Bayesian analysis does not cast a hypothesis as being correct or incorrect, but rather in terms of its plausibility. Thus, a mathematical model proposed along with a scientific theory, if verified by experiments, is more plausible than a nonmathematical model, since the former provides significantly more "points of contact" with experiments than the latter, and is more vulnerable to experimental challenges than the latter. Thus, a good theory is considered to be a good approximation to the scientific truth rather than the truth itself. With this softer view of scientific truths, the conflict of classical determinism and free will is a mere consequence of the limitations of existing scientific theories. Thus, in light of the Popperian view, the foregoing discussion on the conflict of free will and determinism may thus be regarded as being superfluous since a physical law is not supposed to be taken as an absolute truth any way. Nevertheless, I do not think that the debate about absolute determinism is a futile exercise, because a thorough debate often leads to demystification (partial elucidation) of the problem and, regardless of our initial position, we gain valuable new insights as a spin-off of the effort. Furthermore, Laplace's argument plays a useful role by constantly discouraging temptation to explain away observed dispersion as fundamental noise.
Bicomputing Survey II
521
Regrettably, Popper's epistemology, as well as the subjectivity involved in Bayesian probability, has inadvertently led social constructivists and others who practice epistemic relativism to champion the view that treats all scientific knowledge as being relative and, in the extreme case, regards science as mere social constructions for promoting a preconceived ideology. In a desperate search for scapegoats, scientists, such as Sokal and Bricmont, criticized Popper's Logik der Forschung for emphasizing falsifiability of science at the expense of verification. In my opinion, doing so was tantamount to shooting the messenger that had delivered the bad news. On the one hand, Sokal and Bricmont seemed to recognize the impossibility of absolute scientific proof. On the other hand, they seemed to have forgotten, from time to time, that, strictly speaking, verification — i.e., consistency check — only eliminates weaker competing alternatives. We should not forget the historical background: Popper's philosophy was proposed to debunk the belief that absolute verification is possible. Under that circumstance, had Popper emphasized verification, his message would have been seriously weakened by the superfluous distraction. Of course, Sokal and Bricmont's advice not to take Popper's epistemology to the extreme is well justified and well taken. By adopting a gray scale of scientific truths, a modest version of verification is tenable. But then the responsibility of verification (proof beyond reasonable doubt) must then rest with the individual investigators. Popper's message, once announced, becomes so patently obvious that it is puzzling why many scientists, often major players in a given field, continue to treat a casual check of consistency, often non-quantitative in nature, as adequate proof. Thus, an ad hoc theoretical model that appeared to be consistent with only a single type of experimental data, usually of postdiction — in Gauch's terminology — in nature, was often regarded by peers as confirmed and well established. Worse still, some biomedical scientists even treated a valid statistical correlation as validation of a hypothesis; the possibility of casual indirect correlations without valid causality perhaps had never crossed their mind. Furthermore, first-rated journals or reputed monographs continued to publish theoretical articles that made vague predictions that were often non-quantitative in nature; it is difficult to experimentally determine whether the theory offers better predictions than competing theories. In view of the above-mentioned dubious scientific practices, Popper's emphasis of falsifiability is fully justified. In view of the fact that it is still possible to find fault with the widely accepted principle of microscopic reversibility, Popper's stubborn refusal to accept the notion of "confirma-
522
F. T. Hong
tion" of a theory or principle seemed to be a more healthy scientific attitude than Bricmont's self-assured but unfounded confidence, to say nothing of his overt contempt towards the alternative view. No matter how well established a theory becomes, one should be constantly reminded of the tentative nature of a widely accepted theory. On the other hand, premature suppression of alternative ideas stifles science and technology. The premature falsification of the right brain's role in creativity offers a sobering lesson. It might have contributed to the decline of the American educational system. However, we should not condone postmodernists' excess of going to the other extreme by abusing Popper's epistemology. In spite of my misgiving, many incisive views of Sokal and Bricmont are agreeable and must be explicitly acknowledged here. Although I disagree with some of Sokal and Bricmont's views, I do not advocate a wholesale condemnation of their views. Sokal and Bricmont also targeted Thomas Kuhn's The Structure of Scientific Revolutions for the blame of breeding the postmodernist relativism. Specifically, Kuhn's idea that our experience of the world is radically conditioned by our cherished theories, which, in turn, depend on the prevalent paradigm. This element of subjectivity provided a good excuse for scientific or epistemic relativism. However, there is quite a bit of truth in Kuhn's notion, and supporting examples abound. For example, Liebovitch's fractal model was at odds with the prevalent practice of ion channel analysis (Sec. 8 of Chapter 1). The success of Hodgkin and Huxley's celebrated voltage clamp experiments led investigators to inappropriately transplant the classical method to photobiological investigations; the presence of light-sensitive elements invalidates the experimental approach (discussed in Ref. 325). Perhaps one of the most obvious examples of reigning paradigms that persist in spite of growing skepticism is the principle of microscopic reversibility. Our present survey offers the readers a convenient venue to evaluate the two opposing views and cast independent judgment on a paradigm that has reigned for centuries. Bruno Latour cited from the actual biomedical literature that two opposing camps habitually downplayed the validity of unfavorable evidence provided by the opponents — confirmation bias (see Chapter 1 in Part B of Ref. 408). The same tactic was actually used by Sokal and Bricmont in their defense of microscopic reversibility. Sokal and Bricmont mentioned that most physical laws exhibit invariance under time inversion with the exception of weak interactions between subatomic particles (pp. 149-150 of Ref. 640). Then, they hastened to add that "it is the non-reversibility of
Bicomputing Survey II
523
the laws of the 'weak interactions' .... that is new and at present imperfectly understood." In other words, they suggested that the readers should not take the exception too seriously, because the jury is still out. Nevertheless, the probabilistic control law governing the beta decay in particle physics was an observation that could not be readily dismissed by invoking an imperfect understanding at present; they apparently confused measurements with theories. Perhaps Sokal and Bricmont should provide evidence that the measurement technique was flawed and/or the observers were incompetent. Or, perhaps they should provide evidence that uncontrolled factors are deterministically causing the blatant dispersion of the observed data and quote Laplace's dictum to back up their claim. What Sokal and Bricmont did and said is but one of the many examples that scientists are conditioned by theories that they accepted or they believed to have been proved beyond reasonable doubt. Scientists are more critical to data that contradict their cherished beliefs than data that support their favored theories (confirmation bias); it is human nature (this author is no exception). This is the reason why a widely accepted but flawed theory is so difficult to debunk. This is also the reason why an original and better alternative theory is so hard to gain attention, let alone acceptance ("hysteresis" of scientific views). However, it would not be an even-handed treatment on my part, if the other side of the "coin" were not also examined. This hysteresis phenomenon is partly rooted in the Duhem thesis that has been adopted in handling large-scale theories in physical science: a theory must be judged as whole and not to be condemned by an isolated fault. This thesis ensures stability by damping the frequency of possible "flip-flops" in the acceptance or rejection of a theory, as new evidence continues to unfold. Since the holistic judgment inevitably involves subjectivity, one can error on either side when one slides down the "slippery slope" in opposite directions. As Timpane668 pointed out, the leap to acceptance requires a non-rational step. In my opinion, the step is not non-rational, but rather a step based on what Freud referred to as primary-process thinking: non-verbalizable parallel processing in judging the overall goodness of fit between a theory and all known data. Contrary to the suggestion of postmodernists, there is nothing scandalous about it, as long as opponent views are not actively suppressed by irrational means and as long as one is willing to change one's mind when counter-evidence becomes overwhelming. In view of the above-mentioned common scientific practices, I cannot help but find Latour's criticism to be valid. However, scientists sometimes
524
F. T. Hong
become their own worst enemy. Bunge's overzealous defense of Western science inadvertently exposed his own questionable views and dubious practices that were denounced by Latour or even by Sokal and Bricmont (e.g., Chapter 12 of Ref. 640). Bunge would not be able to convince postmodernists by resorting to the tactic of intimidation, much less convert them.x In our attempt to curb the excess of one extreme, need we scientists go to the other extreme? Bunge's battle calls for the expulsion of postmodernist sociologists from academia was particularly repugnant. Here, Voltaire's famed remark may serve as a sobering reminder: "I disapprove of what you say, but I will defend to the death your right to say it."y I tend to agree with the assessment of Sokal and Bricmont: postmodernists are not a threat to Western civilization — certainly no more a threat than science fundamentalists. The real threat comes from the continuing deterioration of the educational system, and the cultivation of rule-based learning that the system entails. The biggest casualty is the loss of innocence, and the resulting loss of the ability of casting independent, critical and rational judgment. The loss of innocence is most likely a major factor that is responsible for the decline of inborn creativity — a hallmark of developing children. It is the reason why it took an innocent child, depicted in Hans Christian Andersen's fairy tale, to see that the emperor actually had no clothes, whereas adults were oblivious to the fact. It is also no wonder that some over-trained scientists failed to appreciate the first-hand introspective account of the creative process, made by Poincare and by Koestler. What happened to Poincare and Koestler were not isolated cases. Betty Edwards applied the theory of cerebral lateralization to art teaching with an impressive success. But scientists ignored her, or even turned hostile towards her. Likewise, Don Campbell's practice of promoting the Mozart Effect was largely ignored by experts in spite of his practical success.103 Perhaps the interpretation offered by these outsiders is primitive and lacking the rigor and technical elegance expected by experts. However, the reported effectiveness of their approach should not be dismissed casually, and their fresh perspective, untainted by dogmas, may be inspiring. Why do we want to throw the baby out with the bath water? Some scientists' penchant for territorial claims and cultural inbreeding "Stephen Hawking286 also complained about his opponents' tactic of refutation by denigration. y The sentence is not Voltaire's, but was a paraphrase, by S. G. Tallentyre (nee E. Beatrice Hall), of Voltaire's words in the Essay on Tolerance (p. 307 of Ref. 50).
Bicomputing Survey II
525
is self-defeating since it may lead to a diminution of the search space in solving a scientific problem. The breaking away of the psychoanalytical school from the biologically oriented cognitive science is a case in point. The fateful decision made by some of Freud's prominent disciples to divorce the school from brain science essentially allowed psychoanalysis to, at least partly, detach from reality and prompted dissident psychiatrists to establish what is known as biological psychiatry. Biological psychiatry has demonstrated the impressive latitude of managing psychiatric illness with psychomimetic drugs. In following this trend, the educational establishment has inadvertently gone too far in de-emphasizing psychological counseling and attempting to manage students' learning problems with Ritalin and other drugs, thus masking many educational problems discussed in this survey. Scientists' long-standing distrust of anecdotal observations by outsiders was perhaps a major factor that drove the dissident outsiders to alternative science or medicine. It was an unnecessary loss to science and to society. The distinction between orthodox science and alternative science lies not so much in the discovery process (the search-and-match phase) but rather in the verification process. The segregation between orthodox science and alternative science was an artifact, created, at least in part, by the bigotry of some scientists. Scientists could have "assimilated" their ideas by converting their conclusions into scientifically falsifiable hypotheses without accepting their questionable interpretations. In my personal opinion, there is only one way to do science: the rational way. Sociologists of science can better serve mankind by distinguishing science from scientists: they are justified to criticize some scientists but they should not condemn science in a wholesale manner. On the other hand, natural scientists should welcome sociologists' criticism in a wholesale manner, whether it is valid or not. Even if the criticism is not valid, gaining insights into why the opponents or critics have been wrong constitutes a valuable enhancement in understanding. One should never dismiss a criticism simply because the criticism is not understandable and apparently makes no sense. Attempts to see and think from the point of view of the critics or the opponents are the fastest way to uncover one's own blind spots (heuristic searching). Although truths may not be determined by popular vote and time is at best an unreliable rectifier of historical wrongs, I maintain a cautious optimism for Western science, provided that the educational system is not allowed to run its natural course towards an untimely death. Perhaps the "free will" of individuals with a rational mind can make a crucial
526
F. T. Hong
difference. The rational way to end the so-called science war seems to be the adoption of a gray scale of scientific truths, and to cultivate good judgment in differentiating good science from bad science, and in differentiating a well-established theory from a highly speculative one. However, even wellestablished theories may not be infallible. It is necessary to maintain a healthy degree of skepticism, and to undertake periodic re-assessment — periodic "inventory" — of the validity of scientific theories, especially when they are threatened by new conflicting evidence, or challenged by a new theory. The procedure of formulating and verifying a theory or theoretical model and subsequent refinement is similar to the self-referential scheme proposed by Korner and Matsumoto. For an inquiring mind, information in the form of observations or data prompts the archicortex to formulate an initial hypothesis. The hypothesis is further modified and shaped by the detailed analysis performed by the paleocortex and the neocortex. It is no wonder that an initial hypothesis may sometimes be wrong, because it is a crude one after all. Perhaps it is not too farfetched to state that human brains are genetically wired to be suitable for making scientific inquiries, whereas empiricism is a sound approach to refine the understanding of our universe and our own mind. This is perhaps the right place to comment on my unorthodox approach that permeates the entire article. Aside from freely using introspections and anecdotes either to formulate hypotheses (or speculations) or to support proposed hypotheses, I also used a rather limited number of examples of bona fide scientific experiments as the starting point of induction. It is my hope that the readers will find more examples either to substantiate the claims or refute them. A frequently raised criticism is: What if the experimental examples eventually turn out to be wrong? The validity of induction, being a process of analog pattern recognition, is more fault-tolerant and less sensitive to the validity of individual examples than digital pattern recognition, provided that valid examples outnumber and/or outweigh dubious ones. Those dubious examples (in hindsight) can just be treated as noise in the process of pattern recognition, and the main conclusion may survive anyway (This is why I was not too concerned about the allegation of certain cases of faked serendipity). On the other hand, there is no guarantee that conclusions or interpretations based on highly trusted experimental data are necessarily correct, since more than one kind of conclusions or interpretations may fit the same set of data. This is why we need to adopt a
Bicomputing Survey II
527
gray scale of scientific truths, and a dynamic view about them: if in doubt, go back to the drawing board. Otherwise, one may become too timid to make any scientific progress or too uncritical to detect or suspect scientific fallacies. In the quest of understanding consciousness, research in nonlinear dynamic analysis and computer simulations leads to important insights: selforganization seems to be the driving force leading to the nested hierarchical organization in a complex system. Whether there is a threshold of complexity for the appearance of emergent phenomena remains unsettled. We must not confuse simulation with duplication, and analogy with identity. Self-organization may be a necessary factor for life to evolve efficiently but other overlooked factors or processes may also be involved, in view of Rosen's analysis. The recent resurgence of interest in consciousness has rekindled heated discussions. While investigators like Crick150 maintained an optimism about the possibility of understanding consciousness in terms of physics and chemistry, others like Penrose512'511'513 claimed that a new kind of physics will be required to explain consciousness. McGinn claimed that humans will never be able to understand consciousness because of the limitation imposed by our evolutionarily determined cognitive abilities (new-mysterian view). There is some truth in the new-mysterian view. Understanding a particular aspect of consciousness often leads to deeper questions, and understanding appears to be a moving target. It is desirable to adopt a gray scale of understanding: be willing to tolerate ambiguity, suspend judgment and live with limited understanding, be content with successive stages of demystification, and be willing to treat full elucidation as a limit that can only be approached asymptotically in the mathematical sense, i.e., one can, in principle, get arbitrarily close to it but never quite there. As pointed out by Sokal and Bricmont,640 this was indeed the position adopted by Laplace. Rosen also claimed that it is impossible to understand life (and consciousness) in terms of contemporary physics not because of humans' cognitive limit but because of the kind of epistemology we have chosen to abide by. All is well if we stay away from the "edge" of knowledge. Pushing the quest of our knowledge to the extreme may inevitably reveal the limitations of our chosen epistemology. At present, standing as a formidable stumbling block is scientists' steadfast refusal to give subjective experience an appropriate and legitimate status. Until recently, discussions of free will and consciousness have been greeted with contempt, if not outright hostility. Thus, scientists are not free from the bondage of the ideological strait-
528
F. T. Hong
jacket. No wonder it was so emotionally unsettling, when postmodernists pointed that out. While Rosen emphasized the limitation of the machine metaphor and the simulation approach in understanding life (and consciousness), the prospect of technological advances is brighter than what Rosen envisioned, in my opinion. The biomimetic approach can still yield handsome technological dividends. It is not inconceivable that future generations of computers and robots may be able to feign (simulate) free will. Since free will can neither be proved nor disproved by conventional behavioral experiments, humans will have a hard time making a distinction between "real" and simulated free will. In other words, it will become increasing difficult to judge the claim of strong AI and the detractors' counter-claim. In all fairness, both strong AI supporters and their detractors have scored important points. I believe that future digital machines can simulate consciousness to the extent of passing the Turing Test, but the underlying sequential processes may never fully capture the essence of human consciousness. Thus, one should refrain from the projection that we are fully equipped to understand the ultimate mystery of consciousness. Nor should one make the premature resignation that consciousness is beyond humans' comprehension. Biocomputing has inspired the research and development of machine intelligence since its inception. Early artificial intelligence research led to the development of expert systems: knowledge- and rule-based systems implemented in conventional digital computers. These systems used top-down rules in design and were limited to solving problems in which rules and algorithms had been explicitly stated. The imperative to overcome this limitation ushered in artificial neural network research. By emulating the distributive nature of biological information processing in a digital environment, artificial neural network research was able to overcome many shortcomings of rule-based systems. With the advent of agent technology, computers have begun to exhibit traits that were once regarded as exclusive attributes of the human brain: creativity as well as belief, desire and intention. Armed with insights from cognitive science, modern digital-computing-based artificial intelligence has made impressive progress that are beyond the wildest dream, except in the realm of science fictions, only about half a century ago. In view of the diminishing ability of critical and rational thinking of students trained to practice exclusively rule-based reasoning, the cognitive ability of an average student may no longer be able to compete with a digital computer. It is not entirely inconceivable that some day a digital
Bicomputing Survey II
529
computer may pass the Turing Test, whereas some, if not most, human beings may fail the test. The quest for flexibility has also influenced the research and development of computer hardware. The objective of embryonics research is to empower silicon-based hardware with the capability of self-assembly and selfrepair. Several prototypes that are capable of learning at the phylogenetic, the ontogenetic, and the epigenetic levels have been built. These prototypes demonstrated that John von Neumann's concept of a self-replicating automaton can be realized with conventional hardware. Neuromolecular computing models, pioneered by Conrad and coworkers, simulated computational networks of a multi-level hierarchical architecture, again in a digital environment. The inherent vertical integration of biocomputing dynamics allows increasingly rich dynamics to be captured as computational resources. However, the limitations due to a mismatch between the neuromolecular models and digital computing suggest that the real hope of implementing biocomputing paradigms lies in the use of synthetic organic molecules and biomolecules as the building blocks: molecular electronics. At present, major research activities in molecular electronics are focused on the development of advanced molecular materials and fabrication technologies. Several prototypes of sensors, actuators, memories, transistors and other component-level devices have been constructed. No large-scale molecular neural networks have been successfully developed. A notable exception is DNA computing. This approach allowed some traditionally intractable computing problems to be solved by non-algorithmic methods. Another new approach — tissue engineering — using cultured neurons for device construction (neural devices) has also attracted a great deal of attention from investigators.5'6 Molecular materials are fragile, but Nature chose them as building blocks of living organisms because Nature compensated for this shortcoming by evolving mechanisms of self-repair. The capability of self-repair is intimately related to biomaterials' inherent property of self-assembly. Therefore, self-repair and self-assembly have been recognized as the keys to success in building commercially viable molecular electronic devices.714 In fact, early fabrication technology research took advantage of some biomolecules' capability of self-assembly. Considerable efforts have been devoted to immobilizing functionally active protein molecules, which are capable of selfassembly, on solid substrates. However, immobilization eliminates the effect of membrane fluidity, thus forsaking a major advantage of molecular interactions (Sec. 5.2 of Chapter 1). However, I believe that immobilization is
530
F. T. Hong
still a valuable tentative approach in the midst of the learning curve. It is an intermediate step which investigators took while struggling towards the development of major techniques required for achieving the goal of harnessing the ultimate power of biocomputing. While immobilization constitutes a top-down effort, an appreciation of the bottom-up approach has led to efforts in the exploitation of noise in artificial neural network research. For example, Chinarov122 showed that noise influences the function and dynamics of neural networks. The performance of the multi-level neuromolecular model of Chen and Conrad was improved by the introduction of noise in the decision-making process. Fluctuation-driven ion transport (Brownian ratchet) has been considered for technological applications.21'23'682 At the present stage, one of the best approaches in machine intelligence that may meet the demands of both short-term requirements and long-term growth is the adoption of hybrid designs, which encompass traditional AI, neural networks, robotics, and molecular devices. Molecular devices such as biosensors and molecular actuators can be interfaced with conventional digital computers for front end processing. Molecular devices may also be developed to solve specific problems that are intractable for digital computing, as exemplified by what DNA computing has accomplished. Nanotechnology has begun to take the lead because of its readiness to dovetail with conventional microelectronics technology. From the point of view of systems performance of machine intelligence, bottom-up designs are superior to top-down designs. Like all autocratic or bureaucratic systems, top-down designs are plagued with inherent rigidity and inflexibility. However, from the point of view of engineering, bottomup designs are technically formidable to implement. A functioning and selfsustaining molecular computing system demands not only interlocking parts but also interlocking control laws (kinetics). In addition, finding a procedure to assemble these components in a correct sequence and to start the assembled system with matching initial conditions, in a consistent, coherent and rational way, remains a formidable task for designers because, in Rosen's words, there are simply too many entailment loops to close in a top-down fashion. As suggested by Pribram and, more recently, by Korner and Matsumoto, a combination of top-down and bottom-up approaches provides the needed escape hatch. The question is how to come up with designs that combine the best of two approaches while avoiding the shortcomings. Molecular electronics as a science is in an exciting stage of development. However, molecular electronics as a technology is still in its infancy.
Bicomputing Survey II
531
Obviously, establishing the supportive infrastructure is a prerequisite for fruitful research and development of molecular electronics. Organized research is traditionally divided into two categories: basic and applied research. However, molecular electronics research has blurred the distinction. The importance of basic research is widely recognized in molecular electronics research, and many investigators pursuing molecular device development have made significant contributions to basic research. The role of basic and applied research is best understood in the context of biocomputing, since both are tantamount to problem solving at the societal, national, or international level. Applied research is more focused, whereas basic research is more exploratory. Applied research without adequate support from basic research must then resort to random searching in device development. Applied research that is fully integrated with basic research can rely upon heuristic searching for the development of novel devices. Purists in basic science research often scorn applied research and try hard to distance themselves from applied research. The reality is: interactions between basic and applied research tend to foster a kind of crossfertilization that often leads to significant upgrading of basic research. Organized research is traditionally guided by a top-down policy because of the involvement of taxpayers' money. It seems desirable to add a bottomup component to policy-making with regard to the detailed layout of the desired research directions as well as the optimal allocation of resources between basic and applied research. Essentially, all viable ideas and directions should be encouraged and explored. Any premature attempt to focus on a few seemingly promising options may inadvertently foreclose future opportunity, in very much the same way as getting stranded in a local fitness peak. Acknowledgments This work was supported in part by a contract from Naval Surface Warfare Center (N60921-91-M-G761) and a Research Stimulation Fund of Wayne State University. The author thanks the following individuals for critical reading and helpful discussion of the manuscript: Stephen Baigent, Martin Blank, Krzystof Bryl, Vladimir Chinarov, Leonid Christophorov, Donald DeGracia, Stephen DiCarlo, Piero Foa, J. K. Heinrich Horber, Hans Kuhn, Gabriel Lasker, Gen Matsumoto, Koichiro Matsuno, James Moseley, Raymond Paton, Nikolai Rambidi, Uwe Sleytr, Juliusz Sworakowski, Harold Szu, Ann Tate and Klaus-Peter Zauner. The author is indebted to the
532
F. T. Hong
late Professor Michael Conrad of Wayne State University for his profound influence. The author thanks Arif Selguk Ofrenci for furnishing a short problem, cited in Sec. 4.7. The author also thanks Martin Blank, Deborah Conrad, Shih-Wen Huang, Djuro Koruga, Hwan C. Lin, Raymond Paton, Uwe Sleytr, Christof Teuscher and Wouter Wijngaards for helps in locating key references. Specials thanks are due to Maestro Ya-Hui Wang of Akron Symphony Orchestra for commenting on Sec. 4.20. The editorial help provided by Christine Cupps, Stephen DiCarlo, Crystal Hill, Filbert Hong, David Lawson, David Rodenbaugh, Taitzer Wang and Timothy Zammit during the preparation of the manuscript was indispensable and is deeply appreciated. Last but not least, I wish to thank my editor Vladimir B. Bajic for his guidance, editorial help and extraordinary patience. References 1. R. H. Abraham and C. D. Shaw, Dynamics: a visual introduction, in SelfOrganizing Systems: The Emergence of Order, Ed. F. E. Yates (Plenum, New York and London, 1987), pp. 543-597. 2. Y. S. Abu-Mostafa, Machines that learn from hints. Sci. Am. 272(4), 64-69 (1995). 3. L. M. Adleman, Molecular computation of solutions to combinatorial problems. Science 266, 1021-1024 (1994). 4. A. Ahlbom and M. Feychting, A Bayesian approach to hazard identification: the case of electromagnetic fields and cancer, in Uncertainty in the Risk Assessment of Environmental and Occupational Hazards: An International Workshop, Annal. NY Acad. Sci. Vol. 895, Eds. A. J. Bailer, C. Maltoni, J. C. Bailar III, F. Belpoggi, J. V. Brazier and M. Soffritti (New York Academy of Sciences, New York, 1999), pp. 27-33. 5. M. Aizawa, Molecular interfacing for protein molecular devices and neurodevices. IEEE Eng. Med. Biol. Magaz. 13(1), 94-102 (1994). 6. M. Aizawa, Y. Yanagida, T. Haruyama and E. Kobatake, Genetically engineered molecular networks for biosensing systems. Sensors and Actuators B52, 204-211 (1998). 7. B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts and J. D. Watson, Molecular Biology of the Cell, 3rd edition (Garland Publishing, New York and London, 1994). 8. J. E. Alcock, The propensity to believe, in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996), pp. 64-78. 9. M. Alper, H. Bayley, D. Kaplan and M. Navia, Eds., Biomolecular Materials by Design, Materials Research Society Symposium Proceedings, Vol. 330 (Materials Research Society, Pittsburgh, PA, 1994). 10. D. G. Altman and J. M. Bland, Absence of evidence is not evidence of absence. Brit. Med. J. 311, 485 (1995).
Bicomputing Survey II
533
11. T. M. Amabile, Children's artistic creativity: detrimental effects of competition in a field setting. Personality and Social Psychology Bulletin 8, 573-578 (1982). 12. T. M. Amabile, Motivation and creativity: effects of motivational orientation on creative writers. J. Personality and Social Psychology 48, 393-399 (1985). 13. P. B. Andersen, C. Emmeche, N. O. Finnemann and P. V. Christiansen, Eds., Downward Causation: Minds, Bodies and Matter (Aarhus University Press, Aarhus, Denmark, 2000). 14. F. H. Anderson, Ed., Francis Bacon: The New Organon and Related Writings (Prentice-Hall, Upper Saddle River, NJ, 1960). 15. P. W. Anderson and D. L. Stein, Broken symmetry, emergent properties, dissipative structures, life: are they related? in Self-Organizing Systems: The Emergence of Order, Ed. F. E. Yates (Plenum, New York and London, 1987), pp. 445-457. 16. A. Angelopoulos, A. Apostolakis, E. Aslanides, G. Backenstoss, P. Bargassa, O. Behnke, A. Benelli, V. Bertin, F. Blanc, P. Bloch, P. Carlson, M. Carroll, E. Cawley, S. Charalambous, M. B. Chertok, M. Danielsson, M. Dejardin, J. Derre, A. Ealet, C. Eleftheriadis, L. Faravel, W. Fetscher, M. Fidecaro, A. Filipcic, D. Francis, J. Fry, E. Gabathuler, R. Garnet, H.-J. Gerber, A. Go, A. Haselden, P. J. Hayman, F. Henry-Couannier, R. W. Hollander, K. Jon-And, P.-R. Kettle, P. Kokkas, R. Kreuger, R. Le Gac, F. Leimgruber, I. Mandic, N. Manthos, G. Marel, M. Mikuz, J. Miller, F. Montanet, A. Muller, T. Nakada, B. Pagels, I. Papadopoulos, P. Pavlopoulos, A. Policarpo, G. Polivka, R. Rickenbach, B. L. Roberts, T. Ruf, C. Santoni, M. Schafer, L. A. Schaller, T. Schietinger, A. Schopper, L. Tauscher, C. Thibault, F. Touchard, C. Touramanis, C. W. E. Van Eijk, S. Vlachos, P. Weber, O. Wigger, M. Wolter, D. Zavrtanik and D. Zimmerman, First direct observation of time-reversal non-invariance in the neutral-kaon system. Phys. Lett. B444, 43-51 (1998). 17. R. Arnheim, Visual Thinking (University of California Press, Berkeley, Los Angeles and London, 1969). 18. A. P. Arnold and S. W. Bottjer, Cerebral lateralization in birds, in Cerebral Lateralization in Nonhuman Species, Ed. S. D. Glick (Academic Press, Orlando, San Diego, New York, London, Toronto, Montreal, Sydney and Tokyo, 1985), pp. 11-39. 19. W. B. Arthur, Self-reinforcing mechanisms in economics, in The Economy as an Evolving Complex System, Santa Fe Institute Studies in the Sciences of Complexity, Vol. V, Eds. P. W. Anderson, K. J. Arrow and D. Pines (Addison-Wesley, Redwood City, CA, Menlo Park, CA, Reading, MA, New York, Don Mills, Ont., Wokingham, UK, Amsterdam, Bonn, Sydney, Singapore, Tokyo, Madrid and San Juan, 1988), pp. 9-31. 20. W. B. Arthur, Positive feedbacks in the economy. Sci. Am. 262(2), 92-99 (1990). 21. R. D. Astumian, Thermodynamics and kinetics of a Brownian motor. Science 276, 917-922 (1997).
534
F. T. Hong
22. R. D. Astumian, Making molecules into motors. Sci. Am. 285(1), 56-64 (2001). 23. R. D. Astumian and I. Derenyi, Fluctuation driven transport and models of molecular motors and pumps. Eur. Biophys. J. 27, 474-489 (1998). 24. A. Aviram and M. A. Ratner, Molecular rectifiers. Chem. Phys. Lett. 29, 277-283 (1974). 25. A. Aviram and M. Ratner, Eds., Molecular Electronics: Science and Technology, Annal. NY Acad. Sci. Vol. 852 (New York Academy of Sciences, New York, 1998). 26. A. Aviram, M. Ratner and V. Mujica, Eds., Molecular Electronics II, Annal. NY Acad. Sci. Vol. 960 New (York Academy of Sciences, New York, 2002). 27. R. Axelrod, The Evolution of Cooperation (Basic Books, New York, 1984). 28. B. J. Baars, The Cognitive Revolution in Psychology (Guilford Press, New York and London, 1986). 29. B. J. Baars, A Cognitive Theory of Consciousness (Cambridge University Press, Cambridge, New York, New Rochelle, Melbourne and Sydney, 1988). 30. B. J. Baars, In the Theater of Consciousness (Oxford University Press, New York and Oxford, 1997). 31. A. Baddeley, The concept of working memory: a view of its current state and probable future development. Cognition 10, 17-23 (1981). 32. A. D. Baddeley, Working memory. Phil. Trans. R. Soc. Lond. B302, 311— 324 (1983). 33. A. D. Baddeley, N. Thomson and M. Buchanan, Word length and the structure of short-term memory. J. Verbal Learning and Verbal Behavior 14, 575-589 (1975). 34. A. Bader, Loschmidts graphische Formeln von 1861: Vorlaufer der modernen Strukturformeln, in Symposium zum 100. Todestag von Josef Loschmidt (1821-1895) (University of Vienna, Vienna, 1995). 35. J. C. Bailar, III, Science, statistics, and deception. Ann. Int. Med. 104, 259-260 (1986). 36. S. Bailin, Achieving Extraordinary Ends: An Essay on Creativity (Kluwer Academic Publishers, Dordrecht, Boston and Lancaster, 1988). 37. P. Bak and K. Chen, Self-organized criticality. Sci. Am. 264(1), 46-53 (1991). 38. P. Bak, C. Tang and K. Wiesenfeld, Self-organized criticality. Physical Rev. A38, 364-374 (1988). 39. N. A. Baker, D. Sept, S. Joseph, M. J. Hoist and J. A. McCammon, Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl. Acad. Sci. USA 98, 10037-10041 (2001). 40. J. M. Baldwin, Mental Development in the Child and the Race: Methods and Processes, 3rd revised edition (Macmillan, New York and London, 1925). Reprinted (Augustus M. Kelley Publishers, New York, 1968). 41. R. L. Baldwin and G. D. Rose, Is protein folding hierarchic? I. local structure and peptide folding. Trends Biochem. Sci. 24, 26-33 (1999). 42. R. L. Baldwin and G. D. Rose, Is protein folding hierarchic? II. folding intermediates and transition states. Trends Biochem. Sci. 24, 77-83 (1999).
Bicomputing Survey II
535
43. E. Bamberg, J. Tittor and D. Oesterhelt, Light-driven proton or chloride pumping by halorhodopsin. Proc. Natl. Acad. Sci. USA 90, 639-643 (1993). 44. J. Barber, Rethinking the structure of the photosystem two reaction centre. Trends Biochem. Sci. 12, 123-124 (1987). 45. J. Barber, Are the molecular electronics of the reaction centres of bacteria and photosystem two comparable? in Proceedings of the 12th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, November 1-4, 1990, Philadelphia, PA, Eds. P. C. Pedersen and B. Onaral (IEEE, Washington, DC, 1990), pp. 1792-1793. 46. J. Barber and B. Andersson, Too much of a good thing: light can be bad for photosynthesis. Trends Biochem. Sci. 17, 61-66 (1992). 47. H. Barlow, The mechanical mind. Annu. Rev. Neurosci. 13, 15-24 (1990). 48. D. M. Barry and J. M. Nerbonne, Myocardial potassium channels: electrophysiological and molecular diversity. Annu. Rev. Physiol. 58, 363-394 (1996). 49. L. Barsanti, V. Evangelista, P. Gualtieri, V. Passarelli and S. Vestri, Eds., Molecular Electronics: Bio-sensors and Bio-computers, NATO Science Series II, Vol. 96 (Kluwer Academic Publishers, Dordrecht, Boston and London, 2003). 50. J. Bartlett and J. Kaplan, Familiar Quotations: A Collection of Passages, Phrases, and Proverbs Traced to Their Sources in Ancient and Modern Literature, 16th edition (Little, Brown and Co., Boston, Toronto and London, 1992). 51. J. B. Bassingthwaighte, L. S. Liebovitch and B. J. West, Fractal Physiology (Oxford University Press, New York and Oxford, 1994). 52. T. Bastick, Intuition: How We Think and Act (John Wiley and Sons, Chichester, New York, Brisbane, Toronto and Singapore, 1982). 53. R. C. Beason and P. Semm, Neuroethological aspects of avian orientation, in Orientation in Birds, Ed. P. Berthold (Birkhauser Verlag, Basel and Boston, 1991), pp. 106-127. 54. W. Beck and W. Wiltschko, The magnetic field as a reference system for genetically encoded migratory direction in pied flycatchers (Ficedula hypoleuca Pallas). Z. Tierpsychol. 60, 41-46 (1982). 55. E. T. Bell, Men of Mathematics: The Lives and Achievements of the Great Mathematicians from Zeno to Poincare (Simon and Schuster, New York, London, Toronto, Sydney, Tokyo and Singapore, 1986). 56. J. C. I. Belmonte, How the body tells left from right. Sci. Am. 280(6), 46-51 (1999). 57. J. M. Benyus, Biomimicry: Innovation Inspired by Nature (William Morrow and Co, New York, 1997). 58. J. O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd edition (Springer-Verlag, New York, Berlin and Heidelberg, 1985). 59. J. O. Berger and D. A. Berry, Statistical analysis and the illusion of objectivity. American Scientist (Sigma Xi) 76, 159-165 (1988). 60. P. L. Berger and T. Luckmann, The Social Construction of Reality: A Treatise in the Sociology of Knowledge (Anchor Books, Doubleday, New York,
536
F. T. Hong
London, Toronto, Sydney and Auckland, 1989). 61. E. R. Berlekamp, J. H. Conway and R. K. Guy, Winning Ways for Your Mathematical Plays, Volume 2: Games in Particular (Academic Press, London, New York, Paris, San Diego, San Francisco, Sao Paulo, Sydney, Tokyo and Toronto, 1982). 62. T. Berners-Lee, J. Hendler and O. Lassila, The semantic web. Sci. Am. 284(5), 34-43 (2001). 63. E. Berscheid and E. Walster, Physical attractiveness, in Advances in Experimental Social Psychology, Vol. 7, Ed. L. Berkowitz (Academic Press, New York and London, 1974), pp. 157-215. 64. T. G. Bever and R. J. Chiarello, Cerebral dominance in musicians and nonmusicians. Science 185, 537-539 (1974). 65. S. M. Bezrukov and I. Vodyanoy, Noise-induced enhancement of signal transduction across voltage-dependent ion channels. Nature 378, 362-364 (1995). 66. M. H. Bickhard (with D. T. Campbell), Emergence, in Downward Causation: Minds, Bodies and Matter, Eds. P. B. Andersen, C. Emmeche, N. O. Finnemann and P. V. Christiansen (Aarhus University Press, Aarhus, Denmark, 2000), pp. 322-348. 67. M. L. Bigge, Learning Theories for Teachers, 4th edition (Harper and Row, New York, 1982). 68. R. M. Bilder and F. F. LeFever, Eds., Neuroscience of the Mind on the Centennial of Freud's Project for a Scientific Psychology, Annal. NY Acad. Sci. Vol. 843 (New York Academy of Sciences, New York, 1998). 69. V. P. Bingman, L. V. Riters, R. Strasser and A. Gagliardo, Neuroethology of avian navigation, in Animal Cognition in Nature: The Convergence of Psychology and Biology in Laboratory and Field, Eds. R. P. Balda, I. M. Pepperberg and A. C. Kamil (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1998), pp. 201-226. 70. R. R. Birge, Protein-based optical computing and memories. IEEE Computer 25(11), 56-67 (1992). 71. R. R. Birge, Protein-based three-dimensional memory. American Scientist (Sigma Xi) 82, 348-355 (1994). 72. R. R. Birge, Protein-based computers. Sci. Am. 272(3), 90-95 (1995). 73. M. Blank, Ed., Electromagnetic Fields: Biological Interactions and Mechanisms, Advances in Chemistry Series, No. 250 (American Chemical Society, Washington, DC, 1995). 74. K. Blaukopf, Gustav Mahler, translated by I. Goodwin (Praeger Publishers, New York and Washington, DC, 1973). Original German version: Gustav Mahler oder der Zeitgenosse der Zukhunft (Verlag Fritz Molden, WienMiinchen- Zurich, 1969). 75. M. A. Boden, The Creative Mind: Myths and Mechanisms (Basic Books, New York, 1991). 76. M. A. Boden, Computer models of creativity, in Handbook of Creativity, Ed. R. J. Sternberg (Cambridge University Press, Cambridge, New York and Melbourne, 1999), pp. 351-372. 77. J. E. Bogen and G. M. Bogen, Split-brains: interhemispheric exchange in
Bicomputing Survey II
78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91.
92. 93. 94. 95.
537
creativity, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 571-575. D. Bohm, Causality and Chance in Modern Physics (University of Pennsylvania Press, Philadelphia, 1957). R. Bonneau and D. Baker, Ab initio protein structure prediction: progress and prospects, Annu. Rev. Biophys. Biomol. Struct. 30, 173-189 (2001). M.-A. Bouchiat and L. Pottier, An atomic preference between left and right. Sci. Am. 250(6), 100-111 (1984). A. Boucourechliev, Stravinsky, translated by M. Cooper (Victor Gollancz, London, 1987). Original French version: Stravinsky (Librairie Artheme Fayard, 1982). A. Boukreev and G. W. DeWalt, The Climb: Tragic Ambitions on Everest (St. Martin's Griffin, New York, 1999). A. Bouma, Lateral Asymmetries and Hemispheric Specialization: Theoretical Models and Research (Swets and Zeitlinger, Amsterdam, Lisse, Rockland, MA, and Berwyn, PA, 1990). R. B. Bourret, K. A. Borkovich and M. I. Simon, Signal transduction pathways involving protein phosphorylation in prokaryotes. Annu. Rev. Biochem. 60, 401-441 (1991). D. D. Boyden, An Introduction to Music (Alfred A. Knopf, New York, 1967). J. L. Bradshaw, Hemispheric Specialization and Psychological Function (John Wiley and Sons, Chichester, New York, Brisbane, Toronto and Singapore, 1989). R. S. Braich, N. Chelyapov, C. Johnson, P. W. K. Rothemund and L. Adleman, Solution of a 20-variable 3-SAT problem on a DNA computer. Science 296, 499-502 (2002). M. E. Bratman, Intention, Plans, and Practical Reason (Harvard University Press, Cambridge, MA, and London, 1987). M. E. Bratman, D. J. Israel and M. E. Pollack, Plans and resource-bounded practical reasoning. Computational Intelligence 4, 349-355 (1988). C. Brauchle, N. Hampp and D. Oesterhelt, Optical applications of bacteriorhodopsin and its mutated variants. Adv. Mater. 3, 420-428 (1991). J. Bricmont, Science of chaos or chaos in science? in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996), pp. 131175. D. E. Broadbent, Perception and Communication (Pergamon Press, New York, London, Paris and Los Angeles, 1958). R. A. Brooks and P. Maes, Eds., Artificial Life IV (MIT Press, Cambridge, MA, and London, 1994). S. G. Brush, Kinetic Theory, Volume 2: Irreversible Processes (Pergamon Press, Oxford, London, Edinburgh, New York, Toronto, Sydney, Paris and Braunschweig, 1966). S. G. Brush, The Kind of Motion We Call Heat: A History of the Kinetic Theory of Gases in the 19th Century, Book 1: Physics and the Atomists
538
F. T. Hong
(North-Holland Publishing, Amsterdam, New York and Oxford, 1976). 96. S. G. Brush, The Kind of Motion We Call Heat: A History of the Kinetic Theory of Gases in the 19th Century, Book 2: Statistical Physics and Irreversible Processes (North-Holland Publishing, Amsterdam, New York and Oxford, 1976). 97. M. Bunge, In praise of intolerance to charlatanism in academia, in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996), pp. 96-115. 98. K. Bushweller, Lessons from the analog world: what tomorrow's classrooms can learn from today. (Reprinted from Electronic School, September 2000, National School Boards Association) American Educator (American Federation of Teachers) 25(3), 30-33,45 (2001). 99. R. Byrne, The Thinking Ape: Evolutionary Origins of Intelligence (Oxford University Press, Oxford, New York and Tokyo, 1995). 100. A. Calaprice, Ed., The Expanded Quotable Einstein (Princeton University Press, Princeton, NJ, and Oxford, 2000). 101. W. H. Calvin, The emergence of intelligence. Sci. Am. 271(4), 100-107 (1994). 102. J. Cameron and W. D. Pierce, Reinforcement, reward, and intrinsic motivation: a meta-analysis. Rev. Educational Research 64, 363-423 (1994). 103. D. Campbell, The Mozart Effect: Tapping the Power of Music to Heal the Body, Strengthen the Mind, and Unlock the Creative Spirit (Avon Books, New York, 1997). 104. D. T. Campbell, Blind variation and selective retention in creative thought as in other knowledge processes. Psychological Rev. 67, 380-400 (1960). 105. S. C. Cannon, Sodium channel defects in myotonia and periodic paralysis. Annu. Rev. Neurosci. 19, 141-164 (1996). 106. F. L. Carter, Ed., Molecular Electronic Devices (Marcel Dekker, New York and Basel, 1982). 107. F. L. Carter, Ed., Molecular Electronic Devices II (Marcel Dekker, New York and Basel, 1987). 108. F. L. Carter, R. E. Siatkowski and H. Wohltjen, Eds., Molecular Electronic Devices (North-Holland Publishing, Amsterdam, New York, Oxford and Tokyo, 1988). 109. C. F. Chabris, Prelude or requiem for the 'Mozart effect'? Nature 400, 826-827 (1999). 110. R. Chagneux and N. Chalazonitis, Evaluation de l'anisotropie magnetique des cellules multimembranaires dans un champ magnetique constant (segments externes des batonnets de la retine de grenouille). C. R. Acad. Sci. Paris Ser. D274, 317-320 (1972). 111. N. Chalazonitis, R. Chagneux and A. Arvanitaki, Rotation des segments externes des photorecepteurs dans le champ magnetique constant. C. R. Acad. Sci. Paris Ser. D271, 130-133 (1970). 112. D. J. Chalmers, The Conscious Mind: In Search of a Fundamental Theory (Oxford University Press, New York and Oxford, 1996).
Bicomputing Survey II
539
113. W. G. Chase and H. A. Simon, Perception in chess. Cognitive Psychology 4, 55-81 (1973). 114. W. G. Chase and H. A. Simon, The mind's eye in chess, in Visual Information Processing, Ed. W. G. Chase (Academic Press, New York, San Francisco and London, 1973), pp. 215-281. 115. J. Chen, M. A. Reed, A. M. Rawlett and J. M. Tour, Large on-off ratios and negative differential resistance in a molecular electronic device. Science 286, 1550-1552 (1999). 116. J.-C. Chen, Computer Experiments on Evolutionary Learning in a Multilevel Neuromolecular Architecture, Ph.D. Dissertation (Wayne State University, Detroit, MI, 1993). 117. J.-C. Chen and M. Conrad, A multilevel neuromolecular architecture that uses the extradimensional bypass principle to facilitate evolutionary learning. Physica D75, 417-437 (1994). 118. J.-C. Chen and M. Conrad, Learning synergy in a multilevel neuronal architecture. BioSystems 32, 111-142 (1994). 119. L. V. Cheney, Whole hog for whole math. Wall Street Journal, February 3 (1998). 120. M. Chester, Neural Networks: A Tutorial (PTR Prentice-Hall, Englewood Cliffs, NJ, 1993). 121. A. Chiabrera, E. Di Zitti, F. Costa and G. M. Bisio, Physical limits of integration and information processing in molecular systems. J. Phys. D: Appl. Phys. 22, 1571-1579 (1989). 122. V. A. Chinarov, Noisy dynamics and biocomputing with nonequilibrium neural networks, in Proceedings of the 5th International Symposium on Bioelectronic and Molecular Electronic Devices and the 6th International Conference on Molecular Electronics and Biocomputing, November 28-30, 1995, Okinawa, Japan, Ed. M. Aizawa (Research and Development Association for Future Electron Devices, Tokyo, 1995), pp. 243-246. 123. V. A. Chinarov, Y. B. Gaididei, V. N. Kharkyanen and S. P. Sit'ko, Ion pores in biological membranes as self-organized bistable systems. Physical Rev. A46, 5232-5241 (1992). 124. P. M. Churchland, Cognitive activity in artificial neural networks, in An Invitation to Cognitive Science, Volume 3: Thinking, Eds. D. N. Osherson and E. E. Smith (MIT Press, Cambridge, MA, and London, 1990), pp. 199227. 125. P. M. Churchland, The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain (MIT Press, Cambridge, MA, and London, 1995). 126. P. M. Churchland and P. S. Churchland, Could a machine think? Sci. Am. 262(1), 32-37 (1990). 127. P. M. Churchland and P. S. Churchland, On the Contrary: Critical Essays, 1987-1997 (MIT Press, Cambridge, MA, and London, 1998). 128. P. S. Churchland, On the alleged backwards referral of experiences and its relevance to the mind-body problem. Philosophy of Science 48, 165-181 (1981).
540
F. T. Hong
129. P. S. Churchland and T. J. Sejnowski, The Computational Brain (MIT Press, Cambridge, MA, and London, 1992). 130. D. V. Cicchetti, The reliability of peer review for manuscript and grant submissions: a cross-disciplinary investigation. Behavioral and Brain Sciences 14, 119-186 (1991). 131. S. Clark, Polarized starlight and the handedness of life. American Scientist (Sigma Xi) 87, 336-343 (1999). 132. G. Cohen, Hemispheric differences in serial versus parallel processing. J. Exp. Psychol. 97, 349-356 (1973). 133. P. R. Cohen and H. J. Levesque, Intention is choice with commitment. Artificial Intelligence 42, 213-261 (1990). 134. M. Conrad, Microscopic-macroscopic interface in biological information processing. BioSystems 16, 345-363 (1984). 135. M. Conrad, Towards the molecular computer factory, in Molecular Electronics: Biosensors and Biocomputers, Ed. F. T. Hong (Plenum, New York and London, 1989), pp. 385-395. 136. M. Conrad, Molecular computing, in Advances in Computers, Vol. 31, Ed. M. C. Yovits (Academic Press, Boston, San Diego, New York, London, Sydney, Tokyo and Toronto, 1990), pp. 235-324. 137. M. Conrad, Toward an artificial brain. IEEE Computer 25(11), 79-80 (1992). 138. M. Conrad, Fluctuons — I. operational analysis. Chaos, Solitons and Fractals 3, 411-424 (1993). 139. M. Conrad, Fluctuons — II. electromagnetism. Chaos, Solitons and Fractals 3, 563-573 (1993). 140. M. Conrad, Fluctuons — III. gravity. Chaos, Solitons and Fractals 7, 12611303 (1996). 141. M. Conrad, Origin of life and the underlying physics of the universe. BioSystems 42, 177-190 (1997). 142. M. Conrad, Quantum gravity and life. BioSystems 46, 29-39 (1998). 143. M. Conrad, R. R. Kampfher, K. G. Kirby, E. N. Rizki, G. Schleis, R. Smalz and R. Trenary, Towards an artificial brain. BioSystems 23, 175-218 (1989). 144. R. Conti and T. Amabile, Motivation/drive, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 251-259. 145. J. Cottingham, Ed., Western Philosophy: An Anthology (Blackwell Publishers, Oxford and Cambridge, MA, 1996). 146. J. Cottingham, R. Stoothoff and D. Murdoch, Eds., Descartes: Selected Philosophical Writings (Cambridge University Press, Cambridge, 1988). 147. N. Cowan, The magic number 4 in short-term memory: a reconsideration of mental storage capacity. Behavioral and Brain Sciences 24, 87-185 (2000). 148. L. A. Coward, Pattern Thinking (Praeger, New York, Westport, CT, and London, 1990). 149. F. Crews, Freudian suspicion versus suspicion of Freud, in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996),
Bicomputing Survey II
541
pp. 470-482. 150. F. Crick, The Astonishing Hypothesis: The Scientific Search for the Soul (Charles Scribner's Sons, New York, 1994). 151. F. Crick and C. Koch, The problem of consciousness. Sci. Am. 267(3), 152-159 (1992). 152. A. J. Cropley, Fostering creativity in the classroom: general principles, in The Creativity Research Handbook, Vol. 1, Ed. M. A. Runco (Hampton Press, Cresskill, NJ, 1997), pp. 83-114. 153. H. F. Crovitz, Galton's Walk: Methods for the Analysis of Thinking, Intelligence, and Creativity (Harper and Row, New York, Evanston and London, 1970). 154. R. Crutchfield, Conformity and creative thinking, in Contemporary Approaches to Creative Thinking: A Symposium Held at the University of Colorado, Eds. H. E. Gruber, G. Terrell and M. Wertheimer (Atherton Press, New York, 1962), pp. 120-140. 155. M. Csikszentmihalyi, Creativity: Flow and the Psychology of Discovery and Invention (HarperCollins, New York, 1996). 156. D. M. Dacey, Parallel pathways for spectral coding in primate retina. Annu. Rev. Neurosci. 23, 743-775 (2000). 157. A. R. Damasio, Descartes' Errors: Emotion, Reason, and the Human Brain (G. P. Putnam's Sons, New York, 1994). 158. A. R. Damasio, A. Harrington, J. Kagan, B. S. McEwen, H. Moss and R. Shaikh, Eds., Unity of Knowledge: The Convergence of Natural and Human Science, Annal. NY Acad. Sci. Vol. 935 (New York Academy of Sciences, New York, 2001). 159. C. Darwin, The Origin of Species: By Means of Natural Selection Or The Preservation of Favored Races in the Struggle for Life, Modern Library Paperback edition (Random House, New York, 1998). 160. P. Davies, That mysterious flow. Sci. Am. 287(3), 40-47 (2002). 161. J. W. Dawson, Jr., Godel and the limits of logic. Sci. Am. 280(6), 76-81 (1999). 162. F. de Waal, Chimpanzee Politics: Power and Sex among Apes, revised edition (Johns Hopkins University Press, Baltimore, MD, and London, 1998). 163. F. B. M. de Waal, The end of nature versus nurture. Sci. Am. 281(6), 94-99 (1999). 164. B. E. Deal and J. B. Talbot, Principia Moore. Electrochem. Soc. Interface 6(1), 18-23 (1997). 165. E. L. Deci, Why We Do What We Do: The Dynamics of Personal Autonomy (G. P. Putnam's Sons, New York, 1995). 166. E. L. Deci, R. Koestner and R. M. Ryan, A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychol. Bull. 125, 627-668 (1999). 167. E. L. Deci, R. Koestner and R. M. Ryan, The undermining effect is a reality after all — extrinsic rewards, task interest, and self-determination: reply to Eisenberger, Pierce, and Cameron (1999) and Lepper, Henderlong, and Gingras (1999). Psychol. Bull. 125, 692-700 (1999).
542
F. T. Hong
168. E. L. Deci and R. M. Ryan, Intrinsic Motivation and Self-Determination in Human Behavior (Plenum, New York and London, 1985). 169. J. Deisenhofer and H. Michel, The photosynthetic reaction center from the purple bacterium Rhodopseudomonas viridis. Science 245, 1463-1473 (1989). 170. D. C. Dennett, Consciousness Explained (Little, Brown and Co., Boston, New York, Toronto and London, 1991). 171. D. C. Dennett and M. Kinsbourne, Time and the observer: the where and when of consciousness in the brain. Behavioral and Brain Sciences 15, 183247 (1992). 172. R. Desimone and J. Duncan, Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18, 193-222 (1995). 173. D. Deutsch, Ed., The Psychology of Music (Academic Press, New York, London, Paris, San Diego, San Francisco, Sao Paulo, Sydney, Tokyo and Toronto, 1982). 174. J. Dewey, How We Think (Prometheus Books, Amherst, NY, 1991). Originally version (D. C. Heath, Lexington, MA, 1910). 175. C. L. Diaz de Chumaceiro, Serendipity, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 543-549. 176. J. Diggins, J. F. Ralph, T. P. Spiller, T. D. Clark, H. Prance and R. J. Prance, Chaotic dynamics in the rf superconducting quantum-interferencedevice magnetometer: a coupled quantum-classical system. Physical Rev. E49, 1854-1859 (1994). 177. R. Douglas, M. Mahowald and C. Mead, Neuromorphic analogue VLSI. Annu. Rev. Neurosci. 18, 255-281 (1995). 178. K. E. Drexler, Machine-phase nanotechnology. Sci. Am. 285(3), 74-75 (2001). 179. R. A. Dudley and H. S. Luft, Managed care in transition. N. Eng. J. Med. 344, 1087-1092 (2001). 180. P. Duhem, The Aim and Structure of Physical Theory, translated by P. P. Wiener (Princeton University Press, Princeton, NJ, 1954). Original French version: La Thiorie Physique: Son Objet, Sa Structure, 2nd edition (Marcel Riviere and Cie, Paris, 1914). 181. C. S. Dulcey, J. H. Georger, Jr., V. Krauthamer, D. A. Stenger, T. L. Fare and J. M. Calvert, Deep UV photochemistry of chemisorbed monolayers: patterned coplanar molecular assemblies. Science 252, 551-554 (1991). 182. S. D. Durrenberger, Mad genius controversy, in Encyclopedia of Creativity, Vol. 1, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 169-177. 183. J. Earman, A Primer on Determinism, University of Western Ontario Series in Philosophy of Science, Vol. 32 (D. Reidel Publishing, Dordrecht, Boston, Lancaster and Tokyo, 1986). 184. J. C. Eccles, The Neurophysiological Basis of Mind: The Principles of Neurophysiology (Oxford University Press, Oxford, 1953). 185. J. C. Eccles, The Understanding of the Brain, 2nd edition (McGraw-Hill,
Bicomputing Survey II
543
New York, 1977). 186. J. C. Eccles, The Human Mystery (Springer-Verlag, Berlin, Heidelberg and New York, 1979). 187. J. C. Eccles, How the Self Controls Its Brain (Springer-Verlag, Berlin, Heidelberg and New York, 1994). 188. G. M. Edelman, The Remembered Present: A Biological Theory of Consciousness (Basic Books, New York, 1989). 189. G. M. Edelman and G. Tononi, A Universe of Consciousness: How Matter Becomes Imagination (Basic Books, New York, 2000). 190. B. Edwards, Drawing on the Right Side of the Brain: A Course in Enhancing Creativity and Artistic Confidence (Jeremy P. Tarcher, Los Angeles, 1979). Revised edition (1989). 191. R. Eisenberger, S. Armeli and J. Pretz, Can the promise of reward increase activity? J. Personality and Social Psychol. 74, 704-714 (1998). 192. R. Eisenberger and J. Cameron, Detrimental effects of reward: reality or myth? American Psychologists 51, 1153-1166 (1996). 193. R. Eisenberger and J. Cameron, Reward, intrinsic interest, and creativity: new findings. American Psychologists 53, 676-679 (1998). 194. R. Eisenberger, W. D. Pierce and J. Cameron, Effects of reward on intrinsic motivation — negative, neutral, and positive: comment on Deci, Koestner, and Ryan (1999). Psychol. Bull. 125, 677-691 (1999). 195. C. Emmeche, S. K0ppe and F. Stjernfelt, Levels, emergence, and three versions of downward causation, in Downward Causation: Minds, Bodies and Matter, Eds. P. B. Andersen, C. Emmeche, N. O. Finnemann and P. V. Christiansen (Aarhus University Press, Aarhus, Denmark, 2000), pp. 1334. 196. D. Erwin, J. Valentine and D. Jablonski, The origin of animal body plans. American Scientist (Sigma Xi) 85, 126-137 (1997). 197. Evidence-Based Medicine Working Group, Evidence-based medicine: a new approach to teaching the practice of medicine. J. Amer. Med. Assoc. (JAMA) 268, 2420-2425 (1992). 198. S. L. Farber, Identical Twins Reared Apart: A Reanalysis (Basic Books, New York, 1981). 199. A. Faris, Jacques Offenbach (Charles Scribner's Sons, New York, 1981). 200. E. A. Feigenbaum, The simulation of verbal learning behavior, in Computers and Thought, Eds. E. A. Feigenbaum and J. Feldman (McGraw-Hill, New York, San Francisco, Toronto and London, 1963), pp. 297-309. 201. E. A. Feigenbaum and H. A. Simon, EPAM-like models of recognition and learning. Cognitive Science 8, 305-336 (1984). 202. R. Feynman, The Character of Physical Law (MIT Press, Cambridge, MA, and London, 1967). 203. M. Fink, Electroshock revisited. American Scientist (Sigma Xi) 88, 162-167 (2000). 204. N. O. Finnemann, Rule-based and rule-generating systems, in Downward Causation: Minds, Bodies and Matter, Eds. P. B. Andersen, C. Emmeche, N. O. Finnemann and P. V. Christiansen (Aarhus University Press, Aarhus,
544
F. T. Hong
Denmark, 2000), pp. 278-302. 205. R. Flacks, Should colleges drop the SAT?: yes, it's a move that will improve college admissions. AFT On Campus (American Federation of Teachers) 20(8), 4 (2001). 206. O. Flanagan, Consciousness Reconsidered (MIT Press, Cambridge, MA, and London, 1992). 207. D. B. Fogel, Ed., Special Issue — Prisoner's Dilemma. BioSystems 37, 1-176 (1996). 208. D. B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, 2nd edition (IEEE, New York, 2000). 209. S. Forrest, P. Burrows and M. Thompson, The dawn of organic electronics. IEEE Spectrum 37(8), 29-34 (2000). 210. J. S. Foster, J. E. Frommer and P. C. Arnett, Molecular manipulation using a tunnelling microscope. Nature 331, 324-326 (1988). 211. L. R. Frank, Ed., Random House Webster's Quotationary (Random House, New York, 2001). 212. J. Franklin, The Science of Conjecture: Evidence and Probability before Pascal (Johns Hopkins University Press, Baltimore and London, 2001). 213. S. J. Freeland and L. D. Hurst, The genetic code is one in a million. J. Mol. Evol. 47, 238-248 (1998). 214. S. J. Freeland and L. D. Hurst, Evolution encoded. Sci. Am. 290(4), 84-91 (2004). 215. S. Freud, The Interpretation of Dreams, translated by A. A. Brill (Random House Modern Library, New York, 1950). 216. L. Fu, Neural Networks in Computer Intelligence (McGraw-Hill, New York, St. Louis, San Francisco, Auckland, Bogota, Caracas, Lisbon, London, Madrid, Mexico City, Milan, Montreal, New Delhi, San Juan, Singapore, Sydney, Tokyo and Toronto, 1994). 217. J. M. Fuster and J. P. Jervey, Inferotemporal neurons distinguish and retain behaviorally relevant features of visual stimuli. Science 212, 952-955 (1981). 218. L. Galvani, De Viribus Electricitatis in Motu Musculari (Arnaldo Forni Editore, Bologna, 1998). Accompanying English text: Galvani's Commentary on the Effects of Electricity on Muscular Motion, translated by M. G. Foley (Burndy Library, Norwalk, CT, 1953). 219. B. T. Gardner and R. A. Gardner, Two-way communication with an infant chimpanzee, in Behavior of Nonhuman Primates, Vol. 4, Eds. A. M. Schrier and F. StoIInitz (Academic Press, New York and London, 1971), pp. 117184. 220. D. P. Gardner, Ed., A Nation At Risk: The Imperative for Educational Reform (U.S. Government Printing Office, Washington, DC, 1983). 221. H. Gardner, Frames of Mind: The Theory of Multiple Intelligences (Basic Books, New York, 1983). 222. H. Gardner, The Mind's New Science: A History of the Cognitive Revolution (Basic Books, New York, 1985). 223. H. Gardner, The Unschooled Mind: How Children Think and How Schools Should Teach (Basic Books, New York, 1991).
Bicornputing Survey II
545
224. H. Gardner, Creating Minds: An Anatomy of Creativity Seen Through the Lives of Freud, Einstein, Picasso, Stravinsky, Eliot, Graham, and Gandhi (Basic Books, New York, 1993). 225. H. Gardner, Intelligence Reframed: Multiple Intelligences for the 21st Century (Basic Books, New York, 1999). 226. M. Gardner, Mathematical games: the fantastic combinations of John Conway's new solitaire game "life." Sci. Am. 223(4), 120-123 (1970). 227. H. G. Gauch, Jr., Prediction, parsimony and noise. American Scientist (Sigma Xi) 81, 468-478 (1993). 228. D. Gentner, K. J. Holyoak and B. N. Kokinov, Eds., The Analogical Mind: Perspectives from Cognitive Science (MIT Press, Cambridge, MA, and London, 2001). 229. M. Georgeff and A. Rao, Rational software agents: from theory to practice, in Agent Technology: Foundations, Applications, and Markets, Eds. N. R. Jennings and M. J. Wooldridge (Springer-Verlag, Berlin, Heidelberg and New York, 1998), pp. 139-160. 230. N. Geschwind and A. M. Galaburda, Cerebral Lateralization: Biological Mechanisms, Associations, and Pathology (MIT Press, Cambridge, MA, and London, 1987). 231. J. W. Getzels, Creativity and human development, in The International Encyclopedia of Education: Research and Studies, Vol. 2, 1st edition, Eds. T. Husen and T. N. Postlethwaite (Pergamon Press, Oxford, New York, Toronto, Sydney, Paris and Frankfurt, 1985), pp. 1093-1100. 232. B. Ghiselin, Ed., The Creative Process: A Symposium (University of California Press, Berkeley, Los Angeles and London, 1985). 233. J. Giarratano and G. Riley, Expert Systems: Principles and Programming (PWS-KENT Publishing, Boston, 1989). 234. W. W. Gibbs, Cybernetic cells. Sci. Am. 285(2), 52-57 (2001). 235. W. W. Gibbs and D. Fox, The false crisis in science education. Sci. Am. 281(4), 86-93 (1999). 236. W. R. Giles and Y. Imaizumi, Comparison of potassium currents in rabbit atrial and ventricular cells. J. Physiol. 405, 123-145 (1988). 237. D. Gillies, Philosophy of Science in the Twentieth Century: Four Central Themes (Blackwell, Oxford and Cambridge, MA, 1993). 238. A. G. Gilman, G proteins and dual control of adenylate cyclase. Cell 36, 577-579 (1984). 239. J. Glanz, How not to pick a physicist? Science 274, 710-712 (1996). 240. J. Gleick, Chaos: Making a New Science (Penguin Books, New York, 1987). 241. J. Gleick, Genius: The Life and Science of Richard Feynman (Vintage Books, New York, 1993). 242. J. A. Glover, R. R. Ronning and C. R. Reynolds, Eds., Handbook of Creativity (Plenum, New York and London, 1989). 243. K. Godel, On Formally Undecidable Propositions of Principia Mathematics and Related Systems, translated by B. Meltzer (Dover, New York, 1992). Original German version: Uber formal unentscheidbare Satze der Principia Mathematica und verwandter Systeme I. Monatshefte fur Mathematik und
546
F. T. Hong
Physik 38, 173-198 (1931). 244. E. Goldberg and L. D. Costa, Hemispheric differences in the acquisition and use of descriptive systems. Brain and Language 14, 144-173 (1981). 245. E. Goldberg and K. Podell, Lateralization in the frontal lobe, in Epilepsy and the Functional Anatomy of the Frontal Lobe, Eds. H. H. Jasper, S. Riggio and P. S. Goldman-Rakic (Raven Press, New York, 1995), pp. 85-96. 246. E. Goldberg, K. Podell and M. Lovell, Lateralization of frontal lobe functions and cognitive novelty. J. Neuropsychiatr. 6, 371-378 (1994). 247. E. Goldberg, H. G. Vaugham and L. J. Gerstman, Nonverbal descriptive systems and hemispheric asymmetry: shape versus texture discrimination. Brain and Language 5, 249-257 (1978). 248. A. Goldman, Action and free will, in An Invitation to Cognitive Science, Volume 2: Visual Cognition and Action, Eds. D. N. Osherson, S. M. Kosslyn and J. M. Hollerbach (MIT Press, Cambridge, MA, and London, 1990), pp. 317-340. 249. P. S. Goldman-Rakic, Working memory and the mind. Sci. Am. 267(3), 110-117 (1992). 250. P. S. Goldman-Rakic, S. Funahashi and C. J. Bruce, Neocortical memory circuits. Cold Spring Harbor Symp. Quant. Biol. 55, 1025-1038 (1990). 251. S. Goldstein, Quantum philosophy: the flight from reason in science, in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996), pp. 119-125. 252. D. Gooding, Experiment and the Making of Meaning: Human Agency in Scientific Observation and Experiment (Kluwer Academic Publishers, Dordrecht, Boston and London, 1990). 253. D. Gooding, Mapping experiment as a learning process: how the first electromagnetic motor was invented. Science, Technology and Human Values 15, 165-201 (1990). 254. M. Goodman, Ed., Macromolecular Sequences in Systematic and Evolutionary Biology (Plenum, New York and London, 1982). 255. M. Goodman, R. E. Tashian and J. H. Tashian, Molecular Anthropology: Genes and Proteins in the Evolutionary Ascent of the Primates (Plenum, New York and London, 1976). 256. H. W. Gordon, Cognitive asymmetry in dyslexic families. Neuropsychologia 18, 645-656 (1980). 257. H. W. Gordon, The learning disabled are cognitively right, in Brain Basis of Learning Disabilities, Topics in Learning and Learning Disabilities, Vol. 3, No. 1, Ed. M. Kinsbourne (Aspen Systems Corporation, Rockville, MD, 1983), pp. 29-39. 258. T. Grandin, Thinking in Pictures: And Other Reports From My Life With Autism (Vintage Books/Random House, New York, 1996). 259. R. G. Grant, Flight: 100 Years of Aviation (DK Publishing, London, New York, Munich, Melbourne and Delhi, 2002). 260. C. Green and C. McCreery, Lucid Dreaming: The Paradox of Consciousness During Sleep (Routledge, London and New York, 1994).
Bicomputing Survey II
547
261. E. Greenbaum, J. W. Lee, C. V. Tevault, S. L. Blankinship and L. J. Mets, CO2 fixation and photoevolution of H2 and O2 in a mutant of Chlamydomonas lacking photosystem I. Nature 376, 438-441 (1995). 262. R. J. Greenspan, Understanding the genetic construction of behavior. Sci. Am. 272(4), 72-78 (1995). 263. D. R. Griffin, The Question of Animal Awareness: Evolutionary Continuity of Mental Experience, 2nd edition (Rockefeller University Press, New York, 1981). 264. D. R. Griffin, Animal Minds: Beyond Cognition to Consciousness, 2nd edition (University of Chicago Press, Chicago and London, 2001). 265. S. Grossberg, Competitive learning: from interactive activation to adaptive resonance. Cognitive Science 11, 23-63 (1987). 266. A. Griinbaum, Precis of The Foundations of Psychoanalysis: A Philosophical Critique. Behavioral and Brain Sciences 9, 217-284 (1986). 267. J. P. Guilford, Traits of creativity, in Creativity and Its Cultivation, Ed. H. H. Anderson (Harper, New York, 1959), pp. 142-161. 268. J. P. Guilford, Intelligence, Creativity, and Their Educational Implications (R. R. Knapp, San Diego, CA, 1968). 269. K. Gurney, An Introduction to Neural Networks (University College London Press, London, 1997). 270. M. C. Gutzwiller, Quantum chaos. Sci. Am. 266(1), 78-84 (1992). 271. E. Gwinner and W. Wiltschko, Endogenously controlled changes in migratory direction of the garden warbler, Sylvia borin. J. Comp. Physiol. 125, 267-273 (1978). 272. J. Hadamard, The Mathematician's Mind: The Psychology of Invention in the Mathematical Field (Princeton University Press, Princeton, NJ, 1996). Original version: The Psychology of Invention in the Mathematical Field (Princeton University Press, Princeton, NJ, 1945). 273. P. A. Haensly and C. R. Reynolds, Creativity and intelligence, in Handbook of Creativity, Eds. J. A. Glover, R. R. Ronning and C. R. Reynolds (Plenum, New York and London, 1989), pp. 111-132. 274. H. Haken, Synergetics: An Introduction: Nonequilibrium Phase Transitions and Self-Organization in Physics, Chemistry, and Biology, 3rd edition (Springer-Verlag, Berlin, Heidelberg, New York and Tokyo, 1983). 275. H. Haken, Advanced Synergetics: Instability Hierarchies of Self-Organizing Systems and Devices (Springer-Verlag, Berlin, Heidelberg, New York and Tokyo, 1983). 276. N. Hampp, Bacteriorhodopsin as a photochromic retinal protein for optical memories. Chemical Rev. 100, 1755-1776 (2000). 277. N. Hampp, C. Brauchle and D. Oesterhelt, Bacteriorhodopsin wildtype and variant aspartate-96 —> asparagine as reversible holographic media. Biophys. J. 58, 83-93 (1990). 278. N. Hampp and D. Zeisel, Mutated bacteriorhodopsins: versatile media in optical image processing. IEEE Eng. Med. Biol. Magaz. 13(1), 67-74,111 (1994). 279. Y. Hanyu and G. Matsumoto, Spatial long-range interactions in squid giant
548
F. T. Hong
axons. Physica D49, 198-213 (1991). 280. D. P. Harland and R. R. Jackson, Cues by which Portia fimbriata, an araneophagic jumping spider, distinguishes jump-spider prey from other prey. J. Exp. Biol. 203, 3485-3494 (2000). 281. L. J. Harris, Right-brain training: some reflections on the application of research on cerebral hemispheric specialization to education, in Brain Lateralization in Children: Developmental Implications, Eds. D. L. Molfese and S. J. Segalowitz (Guilford Press, New York and London, 1988), pp. 207-235. 282. H. K. Hartline, H. G. Wagner and E. F. MacNichol, Jr., The peripheral origin of nervous activity in the visual system. Cold Spring Harbor Symp. Quant. Biol. 17, 125-141 (1952). 283. K. Hatori, H. Honda, K. Shimada and K. Matsuno, Staggered movement of an actin filament sliding on myosin molecules in the presence of ATP. Biophys. Chem. 70, 241-245 (1998). 284. M. D. Hauser, What do animals think about numbers? American Scientist (Sigma Xi) 88, 144-151 (2000). 285. M. D. Hauser, Wild Minds: What Animals Really Think (Henry Holt and Company, New York, 2000). 286. S. Hawking, Black Holes and Baby Universes and Other Essays (Bantam Books, New York, Toronto, London, Sydney and Auckland, 1993). 287. S. Hawking, A Brief History of Time, updated and expanded 10th edition (Bantam Books, New York, London, Toronto, Sydney and Auckland, 1998). 288. B. Hayes, Randomness as a resource. American Scientist (Sigma Xi) 89, 300-304 (2001). 289. J. R. Hayes, Cognitive processes in creativity, in Handbook of Creativity, Eds. J. A. Glover, R. R. Ronning and C. R. Reynolds (Plenum, New York and London, 1989), pp. 135-145. 290. J. R. Hayes and H. A. Simon, Understanding written problem instructions, in Knowledge and Cognition, Ed. L. W. Gregg (Lawrence Erlbaum Associates, Potomac, MD, 1974), pp. 167-200. 291. R. A. Hegstrom and D. K. Kondepudi, The handedness of the universe. Sci. Am. 262(1), 108-115 (1990). 292. W. Heiligenberg, The neural basis of behavior: a neuroethological view. Annu. Rev. Neurosci. 14, 247-267 (1991). 293. W. Heisenberg, Physics and Philosophy: The Revolution in Modern Science (Harper and Row, New York, 1958). 294. B. S. Held, Constructivism in psychotherapy: truth and consequences, in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996), pp. 198-206. 295. J. B. Hellige, Hemispheric Asymmetry: What's Right and What's Left (Harvard University Press, Cambridge, MA, and London, 1993). 296. G. R. Hendren, Using sign language to access right brain communication: a tool for teachers. J. Creative Behavior 23, 116-120 (1989). 297. B. R. Hergenhahn, An Introduction to Theories of Learning, 2nd edition (Prentice-Hall, Englewood Cliffs, NJ, 1982).
Bicomputing Survey II
549
298. C. Heyes and D. L. Hull, Selection Theory and Social Construction: The Evolutionary Naturalistic Epistemology of Donald T. Campbell (State University of New York Press, Albany, NY, 2001). 299. W. Hildesheimer, Mozart, translated by M. Faber (Farrar Straus Giroux, New York, 1982). Original German version: Mozart (Suhrkamp Verlag, Frankfurt am Main, 1977). 300. B. Hileman, Precautionary principle debate. ACS Chemical and Engineering News 80(16), 24-26 (2002). 301. D. W. Hilgemann, Cytoplasmic ATP-dependent regulation of ion transporters and channels: mechanisms and messengers. Annu. Rev. Physiol. 59, 193-220 (1997). 302. A. L. Hodgkin and A. F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500-544 (1952). 303. J. P. E. Hodgson, Knowledge Representation and Language in AI (Ellis Horwood, New York, London, Toronto, Sydney, Tokyo and Singapore, 1991). 304. D. R. Hofstadter, Metamagical Themas: Questing for the Essence of Mind and Pattern (Basic Books, New York, 1985). 305. D. R. Hofstadter, Godel, Escher, Bach: An Eternal Golden Braid (Vintage Books, New York, 1989). 306. D. R. Hofstadter, Ed., Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought (Basic Books, New York, 1995). 307. D. R. Hofstadter and D. C. Dennett, Eds., The Mind's I: Fantasies and Reflections on Self and Soul (Basic Books, New York, 1981). 308. C. J. Hogan, The Little Book of the Big Bang: A Cosmic Primer (SpringerVerlag, New York, 1998). 309. C. J. Hogan, Observing the beginning of time. American Scientist (Sigma Xi) 90, 420-427 (2002). 310. J. Hogan, Can science explain consciousness? Sci. Am. 271(1), 88-94 (1994). 311. J. H. Holland, K. J. Holyoak, R. E. Nisbett and P. R. Thagard, Induction: Processes of Inference, Learning, and Discovery (MIT Press, Cambridge, MA, and London, 1986). 312. E. Holmes, The Life of Mozart: Including His Correspondence, Ed. C. Hogwood (The Folio Society, London, 1991). Original edition (Chapman and Hall, London, 1845). 313. K. J. Holyoak, Problem solving, in An Invitation to Cognitive Science, Volume 3: Thinking, Eds. D. N. Osherson and E. E. Smith (MIT Press, Cambridge, MA, and London, 1990), pp. 117-146. 314. F. H. Hong, M. Chang, B. Ni, R. B. Needleman and F. T. Hong, Genetically modified bacteriorhodopsin as a bioelectronic material, in Biomolecular Materials by Design, Materials Research Society Symposium Proceedings, Vol. 330, Eds. M. Alper, H. Bayley, D. Kaplan and M. Navia (Materials Research Society, Pittsburgh, PA, 1994), pp. 257-262. 315. F. T. Hong, Magnetic anisotropy of the visual pigment rhodopsin. Biophys.
550
F. T. Hong
J. 29, 343-346 (1980). 316. F. T. Hong, Electrochemical evaluation of various membrane reconstitution techniques, in Proceedings of the 13th Annual Northeast Bioengineering Conference, March 12-13, 1987, Philadelphia, PA, Ed. K. R. Foster (IEEE, Washington, DC, 1987), pp. 304-307. 317. F. T. Hong, Relevance of light-induced charge displacements in molecular electronics: design principles at the supramolecular level. J. Mol. Electronics 5, 163-185 (1989). 318. F. T. Hong, Does nature utilize a common design for photoactive transport and sensor proteins? in Molecular Electronics: Materials and Methods, Ed. P. I. Lazarev (Kluwer Academic Publishers, Dordrecht, Boston, and London, 1991), pp. 291-310. 319. F. T. Hong, Intelligent materials and intelligent microstructures in photobiology. Nanobiol. 1, 39-60 (1992). 320. F. T. Hong, Molecular electronics: science and technology for the future. IEEE Eng. Med. Biol. Magaz. 13(1), 25-32 (1994). 321. F. T. Hong, Magnetic field effects on biomolecules, cells, and living organisms. BioSystems 36, 187-229 (1995). 322. F. T. Hong, Molecular sensors based on the photoelectric effect of bacteriorhodopsin: origin of differential responsivity. Mater. Sci. Engg. C4, 267285 (1997). Reprinted with erratum, Mater. Sci. Engg. C5, 61-79 (1997). 323. F. T. Hong, Control laws in the mesoscopic processes of biocomputing, in Information Processing in Cells and Tissues, Eds. M. Holcombe and R. Paton (Plenum, New York and London, 1998), pp. 227-242. 324. F. T. Hong, A survival guide to cope with information explosion in the 21st century: picture-based vs. rule-based learning ( ). 21st Webzine 3(4), Speed Section (1998). 325. F. T. Hong, Interfacial photochemistry of retinal proteins. Prog. Surface Sci. 62, 1-237 (1999). 326. F. T. Hong, Towards physical dynamic tolerance: an approach to resolve the conflict between free will and physical determinism. BioSystems 68, 85-105 (2003). 327. F. T. Hong, The enigma of creative problem solving: a biocomputing perspective, in Molecular Electronics: Bio-sensors and Bio-computers, NATO Science Series II, Vol. 96, Eds. L. Barsanti, V. Evangelista, P. Gualtieri, V. Passarelli and S. Vestri (Kluwer Academic Publishers, Dordrecht, Boston and London, 2003), pp. 457-542. 328. F. T. Hong, Interfacial photochemistry of biomembranes, in Handbook of Photochemistry and Photobiology, Vol. 4, Ed. H. S. Nalwa (American Scientific Publishers, Stevenson Ranch, CA, 2003), pp. 383-430. 329. F. T. Hong, The early receptor potential and its analog in bacteriorhodopsin membranes, in CRC Handbook of Organic Photochemistry and Photobiology, 2nd edition, Eds. W. Horspool and F. Lenci (CRC Press, Boca Raton, London, New York and Washington, DC, 2004), pp. 128-1-128-25. 330. F. T. Hong, D. Mauzerall and A. Mauro, Magnetic anisotropy and the
Bicomputing Survey II
331.
332. 333. 334.
335. 336. 337. 338. 339.
340. 341. 342. 343. 344. 345. 346. 347.
551
orientation of retinal rods in a homogeneous magnetic field. Proc. Nail. Acad. Sci. USA 68, 1283-1285 (1971). W. K. Honig and R. K. R. Thompson, Retrospective and prospective processing in animal working memory, in The Psychology of Learning and Motivation: Advances in Research and Theory, Vol. 16, Ed. G. H. Bower (Academic Press, New York, London, Paris and San Diego, 1982), pp. 239-283. J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554-2558 (1982). B. L. Horan, The statistical character of evolutionary theory. Philosophy of Science 61, 76-95 (1994). J. K. H. Horber, Local probe techniques, in Atomic Force Microscopy in Cell Biology, Methods in Cell Biology, Vol. 68, Eds. B. P. Jena and J. K. H. Horber (Academic Press, Amsterdam, Boston, London, New York, Oxford, Paris, San Diego, San Francisco, Singapore, Sydney and Tokyo, 2002), pp. 1-31. J. K. H. Horber and M. J. Miles, Scanning probe evolution in biology. Science 302, 1002-1005 (2003). M. J. A. Howe, Fragments of Genius: The Strange Feats of Idiots Savants (Routledge, London and New York, 1989). Y. H. Hu and J.-N. Hwang, Handbook of Neural Network Signal Processing (CRC Press, Boca Raton, London, New York and Washington, DC, 2002). D. Hume, A Treatise of Human Nature, 2nd edition (analytical index by L. A. Selby-Bigge, and text revision and notes by P. H. Nidditch) (Oxford University Press, Oxford and New York, 1978). K. Ihara, T. Umemura, I. Katagiri, T. Kitajima-Ihara, Y. Sugiyama, Y. Kimura and Y. Mukohata, Evolution of the archaeal rhodopsins: evolution rate changes by gene duplication and functional differentiation. J. Mol. Biol. 285, 163-174 (1999). M. F. Ippolito and R. D. Tweney, The inception of insight, in The Nature of Insight, Eds. R. J. Sternberg and J. E. Davidson (MIT Press, Cambridge, MA, and London, 1995), pp. 433-462. S. W. Itzkoff, The Decline of Intelligence in America: A Strategy for National Renewal (Praeger, Westport, CT, and London, 1994). P. Jackson, Introduction to Expert Systems, 3rd edition (Addison-Wesley, Harlow, UK, Reading, MA, New York, Amsterdam, and Tokyo, 1999). R. R. Jackson, Portia spider: mistress of deception. National Geographic Magaz. 190(5), 104-115 (1996). R. R. Jackson and A. D. Blest. The biology of Portia fimbriata, a webbuilding jumping spider (Araneae, Salticidae) from Queensland: utilization of webs and predatory versatility. J. Zool. (London) 196, 255-293 (1982). R. R. Jackson and R. S. Wilcox, Spider-eating spiders. American Scientist (Sigma Xi) 86, 350-357 (1998). M. Jacobs, On the nature of discovery. ACS Chemical and Engineering News 80(28), 3 (2002). W. James, The dilemma of determinism, in The Will to Believe and Other
552
348. 349. 350. 351.
352. 353. 354. 355. 356.
357. 358. 359. 360. 361. 362. 363. 364. 365.
F. T. Hong
Essays in Popular Philosophy (Longmans, Green and Co., London, NewYork and Toronto, 1937). Reprinted along with Human Immortality: Two Supposed Objections to the Doctrine, 2nd edition (Dover, New York, 1956), pp. 145-183. K. R. Jamison, Touched with Fire: Manic-Depressive Illness and the Artistic Temperament (Free Press, New York, and Maxwell Macmillan Canada, Toronto, 1993). K. R. Jamison, Manic-depressive illness and creativity. Sci. Am. 272(2), 62-67 (1995). W. H. Jefferys and J. O. Berger, Ockham's razor and Bayesian analysis. American Scientist (Sigma Xi) 80, 64-72 (1992). N. R. Jennings and M. J. Wooldridge, Applications of intelligent agents, in Agent Technology: Foundations, Applications, and Markets, Eds. N. R. Jennings and M. J. Wooldridge (Springer-Verlag, Berlin, Heidelberg and New York, 1998), pp. 3-28. N. Juel-Nielsen, Individual and Environment: Monozygotic Twins Reared Apart, revised edition (International Universities Press, New York, 1980). C. G. Jung and M.-L. von Franz, Eds., Man and his Symbols (Doubleday, New York, London, Toronto, Sydney and Auckland, 1964). Y. Y. Kagan and L. Knopoff, Stochastic synthesis of earthquake catalogs. J. Geophys. Res. 86, 2853-2862 (1981). E. R. Kandel, J. H. Schwartz and T. M. Jessell, Principles of Neural Science, 4th edition (McGraw Hill, New York, London, Milan, Sydney, Tokyo and Toronto, 2000). I. Kant, Critique of Pure Reason, translated by N. K. Smith (St. Martin's Press, New York, 1965). Original German version: Kritik der reinen Vernunft (Johann Friedrich Hartknoch, Riga, 1st edition (1781), 2nd edition (1787). B. Kantrowitz and D. McGinn, When teachers are cheaters. Newsweek 135(25), 48-49 (2000). R. Kapral and K. Showalter, Eds., Chemical Waves and Patterns (Kluwer Academic Publishers, Dordrecht and Boston, 1995). S. Kastner and L. G. Ungerleider, Mechanisms of visual attention in the human cortex. Annu. Rev. Neurosci. 23, 315-341 (2000). A. Katchalsky and P. F. Curran, Nonequilibrium Thermodynamics in Biophysics (Harvard University Press, Cambridge, MA, 1967). A. N. Katz, Creativity and the cerebral hemisphere, in The Creativity Research Handbook, Vol. 1, Ed. M. A. Runco (Hampton Press, Cresskill, NJ, 1997), pp. 203-226. S. A. Kauffman, Antichaos and adaptation. Sci. Am. 265(2), 78-84 (1991). S. A. Kauffman, The Origins of Order: Self-Organization and Selection in Evolution (Oxford University Press, New York and Oxford, 1993). S. A. Kauffman, At Home in the Universe: The Search for the Laws of Self-Organization and Complexity (Oxford University Press, New York and Oxford, 1995). L. Keszthelyi, Asymmetries of nature and the origin of biomolecular hand-
Bicomputing Survey II
553
edness. BioSystems 20, 15-19 (1987). 366. L. Keszthelyi, Origin of the homochirality of biomolecules. Quarterly Rev. Biophys. 28, 473-507 (1995). 367. R. W. Keyes, Physical limits in digital electronics. Proc. IEEE 63, 740-767 (1975). 368. R. Keynes and R. Krumlauf, Hox genes and regionalization of the nervous system. Annu. Rev. Neurosi. 17, 109-132 (1994). 369. C. A. Kiesler, Should colleges drop the SAT?: no, don't deify the SAT, but don't demonize it either. AFT On Campus (American Federation of Teachers) 20(8), 4 (2001). 370. J. Kim, Making sense of downward causation, in Downward Causation: Minds, Bodies and Matter, Eds. P. B. Andersen, C. Emmeche, N. O. Finnemann and P. V. Christiansen (Aarhus University Press, Aarhus, Denmark, 2000), pp. 305-321. 371. M. Kinsbourne, Taking the Project seriously: the unconscious in neuroscience perspective, in Neuroscience of the Mind on the Centennial of Freud's Project for a Scientific Psychology, Annal. NY Acad. Sci. Vol. 843, Eds. R. M. Bilder and F. F. LeFever (New York Academy of Sciences, New York, 1998), pp. 111-115. 372. K. Kitamura, M. Tokunaga, A. H. Iwane and T. Yanagida, A single myosin head moves along an actin filament with regular steps of 5.3 nanometers. Nature 397, 129-134 (1999). 373. H. Klein, P. Berthold and E. Gwinner, Der Zug europaischer Garten- und Monchsgrasmucken (Sylvia borin und S. atricapilla). Vogelwarte 27, 73-134 (1973). 374. D. Kleppner, Physics and common nonsense, in The Flight from Science and Reason, Annal. NY Acad. Sci. Vol. 775, Eds. P. R. Gross, N. Levitt and M. W. Lewis (New York Academy of Sciences, New York, 1996), pp. 126-130. 375. R. D. Knight, S. J. Freeland and L. F. Landweber, Selection, history and chemistry: the three faces of the genetic code. Trends Biochem. Sci. 24, 241-247 (1999). 376. A. Koestler, The Act of Creation (Arkana, Penguin Books, London, 1989). Original edition (Hutchinson, London, 1964). 377. W. Kohler, The Mentality of Apes, translated by E. Winter (Kegan Paul, Trench, Trubner and Co., London, 1925). Reprinted (Routledge, London, 2000). 378. D. K. Kondepudi and G. W. Nelson, Weak neutral currents and the origin of biomolecular chirality. Nature 314, 438-441 (1985). 379. Y. E. Korchev, C. L. Bashford, G. M. Alder, P. Y. Apel, D. T. Edmonds, A. A. Lev, K. Nandi, A. V. Zima and C. A. Pasternak, A novel explanation for fluctuations of ion current through narrow pores. FASEB J. 11, 600-608 (1997). 380. K. P. Kording and D. M. Wolpert, Bayesian integration in sensorimotor learning. Nature 427, 244-247 (2004). 381. E. Korner, U. Korner and G. Matsumoto, Top-down selforganization of semantic constraints for knowledge representation in autonomous systems:
554
382.
383.
384. 385. 386. 387. 388.
389. 390.
391. 392.
393.
394. 395.
F. T. Hong
a model on the role of an emotional system in brains. Bull. Electrotechnical Lab. (Tsukuba) 60(7), 405-409 (1996). E. Korner and G. Matsumoto, Cortical architecture and self-referential control for brain-like computation: a new approach to understanding how the brain organizes computation. IEEE Eng. Med. Biol. Magaz. 21, 121-133 (2002). D. Koruga, Creativity as an experience of the beautiful and sublime, in Creativity and World of Man, Proceedings of the 9th International Congress of Aesthetics, Ed. M. Damnjanovic (Inter-University Center, Dubrovnik, 1990), pp. 320-326. B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence (Prentice-Hall, Englewood Cliffs, NJ, 1991). B. Kosko, Fuzzy Thinking: The New Science of Fuzzy Logic (Hyperion, New York, 1993). S. M. Kosslyn, Image and Mind (Harvard University Press, Cambridge, MA, and London, 1980). S. M. Kosslyn, Aspects of cognitive neuroscience of mental imagery. Science 240, 1621-1626 (1988). S. M. Kosslyn, Mental imagery, in An Invitation to Cognitive Science, Volume 2: Visual Cognition and Action, Eds. D. N. Osherson, S. M. Kosslyn and J. M. Hollerbach (MIT Press, Cambridge, MA, and London, 1990), pp. 73-97. S. M. Kosslyn, Image and Brain: The Resolution of the Imagery Debate (MIT Press, Cambridge, MA, and London, 1994). N. Kraufi, W.-D. Schubert, O. Klukas, P. Fromme, H. T. Witt and W. Saenger, Photosystem I at 4 A resolution represents the first structural model of a joint photosynthetic reaction centre and core antenna system. Nature Struct. Biol. 3, 965-973 (1996). E. Kris, Psychoanalytic Explorations in Art (International Universities Press, New York, 1952). P. K. Kuhl, J. E. Andruski, I. A. Chistovich, L. A. Chistovich, E. V. Kozhevnikova, V. L. Ryskina, E. I. Stolyarova, U. Sundberg and F. Lacerda, Cross-language analysis of phonetic units in language addressed to infants. Science 277, 684-686 (1997). P. K. Kuhl, F.-M. Tsao, H.-M. Liu, Y. Zhang and B. de Boer, Language/culture/mind/brain: progress at the margins between disciplines, in Unity of Knowledge: The Convergence of Natural and Human Science, Annal. NY Acad. Sci. Vol. 935, Eds. A. R. Damasio, A. Harrington, J. Kagan, B. S. McEwen, H. Moss and R. Shaikh (New York Academy of Sciences, New York, 2001), pp. 136-174. H. Kuhn, Systems of monomolecular layers — assembling and physicochemical behavior. Angew. Chem. Internat. Edit. 10, 620-637 (1971). H. Kuhn, Molecular engineering — a begin and an endeavour, in Proceedings of the International Symposium on Future Electron Devices: Bioelectronic and Molecular Electronic Devices, November 20-21, 1985, Tokyo, Ed. M. Aizawa (Research and Development Association for Future Electron De-
Bicomputing Survey II
555
vices, Tokyo, 1985), pp. 1-6. 396. H. Kuhn, Organized monolayers — building blocks in constructing supramolecular devices: the Forrest L. Carter Lecture, in Molecular Electronics: Biosensors and Biocomputers, Ed. F. T. Hong (Plenum, New York and London, 1989), pp. 3-24. 397. H. Kuhn and J. Waser, Hypothesis: on the origin of the genetic code. FEBS Lett. 352, 259-264 (1994). 398. H. Kuhn and J. Waser, A model of the origin of life and perspectives in supramolecular engineering, in The Lock-and-Key Principle: The State of the Art — 100 Years On, Ed. J.-P Behr (John Wiley and Sons, Chichester and New York, 1994), pp. 247-306. 399. T. S. Kuhn, The Structure of Scientific Revolutions, enlarged 2nd edition (University of Chicago Press, Chicago, 1970). 400. L. Kuhnert, K. I. Agladze and V. I. Krinsky, Image processing using lightsensitive chemical waves. Nature 337, 244-247 (1989). 401. R. Kurzweil, The Age of Intelligent Machines (MIT Press, Cambridge, MA, and London, 1990). 402. R. Kurzweil, The Age of Spiritual Machines: When Computers Exceed Human Intelligence (Penguin Books, New York, London, Victoria, Toronto and Auckland, 1999). 403. S. LaBerge and H. Rheingold, Exploring the World of Lucid Dreaming (Ballantine Books, New York, 1990). 404. P. Langley, H. A. Simon, G. L. Bradshaw and J. M. Zytkow, Scientific Discovery: Computational Explorations of the Creative Processes (MIT Press, Cambridge, MA, and London, 1987). 405. C. G. Langton, Ed., Artificial Life (Addison-Wesley, Redwood City, CA, 1989). 406. P. S. Laplace (marquis de), A Philosophical Essay on Probabilities, translated from the 6th French edition by F. W. Truscott and F. L. Emory (Dover, New York, 1951). Original French version: Essai philosophique sur les probabilites (Gauthier-Villars, Paris, 1814). 407. J. H. Larkin and H. A. Simon, Why a diagram is (sometimes) worth ten thousand words. Cognitive Science 11, 65-99 (1987). 408. B. Latour, Science in Action: How to Follow Scientists and Engineers Through Society (Harvard University Press, Cambridge, MA, 1987). 409. H. A. Lea, Gustav Mahler: Man on the Margin (Bouvier Verlag Herbert Grundmann, Bonn, Germany, 1985). 410. J. E. LeDoux, The Emotional Brain: The Mysterious Underpinnings of Emotional Life (Simon and Schuster, New York, 1996). 411. J. E. LeDoux, Emotion circuits in the brain. Annu. Rev. Neurosci. 23, 155184 (2000). 412. J.-M. Lehn, 1988, Supramolecular chemistry — scope and perspectives: molecules, supermolecules, and molecular devices. Angew. Chem. Int. Ed. Engl. 27, 89-112 (1988). 413. J.-M. Lehn, Toward self-organization and complex matter. Science 295, 2400-2403 (2002).
556
F. T. Hong
414. M. LeMay, Asymmetries of the brains and skulls of nonhuman primates, in Cerebral Lateralization in Nonhuman Species, Ed. S. D. Glick (Academic Press, Orlando, San Diego, New York, London, Toronto, Montreal, Sydney and Tokyo, 1985), pp. 233-245. 415. M. R. Lepper, J. Henderlong and I. Gingras, Understanding the effects of extrinsic rewards on intrinsic motivation — uses and abuses of metaanalysis: comment on Deci, Koestner, and Ryan (1999). Psychol. Bull. 125, 669-676 (1999). 416. A. A. Lev, Y. E. Korchev, T. K. Rostovtseva, C. L. Bashford, D. T. Edmonds and C. A. Pasternak, Rapid switching of ion current in narrow pores: implications for biological channels. Proc. R. Soc. Lond. B252, 187-192 (1993). 417. I. B. Levitan, Modulation of ion channels by protein phosphorylation and dephosphorylation. Ann. Rev. Physiol. 56, 193-212 (1994). 418. J.-M. Levy-Leblond, Science's fiction. Nature 413, 573 (2001). 419. B. Libet, The experimental evidence for subjective referral of a sensory experience backwards in time: reply to P. S. Churchland. Philosophy of Science 48, 182-197 (1981). 420. B. Libet, Unconscious cerebral initiative and the role of conscious will in voluntary action. Behavioral and Brain Sciences 8, 529-566 (1985). 421. B. Libet, C. A. Gleason, E. W. Wright and D. K. Pearl, Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential): the unconscious initiation of a freely voluntary act. Brain 106, 623-642 (1983). 422. B. Libet, E. W. Wright, Jr., B. Feinstein and D. K. Pearl, Subjective referral of the timing for a conscious sensory experience: a functional role for the somatosensory specific projection system in man. Brain 102, 193-224 (1979). 423. C. M. Lieber, The incredible shrinking circuit. Sci. Am. 285(3), 58-64 (2001). 424. E. N. Lorenz, Deterministic nonperiodic flow. J. Atmosph. Sci. 20, 130-141 (1963). 425. E. Lorenz, The Essence of Chaos (University of Washington Press, Seattle, WA, 1993). 426. J.-P. Luminet, G. D. Starkman and J. R. Weeks, Is space finite? Sci. Am. 280(4), 90-97 (1999). 427. G. Luo, Three Kingdoms: A Historical Novel, abridged and translated by M. Roberts (Foreign Languages Press/University of California Press, Beijing, Berkeley, Los Angeles and London, 1999). 428. M. MacCallum, News and views: the breakdown of physics? Nature 257, 362 (1975). 429. F. Machlup, Are the social sciences really inferior? in How Economists Explain: A Reader in Methodology, Eds. W. L. Marr and B. Raj (University Press of America, Lanham, MD, New York and London, 1982), pp. 3-23. 430. R. J. Maclntyre, Ed., Molecular Evolutionary Genetics (Plenum, New York and London, 1985). 431. A. L. Mackay, A Dictionary of Scientific Quotations (Institute of Physics
Bicomputing Survey II
557
Publishing, Bristol and Philadelphia, PA, 1991). 432. M. C. Mackey, Microscopic dynamics and the second law of thermodynamics, in Time's Arrows, Quantum Measurement and Superluminal Behavior, Eds. C. Mugnai, A. Ranfagni, L. S. Schulman (Consiglio Nazionale Delle Ricerche, Roma, 2001), pp. 49-65. 433. B. B. Mandelbrot, The Fractal Geometry of Nature (W. H. Freeman, New York, 1982). 434. D. Mange, D. Madon, A. Stauffer and G. Tempesti, Von Neumann revisited: a Turing machine with self-repair and self-production properties. Robot. Autonom. Syst. 22, 35-58 (1997). 435. D. Mange, M. Sipper and P. Marchal, Embryonic electronics. BioSystems 51, 145-152 (1999). 436. D. Mange, M. Sipper, A. Stauffer and G. Tempesti, Toward robust integrated circuits: the embryonics approach. Proc. IEEE 88, 516-541 (2000). 437. D. Mange and M. Tomassini, Eds., Bio-Inspired Computing Machines: Towards Novel Computational Architectures (Presses Polytechniques et Universitaires Romandes, Lausanne, Switzerland, 1998). 438. C. Mao, W. Sun, Z. Shen and N. C. Seeman, A nanomechanical device based on the B-Z transition of DNA. Nature 397, 144-146 (1999). 439. H. Margolis, Paradigms and Barriers: How Habits of Mind Govern Scientific Beliefs (University of Chicago Press, Chicago and London, 1993). 440. R. C. Marsh, Ed., Bertrand Russell: Logic and Knowledge, Essays 19011950 (George Allen and Unwin, London, 1956). 441. C. Martindale, Cognition and Consciousness (Dorsey Press, Homewood, IL, 1981). 442. C. Martindale, Personality, situation, and creativity, in Handbook of Creativity, Eds. J. A. Glover, R. R. Ronning and C. R. Reynolds (Plenum, New York and London, 1989), pp. 211-232. 443. C. Martindale, Biological basis of creativity, in Handbook of Creativity, Ed. R. J. Sternberg (Cambridge University Press, Cambridge, New York and Melbourne, 1999), pp. 137-152. 444. K. Matsuno, Protobiology: Physical Basis of Biology (CRC Press, Boca Raton, FL, 1989). 445. A. Mavromatis, Hypnagogia: The Unique State of Consciousness between Wakefulness and Sleep (Routledge and Kegan Paul, London and New York, 1987). 446. J. Maynard-Smith, Evolution and the Theory of Games (Cambridge University Press, Cambridge, London and New York, 1982). 447. W. S. McCulloch and W. Pitts, A logical calculus of the ideas of immanent in nervous activity. Bull. Math. Biophys. 5, 115-133 (1943). 448. L. C. McDermott, How we teach and how students learn, in Promoting Active Learning in the Life Science Classroom, Annal. NY Acad. Sci. Vol. 701, Eds. H. I. Modell and J. A. Michael (New York Academy of Sciences, New York, 1993), pp. 9-20. 449. C. McGinn, The Mysterious Flame: Conscious Minds in a Material World (Basic Books, New York, 1999).
558
F. T. Hong
450. W. J. McKeachie, Recent research on university learning and teaching: implications for practice and future research. Academic Medicine 67(10), S84S87 (1992). 451. P. B. Medawar, The Art of the Soluble (Methuen, London, 1967). 452. S. A. Mednick, The associative basis of the creative process. Psychological Rev. 69, 220-232 (1962). 453. G. A. Mendelsohn, Associative and attentional processes in creative performance. J. Personality 44, 341-369 (1976). 454. W. H. Merigan and J. H. R. Maunsell, How parallel are the primate visual pathways? Annu. Rev. Neurosci. 16, 369-402 (1993). 455. M. L. Merlau, M. del Pilar Mejia, S. T. Nguyen and J. T. Hupp, Artificial enzymes formed through directed assembly of molecular square encapsulated epoxidation catalysts. Angew. Chem. Int. Ed. Engl. 40, 4239-4242 (2001). 456. S. B. Merriam and R. S. Caffarella, Learning in Adulthood: A Comprehensive Guide, 2nd edition (Jossey-Bass Publishers, San Francisco, 1999). 457. R. K. Merton, On the Shoulders of Giants: A Shandean Postscript, PostItalianate edition (University of Chicago Press, Chicago and London, 1993). 458. R. K. Merton and E. Barber, The Travels and Adventures of Serendipity: A Study in Sociological Semantics and the Sociology of Science (Princeton University Press, Princeton and Oxford, 2004). 459. J. Metcalfe, Feeling of knowing in memory and problem solving. J. Exp. Psychol.: Learning, Memory and Cognition 12, 288-294 (1986). 460. J. Metcalfe, Premonitions of insight predict impending error. J. Exp. Psychol.: Learning, Memory and Cognition 12, 623-634 (1986). 461. J. Metcalfe and D. Wiebe, Intuition in insight and noninsight problem solving. Memory and Cognition 15, 238-246 (1987). 462. R. M. Metzger, B. Chen, U. Hopfner, M. V. Lakshmikantham, D. Vuillaume, T. Kawai, X. Wu, H. Tachibana, T. V. Hughes, H. Sakurai, J. W. Baldwin, C. Hosch, M. P. Cava, L. Brehmer and G. J. Ashwell, Unimolecular electrical rectification in hexadecylquinolinium tricyanoquinodimethanide. J. Am. Chem. Soc. 119, 10455-10466 (1997). 463. H. Michel and J. Deisenhofer, Relevance of the photosynthetic reaction center from purple bacteria to the structure of photosystem II. Biochemistry 27, 1-7 (1988). 464. A. I. Miller, Imagery in Scientific Thought: Creating 20th-century Physics (Birkhauser, Boston, Basel and Stuttgart, 1984). 465. A. I. Miller, Insights of Genius: Imagery and Creativity in Science and Art (Copernicus/Springer-Verlag, New York, 1996). 466. E. K. Miller, L. Li and R. Desimone, A neural mechanism for working and recognition memory in inferior temporal cortex. Science 254, 1377-1379 (1991). 467. G. A. Miller, The magic number seven, plus or minus two: some limits on our capacity for processing information. Psychological Rev. 63, 81-97 (1956). 468. L. K. Miller, Musical Savants: Exceptional Skill in the Mentally Retarded (Lawrence Erlbaum Associates, Hillsdale, NJ, Hove and London, 1989).
Bicomputing Survey II
559
469. R. L. Millstein, Random drift and the omniscient viewpoint. Philosophy of Science 63, S10-S18 (1996). 470. M. Minsky, The Society of Mind (Simon and Schuster, New York, London, Toronto, Sydney, Tokyo and Singapore, 1986). 471. L. Mirny and E. Shakhnovich, Protein folding theory: from lattice to allatom models. Annu. Rev. Biophys. Biomol. Struct. 30, 361-396 (2001). 472. A. Miyake and P. Shah, Toward unified theories of working memory: emerging general consensus, unresolved theoretical issues, and future research directions, in Models of Working Memory: Mechanisms of Active Maintenance and Executive Control, Eds. A. Miyake and P. Shah (Cambridge University Press, Cambridge, 1999), pp. 442-481. 473. Y. Miyashita, Neuronal correlate of visual associative long-term memory in the primate temporal cortex. Nature 335, 817-820 (1988). 474. F. Moss and K. Wiesenfeld, The benefits of background noise. Sci. Am. 273(2), 66-69 (1995). 475. P. Mueller, D. O. Rudin, H. T. Tien and W. C. Wescott, Reconstitution of cell membrane structure in vitro and its transformation into an excitable system. Nature 194, 979-980 (1962). 476. Y. Mukohata, K. Ihara, T. Tamura and Y. Sugiyama, Halobacterial rhodopsins. J. Biochem. (Tokyo) 125, 649-657 (1999). 477. K.-R. Miiller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf, An introduction to kernel-based learning algorithms, in Handbook of Neural Network Signal Processing, Eds. Y. H. Hu and J.-N. Hwang (CRC Press, Boca Raton, London, New York and Washington, DC, 2002), pp. 4-1-4-40. 478. M. D. Mumford and P. P. Porter, Analogies, in Encyclopedia of Creativity, Vol. 1, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 71-77. 479. R. Murch and T. Johnson, Intelligent Software Agents (Prentice Hall PTR, Upper Saddle River, NJ, 1999). 480. G. Musser, A hole at the heart of physics. Sci. Am. 287(3), 48-49 (2002). 481. D. G. Myers, Intuition: Its Powers and Perils (Yale University Press, New Haven, CT, and London, 2002). 482. C. R. Mynatt, M. E. Doherty and R. D. Tweney, Confirmation bias in a simulated research environment: an experimental study of scientific inference. Quarterly J. Experimental Psychology 29, 85-95 (1977). 483. E. Nagel and J. R. Newman, Godel's Proof (New York University Press, New York, 1958). 484. T. Nagel, What is it like to be a bat? Philosophical Rev. 83, 435-450 (1974). 485. T. Nagel, Other Minds: Critical Essays 1969-1994 (Oxford University Press, New York and Oxford, 1995). 486. D. Nelkin and L. Tancredi, Dangerous Diagnostics: The Social Power of Biological Information (University of Chicago Press, Chicago and London, 1994). 487. M. Newborn, Kasparov versus Deep Blue: Computer Chess Comes of Age (Springer-Verlag, New York, Berlin and Heidelberg, 1997). 488. A. Newell, J. C. Shaw and H. A. Simon, The processes of creative thinking,
560
489.
490. 491. 492. 493. 494. 495.
496. 497. 498.
499. 500. 501. 502. 503.
F. T. Hong
in Contemporary Approaches to Creative Thinking: A Symposium Held at the University of Colorado, Eds. H. E. Gruber, G. Terrell and M. Wertheimer (Atherton Press, New York, 1962), pp. 63-119. A. Newell and H. A. Simon, GPS, a program that simulates human thought, in Computers and Thought, Eds. E. A. Feigenbaum and J. Feldman (McGraw-Hill, New York, San Francisco, Toronto and London, 1963), pp. 279-293. A. Newell and H. A. Simon, Human Problem Solving (Prentice-Hall, Englewood Cliffs, NJ, 1972). C. G. Nichols and A. N. Lopatin, Inward rectifier potassium channels. Annu. Rev. Physiol. 59, 171-191 (1997). G. Nicolis and I. Prigogine, Exploring Complexity: An Introduction (W. H. Freeman, New York, 1989). J. Nield, C. Funk and J. Barber, Supermolecular structure of photosystem II and location of the PsbS protein. Phil. Trans. R. Soc. Lond. B355, 13371344 (2000). L. D. Noppe, Unconscious, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 673-679. D. L. Oden, R. K. R. Thompson and D. Premack, Can an ape reason analogically?: comprehension and production of analogical problems by Sarah, a chimpanzee (Pan troglodytes), in The Analogical Mind: Perspectives from Cognitive Science, Eds. D. Gentner, K. J. Holyoak and B. N. Kokinov (MIT Press, Cambridge, MA, and London, 2001), pp. 471-497. D. Oesterhelt and W. Stoeckenius, Rhodopsin-like protein from the purple membrane of Halobacterium halobium. Nature New Biol. 233, 149-152 (1971). . D. Oesterhelt and J. Tittor, Two pumps, one principle: light-driven ion transport in Halobacteria. Trends Biochem. Set. 14, 57-61 (1989). A. S. Ogrenci, System level requirements for molecular computing architectures in realizing information processing systems, in Proceedings of the NATO International School on Molecular Electronics: Bio-sensors and Biocomputers, June 24 - July 4, 2002, Pisa, Italy, Ed. S. Vestri (NATO Advanced Study Institute, Pisa, Italy, 2002), p. 58. N. Oreskes, K. Shrader-Frechette and K. Belitz, Verification, validation, and confirmation of numerical models in the earth sciences. Science 263, 641-646 (1994). R. Ornstein, The Right Mind: Making Sense of the Hemispheres (Harcourt Brace, San Diego, New York and London, 1997). A. Pais, 'Subtle is the Lord . . . ': The Science and the Life of Albert Einstein (Oxford University Press, Oxford and New York, 1982). H. Park, J. Park, A. K. L. Lim, E. H. Anderson, A. P. Alivisatos and P. L. McEuen, Nanomechanical oscillations in a single-Cgo transistor. Nature 407, 57-60 (2000). D. Partridge and J. Rowe, Computers and Creativity (Intellect Books, Oxford, 1994).
Bicomputing Survey II
561
504. R. Paton and I. Neilson, Eds., Visual Representations and Interpretations (Springer-Verlag, London, Berlin, Heidelberg, New York, Barcelona, Hong Kong, Milan, Paris, Santa Clara, Singapore and Tokyo, 1999). 505. H. H. Pattee, Causation, control, and the evolution of complexity, in Downward Causation: Minds, Bodies and Matter, Eds. P. B. Andersen, C. Emmeche, N. O. Finnemann and P. V. Christiansen (Aarhus University Press, Aarhus, Denmark, 2000), pp. 63-77. 506. F. G. P. Patterson and R. H. Cohn, Self-recognition and self-awareness in lowland gorillas, in Self-awareness in Animals and Humans: Developmental Perspectives, Eds. S. T. Parker, R. W. Mitchell and M. L. Boccia (Cambridge University Press, Cambridge, New York and Melbourne, 1994), pp. 273-290. 507. F. Patterson and E. Linden, The Education of Koko (Holt, Rinehart and Winston, New York, 1981). 508. T. C. Pearce, Computational parallels between the biological olfactory pathway and its analogue 'The Electronic Nose': Part I. biological olfaction. BioSystems 41, 43-67 (1997). 509. T. C. Pearce, Computational parallels between the biological olfactory pathway and its analogue 'The Electronic Nose': Part II. sensor-based machine olfaction. BioSystems 41, 69-90 (1997). 510. P. J. E. Peebles, Making sense of modern cosmology. Sci. Am. 284(1), 54-55 (2001). 511. R. Penrose, Precis of The Emperor's New Mind: Concerning computers, minds, and the laws of physics. Behavioral and Brain Sciences 13, 643-705 (1990). 512. R. Penrose, The Emperor's New Mind: Concerning Computers, Minds, and the Laws of Physics (Penguin Books, New York and London, 1991). 513. R. Penrose, Shadows of the Mind: A Search for the Missing Science of Consciousness (Oxford University Press, Oxford, New York and Melbourne, 1994). 514. L. I. Perlovsky, Neural Networks and Intellect: Using Model-Based Concepts (Oxford University Press, New York and Oxford, 2001). 515. H. L. Petri and M. Mishkin, Behaviorism, cognitivism and the neuropsychology of memory. American Scientist (Sigma Xi) 82, 30-37 (1994). 516. J. Piaget, The Origins of Intelligence in Children, translated by M. Cook (International Universities Press, New York, 1952). 517. J. Piaget, The Construction of Reality in the Child, translated by M. Cook (Basic Books, New York, 1954). 518. S. Pinker, The Language Instinct (William Morrow, New York, 1994). 519. S. Pinker, The Blank Slate: The Modern Denial of Human Nature (Viking Penguin, New York, London, Camberwell (Australia), Toronto, New Delhi, Auckland and Johannesburg, 2002). 520. H. Poincare, Sur le probleme des trois corps et les equations de la dynamique. Acta Mathematica 13, 1-270 (1890). 521. H. Poincare, The Foundations of Science: Science and Hypothesis, The Value of Science, Science and Method, translated by G. B. Halsted (Science Press, New York and Garrison, NY, 1913).
562
F. T. Hong
522. E. Policastro, Intuition, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 89-93. 523. E. Policastro and H. Gardner, Prom case studies to robust generalizations: an approach to the study of creativity, in Handbook of Creativity, Ed.R. J. Sternberg (Cambridge University Press, Cambridge, New York and Melbourne, 1999), pp. 213-225. 524. K. R. Popper, Indeterminism in quantum physics and in classical physics: part I. Brit. J. Phil. Sci. 1, 117-133 (1950). 525. K. R. Popper, Indeterminism in quantum physics and in classical physics: part II. Brit. J. Phil. Sci. 1, 173-195 (1950). 526. K. R. Popper, The Logic of Scientific Discovery, revised edition (Hutchinson, London, 1968). Reprinted (Routledge, London and New York, 1992). Original German version: Logik der Forschung (Vienna, 1934). 527. K. R. Popper, Indeterminism is not enough. Encounter 40(4), 20-26 (1973). 528. K. R. Popper, Objective Knowledge: An Evolutionary Approach, revised edition (Oxford University Press, Oxford, 1979). 529. K. R. Popper, The Open Universe: An Argument for Indeterminism (Rowman and Littlefield, Totowa, NJ, 1982). Reprinted (Routledge, London and New York, 1998). 530. K. Popper, Unended Quest: An Intellectual Autobiography, revised edition (Routledge, London, 1992). 531. K. R. Popper and J. C. Eccles, The Self and Its Brain (Routledge, London and New York, 1977). 532. M. A. Porter and R. L. Liboff, Chaos on the quantum scale. American Scientist (Sigma Xi) 89, 532-537 (2001). 533. M. I. Posner and S. Dehaene, Attentional networks. Trends Neurosci. 17, 75-79 (1994). 534. M. I. Posner and S. E. Petersen, The attention system of the human brain. Anna. Rev. Neurosci. 13, 25-42 (1990). 535. M. C. Potter, Remembering, in An Invitation to Cognitive Science, Volume 3: Thinking, Eds. D. N. Osherson and E. E. Smith (MIT Press, Cambridge, MA, and London, 1990), pp. 3-32. 536. R. Prentky, Creativity and psychopathology: gamboling at the seat of madness, in Handbook of Creativity, Eds. J. A. Glover, R. R. Ronning and C. R. Reynolds (Plenum, New York and London, 1989), pp. 243-269. 537. K. H. Pribram, Brain and Perception: Holonomy and Structure in Figural Processing (Lawrence Erlbaum Associates, Hillsdale, NJ, Hove and London, 1991). 538. K. H. Pribram, A century of progress? in Neuroscience of the Mind on the Centennial of Freud's Project for a Scientific Psychology, Annal. NY Acad. Sci. Vol. 843, Eds. R. M. Bilder and F. F. LeFever (New York Academy of Sciences, New York, 1998), pp. 11-19. 539. K. H. Pribram and M. M. Gill, Freud's 'Project' Re-assessed: Preface to Contemporary Cognitive Theory and Neuropsychology (Basic Books, New York, 1976).
Bicomputing Survey II
563
540. I. Prigogine, Introduction to Thermodynamics of Irreversible Processes, 3rd edition (John Wiley and Sons, New York, London and Sydney, 1967). 541. I. Prigogine, The microscopic meaning of irreversibility. Z. Phys. Chemie (Leipzig) 270, 477-490 (1989). 542. I. Prigogine, The End of Certainty: Time, Chaos, and the New Laws of Nature (Free Press, New York, London, Toronto, Sydney and Singapore, 1997). 543. I. Prigogine and I. Stengers, Order Out of Chaos: Man's Dialogue with Nature (Bantam Books, New York, Toronto, London, Sydney and Auckland, 1984). 544. J. W. S. Pringle, On the parallel between learning and evolution. Behaviour 3, 174-215 (1951). 545. D. Pum and U. B. Sleytr, The application of bacterial S-layer in molecular nanotechnology. Trends Biotech. 17, 8-12 (1999). 546. Z. W. Pylyshyn, What the mind's eye tells the mind's brain: a critique of mental imagery. Psychol. Bull. 80, 1-24 (1973). 547. Z. W. Pylyshyn, Computation and cognition: issues in the foundations of cognitive science. Behavioral and Brain Sciences 3, 111-169 (1980). 548. W. G. Quinn and R. J. Greenspan, Learning and courtship in Drosophila: two stories with mutants. Annu. Rev. Neurosci. 7, 67-93 (1984). 549. N. G. Rambidi, Non-discrete biomolecular computing: an approach to computational complexity. BioSystems 31, 3-13 (1993). 550. A. Rand, Atlas Shrugged, Random House, New York, 1957. The 35th anniversary edition (Penguin Putnam, New York, 1992). 551. A. S. Rao and M. P. Georgeff, Modeling rational agents within a BDIarchitecture, in Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning, April 22-25, 1991, Cambridge, MA, Eds. J. Allen, R. Fikes and E. Sandewall (Morgan Kaufmann Publishers, San Mateo, CA, 1991), pp. 473-484. 552. R. P. N. Rao and D. H. Ballard, Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neurosci. 2, 79-87 (1999). 553. N. Rashevsky, Mathematical Biophysics: Physico-Mathematical Foundations of Biology, 3rd edition, Vols. 1 and 2 (Dover, New York, 1960). 554. F. H. Rauscher and G. L. Shaw, Key components of the Mozart effect. Percept. Motor Skills 86, 835-841 (1998). 555. F. H. Rauscher, G. L. Shaw and K. N. Ky, Music and spatial task performance. Nature 365, 611 (1993). 556. F. H. Rauscher, G. L. Shaw and K. N. Ky, Listening to Mozart enhances spatial-temporal reasoning: towards a neurophysiological basis. Neurosci. Lett. 185, 44-47 (1995). 557. J. Rebek, Jr., Synthetic self-replicating molecules. Sci. Am. 271(1), 48-55 (1994). 558. M. A. Reed and J. M. Tour, Computing with molecules. Sci. Am. 282(6), 86-93 (2000). 559. J. Rees, Complex disease and the new clinical sciences. Science 296, 698-
564
F. T. Hong
701 (2002). 560. H. Reichgelt, Knowledge Representation: An AI Perspective (Ablex Publishing, Norwood, NJ, 1991). 561. D. N. Reinhoudt and M. Crego-Calama, Synthesis beyond the molecule. Science 295, 2403-2407 (2002). 562. K.-H. Rhee, E. P. Morris, J. Barber and W. Kiihlbrandt, Three-dimensional structure of the plant photosystem II reaction centre at 8 A resolution. Nature 396, 283-286 (1998). 563. H. Rheingold, Excursions to the Far Side of the Mind: A Book of Memes (Beech Tree Books, William Morrow, New York, 1988). 564. G. Roberts, Ed., Langmuir-Blodgett Films (Plenum, New York and London, 1990). 565. R. M. Roberts, Serendipity: Accidental Discoveries in Science (John Wiley and Sons, New York, Chichester, Brisbane, Toronto and Singapore, 1989). 566. W. H. Roetzheim, Enter the Complexity Lab: Where Chaos Meets Complexity (Sams Publishing, Indianapolis, IN, 1994). 567. R. S. Root-Bernstein, Setting the stage for discovery. The Sciences (New York Academy of Sciences) 28(3), 26-35 (1988). 568. R. Rosen, Anticipatory Systems: Philosophical, Mathematical and Methodological Foundations (Pergamon Press, Oxford, New York, Toronto, Sydney, Paris and Frankfurt, 1985). 569. R. Rosen, Life Itself: A Comprehensive Inquiry Into the Nature, Origin, and Fabrication of Life (Columbia University Press, New York, 1991). 570. R. Rosen, Essays on Life Itself (Columbia University Press, New York, 2000). 571. A. Rosenberg, Is the theory of natural selection a statistical theory? in Philosophy and Biology, Canadian Journal of Philosophy Supplementary, Vol. 14, Eds. M. Matthen and B. Linsky (University of Calgary Press, Calgary, Alberta, Canada, 1988), pp. 187-207. 572. A. Rosenberg, Instrumental Biology or The Disunity of Science (University of Chicago Press, Chicago and London, 1994). 573. B. S. Rothberg and K. L. Magleby, Testing for detailed balance (microscopic reversibility) in ion channel gating. Biophys. J. 80, 3025-3026 (2001). 574. A. Rothenberg, Creativity and Madness (Johns Hopkins University Press, Baltimore and London, 1990). 575. D. Ruelle, Chance and Chaos (Princeton University Press, Princeton, NJ, 1991). 576. M. Runco, Divergent Thinking (Ablex Publishing, Norwood, NJ, 1991). 577. M. A. Runco, Ed., The Creativity Research Handbook, Vol. 1 (Hampton Press, Cresskill, NJ, 1997). 578. M. A. Runco, Divergent thinking, in Encyclopedia of Creativity, Vol. 1, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 577-582. 579. M. A. Runco and S. R. Pritzker, Eds., Encyclopedia of Creativity, Vols. 1 and 2 (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999).
Bicomputing Survey II
565
580. J. Rushton, The Musical Language of Berlioz (Cambridge University Press, Cambridge, London, New York, New Rochelle, Melborne and Sydney, 1983). 581. S. W. Russ, Affect and Creativity: The Role of Affect and Play in the Creative Process (Lawrence Erlbaum Associates, Hillsdale, NJ, Hove and London, 1993). 582. B. Russell, Human Knowledge: Its Scope and Limits (Simon and Schuster, New York, 1948). 583. D. L. Sackett, S. E. Straus, W. S. Richardson, W. Rosenberg and R. B. Haynes, Evidence-Based Medicine: How to Practice and Teach EBM, 2nd edition (Churchill Livingstone, Edinburgh, London, New York, Philadelphia, St. Louis, Sydney and Toronto, 2000). 584. C. Sagan, The Dragons of Eden: Speculations on the Evolution of Human Intelligence (Ballantine Books, New York, 1977). 585. W. C. Salmon, Ed., Zeno's Paradoxes (Bobbs-Merrill Co., Indianapolis, IN, 1970). Reprinted (Hackett Publishing, Indianapolis, IN, and Cambridge, 2001). 586. R. Samples, Neurophysiology and a new look at curriculum. Thrust for Education Leadership 5(3), 8-10 (1976). 587. E. Sanchez and M. Tomassini, Eds., Towards Evolvable Hardware: The Evolutionary Engineering Approach, Lecture Notes in Computer Science, No. 1062 (Springer-Verlag, Berlin, Heidelberg and New York, 1996). 588. E. Sandewall, Features and Fluents: The Representation of Knowledge about Dynamical Systems, Vol. 1 (Clarendon Press, Oxford, 1994). 589. J. Sasaki, L. S. Brown, Y.-S. Chon, H. Kandori, A. Maeda, R. Needleman and J. K. Lanyi, Conversion of bacteriorhodopsin into a chloride ion pump. Science 269, 73-75 (1995). 590. K. Schmidt-Koenig, Internal clocks and homing. Cold Spring Harbor Symp. Quant. Biol. 25, 389-393 (1960). 591. K. Schmidt-Koenig, Avian Orientation and Navigation (Academic Press, London, New York and San Francisco, 1979). 592. J. W. Schooler and T. Y. Engstler-Schooler, Verbal overshadowing of visual memories: some things are better left unsaid. Cognitive Psychology 22, 3671 (1990). 593. J. W. Schooler and J. Melcher, The ineffability of insight, in The Creative Cognitive Approach, Eds. S. M. Smith, T. B. Ward and R. A. Finke (MIT Press, Cambridge, MA, and London, 1995), pp. 97-133. 594. J. W. Schooler, S. Ohlsson and K. Brooks, Thoughts beyond words: when language overshadows insight. J. Exp. Psychol.: General 122, 166-183 (1993). 595. E. Schrodinger, Indeterminism and free will. Nature, July 4 Issue, pp. 13-14 (1936). 596. E. Schrodinger, What is Life?: The Physical Aspect of the Living Cell (Cambridge University Press, Cambridge, and Macmillan Company, New York, 1945). 597. W.-D. Schubert, O. Klukas, N. Kraufi, W. Saenger, P. Fromme and H. T. Witt, Photosystem I of Synechococcus elongatus at 4 A resolution: compre-
566
F. T. Hong
hensive structure analysis. J. Mol. Biol. 272, 741-769 (1997). 598. D. Schuldberg and L. A. Sass, Schizophrenia, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 501-514. 599. L. S. Schulman, Time's Arrows and Quantum Measurement (Cambridge University Press, Cambridge, New York and Melbourne, 1997). 600. W. Schultz and A. Dickinson, Neuronal coding of prediction errors. Annu. Rev. Neurosci. 23, 473-500 (2000). 601. J. M. Schwartz and S. Begley, The Mind and the Brain: Neuroplasticity and the Power of Mental Force (ReganBooks/HarperCollins, New York, 2002). 602. J. R. Searle, Minds, brains, and programs. Behavioral and Brain Sciences 3, 417-457 (1980). 603. J. Searle, Minds, Brains and Science: The 1984 Reith Lectures (British Broadcasting Corporation, London, 1984). 604. J. R. Searle, Is the brain's mind a computer program? Sci. Am. 262(1), 26-31 (1990). 605. J. R. Searle, Chinese room argument, in The MIT Encyclopedia of the Cognitive Sciences, Eds. R. A. Wilson and F. C. Keil (MIT Press, Cambridge, MA, and London, 1999), pp. 115-116. 606. J. R. Searle, Consciousness. Annu. Rev. Neurosci. 23, 557-578 (2000). 607. N. C. Seeman, DNA nanotechnology: novel DNA constructions. Annu. Rev. Biophys. Biomol. Struct. 27, 225-248 (1998). 608. N. C. Seeman, DNA engineering and its application to nanotechnology. Trends Biotech. 17, 437-443 (1999). 609. J. Shear, Ed., Explaining Consciousness: The 'Hard Problem' (MIT Press, Cambridge, MA, and London, 1997). 610. D. Shekerjian, Uncommon Genius: How Great Ideas Are Born (Viking, Penguin Books, New York, London, Victoria, Markham and Auckland, 1990). 611. K. M. Sheldon, Conformity, in Encyclopedia of Creativity, Vol. 1, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 341-346. 612. Y. Shen, C. R. Safinya, K. S. Liang, A. F. Ruppert and K. J. Rothschild, Stabilization of the membrane protein bacteriorhodopsin to 140° C in twodimensional films. Nature 366, 48-50 (1993). 613. R. N. Shepard, Externalization of mental images and the act of creation, in Visual Learning, Thinking, and Communication, Eds. B. S. Randhawa and W. E. Coffman (Academic Press, New York, San Francisco and London, 1978), pp. 133-189. 614. R. N. Shepard, The mental image. American Psychologist 33, 125-137 (1978). 615. M. Shermer, The Captain Kirk Principle. Sci. Am. 287(6), 39 (2002). 616. M. Shidara and B. J. Richmond, Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296, 1709-1711 (2002). 617. J. Shields, Monozygotic Twins Brought Up Apart and Brought Up Together: An Investigation into the Genetic and Environmental Causes of Variation in Personality (Oxford University Press, London, New York and Toronto,
Bicomputing Survey II
567
1962). 618. J. C. Shih, K. Chen and M. J. Ridd, Monoamine oxidase: from genes to behavior. Annu. Rev. Neurosci. 22, 197-217 (1999). 619. F. J. Sigworth and E. Neher, Single Na + channel currents observed in cultured rat muscle cells. Nature 287, 447-449 (1980). 620. H. A. Simon, Theories of bounded rationality, in Decision and Organization, Eds. C. B. Radner and R. Radner (North-Holland Publishing, Amsterdam, 1972), pp. 161-176. Reprinted in Models of Bounded Rationality, Volume 2: Behavioral Economics and Business Organization, Ed. H. A. Simon (MIT Press, Cambridge, MA, and London, 1982), pp. 408-423. 621. H. A. Simon, Does scientific discovery have a logic? Philosophy of Science 40, 471-480 (1973). 622. H. A. Simon, How big is a chunk? Science 183, 482-488 (1974). 623. H. A. Simon, Models of Discovery, Boston Studies in the Philosophy of Science, Vol. 54 (D. Reidel Publishing, Dordrecht and Boston, 1977). 624. H. A. Simon, Models of Thought (Yale University Press, New Haven, CT, and London, 1979). 625. H. A. Simon, Rational decision making in business organizations. American Economic Review 69, 493-513 (1979). Reprinted in Models of Bounded Rationality, Volume 2: Behavioral Economics and Business Organization, Ed. H. A. Simon (MIT Press, Cambridge, MA, and London, 1982), pp. 474-494. 626. H. A. Simon, The role of attention in cognition, in The Brain, Cognition, and Education, Eds. S. L. Friedman, K. A. Klivington and R. W. Peterson (Academic Press, Orlando, San Diego, New York, Austin, London, Montreal, Sydney, Tokyo and Toronto, 1986), pp. 105-115. 627. H. A. Simon, The information processing explanation of Gestalt phenomena. Computers in Human Behavior 2, 241-255 (1986). 628. H. A. Simon, Models of Thought, Volume II (Yale University Press, New Haven, CT, and London, 1989). 629. H. A. Simon, Scientific discovery as problem solving, in Economics, Bounded Rationality and the Cognitive Revolution, Eds. M. Egidi and R. Marris (Edward Elgar Publishing, Hants, UK, and Brookfield, VT, 1992), pp. 102-119. 630. H. A. Simon, Administrative Behavior: A Study of Decision-Making Processes in Administrative Organizations, 4th edition (Free Press, New York, London, Toronto, Sydney and Singapore, 1997). 631. H. A. Simon and A. Newell, Heuristic problem solving: the next advance in operations research. Operations Research 6, 1-10 (1958). Reprinted in Models of Bounded Rationality, Volume 1: Economic Analysis and Public Policy, Ed. H. A. Simon (MIT Press, Cambridge, MA, and London, 1982), pp. 380-389. 632. D. K. Simonton, Genius, Creativity, and Leadership: Histriometric Inquiries (Harvard University Press, Cambridge, MA, and London, 1984). 633. D. K. Simonton, Scientific Genius: A Psychology of Science (Cambridge University Press, Cambridge, New York, Port Chester, Melbourne and Sydney, 1988). 634. S. Singh, Fermat's Enigma (Anchor Books/Doubleday, New York, London,
568
F. T. Hong
Toronto, Sydney and Auckland, 1997). 635. M. Sipper, E. Sanchez, D. Mange, M. Tomassini, A. Perez-Uribe and A. Stauffer, A phylogenetic, ontogenetic, and epigenetic view of bio-inspired hardware systems. IEEE Trans. Evolutionary Computation 1, 83-97 (1997). 636. B. F. Skinner, Science and Human Behavior (Macmillan, New York, 1953). 637. B. F. Skinner, About Behaviorism (Alfred A. Knopf, New York, 1974). 638. J. G. Slater, Ed., Bertrand Russell: Logical and Philosophical Papers, 19091913, The Collected Papers of Bertrand Russell, Vol. 6 (Routledge, London and New York, 1992). 639. R. E. Smalley, Of chemistry, love and nanorobots. Sci. Am. 285(3), 76-77 (2001). 640. A. Sokal and J. Bricmont, Fashionable Nonsense: Postmodern Intellectuals' Abuse of Science (Picador USA/St. Martin's Press, New York, 1998). 641. M. Solms, Before and after Freud's Project, in Neuroscience of the Mind on the Centennial of Freud's Project for a Scientific Psychology, Annal. NY Acad. Sci. Vol. 843, Eds. R. M. Bilder and F. F. LeFever (New York Academy of Sciences, New York, 1998), pp. 1-10. 642. M. Solms, Freud returns. Sci. Am. 290(5), 82-89 (2004). 643. R. L. Solso, Ed., Mind and Brain Sciences in the 21st Century (MIT Press, Cambridge, MA, and London, 1997). 644. R. W. Sperry, Lateral specialization in the surgically separated hemispheres, in The Neurosciences: Third Study Program, Eds. F. O. Schmitt and F. G. Worden (MIT Press, Cambridge, MA, and London, 1974), pp. 5-19. 645. R. Sperry, Some effects of disconnecting the cerebral hemisphere. Science 217, 1223-1226 (1982). 646. S. P. Springer and G. Deutsch, Left Brain, Right Brain, 3rd edition (W. H. Freeman, New York, 1989). 647. S. A. Stahl, Different strokes for different folks?: a critique of learning styles. American Educator (American Federation of Teachers) 23(3), 27-31 (1999). 648. K. M. Steele, S. D. Bella, I. Peretz, T. Dunlop, L. A. Dawe, G. K. Humphrey, R. A. Shannon, J. L. Kirby, Jr. and C. G. Olmstead, Prelude or requiem for the 'Mozart effect'? Nature 400, 827 (1999). 649. I. Z. Steinberg, On the time reversal of noise signals. Biophys. J. 50, 171-179 (1986). 650. R. J. Sternberg, Ed., Handbook of Creativity (Cambridge University Press, Cambridge, New York and Melbourne, 1999). 651. R. J. Sternberg and J. E. Davidson, Eds., The Nature of Insight (MIT Press, Cambridge, MA, and London, 1995). 652. R. J. Sternberg and J. E. Davidson, Insight, in Encyclopedia of Creativity, Vol. 2, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 57-69. 653. R. J. Sternberg and T. I. Lubart, Defying the Crowd: Cultivating Creativity in a Culture of Conformity (Free Press, New York, London, Toronto, Sydney, Tokyo and Singapore, 1995). 654. R. J. Sternberg and L. A. O'Hara, Creativity and intelligence, in Handbook
Bicomputing Survey II
655. 656. 657. 658. 659. 660. 661. 662. 663. 664. 665.
666.
667. 668. 669. 670. 671. 672. 673.
569
of Creativity, Ed. R. J. Sternberg (Cambridge University Press, Cambridge, New York and Melbourne, 1999), pp. 251-272. G. Stix, A license for copycats? Set. Am. 284(6), 36 (2001). N. Sussman, Misuse of "evidence": an open secret. Primary Psychiatry 10(9), 14 (2003). W. Sutton (with E. Linn), Where the Money Was (Viking Press, New York, 1976). E. Teller, Science and morality. Science 280, 1200-1201 (1998). H. S. Terrace, Nim (Alfred A. Knopf, New York, 1979). H. Terrace, Thought without words, in Mind-waves: Thoughts on Intelligence, Identity, and Consciousness, Eds. C. Blakemore and S. Greenfield (Basil Blackwell, Oxford, 1987), pp. 123-137. H. S. Terrace, L. A. Petitto, R. J. Sanders and T. G. Bever, Can an ape create a sentence? Science 206, 891-902 (1979). N. Tesla, Moji Pronalasci/My Inventions, in both Serbian and English (Skolska Knjiga, Zagreb, 1977). Reprinted English version (Nikola Tesla Museum, Belgrade, 2003). M. O. Thirunarayanan, Higher education's newest blight: degree inflation. AFT On Campus (American Federation of Teachers) 21(2), 14 (2001). J. M. T. Thompson and H. B. Stewart, Nonlinear Dynamics and Chaos (John Wiley and Sons, Chichester, New York, Brisbane, Toronto and Singapore, 1986). H. T. Tien, Introduction, Symposium: Lipid Monolayer and Bilayer Models and Cellular Membranes (Part II), conducted by The American Oil Chemists' Society at its 58th Annual Spring Meeting, New Orleans, Louisiana, May 7-10, 1967. J. Amer. Oil Chemists' Society 45, 201 (1968). H. T. Tien and A. Ottova-Leitmannova, Membrane Biophysics: As Viewed from Experimental Bilayer Lipid Membranes (Planar Lipid Bilayers and Spherical Liposomes) (Elsevier, Amsterdam, Lausanne, New York, Oxford, Shannon, Singapore and Tokyo, 2000). H. T. Tien, Z. Salamon and A. Ottova, Lipid bilayer-based sensors and biomolecular electronics. Crit. Rev. Biomed. Eng. 18, 323-340 (1991). J. Timpane, How to convince a reluctant scientist. Sci. Am. 272(1), 104 (1995). P. M. Todd, The animat path to intelligent adaptive behavior. IEEE Computer 25(11), 78-81 (1992). R. C. Tolman, The Principles of Statistical Mechanics (Oxford University Press, Glasgow, New York, Toronto, Melbourne and Wellington, 1938). Reprinted (1967). G. Tononi and G. M. Edelman, Consciousness and complexity. Science 282, 1846-1851 (1998). G. Toplyn, Attention, in Encyclopedia of Creativity, Vol. 1, Eds. M. A. Runco and S. R. Pritzker (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1999), pp. 141-146. D. A. Treffert, Extraordinary People: Understanding "Idiot Savants" (Harper and Row, New York, 1989).
570
F. T. Hong
674. P. Trout, Flunking the test: the dismal record of student evaluations. Academe: Bull. Amer. Assoc. Univ. Prof. 86(4), 58-61 (2000). 675. J. Z. Tsien, Building a brainier mouse. Sci. Am. 282(4), 42-48 (2000). 676. R. W. Tsien, P. Hess, E. W. McCleskey and R. L. Rosenberg, Calcium channels: mechanisms of selectivity, permeation, and block. Annu. Rev. Biophys. Biophys. Chem. 16, 265-290 (1987). 677. R. W. Tsien, D. Lipscombe, D. V. Madison, K. R. Bley and A. P. Fox, Multiple types of neuronal calcium channels and their selective modulation. Trends Neurosci. 11, 431-438 (1988). 678. T. Y. Tsong, D.-S. Liu, F. Chauvin, A. Gaigalas and R. D. Astumian, Electroconformational coupling (ECC): an electric field induced enzyme oscillation for cellular energy and signal transductions. Bioelectrochem. Bioenerg. 21, 319-331 (1989). 679. A. M. Turing, Computing machinery and intelligence. Mind Vol. 59, No. 236 (1950). Reprinted in Minds and Machines, Ed. A. R. Anderson (PrenticeHall, Englewood, NJ, 1964). Excerpted in The Mind's I: Fantasies and Reflections on Self and Soul, Eds. D. R. Hofstadter and D. C. Dennett (Basic Books, New York, 1981). 680. H. Urban, What can a flawed test tell us, anyway? Newsweek 138(8), 9 (2001). 681. P. van Inwagen, An Essay on Free Will (Oxford University Press, Oxford, 1983). 682. A. van Oudenaarden and S. G. Boxer, Brownian ratchets: molecular separations in lipid bilayers supported on patterned arrays. Science 285, 10461048 (1999). 683. A. Vander, J. Sherman and D. Luciano, Human Physiology: The Mechanisms of Body Function, 8th edition (McGraw Hills, Boston, New York and London, 2001). 684. Gy. Varo and J. K. Lanyi, Thermodynamics and energy coupling in the bacteriorhodopsin photocycle. Biochemistry 30, 5016-5022 (1991). 685. M. Velmans, Is human information processing conscious? Behavioral and Brain Sciences 14, 651-726 (1991). 686. G. Veneziano, The myth of the beginning of time. Sci. Am. 290(5), 54-65 (2004). 687. P. E. Vernon, Ed., Creativity: Selected Readings (Penguin Books, Harmondsworth, Middlesex, UK, Baltimore, MD, and Victoria, Australia, 1970). 688. P. E. Vernon, The nature-nurture problem in creativity, in Handbook of Creativity, Eds. J. A. Glover, R. R. Ronning and C. R. Reynolds (Plenum, New York and London, 1989), pp. 93-110. 689. C. R. Viesti, Jr., Effect of monetary rewards on an insight learning task. Psychonomic Sci. 23, 181-183 (1971). 690. J. von Neumann, Theory of Self-Reproducing Automata, edited and completed by A. W. Burks (University of Illinois Press, Urbana, IL, and London, 1966). 691. J. von Neumann and O. Morgenstern, Theory of Games and Economic Be-
Bicomputing Survey II
571
havior, 3rd edition (Princeton University Press, Princeton, NJ, 1953). 692. N. N. Vsevolodov, Biomolecular Electronics: An Introduction via Photosensitive Proteins (Birkhauser, Boston, Basel and Berlin, 1998). 693. N. N. Vsevolodov, A. B. Druzhko and T. V. Djukova, Actual possibilities of bacteriorhodopsin application in optoelectronics, in Molecular Electronics: Biosensors and Biocomputers, Ed. F. T. Hong (Plenum, New York and London, 1989), pp. 381-384. 694. N. N. Vsevolodov and T. V. Dyukova, Retinal-protein complexes as optoelectronic components. Trends Biotechnol. 12, 81-88 (1994). 695. R. Wagner, Die Meistersinger von Nurnberg (C. F. Peters, Leipzig, ca. 1910). Reprinted (Dover, New York, 1976). 696. M. M. Waldrop, Complexity: The Emerging Science at the Edge of Order and Chaos (Simon and Schuster, New York, London, Toronto, Sydney, Tokyo and Singapore, 1992). 697. G. Wallas, The Art of Thought (Harcourt, Brace and Company, New York, 1926). 698. H. Walter, Neurophilosophy of Free Will: From Libertarian Illusions to a Concept of Natural Autonomy, translated by C. Klohr (MIT Press, Cambridge, MA, and London, 2001). Original German version: Neurophilosophie der Willensfreiheit, 2nd edition (Mentis Verlag, Paderborn, 1999). 699. E. Wasserman, Behaviorism, in The MIT Encyclopedia of the Cognitive Sciences, Eds. R. A. Wilson and F. C. Keil (MIT Press, Cambridge, MA, and London, 1999), pp. 77-80. 700. J. Wasserstein, L. E. Wolf and F. F. LeFever, Eds., Adult Attention Deficit Disorder: Brain Mechanisms and Life Outcomes, Annal. NY Acad. Sci. Vol. 931 (New York Academy of Sciences, New York, 2001). 701. T. Watanabe, J. E. Nanez and Y. Sasaki, Perceptual learning without perception. Nature 413, 844-848 (2001). 702. J. B. Watson, Behaviorism, revised edition (University of Chicago Press, Chicago, 1930). 703. H. H. Weetall, Covalent coupling methods for inorganic support materials. Meth. Enzymol. 44, 134-148 (1976). 704. D. M. Wegner, The Illusion of Conscious Will (MIT Press, Cambridge, MA, and London, 2002). 705. S. Weinberg, Unified theories of elementary-particle interaction. Sci. Am. 231(1), 50-59 (1974). 706. R. W. Weisberg, Creativity: Beyond the Myth of Genius (W. H. Freeman, New York, 1993). 707. R. W. Weisberg, Creativity and knowledge: a challenge to theories, in Handbook of Creativity, Ed. R. J. Sternberg (Cambridge University Press, Cambridge, New York and Melbourne, 1999), pp. 226-250. 708. P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting (John Wiley and Sons, New York, Chichester, Brisbane, Toronto and Singapore, 1994). 709. M. Wertheimer, Productive Thinking, enlarged edition (University of Chicago Press, Chicago, IL, 1982). Original edition (Harper and Row, New
572
F. T. Hong
York, 1945). 710. M. Wertheimer, A Gestalt perspective on computer simulations of cognitive processes. Computers in Human Behavior 1, 19-33 (1985). 711. J. B. West, Thoughts on teaching physiology to medical students in 2002. The Physiologist 45, 389-393 (2002). 712. K. M. West, The case against teaching. J. Medical Education 41, 766-771 (1966). 713. T. G. West, In the Mind's Eye, updated edition (Prometheus Books, New York, 1997). 714. G. M. Whitesides and B. Grzybowski, Self-assembly at all scales. Science 295, 2418-2421 (2002). 715. K. Wiesenfeld and F. Moss, Stochastic resonance and the benefits of noise: from ice ages to crayfish and SQUIDs. Nature 373, 33-36 (1995). 716. M. Wilchek and E. A. Bayer, Eds., Avidin-Biotin Technology, Methods in Enzymology, Vol. 184 (Academic Press, San Diego, New York, Boston, London, Sydney, Tokyo and Toronto, 1990). 717. R. S. Wilcox and R. R. Jackson, Cognitive abilities of araneophagic jumping spiders, in Animal Cognition in Nature: The Convergence of Psychology and Biology in Laboratory and Field, Eds. R. P. Balda, I. M. Pepperberg and A. C. Kamil (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1998), pp. 411-434. 718. S. Wilkinson, Chemistry laureates help celebrate Nobel centennial. ACS Chemical and Engineering News 78(49), 75-76 (2000). 719. E. O. Wilson, Consilience: The Unity of Knowledge (Vintage Books, New York, 1999). 720. R. Wiltschko, D. Nohr and W. Wiltschko, Pigeons with a deficient sun compass use the magnetic compass. Science 214, 343-345 (1981). 721. W. Wiltschko and E. Gwinner, Evidence for an innate magnetic compass in Garden Warblers. Naturwissenschaften 61, 406 (1974). 722. W. Wiltschko and R. Wiltschko, Magnecitc compass of European robins. Science 176, 62-64 (1972). 723. W. Wiltschko and R. Wiltschko, Magnetic orientation in birds, in Current Ornithology, Vol. 5, Ed. R. F. Johnston (Plenum, New York and London, 1988), pp. 67-121. 724. W. Wiltschko and R. Wiltschko, The navigation system of birds and its development, in Animal Cognition in Nature: The Convergence of Psychology and Biology in Laboratory and Field, Eds. R. P. Balda, I. M. Pepperberg and A. C. Kamil (Academic Press, San Diego, London, Boston, New York, Sydney, Tokyo and Toronto, 1998), pp. 155-199. 725. W. Wiltschko, R. Wiltschko and W. T. Keeton, Effects of a "permanent" clock-shift on the orientation of young homing pigeons. Behav. Ecol. Sociobiol. 1, 229-243 (1976). 726. S. J. Wind, J. Appenzeller, R. Martel, V. Derycke and Ph. Avouris, Vertical scaling of carbon nanotube field-effect transistors using top gate electrodes. Appl. Phys. Lett. 80, 3817-3819 (2002). 727. S. Wolfram, Computer software in science and mathematics. Sci. Am.
Bicomputing Survey II
573
251(3), 188-203 (1984). 728. S. Wolfram, Theory and Applications of Cellular Automata (World Scientific, Singapore, 1986). 729. S. Wolfram, A New Kind of Science (Wolfram Media, Champaign, IL, 2002). 730. W. B. Wood, Left-right asymmetry in animal development. Annu. Rev. Cell Dev. Biol. 13, 53-82 (1997). 731. M. J. Wooldridge and M. Veloso, Eds., Artificial Intelligence Today: Recent Trends and Developments, Lecture Notes in Artificial Intelligence, No. 1600 (Springer-Verlag, Berlin, Heidelberg and New York, 1999). 732. S. Wright, The roles of mutation, inbreeding, crossbreeding and selection in evolution, in Proceedings of the 6th International Congress of Genetics 1, 356-366 (1932). 733. F. M. Wuketits, The philosophy of Donald T. Campbell: a short review and critical appraisal. Biology and Philosophy 16, 171-188 (2001). 734. R. H. Wurtz, M. E. Goldberg and D. L. Robinson, Behavioral modulation of visual responses in the monkey: stimulus selection for attention and movement, in Progress in Psychobiology and Physiological Psychology, Vol. 9, Eds. J. M. Sprague and A. N. Epstein (Academic Press, New York, London, Toronto, Sydney and San Francisco, 1980), pp. 43-83. 735. R. R. Yager, S. Ovchinnikov, R. M. Tong and H. T. Nguyen, Fuzzy Sets and Applications: Selected Papers by L. A. Zadeh (John Wiley and Sons, New York, Chichester, Brisbane, Toronto and Singapore, 1987). 736. R. R. Yager and L. A. Zadeh, Eds., Fuzzy Sets, Neural Networks, and Soft Computing (Van Nostrand Reinhold, New York, 1994). 737. R. M. Yerkes and J. D. Dodson, The relation of strength of stimulus to rapidity of habit-formation. J. Comparative Neurol. and Psychol. 18, 459482 (1908). 738. B. Yu, W. Zhang, Q. Jing, R. Peng, G. Zhang and H. A. Simon, STM capacity for Chinese and English language materials. Memory and Cognition 13, 202-207 (1985). 739. R. J. Zatorre and I. Peretz, Eds., The Biological Foundations of Music, Annal. NY Acad. Sci. Vol. 930 (New York Academy of Sciences, New York, 2001). 740. E. Zermelo, Ueber einen Satz der Dynamik und die mechanische Warmetheorie. Annal. der Physik 57, 485-494 (1896). 741. G. Zhang and H. A. Simon, STM capacity for Chinese words and idioms: chunking and acoustical loop hypotheses. Memory and Cognition 13, 193201 (1985).
CHAPTER 3 MODELS FOR COMPLEX EUKARYOTIC REGULATORY DNA SEQUENCES
Uwe Ohler Massachusetts Institute of Technology, USA ohler@mit. edu Martin C. Frith Institute for Molecular Bioscience, University of Queensland,/Australia and RIKEN Yokohama Institute, Japan [email protected]
1. Introduction Most cells of a multi-cellular organism contain all genetic information at all times, but only a fraction of it is active in a given tissue at any one time. The concerted and differentiated expression of genes is necessary for the existence of complex living beings with an intricate development that requires precise control of the expression of information. Understanding the regulation of gene expression is therefore undoubtedly one of the most interesting challenges in molecular biology today, but one in which many things are still unclear. We are only beginning to elucidate the impressive logic and organization of tightly interwoven players that a cell uses to determine the active state of every component in it37>84>153. About 10% of human and Drosophila genes are estimated to be used only to control the expression of other genes1'140. To enable subtle patterns of gene expression, control mechanisms appear at many different levels. One of the most important control levels is the first step of gene expression, the transcription of a gene into the working copies of the genes, the messenger-RNAs. Here, the transcriptional machinery of the cell binds to regulatory DNA sequences often called promoters, 575
576
U. Ohler & M. C. Frith
ultimately recognizing and opening up the beginning of a gene, thus controlling the synthesis of mRNAs. It is intuitively clear that errors occurring in this machinery, leading to mis-expression of genes, are an important link in genetically based diseases. It is therefore important to find the exact regulatory regions to be able to examine them in detail, either computationally or by experiments, and learn the mechanisms that control the expression of genes. On the other hand, an evaluation of which features improve the quality of predictions can help to understand the mechanisms by which gene regulation is actually organized in the cell. The following chapter deals with the computer-based identification of transcriptional regulatory regions, and focuses on the eukaryotic genes transcribed by the RNA polymerase II enzyme. Regulatory regions of other eukaryotic genes, as well as those of prokaryotic organisms, have a very different organization and will not be regarded further. After a short overview of the biological concepts, we focus on modeling (1) the proximal or core promoter region, and (2) distal cis-regulatory sequences. We restrict ourselves to the detection of regulatory regions in the genome; a related problem which is not discussed deals with the identification of common putative regulatory sequence elements in a group of co-regulated genes102'143. Other recent reviews have also covered some of this material11'118'146'151. 2. Some Biology of Transcription Regulation The promoter of a protein encoding gene can be divided into a core promoter, a proximal promoter region, and distal enhancers, all of which contain transcription factor binding sites (TFBS), short 5-15 nucleotide long DNA sequence patterns that are targeted by specific auxiliary proteins called transcription factors. Transcription initiation by the RNA polymerase II (pol-II) enzyme is regulated by those factors interacting with TFBSs, pol-II, and also with each other, and by an open chromosomal structure that enables the factors to access the DNA. 2.1. The Basal Transcription
Machinery
The common and best characterized part of promoters is the core promoter which is responsible for guiding the polymerase to the correct transcription start site (TSS). Accurate initiation of transcription depends on assembling a complex containing pol-II and at least six general transcription factors, which have been identified over the past 20 years (see the book by Latchman79 and recent reviews64'128). This complex machinery is generally
Complex Eukaryotic Regulatory DNA Sequences
577
preserved in all eukaryotic species. The first binding site of a general transcription factor to be identified was an AT-rich sequence element around position -30. This so-called TATA box is a target of transcription factor (TF) IID, or more specifically, of one component of TFIID, the TATA binding protein (TBP). TFIID contains at least a dozen other components known as TAFs (TBP associated factors) which also interact, directly or indirectly, with other sequence elements. Upon binding of TBP to a TATA box, the DNA is strongly distorted, and sequences up- and downstream of the TATA box are brought in close proximity. Transcription factor IIA stabilizes this complex, which is then recognized by transcription factor IIB. This orients the growing complex towards the transcription start site and maybe guides the polymerase to the exact start position. Additional TFs and finally pol-II itself are recruited by protein-protein interactions. This sequence is far from a general mechanism of pol-II recruitment. For example, the promoters of house-keeping genes (i.e. genes that are always "switched on") do not contain anything resembling the TATA box. In these cases, the binding of TFIID is mediated by the initiator sequence element right at the TSS, which is not only present in TATA-less promoters. Some of the TAFs are able to recognize this element directly29. Furthermore, so-called TBP related factors (TRFs) substitute for TBP in the TFIID complex and contribute to the activation of specific subsets of genes. TRF1 binds to TATA-box like sequences in Drosophila; TRF2 is present in vertebrates as well but does not show sequence specific interactions. Instead, a study on fly genes showed that it occurs in a complex with the DNA replication element binding factor (DRE factor65), and a computational analysis identified the DRE as one of the most prevalent sequence elements in Drosophila core promoters105. However, contrary to other core promoter sequence elements, its location appears to be unrestricted relative to the start site. In Drosophila as well as vertebrates, sequences downstream of the initiator were also found to have influence on basal transcription activity. A number of short sequence patterns are significantly over-represented in downstream sequences of Drosophila6. Experimental evidence for a specific downstream promoter element (DPE77) suggests that the DPE is as widely used as the TATA box but is less well-conserved. Its core motif is located exactly from 28 to 33 base pairs downstream of the TSS and is recognized by two distinct TAFs. A striking preference for the initiator consensus in promoters that contain a DPE suggests a strong co-dependency of both
578
U. Ohler & M. C. Frith
elements. Figure 1 shows two different interactions in TATA versus DPE containing promoters. A second distinct downstream element called MTE, located at positions 17-22, has been computationally predicted and experimentally verified86'105, and is also thought to interact with parts of TFIID. Despite evidence for downstream vertebrate elements, current knowledge suggests that DPE and MTE play a less important role in these organisms. In general, eukaryotic genes are transcribed individually. In C. elegans however, about 10% of all genes are found as parts of polycistronic transcripts containing multiple genes under the control of a common promoter19. Some dicistronic transcripts are known in Drosophila as well98. On the other hand, genes may be controlled by several alternative promoters, leading to transcript initiation from multiple distinct start sites. It should finally be noted that in TATA-less promoters, different transcription starts from several neighboring bases have been observed, and a transcription start "site" as such does not exist. If the promoter elements are not well conserved, it is possibly a general rule that the transcription start varies within a limited range.
Fig. 1. Interaction of TFIID with the core promoter elements. Two distinct interactions with TATA-driven (by TBP) and DPE-driven (by TAFs 60 and 40) promoters are shown in this model 77 . (Inr: initiator)
Sequence Patterns in the Core Promoter The most prevalent sequence patterns by which interactions with transcription factors occur in the core promoter, and which can be exploited in a computational promoter finding system, are the TATA box, the initiator, DRE, and the two downstream promoter elements. These are all known to be directly targeted by TFIID components. Bucher22 was the first to systematically study the patterns of TATA box and initiator in vertebrates, and this approach was extended to sequence elements of Drosophila, including DPE6. Several studies found that the TATA box is present in at most 40-50% of Drosophila and vertebrate promoters6'77'105'139. Other sequences
Complex Eukaryotic Regulatory DNA Sequences
579
show organism specific patterns of conservation: the initiator is much better conserved in fly promoters compared to mammals, and the DPE is much more frequent in Drosophila and appears to play a role as a downstream counterpart of the TATA box. Hence, the machinery of transcription is well conserved throughout the whole eukaryotic kingdom, but the ways in which it is employed in transcription regulation are not. This makes it vital to use different models for the prediction of promoters in different organisms. TFIIA and TFIIB also have direct contact with DNA, but it has been widely believed that these contacts are not sequence-specific. More recent experiments78 suggest that TFIIB binding in humans is at least partly influenced by a sequence motif directly upstream of the TATA box, but this is not well characterized so far. Detailed studies of sequence motifs in Drosophila6'77'105 revealed the TATA, initiator, DPE, and MTE motifs, but failed to detect any motif resembling the TFIIB response element. This sequence pattern consists almost exclusively of guanines and cytosines, and human promoters have a very high overall GC content. This gives rise to the suspicion that the TFIIB element may only reflect the overall human promoter sequence composition. Why is the core promoter architecture so variable, given that it should only serve to recruit the polymerase to the start of the gene? Some of the variability serves to regulate different subsets of genes, e.g. through the usage of TBP and TRFs. In addition, different core promoter architectures allow for communicating selectively with distal regulatory regions and specific transcription factors25. As an example of a synergistic interaction of two transcription factors with two TAFs142, the two Drosophila factors bicoid and hunchback interact with specific TAFs and lead independently to an already improved transcription activation, which is nonlinearly increased when both factors are present. Therefore, the complex structure of TFIID, and of the core promoter regions, is not only necessary to bind to the DNA and recruit the polymerase, but rather serves as a modular machinery that offers a vast number of possibilities to interact with and integrate information from different regulatory modules. 2.2. Chromatin Structure in Regulatory Regions The large number of genes found in eukaryotic genomes would render it very impractical should all of them compete for the components of the basal transcription machinery at the same time. Most of them are transcribed only inside a specific tissue or under rarely occurring circumstances. Evolution
580
U. Ohler & M. C. Frith
has found a way to effectively shut down large regions of the genome that are not needed at a particular time: Eukaryotic DNA is divided into several molecules, the chromosomes, and wrapped up in chromatin. Chromatin consists of the DNA itself and protein complexes, mainly histones, around which the DNA is coiled up forming nucleosomes. This tight packing is able to regulate the accessibility of regions in the genome at a high level, and provides a means by which all cells of a tissue stay committed to expressing the same genes. Methylation and CpG Islands In vertebrates, open chromatin structure is closely associated with DNA methylation: some cytosines are chemically modified and bear an additional methyl group. In 90% of the cases, methylation occurs in cytosines that are part of the dinucleotide CG. It was found that some CG sites are always methylated whereas for others, this pattern keeps changing in a tissue-specific manner, and active genes appear to be unmethylated. It was postulated4 that the upstream regions of all constitutively (i.e., constantly) expressed genes, and also a substantial portion of other genes, are correlated with clusters of CG dinucleotides, so-called CpG islands54. Methylated CG dinucleotides are a hot spot for mutations in which the cytosine is wrongly replaced by a thymine, which over the course of time leads to CG depleted regions. Indeed, the CG dinucleotide occurs much less frequently in vertebrate genomes than expected from the mono-nucleotide composition. CpG islands therefore hint at generally lowly methylated regions. DNA methylation has no direct effect on the chromatin structure, but evidence for specific protein interactions with methylated regions that are associated with chromatin structure has been reported18. DNA methylation is also known to be stable during cell division because the CG dinucleotide on the opposite strand of a methylated one is also methylated. Methylation can therefore explain the stable commission to certain groups of active genes within specific tissues. Histone Modifications and DNA Structure Methylation can explain a tissue-specific commitment in vertebrates, but in invertebrates such as Drosophila it hardly occurs92. In some vertebrate cases, differences in methylation between expressing and nonexpressing tissues can also not be detected. Other features of active chromatin structure concern biochemical modifications of the histones. Histone modifications can affect histone association with each other, the DNA, or protein interaction partners. One component of the TFIID enzyme as well
Complex Eukaryotic Regulatory DNA Sequences
581
as other transcription factors have the ability to acetylate histones which leads to an opening of chromatin. The opposite case of de-acetylation and therefore a negative effect on regulation is also observed. In either way, chromatin structure can thus be changed. Further studies showed that the DNA in the regulatory regions of active genes is very sensitive to DNA digestion, an assay to measure the openness of a region. These hypersensitive sites are a result of either loss or modification of nucleosomes. They are not only concentrated in the core promoter region, but also exist in other regulatory regions described below. Transcription factors binding to those regions are often associated with proteins that have the capability of displacing or modifying the nucleosomes. A common mechanism in gene regulation is therefore the attraction of nucleosome displacing factors which enables the binding of other factors and finally the polymerase itself. The DNA in promoter regions is furthermore likely to exist in a different conformation, the so-called Z-DNA. This conformation occurs in DNA with alternating purine and pyrimidine nucleotides and offers an increased accessibility to the single strands of DNA, which means that it is easier for proteins to interact with Z-DNA than with normal DNA. 2.3. Specific Gene Regulation: Sequence Elements and Transcription Factors We have so far dealt with the basal transcription machinery, describing how the transcription start site is recognized, and with the chromatin structure that enables the access to genes in the first place. In eukaryotes, where most mRNA transcripts encode for only one gene, this cannot explain how genes whose protein products are needed in parallel are co-regulated. Coordinately expressed genes do often not even reside at close positions in the genome, but rather on different chromosomes. Britten and Davidson21 published an early working model of such coordinated gene expression that, at an abstract level, still holds (Figure 2). They proposed that genes regulated in parallel, in response to a particular signal, contained a common regulatory element which caused the activation of these genes. Moreover, genes could contain more than one element, each shared with a different group of genes. A signal would then act by stimulating a specific "integrator gene" whose product would interact with a specific sequence element in several genes at once. A gene would finally be activated if all its sequence elements had been "switched on" by integrator gene products. In current terminology, the integrator gene is considered
582
U. Ohler & M. C. Frith
Fig. 2. The Britten and Davidson model for coordinated gene regulation 21 . 79 . Sensor elements A, B, and C detect changes that require a different expression and therefore switch on appropriate integrator genes x, y, and z. The products of genes x, y, and z then interact with control elements, coordinately switching on appropriate genes P, Q, and R. Alternatively, x, y, and z can be proteins undergoing a conformational change under the presence of specific signals which enables them to interact with the control elements.
as encoding a transcription factor which binds to regulatory sequence elements and activates or suppresses a specific group of genes37. Adding to the original theory, a transcription factor can be activated not only by de novo synthesis but also by changing the inactive state of the existing protein into an active one by means of post-transcriptional regulation. Just to illustrate how biology defeats generalization, we note that some parasitic, single-celled eukaryotes appear to have very few specific transcription factors (Plasmodium35), and regulate most of their genes post- rather than pre-transcriptionally (trypanosomes32). The Proximal Promoter Region Many of the regulatory elements serving as transcription factor targets are located in the proximal promoter region, i.e. directly upstream of the core promoter. In particular, binding sites in yeast are usually located within a few hundred base pairs upstream of their target gene, and they are sometimes termed upstream activating sequences. These factors can influence (suppress as well as assist) the binding of the core promoter components as well as the chromatin structure, or both at the same time. The first group thus interacts with components of the general initiation factors,
Complex Eukaryotic Regulatory DNA Sequences
583
such as the TATA box binding protein or its associated factors, or assists the binding of other factors which then interact with the basal machinery. This only works when considering the 2-d or 3-d DNA structure : in a linear DNA sequence, the binding sites are too far away from each other to enable direct contacts of their TFs. Some transcription factors work in a non-specific way, i.e. they merely serve to increase the production rate of the basal machinery and can thus be found in a variety of genes. Sequence elements that interact with these factors are the CCAAT and GC boxes in vertebrates, or the GAGA box in Drosophila. Other factors work in a very specific way and are contained in only a small number of promoters. Transcription elements can be present in several copies in one promoter and are sometimes organized in dyad symmetry, i.e. one of two identical sequence parts is contained on the sense and the other on the anti-sense strand, thus forming a palindrome. For geometric reasons, homodimeric transcription factors tend to bind sequences that exhibit dyad symmetry. Orientation therefore does not matter, but this is also true for many non-palindromic patterns. Enhancers, Silencers and Locus Control Regions Animal genes often have regulatory sequences in the proximal region, but sequences as far away as many kilobases can have a major influence on transcription. These enhancers or silencers can be upstream or downstream of the gene or in introns, and can be separated from their target promoter by intervening genes26. A particular enhancer can affect more than one promoter, and can exert its influence also on the transcription of other genes when transferred into their neighborhood. Enhancers can even regulate homologs of their target genes on a different chromosome in trans, in a process termed transvection41. As with many TFBSs, the orientation of the sequence, i.e. whether it is on the sense or anti-sense strand, is also not important for its functionality. Enhancers and silencers often exhibit a tissue-specific activity, and are able to modulate the activity of transcription by up to three orders of magnitude. They are often composed of the same sequence elements found in proximal promoters that mediate tissuespecific expression. Like transcription factors binding to promoters, factors binding to enhancer elements influence gene expression both by changing the chromatin structure and by interaction with proteins of the transcription apparatus. Because of the very large distance of the enhancers from the affected promoters, the second mechanism is especially puzzling, and the most commonly accepted explanation in concordance with experimental
584
U. Ohler & M. C. Frith
results involves the looping out of intervening DNA. The cis-regulatory DNA sequences that control gene expression are often organized into discrete modules7. The most impressive study in this respect concerns the elucidation of the promoter modules of the sea urchin gene endol154. Another classic example is provided by the fruit fly gene even-skipped. This gene is expressed in seven stripes in the fly embryo, under the control of five enhancers. The eve 1, eve 2, and eve 5 enhancers drive expression in the first, second and fifth stripes, and the remaining two enhancers drive expression in pairs of stripes (3+7 and 4+6) 34 . There is not always such a clean connection between sequential and functional continuity, however. The fruit fly gene hairy is expressed in similar stripes to even-skipped, but the regulatory sites that control its expression are largely interspersed with one another7. All of these elements are thought to function as arrays of binding sites for regulatory proteins. Enhancers are of the order of 500 base pairs long, and experimental dissection typically identifies about ten binding sites for at least three different transcription factors81. Enhancers tend to interact with transcription factors from a variety of structural families7, which may be necessary for regulatory specificity since proteins from the same structural family often have very similar DNA binding properties121. There are also regulatory sequences that influence the interaction of enhancers with promoters. Some genes have tethering elements near the TSS, which recruit distant enhancers to the promoter81. Insulators are elements that prevent enhancers located on one side from interacting with promoters located on the other side. A high level of control of the expression of several genes at once is achieved by so-called locus control regions (LCRs). These regions were found to be crucial for the activity of all the genes in a cluster, e. g. the a- or /3-globin genes. They act independently of their position and over a large distance, and without them no single promoter in a cluster can attract the polymerase in vivo. As with enhancers, some elements that are present in promoter regions are also found in LCRs, and they are likely to have a long-range influence on the chromatin structure. Several LCRs were also found to contain sequences which are involved in the attachment of chromatin domains to a protein scaffold, the so-called nuclear matrix. An LCR controlled region may therefore constitute one chromosomal loop which is regulated as a single unit. It also serves as an insulator to block the activity of outside enhancers.
Complex Eukaryotic Regulatory DNA Sequences
585
3. Core Promoter Recognition The following section now turns to the identification of regulatory DNA sequences by computational methods. We will at first concentrate on the general identification of proximal promoter regions, before we discuss related approaches to model distal regulatory regions. Existing methods for general promoter prediction can be classified into two different categories, ab initio methods based on single or multiple species, and alignment based methods based on sequence information from ESTs and cDNAs. 3.1. Ab initio
Prediction
Computational methods that aim at the identification of promoters ab initio tackle the task by establishing a model of promoters — and possibly nonpromoters as well — and using this model to search for an unknown number of promoters in a contiguous DNA sequence. Depending on how the model captures promoter features, different sub-groups of ab initio predictors can be distinguished: • Search-by-signal algorithms make predictions based on the detection of core promoter elements such as the TATA box or the initiator, and/or TFBSs outside the core. • Search-by-content algorithms identify regulatory regions by using measures based on the sequence composition of promoter and nonpromoter examples. There are also methods that combine both ideas: looking for signals and for regions of specific composition. To achieve exact promoter localization, a system output should include the prediction of the transcription start site. Search-by-content methods do not provide good TSS predictions because they do not consider any positionally conserved signals. In the following, we will discuss some important publications in more detail. For a detailed description of the underlying algorithms, the reader is referred to text books dealing with machine learning and biological sequence analysis14'42. Search-by-signal Search-by-signal approaches are based on models of specific patterns in promoters, i.e. models of TFBSs. Thereby, either extensive lists of binding sites from transcription factor databases are used, or an exact modeling of the core promoter is attempted.
586
U. Ohler & M. C. Frith
A transcription factor typically interacts with DNA sequences that reflect a common pattern, or motif, characteristic of the factor. Such a motif can be represented by a consensus sequence, or less crudely by a W x 4 matrix q, where W is the motif's size in bases, and each matrix element q{k,X) is the probability of observing nucleotide X (A, C, G, or T) at position k in the motif. It is then possible to scan this matrix along a sequence, assigning a similarity score to each W-long subsequence using a standard log likelihood ratio formula132. Typically, any subsequence with a similarity score above some threshold is counted as a 'match'. These matrices do however not contain sufficient information to locate functional in vivo binding sites accurately: at thresholds low enough to recover genuine binding sites, spurious matches occur at a high rate 111 . Transcription factors must be guided to their in vivo binding sites by contextual factors such as chromatin structure and interactions with other transcription factors, in addition to the innate DNA binding preferences. In any case, knowledge of the motifs is not in itself adequate to elucidate transcriptional control mechanisms. More information on models for single TFBSs can be found in reviews150 and recent publications15'66. The first computational promoter studies concerned specific subclasses of genes31. After it was found that the detection of single patterns such as the TATA box is far from being sensitive and specific enough to locate eukaryotic promoters in general22, the combination of several patterns was pioneered by Prestridge112. After determining the occurrences of known TFBSs compiled from the literature, the hit ratio within promoters and non-promoters is used as a measure of reliability. To look for promoters, scores of binding site hits within a window of 250 bases are summed up, and the sum is increased by a fixed value if a TATA box is observed. A promoter is predicted if the sum exceeds a pre-defined threshold. The main problem of this approach lies in the way the TFBSs are used: they are represented as strings and therefore need an exact match to count, certainly too inflexible for many degenerate binding sites. Owing to these ideas, a later approach30 built a recognition system on all over-represented oligonucleotides, in addition to those weight matrices for frequent TFBSs that have a considerably larger amount of hits within promoters than in non-promoters. Further studies at the same time33'101-109 pioneered the use of hidden Markov models for promoter recognition, modeling either a combination of core promoter elements, or the longer upstream promoter sequences, but no complete systems for promoter recognition were reported.
Complex Eukaryotic Regulatory DNA Sequences
587
The exact modeling of core promoters is in many cases based on artificial neural networks. NNPP116 employs two time-delay networks for the TATA box and the initiator element, which are able to detect a pattern even if it does not occur at a fixed position within the input windows. This is achieved by receptive fields in which the nodes use the same shared weights. The two nets are combined in another net to allow for non-linear weighting of the two patterns. The networks are carefully trained on large sets of representative positive and negative samples. A different architecture73 is based on an ensemble of multi-layer perceptrons for binding sites, and the individual networks are supposed to learn the most prominent sequence patterns in an unsupervised way. Apart from input nodes for sequences, each of the networks has an input node fed with the strongest activation of the other networks on the same input. This prevents the networks from modeling the same sequence motif. The most reliable performance is achieved when initializing the nets with the known motifs of TATA box, initiator, GC and CCAAT elements22. An alternative to neural networks, Eponine40 uses an exhaustive probabilistic approach to discover the most prevalent features and their co-occurrence in mammalian promoters, and uses these in a classifier to scan the genome for potential promoter regions. A surprisingly limited number of features are found to be crucial: a TATA box surrounded by GC rich motifs, with another GC motif upstream. Search-by-content In contrast to the above approaches which model individual binding sites, this group of approaches considers sequence features derived from long promoter and non-promoter sequences irrespective of their relative location, and uses the established model to calculate scores on moving sequence windows. Most promoter sets use regions of 250-300 bases upstream of the TSS, based on the observation that the greatest increase in TFBS density occurs in the region from -200 onwards113. For instance, Audic and Claverie train Markov chain models of different orders to represent promoter and non-promoter sequences8, and classify a sliding window using Bayes' rule. They use only one background Markov chain model trained on both intron and exon sequences, and only sequences from the initiator to -250 are included, therefore neglecting downstream promoter sequences. A similar approach is based on oligomers of length six67. This model distinguishes between two background classes for coding and non-coding sequences, and the promoter sequences range from -300 to the start site. Two discriminative measures for enrichment of oligomers in promoters versus
588
U. Ohler & M. C. Frith
background are used for classification. 196 out of the 200 top rating hexamers enriched in promoters versus non-coding sequences contain at least one CG dinucleotide, showing the strong bias for non-methylated regions in the promoters. As an additional restriction, the sequence is assumed to contain exactly one promoter. Discriminative counts are again employed by Promoterlnspector123. It builds a classifier for two classes by exclusively assigning sequence groups to one class if their occurrence ratio in sequences of this class versus the other class exceeds a preset threshold. Sequence groups contain a motif of a certain width plus a spacer of arbitrary nucleotides that may occur within the motif. These classifiers are then used together to classify a sequence window of 100 bases: The window is regarded as promoter if the promoter class has the largest number of group hits for all classes (promoters and the background classes coding, non-coding, and 3' UTR). Neighboring promoter windows are merged and a region is reported if it extends for at least 200 bases. The promoter training sequences range from -500 to +50. The authors noted that their approach consistently causes "shadow predictions", i.e. predictions at the same position on the opposite strand, the detected regions are not strand specific and also quite large roughly between 500 and 2,000 base pairs. This system gained popularity because it was the first one to be applied to the complete sequence of a human chromosome124. As with the earlier discriminative approach67, Promoterlnspector predictions are highly correlated with the presence of CpG islands. Features related to CpG islands are the most prominent characteristics of vertebrate promoter regions, and are thus largely dominating pure search-by-content methods. A different way of search-by-content69 therefore concentrates on CpG island associated promoters only, and uses features that help to distinguish between promoter-associated and other CpG islands. A CpG island is generally defined by a minimum length of 200 bp, a GC content of at least 50%, and a ratio of observed to expected CG dinucleotide frequency of more than 0.6. CpG islands containing a TSS are found to have a greater average length, higher GC content and CG ratio than other CpG islands. The combination of these three features with quadratic discriminant analysis leads to successful recognition. Physical Properties All of the above approaches are exclusively based on sequence properties of DNA and do not exploit physical properties to reflect the distinct chromatin structure observed within promoters. Several scales have been
Complex Eukaryotic Regulatory DNA Sequences
589
published that relate the di- or trinucleotides in a sequence to a real value describing physical DNA properties such as bendability, curvature, duplex stability, or preponderance in regions of Z-DNA conformation or nucleosome association. These scales can be used to transform a DNA sequence into a real-valued profile describing a property along the sequence. The first classifier based on such physical properties of DNA87 deals with E. coli, i.e. prokaryotic, sequences. Profiles for four different properties are computed and divided into five non-overlapping consecutive segments denned by the two sequence elements present in prokaryotic promoters: the -35 and -10 boxes. The mean values of each property in each segment then serves as feature variables for a linear discriminant analysis. The authors achieved a good classification performance on a set of promoters and coding sequences, but did not use their system to look for new promoters in long sequences or integrated sequence with profile features. Property profiles of eukaryotic promoter regions were studied9'110, but the first attempt to classify eukaryotic promoters by means of profile features was published later104. Profiles are calculated for several segments in the promoter region, and the values of each segment are combined with a multi-layer perceptron. Furthermore, principal component analysis provides a means to combine several properties. An independent report82 also used structural features to model invertebrate promoter regions. Combining Signal and Content Many recent approaches use features related to both the signal and content framework. Zhang157 divides a promoter region into several consecutive segments of 30 or 45 bases, ranging from -160 to +80. The sequence within each segment is compared to the sequences within the left and right neighbor segment using discriminative word counts. The mean discriminative count of each segment serve as feature variables and are combined using quadratic discriminant analysis. Zhang does not use non-promoter sequences in his model and describes this predictor as a tool to locate the core promoter within a window of 1,000-2,000 base pairs. The promoter finding system TSSG/TSSW129 combines a TATA box weight matrix, triplet preferences in the initiator, hexamer frequencies in three regions -1 to -100, -101 to -200, and -201 to -300, and hits of transcription factor binding sites by linear discriminant analysis. It therefore integrates specific models for binding sites inside and outside the core promoter with general statistics that are also used by search-by-content approaches. This approach is so far the only one that has been extended to take the cross-species conservation
590
U. Ohler & M. C. Frith
of the features between human and rodent sequences into account130, but the evaluation was carried out on such a small set that an objective measure of the improvement still lies in the future. The McPromoter system103 consists of a generalized hidden Markov model of a promoter sequence, with six interpolated Markov chain submodels106 representing different segments of the promoter sequence from -250 to +50: upstream 1 and 2, TATA box, spacer, initiator, and downstream (see Figure 3). It calculates the best division of a sequence into these six segments during training and evaluation and does not use fixedsized windows like other systems129'157. It is optionally augmented by a set of Gaussian densities describing profiles of DNA properties in the six segments104. The likelihoods of sequence and profile segments are combined in a neural network classifier. As non-promoter classes, two mixture models for coding and non-coding sequences for both strands of the DNA sequence are employed. To localize promoters in genomic sequences, an input window of 300 base pairs is shifted along both sides of the sequence, and the classifier score as well as the position of the initiator segment is stored for each window. After smoothing the score graph along the sequence, local maxima are reported if they exceed a pre-set threshold on the score. The system has been adapted to vertebrate as well as fly sequences. On the original small Drosophila data sets, the use of DNA physical properties lead to significant improvements in classification; however, the effect diminishes when using the currently available larger data set. This mirrors the fact that property profiles are obtained by simple transformations of the DNA sequence, and that given enough data, this information can be extracted from the sequence itself. Hannenhalli and Levy58 combine most of the above sequence features in a predictor including features on sequence composition, core promoter elements, and automatically identified motifs. The features are combined with linear discriminant analysis, which also allows to judge the importance of the features based on the weight assigned to it during the LDA learning. Finally, Dragon Promoter Finder13 looks at the frequency of the most overrepresented oligonucleotides at all positions in proximal promoters versus background non-coding and coding sequences. It encodes them in a feature vector which is then used as input for several neural network classifiers adjusted to different sensitivity settings.
Complex Eukaryotic Regulatory DNA Sequences
591
Fig. 3. Schematic diagram of the McPromoter system. A window is shifted along the sequence and evaluated by models of promoter and background sequence classes. The difference between promoter and best non-promoter class is smoothed and plotted along the sequence coordinates, and local maxima are reported as TSSs.
Gene Finding vs. Transcription Start Finding Because eukaryotic mRNAs usually contain only one transcribed gene, it is tempting to use a suitable promoter model as one state of a probabilistic model for ab initio gene rinding. In this way, the admissible search region is restricted to upstream regions of detected genes, while, on the other hand, reliable promoter recognition could help to recognize the border between two neighboring genes. In practice, this idea is hampered by gene finders' lack of ability to predict the non-coding exons at the 5' and 3' ends of a gene, which do not contain specific patterns. It has also turned out that promoter recognition is a problem that equals if not exceeds the complexity of gene recognition. This does not come as a real surprise if one considers that promoters are located within double stranded DNA in chromatin, whereas the patterns used in gene finding are still present in linear single stranded mRNAs. A simple promoter recognition module as it is used in the Genscan system24 is thus much less reliable than the other modules. Gene finders with integrated cDNA alignment are able to considerably restrict the admissible region of promoter predictions and are more successful 117. In big annotation projects, the algorithms are thus applied independently of each other, and a human curator deals with the combination of the results. Two rather recent approaches for mammalian genomes attempt to combine the best of both worlds, attempting to predict the first exon of a gene (FirstEF38) or the 5' start of a gene (Dragon GeneStartFinder12), instead of the promoter region. These approaches augment the set of features used
592
U. Ohler & M. C. FHth
in full-blown promoter prediction systems by some features related to the often non-coding first exon: length, sequence composition, proximity of a 3' splice site, presence of CpG islands. As expected, the usage of additional information increases the performance of promoter prediction; however, the cell effectively does not use this information, and these systems cannot help us to elucidate the mechanism of transcription regulation. 3.2. Alignment
Approaches
As with computational methods for predicting the intron-exon structure of genes117, a prediction of promoters can be greatly aided by cDNA sequence information. cDNAs are copies of the mRNAs in a cell and can be obtained and sequenced in a high-throughput fashion. However, promoter prediction is complicated by the fact that most cDNA clones do not extend to the TSS. Recent advances in cDNA library construction methods that utilize the 5'-cap structure of mRNAs have allowed the generation of socalled "cap-trapped" libraries with an increased percentage of full-length cDNAs138. Such libraries have been used to map TSSs in vertebrates by aligning the 5'-end sequences of individual cDNAs to genomic DNA137'139. However, it is estimated that even in the best libraries only 50-80% of cDNAs extend to the TSSs135'138, making it unreliable to base conclusions on individual cDNA alignments. A more cautious approach for identifying TSSs requires the 5' ends of the alignments of multiple, independent capselected cDNAs to lie in close proximity105, and a similar approach was used for mammals56. For both mammals and Drosophila, several thousand TSSs have now been mapped experimentally in this way. These large, but far from complete sequence collections have been screened for the occurrence of enriched core promoter motifs105'139, and provide a much richer data set for parameter estimation and improvement of ab initio approaches105. These cDNA alignments can also be used to restrict the admissible search region for core promoter predictions88. More recently, the fusion of captrapping and Serial Analysis of Gene Expression (SAGE) has produced "CAGE", wherein 20 nucleotide long tags from the 5' ends of mRNAs are concatamerized and sequenced on a large scale63'126'149.
Complex Eukaryotic Regulatory DNA Sequences
593
4. Prediction of Regulatory Regions by Cross-species Conservation An often-used principle is that functionally important sequences tend to be conserved across different species. The idea is that evolutionary pressure keeps the regulatory patterns free of mutations, whereas the surrounding DNA without specific function will accumulate mutations. Its popularity has increased with the advent of the complete sequences from several model organisms. Thus many studies have searched for as-regulatory elements by aligning homologous DNA sequences, in particular upstream regions of related genes from two or more organisms, and identifying patches of unusually strong conservation. When comparing distantly related species, such as mammals versus fish, most genomic sequence is not meaningfully alignable, and any meaningful alignment almost certainly indicates a functional element of some sort. Here, "meaningful" means that 1) the alignment is statistically significant, compared to a null model of random sequences, and 2) the alignment is not comprised of so-called low complexity sequence, such as long mononucleotide tracts, that occur frequently in many genomes. When comparing human and mouse, about 40% of each genome is alignable to the other, which is thought to represent all of the genome of their common ancestor that is retained in both species. The remaining 60% represents lineage-specific insertions and deletions, much of it repetitive elements such as LINEs and SINEs. The alignable portion includes many ancestral repeats, which are usually thought to be non-functional148. Thus significant alignments are not sufficient to indicate functional elements: it is necessary to find patches of the alignment that are more conserved than the background rate. This task is complicated by the fact that the "background rate" appears to vary from place to place across the genome62. By comparing the divergence levels of 50 base pair long windows within and outside ancestral repeats, it was estimated that about 5% of each genome is evolving more slowly than ancestral repeats, and is presumably functional148. Unfortunately, this analysis did not reveal which 5%. Since protein-coding elements only comprise about 1.5% of the genome, more than half of the conserved sequences are non-coding and many of these may be czs-regulatory elements. In analogy to "footprinting" wet lab experiments, which digest the bulk of DNA but leave intact the sequences in regions where proteins bind, the identification of conserved functional non-coding DNA has been dubbed phylogenetic footprinting. An early review43 describes phylogenetic foot-
594
U. Ohler & M. C. Frith
printing applications, and other excellent reviews on these approaches in the context of genome wide analyses are available46'111. Phylogenetic footprinting is sometimes carried out by specialized alignment algorithms, as standard local or global alignment approaches do not obey the additional biologically meaningful restrictions that can be imposed: Sequences shall contain conserved blocks that represent one or more binding sites, and very low similarity otherwise70-158. Two practical examples where human-mouse phylogenetic footprinting was one step leading to the identification of regulatory elements have been described61'147. Whether and how comparative genomics can be applied to identifying ds-regulatory sequences depends on their level of conservation and particular architecture. In fact enhancers exhibit highly variable degrees of conservation. For example, an enhancer of the embryonic gene DACH is > 98% identical for 350 base pairs in human, mouse and rat. Moreover, a 120 base pair core region of this enhancer has only six substitutions among human, mouse, rat, chicken, frog, and three species offish20. This level of conservation is much higher than that of typical protein-coding sequences, and hard to understand in terms of an array of degenerate binding sites. On the other hand, an enhancer in the first intron of the fruit fly vnd gene appears to be functionally conserved in mosquito, but there is no simple sequence similarity45. Many human oestrogen-responsive sequences are not notably conserved in mouse". Hence comparisons at a range of evolutionary distances are appropriate for identifying different kinds of enhancer, and some cis-regulatory sequences may not be straightforwardly amenable to detection by comparative genomics. A process known as binding site turnover presents an important limitation to comparative genomics. Individual transcription factor binding sites are so short, and their sequence requirements so weak, that new binding sites can easily arise by random mutation. Destructive mutations in an old binding site may then be tolerated, as the new site takes on the function of the old. In this way, the order of binding sites within an enhancer can become scrambled over time, and simple sequence similarity can be lost, although the enhancer's function may remain unchanged91. A survey of mammalian transcription factor binding sites found that 32%-40% of human sites are not functional in rodents, implying a high degree of turnover at this relatively short evolutionary distance39. Various criteria have been suggested for discriminating regulatory DNA from neutral sites in human-mouse (or similar) comparisons. Sequences with > 100 base pairs and 70% identity have proven to be a reasonable threshold
Complex Eukaryotic Regulatory DNA Sequences
595
in many studies20. Elnitski et al. collected alignments of known enhancers and neutral sequences (ancestral repeats), and systematically searched for criteria that optimally distinguish them. They found that simple alignment quality criteria do not cleanly separate these datasets. They obtained better discrimination by comparing frequencies of different kinds of nucleotide pairing (e.g. transitions) between the datasets. The discrimination improved further when they considered frequencies of short pairing patterns, e.g. hexamer blocks of paired nucleotides44. This final method has been used to assign "regulatory potential scores" across the human genome based on its alignment with mouse and rat (http://genome.ucsc.edu/). Multiple species comparisons may have advantages over pair-wise comparisons. Species too similar for pair-wise analyses such as primates can become informative when considered as a group, because the collective evolutionary divergence between multiple primates is similar to that between humans and mice. This approach identified elements that regulate transcription of the medically important gene apolipoprotein (a), which is only exists in humans, apes, and Old World monkeys, and thus cannot be studied by comparing more distant species20. There are likely to be many primate-specific cis-regulatory elements underlying various aspects of primate biology, but on the other hand if the collective evolutionary divergence among primates is as great as that between humans and mice, there may be an equal number of elements that are present in humans and mice but missing in some of the primates: there is no free lunch here. An indisputable " advantage of closely related species is that the sequences can be aligned more accurately. There are numerous examples of conserved sequences that have been tested and found to exhibit cis-regulatory activity, in a variety of animal species5'16-90'155. There are also cases where a conserved element has failed to exhibit cis-regulatory activity, perhaps because it wasn't tested in a suitable cell type, and of enhancers that seem to lack simple sequence conservation. Another limitation of comparative genomics at present is that while it can predict functional elements, it does not employ a model reflecting the function of the elements. A recent study predicts functional similarities among conserved elements by identifying groups of conserved sequences that share similar frequencies of conserved hexanucleotides55. The large set of classification methods for the identification of core promoters, as outlined above, may turn out to be a useful resource to address this problem. Obtaining large genomic sequences from multiple organisms is still an expensive proposition. Fortunately, there is a way of doing comparative
596
U. Ohler & M. C. Frith
genomics without directly obtaining all the sequences. Frazer et al. hybridised DNA from horse, cow, pig, dog, cat, and mouse to high-density oligonucleotide arrays containing probes for 365 kb of human DNA around the SIM2 gene. The spots that light up indicate conserved sequences. Some human sequences were conserved in many other mammals, and some were conserved in a few. Some of the conserved sequences were tested for enhancer function, and surprisingly, the sequences that were conserved in a few mammals were just as likely to be functional as those conserved in many. Furthermore, one of these apparently functional elements is deleted in the chimpanzee genome, and another is deleted in the rhesus macaque47. Thus enhancers seem to evolve in a rather dynamic way, and any pair-wise comparison will miss many functional elements. 5. Searching for Motif Clusters If enhancers are arrays of protein binding sites, the most direct approach to finding them is to take a pre-determined set of protein-binding motifs and search for clusters of them in the genome. The simplest method is to count the number of motif matches within sequence windows of a certain length (e.g. 700 bp), and predict a regulatory element when there are sufficiently many motifs or sufficiently many of each type of motif1T>5T'85'94'115. This approach has identified several bonafideenhancers in the fruit fly genome. Rebeiz et al. and Lifanov et al. explored a range of parameter values (e.g. window length and threshold motif number) and report those that work well for their problem cases85'115. Statistically significant clusters of motif matches can be found by considering the statistics of consecutive Poisson events or r-scans133'144. Statistical model-based techniques have been developed, which find sequence segments that match a model of motif clusters better than they match a null model of random DNA36'49'50'51'114. Hidden Markov models are used to represent motif clusters (Figure 4). A method known as the Forward algorithm can be used to calculate the probability of observing a candidate sequence segment given the cluster model, and this is compared to the probability of observing the segment given a null model of random, independent nucleotides42. This approach uses more information than simple motif counting methods: it takes into account the strength of each motif match (i.e. its degree of similarity to the optimal consensus sequence), and it considers all conceivable matches, no matter how weak. It also combines evidence from overlapping matches in a robust fashion,
Complex Eukaryotic Regulatory DNA Sequences
597
Fig. 4. A hidden Markov model for finding clusters of transcription factor binding motifs. The circles inside the dotted rectangles represent nucleotides within transcription factor binding sites. Every such nucleotide has a characteristic probability of being A, C, G or T. These probabilities are pre-determined from empirical data on the DNA binding preferences of the transcription factor. The topmost circle represents a nucleotide in a spacer region between binding sites: its probability of being A, C, G or T is estimated from the global or local genomic average. The arrows represent transitions between the different types of nucleotide. If a choice of transitions is available, the number beside the arrow indicates the probability that this transition is selected. The parameter a controls the average length of spacer regions between binding sites. The transition probabilities can be estimated by finding the values that make the model best fit a training set of enhancer sequences. Although this figure only shows two types of binding site, an arbitrary number of sites can be added in the obvious fashion. In practice, each type of binding site needs to appear in the model twice, once for each orientation in the DNA duplex.
and allows the optimal size of each cluster to be discovered automatically, rather than specifying a window size in advance. A potential problem is that applying the Forward algorithm to every subsequence requires time proportional to the sequence length squared, making this technique impractical for large genomic sequences. Fortunately, there is a linear-time heuristic implemented in the Cluster-Buster program that effectively mimics the full Forward algorithm50. One drawback of hidden Markov models is that it is hard to assess the statistical significance of the results, i.e. how unlikely it is that a motif cluster would appear in a random sequence just by chance. For the model in Figure 4, estimates of significance can be obtained analytically if we use the
598
U. Ohler & M. C. Frith
Viterbi instead of the Forward algorithm. This means that just the optimal path through the model is considered, rather than all possible paths, and so overlapping and very weak motifs get ignored. The algorithm then has a simple interpretation: we scan the sequence from left to right, keeping a running score which is increased whenever we pass a motif match, and decreased by a constant "gap penalty" otherwise. Subsequences with maximal score increase correspond to motif clusters, and their significance can be evaluated by a variant of the segment score statistics used in BLAST72. This technique was implemented in the COMET program, and it was found that many known enhancers do not contain statistically significant clusters of motifs, given what is known about the types of binding site in them51. Thus either our models of protein binding motifs are inadequate, or more likely these enhancers contain additional signals that we are unaware of. Other classification techniques similar to those used for core promoter identification, such as logistic regression analysis76'89'145 and support vector machines108, can be trained from positive and negative examples to classify windows of DNA sequence as regulatory or non-regulatory, based on motif occurrences and their organization in the sequence. Modellnspector searches for complex models of motif modules with detailed order and spacing rules48. As described above, several techniques for pol-II promoter prediction consider density of transcription factor binding motifs, often in conjunction with other factors11'30'74. MCAST is an extension of the MAST tool, which assesses the significance of multiple motif matches by combining the P-values of the individual matches10. MSCAN finds motif clusters by combining strengths of motif matches within a window in a unique way71. While no detailed comparison of motif cluster detection methods is available, those comparisons that have been performed do not reveal great differences in accuracy at detecting regulatory sequences10-71. The motif cluster approach requires prior knowledge of protein binding motifs. Descriptions of several hundred transcription factor binding motifs are available in collections such as TRANSFAC97 and JASPAR122, with much greater coverage of some taxa (e.g. mammals) than others (e.g. protists). TRANSFAC aims at complete coverage of published data, and thus includes multiple, redundant motifs for some factors, and low- as well as high-quality motifs. JASPAR is a smaller, non-redundant collection with generally high-quality motifs. Since mammals probably have a few thousand transcription factors, only a small fraction of them have known DNA binding preferences. However, transcription factors fall into a limited number of families based on the structures of their DNA binding domains, and
Complex Eukaryotic Regulatory DNA Sequences
599
with the notable exception of zinc finger proteins, transcription factors from the same family tend to bind highly similar target sequences121. Thus the effective coverage of the motif collections is greater than it seems. Searching for clusters of all known protein binding motifs does not seem to predict enhancers successfully; so it is necessary to pick a limited subset of motifs that are suspected to co-occur in ds-regulatory elements. This is probably the most serious limitation of the motif cluster approach. One solution is to gather a set of functionally similar regulatory sequences, and determine which motifs are common to them. There are tools to scan the known protein binding motifs against a set of sequences, and determine which of the motifs are statistically overrepresented in them53 (and references therein). Alternatively, there are many ab initio pattern discovery algorithms that search for novel motifs common to a set of sequences52 (and references therein). The latter methods are more general, but have greater difficulty detecting weak motifs that do not stand out above the background of randomly occurring patterns. Another solution is to scan the known motifs against the whole genome, or a large set of gene upstream regions, and examine which motifs occur close together more often than expected by chance59'75'134. The boundary between de novo motif identification and cluster prediction becomes increasingly blurry, as some of these approaches combine known binding sites with predicted enriched motifs. Comparative genomics and motif cluster searches have complementary advantages and disadvantages. Comparative genomics can identify many kinds of functional element, but provides little if any information on what their function is. Motif cluster searches are generally targeted at limited classes of ds-regulatory elements, but with a specific prediction of function. 6. Perspective Many additional eukaryotic genomes are being sequenced at present, which is shifting comparative genomics towards multiple species comparisons. Tools that combine comparative genomics and motif cluster identification are being developed83'127: the challenge is to combine the strengths and not the weaknesses of each approach. There are experimental techniques to determine the DNA binding preferences of transcription factors in a high throughput fashion23'119. Accurate DNA binding models for every transcription factor in a genome would be an enormous, perhaps paradigmaltering, benefit for the field. Another trend is to look for patterns of organisation among binding sites
600
U. Ohler & M. C. Frith
in an enhancer, rather than treating them as amorphous clusters: a grammar of cis-regulation. In some cases binding sites exhibit periodic spacing preferences, especially multiples of 10 bp, corresponding to complete turns of the DNA double helix68'93. However, no organisational patterns are apparent for cooperating estrogen response elements and Spl binding sites". Erives et al. noted patterns in the spacing and orientation of motifs in four fruit fly neurogenic enhancers, and used them to identify an orthologous enhancer in mosquito, although they had to relax the motif and pattern definitions somewhat45. However, the small number of enhancers available for study and the large number of conceivable organisational patterns prevent these results from being fully convincing at this point. Thus the importance of motif organization in enhancers is not yet clear. In terms of core promoters, the availability of several closely related species will soon show to which extent specific architectures of core promoters (i.e, TATA vs. DPE containing promoters) are conserved, as this is assumed to play a crucial role for core-promoter/enhancer interaction128. Preliminary studies also show that the performance of systems like McPromoter increases significantly on including several core promoter models. Comparative genomics will also help with identifying alternative TSSs and their functional implications in the gene regulatory cascade95'152'156. Experimental methods for identifying ds-regulatory sequences in a highthroughput manner are being developed. Harafuji et al. attached random fragments of sea squirt DNA to a reporter gene, and tested for constructs that produced localized expression patterns in the embryo. They conclude that it is feasible to extend this approach to cover the whole sea squirt genome60. A similar study has been performed with sea urchin27. Many types of cis-regulatory element exhibit hypersensitivity to DNA digesting enzymes: Sabo et al. exploited this property by purifying and sequencing DNA flanking DNasel cut sites. These sequences did not correspond perfectly with DNasel hypersensitive sites, but clustering the sequences based on genomic proximity provided better discrimination120. Chromatin immunoprecipitation (ChIP) can be used to enrich for DNA segments bound by a specific transcription factor. This DNA can then be hybridised to a microarray containing probes for genomic sequence (ChlP-on-chip). This has been done on a whole genome scale in yeast80, but the human genome is much larger and so far only gene upstream regions100 or individual small chromosomes and regions28'96'136 have been placed on the microarray. These initial studies have quite unexpected outcomes, as the binding sites are clearly not as restricted to the upstream proximal region in the widest
Complex Eukaryotic Regulatory DNA Sequences
601
Table 1. Resources for regulatory sequence analysis: Databases of promoter regions and transcription factors; systems to predict core promoters and transcription start sites; and programs to analyze and/or identify composite cis-regulatory regions. | Databases: dbTSS 137 Eukaryotic Promoter Database 125 JASPAR122 MPromDb PromoSer56 Transfac97
| http://dbtss.hgc.jp http://www.epd.isb-sib.ch http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl http://bioinformatics.med.ohio-state.edu/ MPromDb http://biowulf.bu.edu/zlab/promoser http://www.gene-regulation.com
| Core Promoter Finders: Dragon Promoter/ Gene Start Finder 12 ' 13 Eponine40 FirstEF 38 McPromoter 104 Promoterlnspector123
|
http://research.i2r.a-star.edu.sg/promoter/
http: //servlet .sanger. ac.uk: 8080/eponine/ http://rulai.cshl.org/tools/FirstEF/ http://genes.mit.edu/McPromoter.html http://www.genomatix.de/cgi-bin/ promoterinspector/promoted nspector.pl
Cis-regulatory Analysis Tools Cis-analyst17 Cister, COMET, ClusterBuster 49 - 50 ' 51 Fly Enhancer94 MCAST10 ModelGenerator/ Inspector48 MSCAN71 RSAT141 Target Explorer131 TOUCAN 2 ' 3
http://rana.lbl.gov/cis-analyst/ http://zlab.bu.edu/zlab/gene.shtml http://fiyenhancer.org/Main http://mcast.sdsc.edu/ http://www.gsf.de/biodv/modelinspector.html http://tfscan.cgb.ki.se/cgi-bin/MSCAN http://rsat.ulb.ac.be/rsat/ http://trantor.bioc.columbia.edu/Target.Explorer http://www.esat.kuleuven.ac.be/~saerts/software/ toucan.php
sense as is still the common belief. There is no guarantee that biology follows universal rather than ad hoc rules. In the near term, it is therefore unlikely that one method will "solve" cis-regulatory sequence identification. Impressive as they are, none of these experimental techniques are likely to perfectly identify and characterise all czs-regulatory sequences in a genome. They will provide additional input to
602
U. Ohler & M. C. Frith
computational methods rather than supplant them. Combinations of experimental and computational techniques will massively accelerate promoter and enhancer finding, and help us to understand the rules and cellular circuits adopted by living organisms to ensure an accurate control of their genetic program. Currently available online resources for this task are compiled in Figure 1. Acknowledgements MCF would like to thank his Ph.D. mentors Zhiping Weng, Ulla Hansen, and John Spouge. UO thanks his Ph.D. and postdoctoral advisors Heinrich Niemann, Gerry Rubin and Chris Burge for past and current support, and Martin Reese for pushing him in the right direction. We also thank Gabriel Kreiman for detailed comments, and Ben Berman, who should have been the third author. References 1. M. D. Adams, et al., The genome sequence of Drosophila melanogaster. Science 287(5461), 2185-2195 (2000). 2. S. Aerts, et al., Computational detection of cis -regulatory modules. Bioinformatics 19 Suppl 2, II5-II14 (2003). 3. S. Aerts, et al., Toucan: deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31(6), 1753-1764 (2003). 4. F. Antequera and A. Bird, Number of CpG islands and genes in human and mouse. Proc. Natl. Acad. Sci. USA 90(24), 11995-11999 (1993). 5. S. Aparicio, et al., Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Pugu rubripes. Proc. Natl. Acad. Sci. USA 92(5), 1684-1688 (1995). 6. I. R. Arkhipova, Promoter elements in Drosophila melanogaster revealed by sequence analysis. Genetics 139(3), 1359-1369 (1995). 7. M. I. Arnone and E. H. Davidson, The hardwiring of development: organization and function of genomic regulatory systems. Development 124(10), 1851-1864 (1997). 8. S. Audic and J. M. Claverie, Detection of eukaryotic promoters using Markov transition matrices. Comput. Chem. 21(4), 223-227 (1997). 9. V. N. Babenko, et al., Investigating extended regulatory regions of genomic DNA sequences. Bioinformatics 15(7-8), 644-653 (1999). 10. T. L. Bailey and W. S. Noble, Searching for statistically significant regulatory modules. Bioinformatics 19 Suppl 2, II16-II25 (2003). 11. V. B. Bajic and V. Brusic, Computational detection of vertebrate RNA polymerase II promoters. Methods Enzymol. 370, 237-250 (2003). 12. V. B. Bajic and S. H. Seah, Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res. 13(8), 1923-1929 (2003).
Complex Eukaryotic Regulatory DNA Sequences 13. V. B. Bajic, et al., Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics 18(1), 198-199 (2002). 14. P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach. 2 ed: Bradford (2001). 15. P. V. Benos, M. L. Bulyk and G. D. Stormo, Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res. 30(20), 4442-4451 (2002). 16. C. M. Bergman, et al., Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol. 3(12), RESEARCH0086 (2002). 17. B. P. Berman, et al., Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99(2), 757-762 (2002). 18. A. Bird, DNA methylation patterns and epigenetic memory. Genes Dev. 16(1), 6-21 (2002). 19. T. Blumenthal and K. S. Gleason, Caenorhabditis elegans operons: form and function. Nat. Rev. Genet. 4(2), 112-120 (2003). 20. D. Boffelli, M. A. Nobrega and E. M. Rubin, Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5(6), 456-465 (2004). 21. R. J. Britten and E. H. Davidson, Gene regulation for higher cells: a theory. Science 165(891), 349-357 (1969). 22. P. Bucher, Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212(4), 563-578 (1990). 23. M. L. Bulyk, et al., Quantifying DNA-protein interactions by doublestranded DNA arrays. Nat. Biotechnol. 17(6), 573-577 (1999). 24. C. Burge and S. Karlin, Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268(1), 78-94 (1997). 25. J. E. Butler and J. T. Kadonaga, The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16(20), 2583-2592 (2002). 26. V. C. Calhoun and M. Levine, Long-range enhancer-promoter interactions in the Scr-Antp interval of the Drosophila Antennapedia complex. Proc. Natl. Acad. Sci. USA 100(17), 9878-9883 (2003). 27. R. A. Cameron, et al., cis-Regulatory activity of randomly chosen genomic fragments from the sea urchin. Gene Expr. Patterns 4(2), 205-213 (2004). 28. S. Cawley, et al., Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). 29. G. E. Chalkley and C. P. Verrijzer, DNA binding site selection by RNA polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. Embo J. 18(17), 4835-4845 (1999). 30. Q. K. Chen, G. Z. Hertz and G. D. Stormo, PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices. Comput. Appl. Biosci. 13(1), 29-35 (1997). 31. J. M. Claverie and I. Sauvaget, Assessing the biological significance of pri-
603
604
32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49.
U. Ohler & M. C. Frith
mary structure consensus patterns using sequence databanks. I. Heat-shock and glucocorticoid control elements in eukaryotic promoters. Comput. Appl. Biosd. 1(2), 95-104 (1985). C. E. Clayton, Life without transcriptional control? Prom fly to man and back again. Embo J. 21(8), 1881-1888 (2002). P. Clote and R. Backofen, Computational Molecular Biology: An Introduction: John Wiley & Sons (2000). D. E. Clyde, et al., A self-organizing system of repressor gradients establishes segmental complexity in Drosophila. Nature 426(6968), 849-853 (2003). R. M. Coulson, N. Hall and C. A. Ouzounis, Comparative Genomics of Transcriptional Control in the Human Malaria Parasite Plasmodium falciparum. Genome Res. 14(8), 1548-1554 (2004). E. M. Crowley, K. Roeder and M. Bina, A statistical model for locating regulatory regions in genomic DNA. J. Mol. Biol. 268(1), 8-14 (1997). E. Davidson, Genomic Regulatory Systems: Development and Evolution: Academic Press (2001). R. V. Davuluri, I. Grosse and M. Q. Zhang, Computational identification of promoters and first exons in the human genome. Nat. Genet. 29(4), 412-417 (2001). E. T. Dermitzakis and A. G. Clark, Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19(7), 1114-1121 (2002). T. A. Down and T. J. Hubbard, Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12(3), 458-461 (2002). I. W. Duncan, Transvection effects in Drosophila. Annu. Rev. Genet. 36, 521-556 (2002). R. Durbin, et al., Biological sequence analysis: Cambridge University Press (1998). L. Duret and P. Bucher, Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7(3), 399-406 (1997). L. Elnitski, et al., Distinguishing regulatory DNA from neutral sites. Genome Res. 13(1), 64-72 (2003). A. Erives and M. Levine, Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl. Acad. Sci. USA 101(11), 3851-3856 (2004). K. A. Frazer, et al., Cross-species sequence comparisons: a review of methods and available resources. Genome Res. 13(1), 1-12 (2003). K. A. Frazer, et al., Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14(3), 367-372 (2004). K. Freeh, J. Danescu-Mayer and T. Werner, A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. J. Mol. Biol. 270(5), 674-687 (1997). M. C. Frith, U. Hansen and Z. Weng, Detection of cis-element clusters in higher eukaryotic DNA. Bioinformatics 17(10), 878-889 (2001).
Complex Eukaryotic Regulatory DNA Sequences
605
50. M. C. Frith, M. C. Li and Z. Weng, Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31(13), 3666-3668 (2003). 51. M. C. Frith, et al., Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res. 30(14), 3214-3224 (2002). 52. M. C. Frith, et al., Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32(1), 189-200 (2004). 53. M. C. Frith, et al., Detection of functional DNA motifs via statistical overrepresentation. Nucleic Acids Res. 32(4), 1372-1381 (2004). 54. M. Gardiner-Garden and M. Frommer, CpG islands in vertebrate genomes. J. Mol. Biol. 196(2), 261-282 (1987). 55. Y. H. Grad, et al., Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D. melanogaster and D. pseudoobscura. Bioinformatics, (2004). 56. A. S. Halees, D. Leyfer and Z. Weng, PromoSer: A large-scale mammalian promoter and transcription start site identification service. Nucleic Acids Res. 31(13), 3554-3559 (2003). 57. M. S. Halfon, et al., Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12(7), 1019-1028 (2002). 58. S. Hannenhalli and S. Levy, Promoter prediction in the human genome. Bioinformatics 17 Suppl 1, S90-96 (2001). 59. S. Hannenhalli and S. Levy, Predicting transcription factor synergism. Nucleic Acids Res. 30(19), 4278-4284 (2002). 60. N. Harafuji, D. N. Keys and M. Levine, Genome-wide identification of tissuespecific enhancers in the Ciona tadpole. Proc. Natl. Acad. Sci. USA 99(10), 6802-6805 (2002). 61. R. C. Hardison, Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16(9), 369-372 (2000). 62. R. C. Hardison, et al., Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res. 13(1), 13-26 (2003). 63. S.I. Hashimoto, et al., 5'-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22(9), 1146-1149 (2004). 64. A. Hochheimer and R. Tjian, Diversified transcription initiation complexes expand promoter selectivity and tissue-specific gene expression. Genes Dev. 17(11), 1309-1320 (2003). 65. A. Hochheimer, et al., TRF2 associates with DREF and directs promoterselective gene expression in Drosophila. Nature 420(6914), 439-445 (2002). 66. H. Huang, et al., Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification. J. Comput. Biol. 11(1), 1-14 (2004). 67. G. B. Hutchinson, The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Comput. Appl. Biosci. 12(5), 391-398 (1996). 68. I. Ioshikhes, E. N. Trifonov and M. Q. Zhang, Periodical distribution of
606
69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86.
U. Ohler & M. C. Frith
transcription factor sites in promoter regions and connection with chromatin structure. Proc. Natl. Acad. Sd. USA 96(6), 2891-2895 (1999). I. P. Ioshikhes and M. Q. Zhang, Large-scale human promoter mapping using CpG islands. Nat. Genet. 26(1), 61-63 (2000). N. Jareborg, E. Birney and R. Durbin, Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res. 9(9), 815-824 (1999). O. Johansson, et al., Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics 19 Suppl 1, 1169-1176 (2003). S. Karlin and S. F. Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring-schemes. Proc. Natl. Acad. Sd. USA 87(6), 2264-2268 (1990). S. Knudsen, Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics 15(5), 356-361 (1999). Y. V. Kondrakhin, et al., Eukaryotic promoter recognition by binding sites for transcription factors. Comput. Appl. Biosci. 11(5), 477-488 (1995). G. Kreiman, Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. Nucleic Acids Res. 32(9), 2889-2900 (2004). W. Krivan and W. W. Wasserman, A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 11(9), 1559-1566 (2001). A. K. Kutach and J. T. Kadonaga, The downstream promoter element DPE appears to be as widely used as the TATA box in Drosophila core promoters. Mol. Cell Biol. 20(13), 4754-4764 (2000). T. Lagrange, et al., New core promoter element in RNA polymerase IIdependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 12(1), 34-44 (1998). D. Latchman, Gene Regulation: A Eukaryotic Perspective. 4 ed: Nelson Thornes (2002). T. I. Lee, et al., Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594), 799-804 (2002). M. Levine and R. Tjian, Transcription regulation and animal diversity. Nature 424(6945), 147-151 (2003). V. G. Levitsky, et al., Nucleosome formation potential of eukaryotic DNA: calculation and promoters analysis. Bioinformatics 17(11), 998-1010 (2001). S. Levy and S. Hannenhalli, Identification of transcription factor binding sites in the human genome sequence. Mamm. Genome 13(9), 510-514 (2002). H. Li and W. Wang, Dissecting the transcription networks of a cell using computational genomics. Curr. Opin. Genet. Dev. 13(6), 611-616 (2003). A. P. Lifanov, et al., Homotypic regulatory clusters in Drosophila. Genome Res. 13(4), 579-588 (2003). C. Y. Lim, et al., The MTE, a new core promoter element for transcription
Complex Eukaryotic Regulatory DNA Sequences
607
by RNA polymerase II. Genes Dev. 18(13), 1606-1617 (2004). 87. S. Lisser and H. Margalit, Determination of common structural features in Escherichia coli promoters by computer analysis. Eur. J. Biochem. 223(3), 823-830 (1994). 88. R. Liu and D. J. States, Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res. 12(3), 462-469 (2002). 89. R. Liu, R. C. McEachin and D. J. States, Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. Genome Res. 13(4), 654-661 (2003). 90. G. G. Loots, et al., Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288(5463), 136-140 (2000). 91. M. Z. Ludwig, et al., Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403(6769), 564-567 (2000). 92. F. Lyko, DNA methylation learns to fly. Trends Genet 17(4), 169-172 (2001). 93. V. J. Makeev, et al., Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. Nucleic Acids Res. 31(20), 6016-6026 (2003). 94. M. Markstein, et al., Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl. Acad. Sci. USA 99(2), 763-768 (2002). 95. J. A. Martens, L. Laprade and F. Winston, Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature 429(6991), 571-574 (2004). 96. R. Martone, et al., Distribution of NF-kappaB-binding sites across human chromosome 22. Proc. Natl. Acad. Sci. USA 100(21), 12247-12252 (2003). 97. V. Matys, et al., TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374-378 (2003). 98. S. Misra, et al., Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 3(12), RESEARCH0083 (2002). 99. R. O'Lone, et al., Genomic targets of nuclear estrogen receptors. Mol. Endocrinol. 18(8), 1859-1875 (2004). 100. D. T. Odom, et al., Control of pancreas and liver gene expression by HNF transcription factors. Science 303(5662), 1378-1381 (2004). 101. U. Ohler, Polygramme und Hidden Markov Modelle zur DNASequenzanalyse, Friedrich-Alexander-Universitat: Erlangen (1995). 102. U. Ohler and H. Niemann, Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 17(2), 56-60 (2001). 103. U. Ohler, et al., Stochastic segment models of eukaryotic promoter regions. Pac. Symp. Biocomput. 380-391 (2000). 104. U. Ohler, et al., Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17 Suppl 1, S199-206 (2001).
608
U. Ohler & M. C. Frith
105. U. Ohler, et al., Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12), RESEARCH0087 (2002). 106. U. Ohler, et al., Interpolated markov chains for eukaryotic promoter recognition. Bioinformatics 15(5), 362-369 (1999). 107. U. Ohler, Computational Promoter Recognition in Eukaryotic Genomic DNA, Berlin: Logos (2002). 108. P. Pavlidis, et al., Promoter region-based classification of genes. Pac. Symp. Biocomput 151-163 (2001). 109. A. G. Pedersen, et al., Characterization of prokaryotic and eukaryotic promoters using hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 182-191 (1996). 110. A. G. Pedersen, et al., DNA structure in human RNA polymerase II promoters. J. Mol. Biol. 281(4), 663-673 (1998). 111. L. A. Pennacchio and E. M. Rubin, Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2(2), 100-109 (2001). 112. D. S. Prestridge, Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 249(5), 923-932 (1995). 113. D. S. Prestridge and C. Burks, The density of transcriptional elements in promoter and non-promoter sequences. Hum. Mol. Genet. 2(9), 1449-1453 (1993). 114. N. Rajewsky, et al., Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3(1), 30 (2002). 115. M. Rebeiz, N. L. Reeves and J. W. Posakony, SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc. Natl. Acad. Sci. USA 99(15), 9888-9893 (2002). 116. M. G. Reese, Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26(1), 51-56 (2001). 117. M. G. Reese, et al., Genie-gene finding in Drosophila melanogaster. Genome Res. 10(4), 529-538 (2000). 118. S. Rombauts, et al., Computational approaches to identify promoters and cis-regulatory elements in plant genomes. Plant Physiol. 132(3), 1162-1176 (2003). 119. E. Roulet, et al., High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites. Nat. Biotechnol. 20(8), 831835 (2002). 120. P. J. Sabo, et al., Genome-wide identification of DNasel hypersensitive sites using active chromatin sequence libraries. Proc. Natl. Acad. Sci. USA 101(13), 4537-4542 (2004). 121. A. Sandelin and W. W. Wasserman, Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338(2), 207-215 (2004). 122. A. Sandelin, et al., JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32 Database issue, D91-94 (2004).
Complex Eukaryotic Regulatory DNA Sequences
609
123. M. Scherf, A. Klingenhoff and T. Werner, Highly specific localization of promoter regions in large genomic sequences by Promoterlnspector: a novel context analysis approach. J. Mol. Biol. 297(3), 599-606 (2000). 124. M. Scherf, et al., First pass annotation of promoters on human chromosome 22. Genome Res. 11(3), 333-340 (2001). 125. C. D. Schmid, et al., The Eukaryotic Promoter Database EPD: the impact of in silico primer extension. Nucleic Acids Res. 32 Database issue, D82-85 (2004). 126. T. Shiraki, et al., Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Nail. Acad. Sci. USA 100(26), 15776-15781 (2003). 127. S. Sinha, E. Van Nimwegen and E. D. Siggia, A probabilistic method to detect regulatory modules. Bioinformatics 19 Suppl 1, 1292-1301 (2003). 128. S. T. Smale and J. T. Kadonaga, The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449-479 (2003). 129. V. Solovyev and A. Salamov, The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proc. Int. Con}. Intell. Syst. Mol. Biol. 5, 294-302 (1997). 130. V. V. Solovyev and I. A. Shahmuradov, PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res. 31(13), 3540-3545 (2003). 131. A. Sosinsky, et al., Target Explorer: An automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 31(13), 3589-3592 (2003). 132. G. D. Stormo and D. S. Fields, Specificity, free energy and information content in protein-DNA interactions. Trends Biochem. Sci. 23(3), 109-113 (1998). 133. X. Su, S. Wallenstein and D. Bishop, Nonoverlapping clusters: approximate distribution and application to molecular biology. Biometrics 57(2), 420426 (2001). 134. P. Sudarsanam, Y. Pilpel and G. M. Church, Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Res. 12(11), 1723-1731 (2002). 135. Y. Sugahara, et al., Comparative evaluation of 5'-end-sequence quality of clones in CAP trapper and other full-length-cDNA libraries. Gene 263(12), 93-102 (2001). 136. L. V. Sun, et al., Protein-DNA interaction mapping using genomic tiling path microarrays in Drosophila. Proc. Natl. Acad. Sci. USA 100(16), 94289433 (2003). 137. Y. Suzuki, et al., DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res. 30(1), 328-331 (2002). 138. Y. Suzuki, et al., Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA library. Gene 200(1-2), 149-156 (1997). 139. Y. Suzuki, et al., Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res. 11(5), 677-684 (2001).
610
U. Ohler & M. C. Frith
140. R. Tupler, G. Perini and M. R. Green, Expressing the human genome. Nature 409(6822), 832-833 (2001). 141. J. van Helden, Regulatory sequence analysis tools. Nucleic Acids Res. 31(13), 3593-3596 (2003). 142. C. P. Verrijzer and R. Tjian, TAFs mediate transcriptional activation and promoter selectivity. Trends Biochem. Sci. 21(9), 338-342 (1996). 143. J. Vilo and K. Kivinen, Regulatory sequence analysis: application to the interpretation of gene expression. Eur. Neuropsychopharmacol. 11(6), 399411 (2001). 144. A. Wagner, Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15(10), 776-784 (1999). 145. W. W. Wasserman and J. W. Fickett, Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278(1), 167-181 (1998). 146. W. W. Wasserman and A. Sandelin, Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276-287 (2004). 147. W. W. Wasserman, et al., Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26(2), 225-228 (2000). 148. R. H. Waterston, et al., Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520-562 (2002). 149. C. L. Wei, et al., 5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. Proc. Nati Acad. Sci. USA 101(32), 11701-11706 (2004). 150. T. Werner, Identification and functional modelling of DNA sequence elements of transcription. Brief. Bioinform. 1(4), 372-380 (2000). 151. T. Werner, The state of the art of mammalian promoter recognition. Brief. Bioinform. 4(1), 22-30 (2003). 152. Q. Wu, et al., Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res. 11(3), 389-404 (2001). 153. J. J. Wyrick and R. A. Young, Deciphering gene expression regulatory networks. Curr. Opin. Genet. Dev. 12(2), 130-136 (2002). 154. C. H. Yuh, H. Bolouri and E. H. Davidson, Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science 279(5358), 1896-1902 (1998). 155. C. H. Yuh, et al., Patchy interspecific sequence similarities efficiently identify positive cis-regulatory elements in the sea urchin. Dev. Biol. 246(1), 148161 (2002). 156. M. Zavolan, et al., Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res. 13(6B), 1290-1300 (2003). 157. M. Q. Zhang, Identification of human gene core promoters in silico. Genome Res. 8(3), 319-326 (1998). 158. J. Zhu, J. S. Liu and C. E. Lawrence, Bayesian adaptive sequence alignment algorithms. Bioinformatics 14(1), 25-39 (1998).
CHAPTER 4 AN ALGORITHM FOR AB-INITIO DNA MOTIF DETECTION
Enli Huang, Liang Yang, Rajesh Chowdhary, Ashraf Kassim National University of Singapore, Singapore { stuhe, stuyl, rajesh} @i2r. a-star. edu. sg, { ashraf) @nus. edu. sg Vladimir B Bajic Institute for Infocomm Research, Singapore bajicv@i2r. a-star. edu. sg We present here an efficient fast algorithm that aims at detecting statistically significant motifs in a set of unaligned DNA sequences. This algorithm has a resemblance to the expectation maximization (EM) algorithm, but is derived using heuristics and not probabilistic models. The algorithm is implemented in the Dragon Motif Builder system and evaluated on sets of 11,502 mouse and 18,326 human promoter sequences each of length of 1500 bp. The web-server with this algorithm implemented is freely available for academic and non-profit users via http://research.i2r.a-star.edu.sg/DRAGON/MotiLSearch/.
1. Introduction Several methods have been developed to deal with ab-initio motif detection in DNA and protein sequences, as reviewed in refs. [8,9]. Most prominent techniques are based on Gibbs sampling4 and expectation maximization3'6 (EM). However, generally speaking, all methods developed until now are not very effective in applications on very large datasets. We outline here an algorithm that is implemented in Dragon Motif Builder (http://research.i2r.astar.edu.sg/DRAGON/Motif-Search/) which partly solves this problem. This algorithm allows processing of large datasets in acceptable time and detects homogeneous sets of motifs. To demonstrate the usefulness of the algorithm, we also present parts of results from two example where we analyzed: (a) 11,502 promoter sequences from Fantom2 mouse data7 collection, 611
612
Huang et al.
and (b) 18,326 human promoters. 2. Algorithm Our algorithm, implemented in Dragon Motif Builder, resembles the EM process. It makes initial guesses of motifs in the set S of s unaligned sequences, allowing for user definable number d of initial estimates to be made. The information contained in the sets M0,fc, k = l,...,d, of selected motifs is summarized in position weight matrices (pwmotk) which are normalized1. The information content (IC) of pwm = (pij) is calculated according to IC =
2-^2PiJlogpij.
The pwm with the highest IC is selected to proceed with the algorithm. In the next step, the selected pwm is used to reassess the motifs in the set S by scanning sequences from S and matching them with the pwm. The algorithm from ref. [1] is used for calculating scores: score - ^ P i , j ( S > " j . where ® represents operator of matching nucleotide rij with the matrix element pij. The motifs with the highest score in each of the reassessed sequences is included in the new population of motifs. Then, the process is repeated by calculating the pwm of the new motif collection, and the reassessment is made again. It is possible to make the reassessment in only a proportion Pt of the sequences in S. These proportions can include any number of sequences in the range from 1 to d. When the number Sk of sequences included in P^ is less than d, all pwm obtained are memorized with their IC, and the reassessment is made with another selection of Pz of sequences, until all sequences from 5 have been covered. Then, the pwm with the highest IC is selected and used for new scanning of all sequences in S. This process is repeated until there is no significant change in the generated pwm. In the process of selecting the motifs, it is also possible to select the most optimal size of the motif length and users have the option to choose between the two versions of the algorithm. One of these heuristically adjusts the length of motifs and selects the one which results in a motif group that is more coherent and covers more sequences in S, while the other checks all motif lengths within the user specified shortest and longest ones.
Ab-initio DNA Motif Detection
613
Fig. 1. Part of the summary report for the first detected motif in mouse promoters.
3. Experiments The purpose of these experiments is to demonstrate that the algorithm is capable of detecting (a) homogeneous groups of motifs in large subsets of promoters in reasonable time, and (b) new motifs. These experiments are not aimed at comparison with the other algorithms. To evaluate the time required for processing large datasets, we applied this algorithm for the detection of 10 most prominent motifs in a set of 11, 502 promoter sequences covering the range of [-1000,4-500] relative to transcription start sites (TSSs). TSSs have been determined based on PromoSer5 and FIE2.12 tools from the mouse Fantom27 data collection. We evaluated the time needed for the algorithm to generate the top 10 ranking motifs of length 10 nucleotides each, using the forward and reverse complement search. The set size is 34,506,000 nucleotides in a double-stranded search. It took 298 min under Windows XP on a PC, Pentium 4, with a 3GHz processor speed, to generate these top ranked motifs. To the best of our knowledge there is no reported pattern finding algorithm that can handle such volume of data in such a short time. For illustration purposes we present the snapshot of
614
Huang et al.
the fragment of the report file for the first ranked motif group. The summary information of this motif group contains the binding site of androgen receptor (AR) transcription factor. It is present in 5546 promoters and has strong positional bias for the region overlapping the TSS and downstream of it, as can be observed from the rough positional distribution (Figure 1). For the collection of top 10 motif groups, the IC ranges from 12 to 17, suggesting the that motif groups are highly homogeneous. In another experiment, we evaluated the proportion of the new patterns discovered by applying our algorithm. We used the same dataset of mouse promoters and searched for 100 most prominent patterns. Also, we used a similar set of 18, 326 human promoters that contained more than 54 million nucleotides. On average, we found that about 25% of motifs found do not belong to the already known transcription factor binding sites and represent potentially new binding sites in the analyzed promoters. References 1. V. B. Bajic, A. Chong, S. H. Seah, and V. Brusic. Intelligent System for Vertebrate Promoter Recognition. IEEE Intelligent Systems Magazine, 17(4):6470, 2002. 2. A. Chong, G. Zhang, and V. B. Bajic. FIE2: A program for the extraction of genomic DNA sequences around the start and translation initiation site of human genes. Nucleic Acids Res. 31(13):3546-53, 2003. 3. C. E. Lawrence, and A. A. Reilly. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins, 7:41-45, 1990. 4. C. E. Lawrence, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 262:208-14, 1993. 5. A. S. Halees, and Z. Weng. PromoSer: improvements to the algorithm, visualization and accessibility. Nucleic Acids Res. 32(Web Server issue):W191-4, 2004. 6. G. J. McLachlan, and T. Krishnan. The EM Algorithm and Extensions. John Wiley and Sons, Inc., 1997 7. Y. Okazaki et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 420:563-73, 2002. 8. A. Sandelin. In silico prediction of cis-regulatroy elements. Ph.D., Karolinska Instituet, Stockholm, 2004. 9. G. Thijs. Probabilistic methods to search for regulatory elemnets in sets of coregulated genes. Ph.D., Kathoileke Universitieit Leuven, Leuven, 2003.
CHAPTER 5 DETECTING MOLECULAR EVIDENCE OF POSITIVE DARWINIAN SELECTION
Konrad Scheffler SANBI, University of the Western Cape, South Africa [email protected] Cathal Seoighe Computational Biology Group, University of Cape Town, South Africa cathalQscience. uct.ac.za
1. Introduction Living organisms have adapted to an astonishing range of environments and in doing so have managed to develop extremely complex solutions to challenges posed by their non-living surroundings as well as by other organisms with which they interact (competitors, predators, prey, hosts or parasites). When a species is faced with a challenge that requires a change in its heritable phenotype this change can only be brought about through an alteration in the DNA sequence of the individuals of the species. If this alteration occurs, then the DNA sequence of the species has, in some sense, learnt about the new feature of the environment that posed the challenge. If one species learns a solution to a challenge posed by a second species in the same environment this may often, in turn, pose a reciprocal challenge to the second species resulting in a cycle of responses and counter-responses as each species learns the strategies that are employed by the species with which it interacts. Uncovering specific DNA mutations that have enabled species to adapt to their environments or to gain or maintain an advantage over interacting species has been an area of intense interest in recent years. 615
616
K. Scheffler & C. Seoighe
1.1. Molecular Evolution Research in a Time of Genomes Rapid advances in the generation of biomolecular data (gene, protein and genome sequences) and the simultaneous development of powerful mathematical and computational methods to analyze it have yielded spectacular opportunities for molecular evolution research64, enabling researchers to make quite specific inferences about how organisms have evolved. The fossil record contains many examples of the gradual evolution of species that can help to reconstruct the origins of extent organisms. In many cases it is easy to speculate but difficult to prove that the gradual phenotypic shifts that can be inferred are a result of evolutionary selection that favored the newly evolved or modulated traits rather than simply a result of a random process of drift. It is also not possible to identify the combination of DNA mutations and selective pressures that acted upon them (i.e. the evolutionary process), resulting in adaptation to the environment in which they lived. The availability of gene and more recently complete genome DNA sequences has made possible for the first time the identification of some specific mutations upon which evolutionary selective pressure has acted in order to bring about adaptive phenotypic changes. 1.2. Some Examples The earliest traces of human cultures testify to the longstanding fascination humankind has with stories that explain the origin of things or provide reasons that things are the way they are. Darwins theory of natural selection provided new ways in which these stories could be told and allowed many plausible but untestable evolutionary explanations of observations about the living world to gain currency (see the classic paper by Gould and Lewontin19 for a critique). The advent of a deluge of molecular data has made it possible to propose evolutionary or adaptionist explanations for observations about the natural world that are backed up by data. However, molecular data is not a complete safeguard against specious adaptionist explanations and it remains important to be cautious in the face of the seductive plausibility of adaptionist tales. Examples of adaptation at the molecular level from a wide range of organisms have been reported in the scientific literature at an ever increasing rate over recent years. Human genes that have been linked to recent adaptation include the lactase gene (the persistence of the ability to digest lactose in milk appears to have evolved recently and under the influence of strong positive selection in European populations5), ASPM gene that may have
Molecular Detection of Darwinian Selection
617
played a role in the evolution of human brain size10, the F0XP2 gene that has been suggested to be involved in the evolution of language capability9 as well as human genes responsible for the evolution of resistance to malaria47. Many examples of pathogen genes that have evolved under positive selection have also been reported. The human immunodeficiency virus (HIV), for example, has been shown to be evolving under the influence of positive selection in response to challenges posed to the virus by the human immune responses as well as by drug treatment30'71. A brief glance through the table of contents from almost any recent issue of a molecular evolution journal reveals a great many more examples of genes that have been inferred to have evolved under the influence of positive selection from across the kingdoms of life. Several examples of positive selection at the molecular level that can be attributed to the impact of human behavior on the environments of other organisms have been reported44, including the critical area of antibiotic resistance (see44 for a review), widespread resistance to pesticides in insects31'38, and chloroquine resistance in malaria66.
1.3. Chapter Overview In this chapter we discuss the context in which methods have been developed to identify DNA sequences that have evolved adaptively, including a brief review of the neutral theory of molecular evolution that can provide an important null hypothesis against which hypotheses of selection can be tested. We provide an overview of many of the methods that are used to detect positive selection as well as a discussion of the kinds of positive selection that can be detected with each method. We describe some of the outstanding examples of adaptation at the molecular level that have been uncovered in recent years and, finally, try to draw some parallels and distinctions between genetic algorithms and machine learning and selection acting on molecules on evolutionary timescales. A technical exposition of all of the methods that can be used to infer positive Darwinian selection is beyond the scope of this chapter and readers are referred for this purpose to the reviews and excellent textbooks referenced here. A more detailed introduction to the field of molecular evolution can be found in some of the recommended textbooks23'32'42.
618
K. Scheffler & C. Seoighe
2. Types of Adaptive Evolution 2.1. Episodic Positive Selection We are interested in adaptive evolution at the genotypic level and differences between the fitness of individuals with different genotypes. Fitness of a genotype is defined as a measure of the tendency for individuals carrying the genotype to contribute offspring to the next generation. An allele is said to be evolving under positive selection if it has a tendency to increase in frequency as a result of a higher average fitness of individuals carrying the allele27. We will use the terms positive selection and adaptive evolution interchangeably. An allele that tends to decrease in frequency as a result of a lower than average fitness of individuals that have the allele is said to be evolving under negative selection. Selection that acts to conserve an already optimal sequence is referred to as purifying. At the molecular level a distinction is drawn between episodic positive selection that results in a change in function of a protein, possibly enabling the protein to perform a new function (referred to as directional selection) and continuous rapid evolution at the molecular level (diversifying selection) that leaves the protein function substantially unchanged73. The evolution of foregut fermentation in colobine monkeys provides a good example of directional selection. The lysozyme enzyme has been recruited to kill bacteria in the stomachs of monkeys that have evolved the ability to ferment plant material 7 ' 54 . This novel role for the enzyme is associated with several amino acid replacements that arose rapidly under the influence of positive selection in the lineage leading to colobine monkeys in response to a change in diet36-54'68. 2.2. Diversifying Selection - The Biological Arms Races Many examples of diversifying selection come from the so-called arms-race co-evolution of pairs of interacting organisms such as predator and prey or host immune system and antigenic regions of parasites. Examples from the animal kingdom include the rapid evolution of venom from snakes40, aphids29, scorpions74 and predatory snails8 among a great many others. Very rapid evolution of venom from these species could result in arms-race type evolution between predator and prey - but evidence for this in the form of equally rapidly evolving pathways that enable the prey to become resistant to the venom have not yet been reported. Alternatively, diversifying selection on the venom proteins could result from changes in the availability of prey species8. Rapidly evolving gene sequences are also frequently found
Molecular Detection of Darwinian Selection
619
among pathogens that may often depend on rapid evolution for escape from host immune responses and even drug treatment. For example the fastest rate of evolution yet reported was found in the envelope gene of HIV63. Genes expressed in reproductive tissues also evolve very rapidly. Proteins involved in gamete recognition, in particular, can often be shown to be evolving under diversifying selection60. The development of reproductive barriers during the process of speciation involves an interesting interplay between the genes that cause reduced fitness of hybrids and genes that cause barriers to cross-fertilization (either through mate selection or gamete recognition) that may result in a positive feedback loop that drives the evolution of both classes49. 3. The Neutral Theory of Molecular Evolution The success of Darwins theory of natural selection in explaining the high degree of adaptation to their environment of most organisms led to the "selectionist" view that most changes in an organism are adaptive. This idea was extended from phenotypic evolution to molecular evolution where it was interpreted to mean that the majority of mutations that reach fixation in a population do so because they increase the fitness of the organism in its current niche. The majority of polymorphisms were understood to be maintained in a population as a result of selection that favoured polymorphism (e.g. overdominant selection, where the heterozygote is more fit than either homozygote). 3.1. Cost of Natural Selection The development of the neutral theory of molecular evolution26*28 heralded a mini-revolution in evolutionary theory. In 1957, before the development of the neutral theory, J. B. S. Haldane20 estimated the cost to the overall reproductive success of a species incurred by the fixation of a beneficial allele (Haldane argued that fixation of a novel beneficial allele implies some failure to reproduce in individuals without the favourable allele and an overall reduction in the reproductive capacity of the species). Later, Motoo Kimura27 made use of these calculations to show that the levels of polymorphism and sequence divergence that were becoming apparent from molecular data implied reductions in reproductive capacity that were incompatible with life if it was assumed that most of the polymorphism and sequence divergence was adaptive. Haldane20 estimated that the maximal rate at which a species was likely to fix adaptive mutations was one per
620
K. Scheffler & C. Seoighe
300 generations. Kimura suggested that most of the mutations that rise to fixation in a population are neutral or nearly neutral and drift to fixation by chance and that most polymorphisms are not maintained by selection but instead are intermediate states that are on the way to being fixed or lost from the population26'27. This model of evolution helped to explain some of the contradictions that were becoming apparent in the selectionist view with increasing availability of molecular data and aspects of the neutral theory of evolution are now well established, although important details, particularly concerning the overall proportion of substitutions that are adaptive, remain unclear13. The neutral theory does not deny the importance of adaptation but instead asserts that most changes at the molecular level have nothing to do with adaptation. Rather than taking away from the importance of positive Darwinian selection the neutral theory has helped to stimulate great interest in distinguishing the relatively few positively selected differences between species from the overwhelming numbers of neutral or nearly neutral differences. 3.2. Recent Tests of the Neutral Theory While the neutral theory has provided a useful antidote to the selectionist view that considered all change as predetermined by its effect onfitnessthe pendulum has to some extent begun to swing away from pure forms of the theory and debate continues about whether very recent estimates of the proportion of nucleotide mutations that are brought to fixation as a result of selection are indeed consistent with what should be expected under the neutral theory6'14'15. According to Smith and Eyre-Walker53 approximately 45% of all amino acid replacements in Drosophila species have been fixed through positive selection (more recently this estimate was revised downwards to about 25%6). Fay, Wyckoff and Wu14 estimate that 35% of amino acid replacements between humans and old world monkeys have been fixed through positive selection. This represents one adaptive substitution every 200 generations14. While this is a very significant proportion of the amino acid replacements it represents a very small minority of the nucleotide substitutions as the majority of the human genome does not encode protein. It is also not clear where to draw the line in deciding the level of positively selected amino acid replacements that is incompatible with the neutral theory. Interestingly this estimated rate of adaptive substitution since the divergence of humans and old-world monkeys is slightly higher than the
Molecular Detection of Darwinian Selection
621
maximal rate estimated by Haldane on the basis of the reproductive cost to a species of natural selection20'32. This chapter is essentially concerned with the task of distinguishing specific nucleotide substitutions that have risen to fixation through the action of (possibly weak) positive selection from the noisy background of genetic drift. 3.3. Detecting Departures from
Neutrality
Several methods have been developed to detect departures from neutral evolution, without necessarily determining whether the departure from neutrality is adaptive or a result of negative selection. Many of these methods make use of the expected distributions of polymorphisms of a given frequency in a neutrally evolving population of constant size. The frequency of deleterious polymorphisms is expected to be skewed to the left (i.e. higher proportion of the mutant alleles at low frequency) of the distribution for neutral polymorphisms while a greater proportion of high-frequency alleles is expected for positively selected alleles (Figure 1). Several test statistics have been proposed that can help to distinguish between these different modes of evolution, many of which are based on alternative estimates of the key population genetic parameter, 9 = 4iV/i where N isa, measure of the population size (called the effective population size) and fj, is the mutation rate. Tajima61 compared two different estimates of 6, one based on counting the number of polymorphic sites in a locus and the other based on the average number of differences between pairs of sequences sampled at random from the population. Both of these estimates are based on the assumption of neutral evolution. If the assumption of neutrality is violated these different estimates of 9 tend to differ significantly. This is because the estimate of 9 based on the numbers of differences between randomly chosen pairs of sequences is influenced most by alleles at intermediate frequencies while the estimate based on the number of polymorphic sites is not affected by the frequencies of the variants. The proportion of polymorphisms at intermediate (and high) frequencies is higher for positively selected alleles than for neutral alleles while a higher proportion of deleterious alleles, compared to neutral alleles, are kept at low frequency by purifying selection (Figure 1). Tajimas D statistic is a measure of the amount of deviation in these alternative estimates of 9. The sign of the difference between these estimates of 0, permits inference of whether the departure from neutrality is a result of positive or purifying selection (for a more detailed review see32).
622
K. Scheffler & C. Seoighe
Several alternative test statistics, many of which are generalizations of Tajimas D, that are sensitive to departures from the neutral assumption have been developed (see16 for a general discussion). Hudson, Kreitman and Aguade21 developed a powerful test of neutrality based on a simultaneous analysis of multiple loci in two species. Their test statistic measures the difference between the observed and expected levels of polymorphism and sequence divergence at several loci given a model of neutral evolution.
Fig. 1. The figure shows the expected number of polymorphisms at a given frequency under different selection regimes from strong purifying selection (2Ns = -100) to strong positive selection (2Ns = 100). (This figure is reproduced from Fay, Wyckoff and Wu with permission).
4. Selective Sweeps and Genetic Hitchhiking Polymorphic loci that are located physically close to one another on a chromosome are not independent. In the absence of a recombination event with a breakpoint between the two loci, the allele present at one locus provides information about the allele present at a neighbouring locus. Alleles of separate loci on the same chromosome that co-occur more frequently in gametes than would be expected by chance given their individual frequencies in the population are said to be in linkage disequilibrium. Recombination occurs more rarely between loci that are very close together on a chromosome and alleles at these loci are more likely to be in linkage disequilibrium than alleles of loci that are far apart.
Molecular Detection of Darwinian Selection
623
If a novel allele rose to fixation so rapidly that there has been no time for recombination to occur in the neighbourhood of the selected locus then, in this extreme case, all variation in the neighbourhood of the novel allele would be eliminated (Figure 2). A novel allele that yields a large increase in fitness might sweep through an entire population rapidly and this selective sweep can be detectable through its affect on linked diversity. Alleles at neighbouring loci with no effect or even a negative affect on fitness can rise to high frequency or fixation through linkage with the selected allele (referred to as genetic hitchhiking3'52).
Fig. 2. A selective sweep reduces diversity at nearby loci while recombination can maintain much of the pre-existing diversity particularly at loci that are at greater genomic distances from the selected locus. Each block represents data from four individuals. Light and dark characters represent alternative alleles at a locus. On the left a strong selective sweep with no recombination has eliminated all diversity in neighbouring loci. On the right recombination has restored some of the diversity, particularly at sites that are far from the selected locus. New mutations can also restore diversity but this is not depicted.
4.1. Detecting Selective Sweeps Sabeti et al.47 carried out a survey of the human genome to detect traces of recent selective sweeps. Their proposed method is based on identifying common human haplotypes at a genomic region of interest. A haplotype is a specific combination of alleles at linked polymorphic sites that co-occur on
624
K. Scheffler & C. Seoighe
the same chromosome. For each haplotype they measure the rate of decay of linkage as a function of distance from region of interest. This gives an indication of the age of the haplotype (the degree to which linkage can be broken down by recombination is a function of time since the origin of the haplotype). Young haplotypes at high frequency (i.e. in linkage disequilibrium with distant loci) suggest the action of positive selection (equations 1 and 2). Using this method Sabeti et al.i7 have detected evidence of strong selective sweeps in human, affecting genes that are responsible for resistance to the malaria-causing parasite Plasmodium falciparum. The parasite side of the same host-parasite equation has also provided very impressive examples of positively selected genes that are detectable because of the hitchhiking effect. Mutations that render P. falciparum resistant to the common antimalarial drug, chloroquine, have occurred independently a number of times and strong evidence indicates that some of these mutations have risen to high frequencies as a result of selection66. These selected mutations in a key chloroquine resistance gene, pfcrt, have resulted in reduced diversity in the genomic regions flanking the pfcrt gene. This provides an impressive example of very recent selection that has resulted directly from human intervention to combat the parasite and represents one of many examples44 of positive selection that results directly from human activities. In a more recent contribution Toomajian et al.62 developed a more involved method that is also based on detecting young alleles at higher than expected frequency. Allele age is estimated using a method that is similar to Sabeti et al.i7 and simulations were used to test whether the observed allele frequencies were likely to result from neutral evolution under a range of different neutral evolution scenarios (given the allele ages). Using this method Toomajian et al.62 inferred that a mutation responsible for a large proportion of hemochromatosis (a disease characterized by overabsorption of iron) cases has evolved under positive selection, possibly as a result of selection to increase absorption efficiency in populations with iron-poor diets. 4.2. Correlation Between Local Recombination Rates and Diversity As shown in Figure 2 recombination opposes the tendency of selective sweeps to introduce genomic regions with reduced diversity. One of the predictions of the diversity-restoring effects of recombination is that re-
Molecular Detection of Darwinian Selection
625
gions with low rates of recombination should recover diversity more slowly after selective sweeps and therefore should have reduced diversity while regions with high recombination rates should show higher diversity25. This prediction has been borne out in drosophila4 and human39, suggesting that selective sweeps have played a role in the evolution of these organisms. However, removal of deleterious mutations (referred to as background or purifying selection) also results in a reduction of diversity and in the coupling of diversity and local recombination rates. Distinguishing between the effects of background selection and hitchhiking due to positive selection poses a significant and current challenge in population genetics. If it is possible to distinguish between these alternative causes of the correlation between nucleotide diversity and recombination rates then the size of this correlation could be used to estimate the number of adaptive substitutions that have occurred in the relatively recent history of a species. Nachman39 estimates in this way that up to 30,000 adaptive substitutions may have occurred since the divergence of human and chimpanzee. This is a very approximate estimate that depends on several assumptions and may in fact provide an upper bound on the number of adaptive substitutions as the estimate explicitly neglects the contribution of background selection. 4.3. Distinguishing Complex Demographic Histories or Background Selection from Positive Selection Periods of population growth and decline or migration between isolated populations can distort the frequency spectrum of polymorphisms and result in apparent departure from neutrality for neutrally evolving alleles. For example, under neutral evolution a population bottleneck where a population size decreases sharply and subsequently recovers, causes an increase in the number of rare mutant alleles and a decrease in the number of alleles present in the population at intermediate frequencies34. Two distinctions between the effects of population bottlenecks and selective sweeps make it possible to distinguish between them. The first, involves comparing the number of high-frequency alleles with the alleles at intermediate frequencies. Fay and Wu12 argue that an excess of mutant alleles that have risen to high frequency is a unique signature of a selective sweep and is not expected to the same extent following a population bottleneck and they have developed a statistic, H, to measure this excess. A second distinction between the effects of a selective sweep and a population bottleneck becomes apparent when data from multiple loci are avail-
626
K. Scheffler & C. Seoighe
able. A population bottleneck is expected to have the same effect on the diversity at all of the loci (albeit with a large statistical variance), while selective sweeps are expected to affect different alleles differently depending on proximity to a selected locus, the strength of selection and the local rate of recombination. Galtier et al.17 used a statistical model of evolution called the coalescent (see46 for a review) that allows efficient estimation of evolutionary parameters by considering many different evolutionary histories (or genealogies) of the data. They approximated the effect of selective sweeps by population bottlenecks at different times in the past and of differing strengths and durations and developed a maximum likelihood method to test whether the improved fit of their selective sweeps model justified the increased complexity (i.e. additional parameters) of their model17. Their method favoured a contribution from selective sweeps to the recent evolution of Drosophila melanogaster.
The recent evolutionary history of a population can involve a combination of selective sweeps, population bottlenecks and migrations as well as purifying selection to remove deleterious alleles. Separating out the effects of each of these processes in the recent evolutionary history of a population as accurately as possible remains a significant challenge. 5. Codon-based Methods to Detect Positive Selection In this section we focus on methods that allow positive selection to be inferred through a comparison of the rates of neutral and non-neutral nucleotide substitutions in protein-coding nucleotide sequences. These methods are quite powerful at detecting positive selection that resulted in the fixation of a number of beneficial mutations in a population and are particularly useful for identifying diversifying (or arms-race) selection. Codonbased methods have revealed numerous biologically important examples of positive selection. These methods rely on identifying sequences (or positions within sequences) that have rates of nucleotide substitution that are higher than expected under neutrality. In order to do this we must be able to estimate the neutral rate of evolution: this is where synonymous substitutions are important. A key aspect of the genetic code is its redundancy. There are on average about three different triplet codons for each amino acid. Changes in the second codon position always change the amino acid encoded, while changes in the third codon position frequently do not affect the amino acid encoded.
Molecular Detection of Darwinian Selection
627
This phenomenon is known as "third position wobble".
Fig. 3. Types of mutations in protein coding DNA: (a) synonymous mutation (b) nonsynonymous mutation
The result of the redundancy in the genetic code is that a mutation in a protein-coding nucleotide sequence can be one of two types (Figure 3): • Synonymous mutation: A nucleotide mutation that does not alter the encoded amino acid. • Non-synonymous mutation: A nucleotide mutation that alters the encoded amino acid. It seems reasonable to assume that, since it does not alter the encoded protein, a synonymous mutation will have no effect on fitness and will therefore not be subject to selective pressure. This is the key assumption on which the methods described in this section are based: Assumption: Synonymous mutations are selectively neutral. It is well known that this assumption is not strictly correct50 and that fitness can be affected by factors other than the encoded protein. For example, unequal availability of the different transfer-RNAs does cause selection favouring some codons over their synonymous variants. Nevertheless, this effect is slight and the assumption that synonymous mutations are neutral provides a practical starting point for the methods that follow. While codon-based methods of inferring positive selection are extremely powerful and widely used most positively selected nucleotide substitutions are unlikely to be detectable by comparing synonymous and nonsynonymous codon substitution rates. For example, single nucleotide mutations that rise to fixation very rapidly as a result of a strong positive effect on fitness of an organism may represent significant examples of evolutionary adaptation but will not normally result in a significant excess of non-synonymous substitutions over the synonymous rate.
628
K. Scheffler & C. Seoighe
5.1. Counting Methods When data from a number of different species are available it is often reasonable to consider only divergence between species (ignoring polymorphism within the species and considering a species to be homogeneous). This is reasonable when the amount of divergence between sequences derived from individuals of the same species is far less than the average divergence of sequences derived from the different species. Thus the number of polymorphisms is negligible compared to the number of fixed substitutions. After making the assumption that synonymous mutations are selectively neutral, the natural continuation is to estimate the average synonymous and non-synonymous substitution rates, designated da and dn, between a pair of sequences. The symbols Ks and Kn are also commonly used in the literature as alternative notation for ds and djq. These rates are defined as the number of synonymous substitutions per synonymous site and the number of non-synonymous substitutions per non-synonymous site-respectively. A synonymous site is a nucleotide site at which no substitution alters the encoded amino acid whereas a site is considered to be non-synonymous if any substitution would alter the amino acid encoded. Sites at which some substitutions would and some would not alter the encoded amino acid are counted as a fraction of a synonymous site and a fraction of a non-synonymous site. A number of subtly different methods have been proposed to estimate ds and djv by counting synonymous and non-synonymous sites as well as synonymous and non-synonymous differences between a pair of protein-coding sequences. Some additional details about these methods can be obtained from the online material associated with this chapter or by consulting a molecular evolution text book32'42. Most of the distinctions between them are concerned with which model of nucleotide substitution is used to correct for multiple substitutions at the same nucleotide position, how to sum over different possible orders in which multiple mutations at the same codon might have occurred (different mutation orders can imply different numbers of non-synonymous and synonymous substitutions, Figure 4) and how to count synonymous and non-synonymous sites for codon positions that are neither completely synonymous nor completely non-synonymous. By comparing the substitution rate ratio d^/ds to 1, an indication is obtained of the selective pressures that have acted on the sequences following their divergence from a common ancestor. Positive selection can be inferred iid^/ds is significantly greater than 1 as indicated by a statistical hypothesis test.
Molecular Detection of Darwinian Selection
629
Fig. 4. Illustration of the two different maximally parsimonious paths along which the codon TTG (leucine) can evolve into the codon ATA (isoleucine).
5.1.1. Window-based Methods The methods discussed above estimate the average substitution rate ratio for an entire gene. When this ratio is significantly larger than 1, it provides strong evidence that the gene is under positive selection. In the vast majority of protein coding genes, however, most amino acid sites evolve under purifying selection, resulting in an average substitution rate ratio that is much less than 1. Thus even when a gene contains sites under positive selection, they are unlikely to be detected by these methods. Hughes and Nei24 took a step towardsfindingspecific loci that are under positive selection by applying the method of Nei and Gojobori41 to segments of a gene, showing that the selective pressure is different in different exons. This idea extends naturally a window-based approach, such as the sliding window method of Fares et al.n which attempts to optimise the size of the segment under investigation so as to obtain the maximum possible resolution from the available data. 5.1.2. Site-specific Methods The extreme case of a window-based method is when the window size is one codon. This is not feasible when only two sequences are being used, but the method of Nei and Gojobori41 is easily extended to make use of multiple sequences by first inferring a phylogeny and working out the sequences at the ancestral nodes. This can be done using several methods, with the parsimony and maximum likelihood approaches being popular. Suzuki and Gojobori55 used inferred ancestral sequences to do pairwise estimations of ds and djv for each branch in a phylogeny, averaging the results over the entire tree. Finally, they used a statistical hypothesis
630
K. Scheffler & C. Seoighe
test to assess whether the null hypothesis of neutral evolution could be rejected, assuming a binomial distribution for both ds and d^ under the null hypothesis. Source code for this method is available as the Adaptsite program .
5.2. Probabilistic
Methods
In contrast to the family of counting methods, probabilistic methods start with an explicit, detailed model of the process of evolution, from which the probability of a given set of observations a can be calculated. In the case of the phylogenetic models of interest here, a set of observations consists of a multiple in-frame alignment of protein-coding nucleotide sequences. As in the case of counting methods, the availability of data from multiple sequences allows stronger and more detailed conclusions to be drawn. In the case of probabilistic models, it also allows the models to be more sophisticated without incurring the problem of data sparsity: if the assumptions made by the model are justified, this should lead to more reliable conclusions. Once the model has been specified, inferences regarding it can be made by straightforward application of the rules of probability theory. Thus these methods lack the ad hoc flavour of counting methods: all assumptions and approximations are stated explicitly, and the answer produced by the method is the optimal or correct answer given those assumptions and approximations. Most of the models of sequence evolution that have been constructed describe the evolution of sequences at either the nucleotide or the amino acid level. In 1994 Goldman and Yang18'37 and Muse and Gaut37 independently extended these models to create codon models, which take into account the individual nucleotide substitutions required for a change from one codon to another as well as the difference between synonymous and non-synonymous changes. These models were further extended by the model of Nielsen and Yang43 which allows site-specific estimation of the ratio of non-synonymous to synonymous substitution rates and is discussed below.
a
When this probability of a set of observations given a set of model parameters is treated as a function of the parameters, it is termed the likelihood.
631
Molecular Detection of Darwinian Selection
5.2.1. Site-specific Methods The key quantity in a codon model is the rate matrix Q, which consists of elements gy (i ^ j) denoting the instantaneous substitution rates from codon i to codon j . For mathematical convenience, the diagonal elements are chosen such that the matrix rows sum to zero: denning them in this way causes the substitution probabilities over time t to be given as P(t) = eQt, which can be calculated by diagonalising the rate matrix 33 . The substitution rates for the model used by Nielsen and Yang43 are defined as follows: 0,
if i and j differ at two or three nucleotide positions ;
KTTJ ,
if i and j differ by one synonnymous transversion;
Qij = \ w7Tj,
if i and j differ by one non-synonymous transversion;
uiiTj,
if i and j differ by one synonymous transversion;
UJKTTJ ,
if i and j differ by one non-synonymous transtion.
where TT, is the equilibrium frequency of codon j , k is the transition/transversion rate ratio and u> is the non-synonymous/synonymous rate ratio. Note that the ui parameter plays the same role as the djv /ds (or Ka/Ks) ratio used in counting methods. Negative site category
p| 0
Neutral site category
P"
1 1 0)"
Positive site category
Negative site category
PT 1 co>l
(a) Selection model
PT p. 0
1 co"
Neutral site category
p"n 1 co1" =1
p.
(b) Neutral model
Fig. 5. Discrete distributions for u> under (a) the selection model and (b) the neutral model of Nielsen and Yang.
As mentioned before, however, positive selection typically occurs at only a small number of sites in a gene: the average substitution rate ratio is usually close to zero, irrespective of whether positive selection is present at some of the sites. The model of Nielsen and Yang allows rate heterogeneity by allowing the UJ parameter, and thus the rate matrix, to vary across sites in the sequence. Separate sites are treated as independent, but with an u> parameter drawn from a distribution of which the parameters need to be
632
K. Scheffler & C. Seoighe
estimated. Nielsen and Yang used a discrete distribution for this (Figure 5 (a)), which has the effect of dividing the sites up into a small number of site classes. In this case, three site classes are used, with associated w parameters as follows: u)~ constrained to be smaller than 1, for sites under purifying selection; uiN = 1, for sites under neutral drift; and u>+ constrained to be larger than 1, for sites under positive selection. Inference can now be carried out by estimating the maximum likelihood values not only of these parameters and the relative frequencies of the site classes (p~, pN and p+ respectively), but also of the posterior probabilities for each site in the sequence of belonging to the various site classes. Thus even if a sequence has only a single site under positive selection, it may still be possible to detect this if it is inferred to have a high probability of belonging to the site class for positive selection. An alternative neutral model is denned using a distribution that disallows the third (positive selection) site category (Figure 5 (b)). This model will yield the same results as the selection model if no sites are under positive selection. This is useful for deciding whether sites classified by the selection model as being under positive selection are real examples of positive selection or merely artifacts of an inappropriate model. By performing a likelihood ratio test, it can be determined whether the addition of a positive selection site category in the selection model leads to a statistically significant improvement. Only if this is the case is there solid evidence that a real example of positive selection has been detected. Since it is not clear what type of distribution for w would be a good assumption, a large number of different distributions (both continuous and discrete) have been investigated70. For more reliable results a number of different tests, using different variants of neutral and selection models, can be conducted. Source code for all of these models is available as part of the PAML program package67. Despite the widespread popularity of this method, it has a number of potential problems. In fact, Suzuki and Nei2 have claimed that it gives misleading results, falsely detecting positive selection when there is none. The following are issues that may affect the method: • Violation of assumptions: The models make several assumptions which may not be correct. For instance, most codon models assume that the synonymous rate ds is constant for the entire sequence under investigation. When this is not the case, false detection of positive selection may result. Similarly, codon models
Molecular Detection of Darwinian Selection
633
(as well as counting methods) assume that the sequences are free of recombination. It has been shown1'51 that recombination can lead to false detection of positive selection. • Convergence of maximum likelihood parameter estimation: In order to estimate the maximum likelihood values of model parameters in a computationally tractable way, it is necessary to use algorithms such as gradient descent or expectation maximisation. These algorithms are not guaranteed to find the globally optimal parameter values. If the search space is complex, there is a possibility that there may be convergence on suboptimal parameter values, leading to unreliable inferences. Suzuki and Nei58 reported experiencing such problems with the models recommended by Yang et al.70, and in particular found that the results were very sensitive to the values used to initialise the algorithm. • Likelihood ratio test: The use of the likelihood ratio test for determining statistical significance requires the test statistic to have a chi-squared distribution. Anisimova et al.1 showed that in fact it does not have this distribution, resulting in an overly conservative test. This problem could be corrected by simulating the actual distribution, but researchers have tended to welcome this aspect making the test more conservative, especially since it is claimed that other model assumptions causes it to be too liberal58. Thus this does not appear to be a serious weakness. • Uncertainty in estimated parameter values and tree topology: The use of maximum likelihood estimates of tree topologies and parameter values suffers from the weakness that it ignores the uncertainty inherent in these estimates, which may in turn affect the inference of positive selection. While Yang et al.52'70 claim that their method is not sensitive to the assumed tree topology, this has been disputed by Suzuki and Nei2, who obtained different results when using different criteria to estimate their trees. This problem has been addressed by Huelsenbeck and Dyer22, who used a Bayesian approach to average out the uncertainties in the estimates. The Bayesian approach also avoids the convergence problems that the maximum likelihood approach may experience, but at the cost of increasing the computational complexity of the method.
634
K. Scheffler & C. Seoighe
5.2.2. Lineage-specific Methods
Instead of asking which sites in a sequence are under selection, we could also ask which lineages in a phylogeny experienced selection. This is of interest in cases where a particular species in the taxonomy evolved to fill a particular niche: in such a case one could observe directed evolution of a gene for which the optimal form in the new niche is different from the optimal form in the environment inhabited by the other species. Yang and Nielsen69 proposed a set of codon models that allow positive selection at individual sites in a subset of the lineages of the phylogeny, referred to as the foreground branches. The remaining lineages (the background branches) are not allowed to have positive selection. As in the sitespecific case, the maximum likelihood parameters are estimated and, if the foreground branches are found to have sites with u > 1, a likelihood ratio test is used to determine whether the improvement over the neutral model is significant. If so, it is concluded that the foreground branch has evolved under positive selection. Once again, a more correct but computationally more expensive solution can be reached using Bayesian methods. Seo, Kishino and Thome48 developed a Bayesian approach to this problem, using Markov Chain Monte Carlo methods to average out the uncertainties in the parameter estimates. Their approach uses a slightly different framework that allows separate estimation of (IN and ds, without assuming a constant synonymous substitution rate ds-
5.2.3. Detecting Selection in Non-coding Regions It is also possible to use codon models to investigate selection in non-coding regions, if the non-coding regions in question are close enough to available coding regions that a constant neutral rate can be assumed. Wong and Nielsen65 proposed a combined model for coding and non-coding regions that includes a parameter, £> representing the ratio between the substitution rate at a non-coding site and the synonymous substitution rate in coding regions. They used this to show that positive selection in non-coding regions is rare or absent in the viral sequences they examined. It should be noted that, due to the very large number of non-coding sequences evolving without selective constraints, this approach is likely to find examples of positive selection only if is possible to identify in advance specific non-coding regions that are likely to have undergone adaptive evolution.
635
Molecular Detection of Darwinian Selection
5.3. Comparison of Counting and Probabilistic to Comparative Methods
Approaches
Since their introduction in 1994, probabilistic methods seem to have taken over from counting methods as the method of choice, partly because of their more sophisticated theoretical framework. However, in recent years they have started coming under criticism for resulting in overly liberal inferences56'57'58'72. In the light of this, it is worth taking a closer look at exactly how these two types of method compare: the most important differences lie not so much in the theoretical framework under which the method is described, as in their underlying assumptions. Counting methods make no assumptions about the relationships between different sites in the sequence, except that they have a common tree topology. Accordingly, they perform a completely separate analysis for each site. Probabilistic methods, on the other hand, assume that a single set of parameters can be used to describe all sites and use data from the entire sequence to estimate these parameters. These include the constant transition/transversion ratio, codon equilibrium frequencies and phylogenetic branch lengths assumed for all sites, as well as the parameters describing the probability distribution from which the site-specific values are drawn. For instance, if the codon AAA is observed at both ends of a phylogenetic branch, a counting method will infer that no substitutions have taken place, irrespective of what is observed at other sites in the sequence. The inference made by a probabilistic method, however, will depend on the rest of the sequence: if the branch length is estimated to be long, it may turn out that two or more successive substitutions is a more likely explanation. By using the data in this way, probabilistic methods gain a potential advantage in that they make efficient use of the available information. When sufficient data is available and the model assumptions are justified, this should lead to more reliable results than obtained using counting methods. However, there are two ways in which this approach can be counterproductive: If the amount of available data is insufficient, the estimated parameter values will be unreliable. This is called over-parameterisation: the more parameters there are in a model, the larger the data set required to estimate them accurately. Thus the best model to describe a particular phenomenon usually depends on the amount of available data. If the assumption that a single model can be used to describe the entire data set is incorrect, the additional information being used will be false,
636
K. Scheffter & C. Seoighe
resulting in false inferences. For this reason, probabilistic methods are unreliable if the sequence under investigation is inhomogeneous in some way that has not been modeled explicitly, as in the case of recombination or variable mutation rate. 5.4. Codon Volatility A much simpler idea was proposed by Plotkin, Dushoff and Eraser45, who used the notion of codon volatility to learn about the most recent substitutions in a single sequence. The idea is that some codons are volatile in that many of the possible substitutions which they could undergo are non-synonymous, while others are less volatile in that many of the possible substitutions are synonymous. A random non-synonymous substitution is more likely to result in a volatile codon, while codons undergoing only synonymous substitutions are more likely to be non-volatile. By measuring the levels of codon volatility in particular genes against those of the entire genome, one can evaluate whether the selective pressure in the gene in question is more positive or negative than that of the genome as a whole. Using this method to analyse the genomes of Mycobacterium tuberculosis and Plasmodium falciparum, they identified a number of surface proteins that appear to be under positive selection. The notion of volatility is automatically incorporated in the comparative sequence analysis methods discussed thus far, since they take into account the different paths along which one sequence can evolve into another. A consequence of this is that these methods will be more likely to associate volatile codons with non-synonymous substitutions and non-volatile codons with synonymous substitutions. However, they use single genes from multiple species, while with the codon volatility method it is possible to obtain results using multiple genes of a single species, using far fewer data than for comparative sequence analysis methods. The simplicity of the codon volatility method also means that very few assumptions need to be made: the assumptions that lead to debate between the counting and probabilistic approaches are avoided, and issues such as recombination and population size are irrelevant. There are two caveats to bear in mind when considering the codon volatility method: It focuses only on the most recent evolutionary events affecting the sequence. Thus the method is suitable for studying which portion of a genome is currently under adaptive or constraining selective pressure, but
637
Molecular Detection of Darwinian Selection
not for studying the evolutionary history of an organism. It does not produce an absolute estimate of the selective pressure, but only an estimate of the pressure relative to the rest of the genome. Since most genes are mainly under negative selection, this usually means that examples of negative selection found by this method will indeed be strongly conserved, but examples of positive selection may in fact be evolving neutrally rather than under adaptive pressure. 5.5. Codon-baaed Methods that use Polymorphism
Data
When data from multiple individuals within one or more species are available a population-based approach can be used in order to take polymorphism data into account. The McDonald-Kreitman test 35 is based on the fact that, under a neutral model of evolution, the ratio between non-synonymous and synonymous substitutions should be equal to the ratio between non-synonymous and synonymous polymorphisms. However, advantageous mutations are expected to reach fixation more rapidly (spending less time as polymorphisms) so that if a large number of nonsynonymous substitutions confer a selective advantage on the organism then non-synonymous substitutions would be expected to be more common as fixed substitutions than as polymorphisms. McDonald and Kreitman analysed data from the Adh locus in three species from the Drosophila melanogaster species subgroup. They constructed a phylogeny containing within-species branches accounting for diversity within species (polymorphism), and between-species branches accounting for fixed differences between species (substitutions). By measuring the polymorphisms and substitutions reconstructed using this tree, they demonstrated a statistically significant deviation from the neutral expectation and concluded that the Adh locus in Drosophila is under positive selection. A number of approaches based on this framework have been proposed and used to detect positive selection in cases. The extension proposed by Williamson63, based on that of Smith and Eyre-Walker53, distinguishes between rare and common polymorphisms. It is assumed that positively selected mutations are not observed as rare polymorphisms. The number of positively selected substitutions (ad) and positively selected common polymorphisms (ap) can then be estimated as ad - vn and
Rs
^i-f-
Rsj
,
638
K. Scheffler & C. Seoighe
ap = C n
^
(l + -jjjj ,
where Dn and Ds are the numbers of non-synonymous and synonymous substitutions, Cn and Cs are the numbers of non-synonymous and synonymous common polymorphisms, and Rn and Rs are the numbers of nonsynonymous and synonymous rare polymorphisms. Using this approach to analyze the HIV-1 env gene, Williamson estimated that the majority (55%) of non-synonymous substitutions and highfrequency polymorphisms in this gene are adaptive. He also found that patients with longer asymptomatic periods have virus populations with higher adaptation rates, which is presumably related to a stronger immune response in these patients. 6. Discussion and Future Prospects Evolution is a ubiquitous biological learning process: it occurs in most populations of organisms most of the time, as they continually learn to adapt to changing environments. It is a process very similar to many computational learning methods: genetic algorithms, for instance, are a class of computational methods that are based on a detailed model of the evolutionary learning process. However, the sheer scale of the problem faced by biological organisms causes an enormous difference between the complexity of computational and biological learning systems: biological systems have to learn and maintain solutions to a vast number of different (and continually changing) problems on a scale with which no known computational method could cope. Another difference between biological and computational learning is that in the latter we have direct access to the computation and can therefore understand exactly how the learning takes place. An understanding of the learning process in evolution, on the other hand, can only be achieved indirectly as we have no direct access to either the computations involved or the details of the problems they solve. Indeed, in many cases it is the desire to learn more about the biology of the problems being solved that motivates our efforts to understand the evolutionary process. Fortunately, we do have a large number of clues that allow us a glimpse of what's going on. In this chapter we have described a selection of the many methods that have been developed to extract information from these clues. However, currently available methods remain imperfect in that they make assumptions that are often unjustified and may frequently lead to
Molecular Detection of Darwinian Selection
639
incorrect inferences. For example, the effect of recombination, which can invalidate codon-based methods that make use of phylogenetic trees, is often neglected. Similarly, complex demographic histories can provide competing explanations for positive selection inferred from some population-based methods It should therefore be emphasised that, while it is easy to construct plausible stories of evolution, such stories could in many cases be completely wrong. As theory advances, some previously inferred examples of positive selection may need to be reconsidered as many false positive results may have been published. The question of how best to decipher clues left over from the evolutionary learning process remains an intriguing and rapidly developing field of research.
References 1. M. Anisimova, J. P. Bielawski and Z. Yang, Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18, 1585-1592 (2001). 2. M. Anisimova, R. Nielsen and Z. Yang, Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164, 1229-1236 (2003). 3. N. H. Barton, Genetic hitchhiking. Philos. Trans. R. Soc. Lond B Biol. Sci. 355, 1553-1562 (2000). 4. D. J. Begun and C. F. Aquadro, Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519-520 (1992). 5. T. Bersaglieri, et al., Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111-1120, (2004). 6. N. Bierne and A. Eyre-Walker, The genomic rate of adaptive amino acid substitution in Drosophila. Mol. Biol. Evol. 21, 1350-1360 (2004). 7. D. E. Dobson, E. M. Prager and A. C. Wilson, Stomach lysozymes of ruminants. I. Distribution and catalytic properties. J. Biol. Chem. 259, 1160711616 (1984). 8. T. F. Jr. Duda and S. R. Palumbi, Molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod Conus. Proc. Natl. Acad. Sci. USA 96, 6820-6823 (1999). 9. W. Enard, et al., Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869-872 (2002). 10. P. D. Evans, et al., Adaptive evolution of ASPM, a major determinant of cerebral cortical size in humans. Hum. Mol. Genet. 13, 489-494 (2004). 11. M. A. Fares, et al., A sliding window-based method to detect selective constraints in protein-coding genes and its application to RNA viruses. J. Mol. Evol. 55, 509-521 (2002).
640
K. Scheffler & C. Seoighe
12. J. C. Fay and C. I. Wu, Hitchhiking under positive Darwinian selection. Genetics 155, 1405-1413 (2000). 13. J. C. Fay and C. I. Wu, (2001). The neutral theory in the genomic era. Curr. Opin. Genet. Dev. 11, 642-646 (2001). 14. J. C. Fay, G. J. Wyckoff and C. I. Wu, Positive and negative selection on the human genome. Genetics 158, 1227-1234 (2001). 15. J. C. Fay, G. J. Wyckoff and C. I. Wu, Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415, 1024-1026 (2002). 16. Y. X. Fu, Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147, 915-925 (1997). 17. N. Galtier, F. Depaulis and N. H. Barton, Detecting bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics 155, 981-987 (2000). 18. N. Goldman and Z. Yang, A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725-736 (1994). 19. S. J. Gould and R. C. Lewontin, The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. Lond B Biol. Sci. 205, 581-598 (1979). 20. J. B. S. Haldane, The cost of natural selection. Journal of Genetics 55, 511524 (1957). 21. R. R. Hudson, M. Kreitman and M. Aguade, A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153-159 (1987). 22. J. P. Huelsenbeck and K. A. Dyer, Bayesian Estimation<>f Positively Selected Sites. J. Mol. Evol. 58, 661-672 (2004). 23. A. L. Hughes, Adaptive evolution of genes and genomes. New York: Oxford University Press, (1999). 24. A. L. Hughes and M. Nei, Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335, 167-170 (1988). 25. N. L. Kaplan, R. R. Hudson and C. H. Langley, The "hitchhiking effect" revisited. Genetics 123, 887-899 (1989). 26. M. Kimura, Evolutionary rate at the molecular level. Nature 217, 624-626 (1968). 27. M. Kimura, The neutral theory of molecular evolution. Cambridge: Cambridge University Press (1983). 28. J. L. King and T. H. Jukes, Non-Darwinian evolution. Science 164, 788-798 (1969). 29. M. Kutsukake, et al., Venomous protease of aphid soldier for colony defense. Proc. Natl. Acad. Sci. USA 101, 11338-11343 (2004). 30. D. S. Leal, E. C. Holmes and P. M. Zanotto, Distinct patterns of natural selection in the reverse transcriptase gene of HIV-1 in the presence and absence of antiretroviral therapy. Virology 325, 181-191 (2004). 31. T. Lenormand, D. Bourguet, T. Guillemaud and M. Raymond, Tracking the evolution of insecticide resistance in the mosquito Culex pipiens. Nature 400, 861-864 (1999). 32. W. H. Li, Molecular Evolution. 1 ed. Sinauer Assoc, (1997). 33. P. Lio and N. Goldman, Models of molecular evolution and phylogeny.
Molecular Detection of Darwinian Selection
641
Genome Res. 8, 1233-1244 (1998). 34. G. T. Marth, E. Czabarka, J. Murvai and S. T. Sherry, The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166, 351-372 (2004). 35. J. H. McDonald and M. Kreitman, Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652-654 (1991). 36. W. Messier and C. B. Stewart, Episodic adaptive evolution of primate lysozymes. Nature 385, 151-154 (1997). 37. S. V. Muse and B. S. Gaut, A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11, 715-724 (1994). 38. A. Mutero, M. Pralavorio, J. M. Bride and D. Fournier, Resistance-associated point mutations in insecticide-insensitive acetylcholinesterase. Proc. Natl. Acad. Sci. USA 91, 5922-5926 (1994). 39. M. W. Nachman, Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17, 481-485 (2001). 40. K. Nakashima, et al., Accelerated evolution in the protein-coding regions is universal in crotalinae snake venom gland phospholipase A2 isozyme genes. Proc. Natl. Acad. Sci. USA 92, 5605-5609 (1995). 41. M. Nei and T. Gojobori, Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418-426 (1986). 42. M. Nei and S. Kumar, Molecular evolution and phylogenetics. New York Oxford University Press, (2000). 43. R. Nielsen and Z. Yang, Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929-936 (1998). 44. S. R. Palumbi, Humans as the world's greatest evolutionary force. Science 293, 1786-1790 (2001). 45. J. B. Plotkin, J. Dushoff and H. B. Fraser, Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum. Nature 428, 942-945 (2004). 46. N. A. Rosenberg and M. Nordborg, Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat. Rev. Genet. 3, 380-390 (2002). 47. P. C. Sabeti, et al., Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832-837 (2002). 48. T. K. Seo, H. Kishino and J. L. Thome, Estimating absolute rates of synonymous and nonsynonymous nucleotide substitution in order to characterize natural selection and date species divergences. Mol. Biol. Evol. 21, 1201-1213 (2004). 49. M. R. Servedio and G. P. Saetre, Selection as a positive feedback loop between postzygotic and prezygotic barriers to gene flow. Proc. R. Soc. Lond B Biol. Sci. 270, 1473-1479 (2003). 50. D. C. Shields, P. M. Sharp, D. G. Higgins and F. Wright, "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous
642
K. Scheffler & C. Seoighe
codons. Mol. Biol. Evol. 5, 704-716 (1988). 51. D. Shriner, D. C. Nickle, M. A. Jensen and J. I. Mullins, Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet. Res. 81, 115-121 (2003). 52. J. M. Smith and J. Haigh, The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23-35 (1974). 53. N. G. Smith and A. Eyre-Walker, Adaptive protein evolution in Drosophila. Nature 415, 1022-1024 (2002). 54. C. B. Stewart, J. W. Schilling and A. C. Wilson, Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature 330, 401-404 (1987). 55. Y. Suzuki and T. Gojobori, A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16, 1315-1328 (1999). 56. Y. Suzuki and M. Nei, Reliabilities of parsimony-based and likelihood-based methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 18, 2179-2185 (2001). 57. Y. Suzuki and M. Nei, Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 19, 1865-1869 (2002). 58. Y. Suzuki, and M. Nei, False-positive selection identified by ML-based methods: examples from the Sigl gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol. Biol. Evol. 21, 914-921 (2004). 59. Y. Suzuki, T. Gojobori and M. Nei, ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 17, 660-661 (2001). 60. W. J. Swanson and V. D. Vacquier, The rapid evolution of reproductive proteins. Nat. Rev. Genet. 3, 137-144 (2002). 61. F. Tajima, Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585-595 (1989). 62. C. Toomajian, et al., A method for detecting recent selection in the human genome from allele age estimates. Genetics 165, 287-297 (2003). 63. S. Williamson, Adaptation in the env gene of HIV-1 and evolutionary theories of disease progression. Mol. Biol. Evol. 20, 1318-1325 (2003). 64. K. H. Wolfe and W. H. Li, Molecular evolution meets the genomics revolution. Nat. Genet. 33 Suppl, 255-265 (2003). 65. W. S. Wong and R. Nielsen, Detecting selection in noncoding regions of nucleotide sequences. Genetics 167, 949-958 (2004). 66. J. C. Wootton, et al., Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature 418, 320-323 (2002). 67. Z. Yang, PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556 (1997). 68. Z. Yang, Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568-573 (1998). 69. Z. Yang and R. Nielsen, Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908-917 (2002). 70. Z. Yang, R. Nielsen, N. Goldman and A. M. Pedersen, Codon-substitution
Molecular Detection of Darwinian Selection
71.
72. 73.
74.
643
models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431-449 (2000). P. M. Zanotto, E. G. Kallas, R. F. de Souza and E. C. Holmes, Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics 153, 10771089 (1999). J. Zhang, Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21, 1332-1339 (2004). J. Zhang and H. F. Rosenberg, Diversifying selection of the tumor-growth promoter angiogenin in primate evolution. Mol. Biol, Evol. 19, 438-445 (2002). S. Zhu, F. Bosnians and J. Tytgat, Adaptive evolution of scorpion sodium channel toxins. J. Mol. Evol. 58, 145-153 (2004).
CHAPTER 6 MOLECULAR PHYLOGENETIC ANALYSIS: UNDERSTANDING GENOME EVOLUTION
Alan Christoffels Computational Biology, Temasek Life Sciences Laboratory, Singapore calan@irncb. a-star. edu.sg
Genome duplication has been postulated to be the driving force in shaping the evolution of complexity and novelty within an organism . Furthermore, it has been postulated that at least two rounds of ancient genome duplications have contributed to the complexity of vertebrate genomes, also known as the 2R hypothesis10'11. However, several studies including the first whole genome analysis addressing the 2R hypothesis has demonstrated clear evidence for at least one genome-doubling event early in the evolution of vertebrates8'12. In addition, large scale analysis of Pugu rubripes has supported the view that ray-finned fishes might be different from land vertebrates because of additional duplication events in their evolutionary past ' . The availability of sequenced vertebrate genomes and those in the pipeline for sequencing allows for large-scale computational analyses to investigate the contribution of duplication events in structuring vertebrate genomes. The following criteria needs to be assessed in order to prove or refute species-specific genome duplication events: (a) Can species-specific duplicate genes be identified? (b) Does the age of species-specific genes predate the divergence of a sister group? (c) Can we identify intact genome segments containing the above duplicate genes? This chapter looks at the identification of duplicated genes focusing on principles behind the reconstruction of phylogenetic trees. 1. What is Phylogenetics? Phylogenetics is the science of estimating the evolutionary past, and in the case of molecular phylogeny, studying the evolutionary relationships between DNA or protein sequences1. The use of phylogenetic analyses has contributed enormously to the field of comparative genomics in order to glean insights into genome evolution and specifically gene function2'12. 645
646
A. Christoffels
2. What is a Phylogenetic Tree? A phylogenetic tree captures the evolutionary relationships between genes. A tree is composed of branches and nodes (Figure 1). Branches connect nodes, which in turn represent the points at which two or more branches diverge. An internal node corresponds to the last common ancestor. Terminal nodes correspond to the sequences from which the tree was derived (also referred to as operational taxonomic units or OTUs for short).
Fig. 1. Duplication topologies, (a) Gene 1 duplication event in human, (b) Duplication event occurring before human and fish diverged. Black dot indicates a node.
The root of a phylogenetic tree is at its base and is the oldest point. The root also implies the branching order in a tree namely, which sequences share a recent common ancestor. The tree can -only be rooted with an 'outgroup' or an external point of reference. An outgroup is anything that is not a natural member of a group of interest. Therefore humans is not an outgroup to animals. In the absence of an outgroup, the best alternative is to place the root in the middle of the tree or not to root the tree at all. 3. Identifying Duplicate Genes Expansion within a gene family occurs by gene duplication events. These duplicated genes are also referred as paralogs compared to orthologs that arise from a common ancestor as a result of a speciation event. The evolutionary relationships between paralogous genes are best illustrated with the aid of a phylogenetic tree. Trees can represent multigene families, in which case the internal nodes correspond to gene duplication events. The tree topology distinguishes duplication events specific to one species (Figure l(a)) or shared by multiple species (Figure l(b)). The following strategy is recommended to identify duplicated genes: generate protein families; multiple alignment; phylogenetic tree reconstruction; high-throughput screening of tree topologies.
Molecular Phylogenetic Analysis
647
3.1. Generate Protein Families The correct assignment of proteins into families (clusters) is hampered by problems such as multi domain proteins, fragmented proteins and low complexity domains. Different algorithms have been implemented in order to reduce the false positive rate for protein families including Tribe MCL3 used in the ensembl pipeline. An alternative approach implements an all verse all proteome comparison using BLASTP. Various BLAST parameters have been tweaked in order to modify the strictness of protein family membership2'12. Strict criteria would eliminate certain family members but would produce more accurate alignments for automated downstream processing. 3.2. Multiple Sequence Alignments Multiple sequence alignments are generally constructed by a progressive sequence alignment approach such as implemented in ClustalW16. This method builds an alignment up stepwise starting from the most similar sequences and progressively adding the most divergent sequences with help of a guide tree. Insertions or deletions (indels) within a gene can change the reading frame or introduce a stop codon. Indels are presented by gaps in a multiple sequence alignment. The size of the gap is much less important than the fact that the gap is there at all. Therefore alignment programs have separate penalties for inserting a gap and for making it bigger. Gaps within a sequence can have an undue influence on your tree. For example, a 6-nucleotide insertion would be 6 shared characters for the OTUs that have it. This is inappropriate because a gap is really a single evolutionary event regardless of size. In practice, the full alignment can be adjusted 'by eye' to minimize indels using software such as BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Automated strategies for gap removal in an alignment have focused on removing any column where at least 10% of sequences contain a gap together with the adjacent columns that are not well aligned17. 3.3. Reconstructing Phylogenetic Trees As discussed, there is a variety of analysis methods at each step, and Commonly used methods for reconstructing phylogenetic trees from molecular data include (1) distance methods, (2) parsimony methods, and (3) likelihood methods. We will focus on a distance method namely neighbor joining.
648
A. Christoffels
In distance methods, evolutionary distances are computed for all pairs of taxa and a phylogenetic tree is reconstructed by considering the relationships among these distance values . In 1987, Saitou and Nei14 developed an efficient tree-building method that does not examine all possible topologies but implements a minimum evolution principle at each stage of taxon clustering. Minimum evolution principle refers to the selection of one topology out of all possible topologies where the sum of the branch lengths is the minimum. Two taxa that are connected by a node in an unrooted tree are called neighbors. For example taxa 1 and 2 are neighbors joined by node A (Figure 2(a)). It is possible to successive join neighbors and produce new pairs of neighbors when defining a topology. For example taxa land 2 are neighbors to taxa 3 joined at node B (Figure 2(a)). Construction of a neighbor joining (NJ) tree begins with a star tree that is produced under the assumption that there is no clustering of taxa (Figure 2(b)). The sum of all branch lengths (So) for the star tree should be larger that the sum (S/inai) for the final tree. If we choice neighbors 1 and 2 and construct the tree as in Figure 2(c), then sum (S12) of all branch lengths should be smaller than So- We consider all pairs of taxa as potential neighbors and compute the sum of branch lengths (Sy-) for the i-th and j-th taxa using a topology similar to that shown in Figure 2(c). We then choose taxa i and j that show the smallest Sy- value. The selected pair of neighbors is combined into one composite taxon and the procedure is repeated until the final tree is produced.
Fig. 2. Generating pairs of neighbors for tree NJ tree construction, (a) phylogeny of four sequences, (b) Star tree showing no clustering of taxa. (c) Unrooted tree showing sequence 1 and 2 as closest pairs during clustering.
Trees are tested with more than one phylogenetic method for consistency before extrapolating any information. Phylogenetic tree reconstruction is hampered by the observation that the true evolutionary difference between two sequences is obscured by multiple mutations at the same site. These rapidly evolving sites would cause estimates of evolution to be underesti-
Molecular Phylogenetic Analysis
649
mated. Various models have been developed to estimate the true difference between sequences based on amino acid substitution matrices such as PAM and BLOSUM or gamma correction7 where more weight is given to changes at slowly evolving sites. 4. Assessing the Accuracy of Phylogenetic Trees The accuracy of a phylogenetic tree is tested using bootstrapping5. Bootstrapping tests whether the whole dataset supports the tree or whether there is a slightly better tree compared to many alternatives. This is done by taking random subsamples of the dataset while allowing repeat sampling of sites. The frequency with which various parts of a tree are reproduced in each of the random samples is calculated and reported as a percentage. For example, if a specific topology is found in two thirds of all random samples then the bootstrap support is 67%. Bootstrap values of 70% or higher are likely to indicate reliable groupings9. Another problem associated with phylogenetic tree reconstruction is long branch attraction4'6, and refers to the tendency of highly divergent sequences to group together in a tree regardless of their true relationships. Reasons for, and possible solutions to long branch attraction have been reviewed by Baldauf1, and Felsenstein5. 5. High-throughput Screening of Tree Topologies Trees are represented in a computer-readable form using the NEWICK format. The rooted tree in Figure 1 can be represented in NEWICK format as follows: (Drosophila,(Fugu,(Human-la,Human-lb)). An interior node is represented as a pair of matched parentheses. The tips of the tree are represented by their names. Branch lengths can be incorporated into a tree by putting a real number after the node and preceded by a colon, for example (Drosophila:4.0,(Fugu:2.0,(Human-la,Human-lb):7.0) The BioPerl open source initiative15 maintains a collection of PERL wrappers that facilitates automated phylogenetic tree reconstruction for high throughput analysis. Additional PERL routines exist to manipulate and extract data from NEWICK formatted trees. These routines can be modified to identify species-specific topologies as seen in Figure l(a). 6. Concluding Remarks Genes are duplicated together with the upstream and downstream regulatory elements. Yet, the fate of these duplicated genes can differ in that one
650
A. Christoffels
copy can be lost over time or both copies can be retained but one copy can acquire a different function. To understand the mechanisms by which duplicated genes acquire modified functions, we need to analyse the corresponding regulatory regions. Our ability to pinpoint duplicated genes within newly sequenced genomes provides the starting material for the application of biocomputing protocols such as comparative analysis to examine the regulatory regions of duplicated genes.
References 1. S. L. Baldauf, Phylogeny for the faint of heart: a tutorial. Trends in Genetics. 19, 345-351 (2003). 2. A. Christoffels, et al., Pugu genome analysis provides evidence for a wholegenome duplication early during the evolution of ray-finned fishes. Molecular Biology Evolution 21(6), 1146-1151 (2004). 3. A. J. Enright, S. Van Dongen, C. A. Ouzounis, An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7), 15751584 (2002). 4. J. Felsenstein, Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool. 27, 401-410 (1978). 5. J. Felsenstein, Confidence limits in phytogenies: An approach using the bootstrap. Evolution 39, 783-791 (1985). 6. S. Gribaldo and H. Philippe, Ancient phylogenetic relationships. Theor. Popul. Biol. 61, 391-408 (2002). 7. X. Gu and J. Zhang, A simple method for estimating the parameter of substitution rate variation among sites. Mol Biol Evol. 14, 1106-1113 (1997). 8. X. Gu, Y. Wang and J. Gu, (2002) Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31, 205-209. 9. D. M. Hills and J. J. Bull, An empirical testing of bootstrapping as a method for assessing confidence in phylogenetic analyses. Syst Biol. 42, 182-192 (1993). 10. P. W. Holland, Vertebrate evolution: Something fishy about Hox genes. Curr. Biol. 7, R570-R572 (1997). 11. I. G. Lundin, D. Larhammar and F. Hallbook, Numerous groups of chromosomal regional paralogues strongly indicate two genome doublings at the root of vertebrates. J. Struct. Fund. Genomics 3, 53-63 (2003). 12. A. McLysaght, K. Hokamp and K. H. Wolfe, Extensive genomic duplication during early chordate evolution. Nat Genet. 31, 200-204 (2002). 13. S. Ohno. Evolution by gene duplication. Springer New York, (1970). 14. N. Saitou and Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406-425 (1987). 15. J. Stajich, et al., The Bioperl Toolkit: Perl Modules for the life science. Genome Res. 12, 1611-1618 (2002).
Molecular Phylogenetic Analysis
651
16. J. D. Thompson, D. G. Higgins, and T. J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673-4680 (1994). 17. K. Vandepoele, et al., Major events in the genome evolution of vertebrates: Paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc. Nat. Acad Sci. USA 101, 1638-1643 (2004).
CHAPTER 7 CONSTRUCTING BIOLOGICAL NETWORKS OF PROTEIN-PROTEIN INTERACTIONS
See-Kiong Ng Knowledge Discovery Department, Institute for Infocomm Research, Singapore skng@i2r. a-star. edu. sg
Current efforts in genomics and proteomics have already helped discover many new genes and proteins, but simply knowing the existence of these biological molecules is inadequate for understanding the biological processes in which they participate. In the core of the biological circuits of the various biocomputing systems in the cell are networks of protein-protein interactions. It is thus essential to understand how various proteins interact with each other in the cell's bio-molecular networks to decipher the complex bio-information processing principles and mechanisms underlying the cellular processes. In this chapter, we provide an overview of the various computational detection methods for reconstructing biological networks of protein-protein interactions from various biological and experimental data. 1. Introduction While the complete sequencing of the genomes of hundreds of organisms has revealed hundreds of thousands of new genes, it is the proteins that have become the bio-molecules of focus in the post-genome era. Proteins dominate the biological circuits underlying the various biocomputing systems in our cells for the continuous development, maintenance, regulation, and responsiveness of our body. A healthy body is a complex integration of biocomputing systems that perform various biological functions based on the continuous interplay of thousands of proteins acting together in just the right amounts and in just the right places. It is through the networks of interactions of these proteins that our genes—which merely provide the recipes for making the proteins—influence almost everything about us, including how tall we are, how efficiently we process foods, and how we re653
654
S.-K. Ng
spond to infections and medications. To understand the complex biological information processing principles and mechanisms adopted by the living biocomputing systems in us, we must first unravel the networks of proteinprotein interactions that form the core of the underlying biological circuit diagrams in our cell's biocomputing systems. In terms of nomenclature development, the term proteome was coined in 1994 as a linguistic equivalent to the concept of "genome" to describe the complete set of proteins expressed by the entire genome in a cell. A related term proteomics was also created—in the same vein as the term "genomics"—to refer to the study of the proteome using technologies for large-scale protein separation and identification. As a result, we use the phrase functional proteomics to refer to the set of technologies to produce protein interaction data on a large scale, given that the biological function of a protein is not solely an individual property of the molecule itself but is defined in terms of its biochemical interactions with its partners in the cell. These developments in nomenclature reflect the recent metamorphosis of laboratory molecular biology from humble single-molecular investigations into grand whole-genome interrogations. The timely birth of bioinformatics further fueled the rapid progress of molecular biology into the grand scale—it created an additional brand new dimension for biological investigations—an in silico dimension. Together, these bio-technological revolutions have brought about many new possibilities that were only dreamed of in the pre-genomic days. The construction of networks of protein-protein interactions—necessary to reveal the biological circuits of life—is now possible due to the advent of large-scale, high throughput functional proteomics and bioinformatics. In this chapter, we review several representative technological advances in bioinformatics for constructing biological networks of protein-protein interactions. 2. Bioinformatic Approaches By enabling the detection of protein-protein interactions in a separate in silico dimension, bioinformatic methods complement the experimental approaches, especially since even the best and popular experimental methods for detecting protein-protein interactions are not without their limitations1'2'3. Moreover, bioinformatic methods are easily amenable for whole-genome interrogations, making them particularly suitable for screening entire genomes for putative protein-protein interactions for constructing
Constructing Protein Interaction Networks
655
networks of protein interactions. In fact, the study by von Mering and coworkers (2002)2 showed that in silico methods have higher coverage and higher accuracy than the majority of experimental methods that they have examined. Bioinformatic methods can therefore be expected to play a critical role in the construction of protein-protein interaction networks. We review here several popular bioinformatic methods for in silico construction of biological networks of protein-protein interactions. 2.1. Homology Since interactions between proteins are often deemed as biophysical processes whereby the shapes of the molecules play a major role, the proteins' structural information are the most informative source for predicting protein-protein interactions. Prom a computational perspective, the first question to ask is: How do we predict that two proteins interact solely based on their structure? As in many bioinformatic approaches, we apply the principle of homology—in this case, structural homology—to infer interaction between proteins: if protein A interacts with protein B, and two new proteins X and Y are each structurally alike with proteins A and B respectively, then we may predict that protein X might also interact with protein Y. Unfortunately, there are currently still no high-throughput methods for detecting protein structures efficiently. This technological limitation greatly hinders the use of structural homology for constructing networks of protein-protein interactions, as the structural data for most proteins are still unavailable. The next question to ask from the computational perspective is: How do we predict interactions between two proteins solely based on their sequences? Can we exploit the current abundance in sequence information and use conventional sequence homology to predict protein interactions? In other words, if protein A interacts with protein B, we infer that the orthologs of A and B in another species are also likely to interact. Matthews et al. (2001)4 investigated the effectiveness of this approach by conducting a study to evaluate the extent to which a protein interaction map experimentally generated in one species (S. cerevisiae) can be used to computationally predict interactions in another species (C. elegans) based on the interacting orthologs or "interologs" principle. They found that only 31% of the high-confidence interactions detected in S. cerevisiae were also detected in C. elegans. Although their study confirmed that some interactions are indeed conserved between organisms by sequence homology, the poor cov-
656
S.-K. Ng
erage also indicated that sequence homology alone may not be suitable for constructing complete protein interaction networks. New techniques that computationally exploit other biological knowledge must be developed to transcend conventional homology-based methods. In the rest of this section, we describe several non-homology approaches that utilize various genomic contexts of genes and proteins for predicting genome-wide networks of protein interactions. 2.2. Fusion events The gene fusion or the so-called "Rosetta Stone" method is based on the assumption that the functional interaction of proteins might lead to the fusion of the corresponding genes in some species. Due to selective pressure, many genes in complete genomes become fused through the course of evolution. Through fusion, the entropy of dissociation between the two protein products is reduced. For example, fusion of two genes may decrease the regulatory load in the cell, or allow metabolic channeling of the substrates. The fusion of two genes may thus provide evidence that the corresponding protein products of these genes share a functional association, if not a physical interaction5'6. Some well known examples include the fusion of tryptophan synthetase a and /? subunits from bacteria to fungi7, and that of TrpC and TrpF genes in E. coli and H. influenzae8. In fact, gene fusion events have been observed frequently in evolution5.
Fig. 1. Interaction between proteins A and B is predicted by a fusion event across two genomes.
The gene fusion method for inferring protein interactions was first proposed by Marcotte et al. (1999)5 who referred to it as the "Rosetta-stone" methoda. Figure 1 depicts the schematics of interaction prediction by fusion events. The method was thereafter successfully extended for large-scale a
This was in reference to the Rosetta Stone which allowed Champollion to make sense of
Constructing Protein Interaction Networks
657
analysis6'9. For example, Tsoka and Ouzounis (2000)10 applied the gene fusion event method to predict the functional network of E. coli by analyzing its genome against a set of 22 genomes from diverse archael, bacterial and eukaryotic species. They found that metabolic enzymes tend to participate in gene fusion events. In fact, 76% of the detected pairs of enzymes involved in gene fusion events were also found to be subunits of enzymatic complexes, showing the remarkable ability of the fusion event method to detect actual physical interactions for metabolic enzymes. Given that protein interactions usually involve only small regions of the interacting molecules, it may be more effective to detect the fusion of functional subregions in interacting proteins to infer interactions. Protein domains—such as the DNA-binding domain and trans-activation domain in the yeast-two-hybrid method—are evolutionary conserved structural or functional subunits adequate for defining intermolecular interactions. As such, a similar approach based on protein domains was also proposed by Marcotte et al.5, detecting fused composite proteins in a reference genome with protein domains that correspond to individual full-length component proteins in other genomes. Using the proteins in the SWISS-PROT database11 annotated with domain information from ProDom12, Marcotte and co-workers detected ~7,842 so-called Rosetta Stone domain fusion links in yeast and ~ 750 high-confidence ones in E. coli, verifying that the domain fusion phenomenon is indeed widely observed and therefore suitable as a basis for predicting protein interactions. The results by other researchers6 verified further that the method can be applied to reconstruct protein interaction networks in eukaryotic genomes. However, the occurrence of shared domains in distinct proteins is debatable in prokaryotic organisms13, which may limit the use of the fusion method for constructing protein-protein interaction networks in the prokaryotes. 2.3.
Co-localization
Another useful evolutionary genomic context that has been exploited for inferring networks of protein-protein interactions is the chromosomal colocalization of genes, as in the so-called gene neighborhood approach. The gene neighborhood method predicts the functional interaction of genes and proteins on the basis of the conservation of gene clusters across multiple genomes, as shown in Figure 2. Examples of protein networks that are hieroglyphs (i.e. "word fusions") by comparing them to Greek and Demotic (languages that use "unitary" unfused words).
658
S.-K. Ng
closely related to genomic co-localization can be found in the bacterial and archael genomes, where genes are known to be organized into regions called operons that code for functionally related proteins14. Dandekar et al. (1998)15 analyzed nine bacterial and archael genomes and found about 300 gene pairs with conserved gene neighborhoods. 75% of these detected gene pairs were also previously known to be physically interacting, illustrating the strong evolutionary correlation between genome order and physical interaction of encoded proteins. Using this approach, Overbeek et al. (1999)16 successfully detected missing members of metabolic pathways in a number of prokaryotic species.
Fig. 2. Interaction between proteins A and B is predicted by conserved gene neighborhood across multiple genomes.
Thus far, the gene neighborhood method has worked well with bacteria because of the conserved gene organization in microbial genomes. Although some operon-like cluster structures have also been observed17 in the higher species, the correlation between genome order and biological functions is generally less pronounced as the co-regulation of genes is not often imposed at the genome structure level. The gene neighborhood method would therefore be inadequate for re-constructing protein interaction networks for the higher eukaryotic species. In contrast, the domain fusion method described previously is more applicable for higher eukaryotic genomes than for the lower prokaryotic organisms. This example reminds us that even the computational methods for re-constructing networks of protein interactions may be hampered by method-specific limitations, just as it was the case with experimental approaches.
Constructing Protein Interaction Networks
659
2.4. Co-evolution Another form of evolutionary context that can be used for inferring networks of interacting proteins is the co-evolution of interacting proteins, or the co-occurrence of orthologous genes in complete genomes. The underlying biological assumption here is that proteins that function together in a pathway or structural complex will have matching or similar "phylogenetic profiles", exhibiting a similar pattern of presence and absence across different genomes. Pellegrini et aJ.(1999)18 employed a binary occurrence pattern vector to encode the phylogenetic profile of each protein, using a '1' and a '0' to record respectively the presence and absence of the gene across a set of reference genomes. If two proteins have identical or highly similar phylogenetic profiles against the reference genomes, we can infer that they have co-evolved and are thus likely to be functionally-linked or even interacting; see Figure 3. Specie a S
T
W
X
Protein A
H , !0 ,
1
1
0
1
l_j -> same j-projjie
Protein B
j_0
1
^1
0
I
lj
Protein C
1
0
1
0
1
0
Protein C
0
1
1
1
1
Y 1
0
Fig. 3. Interaction between proteins A and B is predicted by their phylogenetic profile similarity.
A limitation of the phylogenetic profile method is that it requires that all the reference genomes must be completely sequenced to avoid false negative information. It is also unable to detect interactions between the essential proteins, whose phylogenetic profiles are by definition indistinguishable across genomes. A more refined co-evolution based computational method for inferring networks of protein interactions called the "mirrortree" method19'20—or more generically, the phylogenetic tree similarity method— can overcome these limitations. Instead of defining a protein's evolutionary context as simply the presence or absence of orthologs in different genomes, the phylogenetic tree similarity method defines similarity in evolutionary histories in terms of co-related residues changes between two proteins across different genomes. The reasoning is as follows: if a residue change incurred
660
S.-K. Ng
in one protein disrupts its interaction with its partner, some compensatory residue changes must also occur in its interacting partner in order to sustain the interaction or they will be selected against and eliminated. As a result, a pair of interacting proteins in the course of evolution is expected to go through similar series of changes, whereas the residue changes for non-interacting proteins would be totally uncorrelated. This means that the phylogenetic trees for interacting proteins would be very similar due to their similar evolutionary histories. Instead of comparing the actual phylogenetic trees of two proteins, we can use "distance matrices" to approximate the proteins' evolutionary similarity. For each protein, we first search for their orthologs across different genomes to form as many orthologous pairs from coincident species as possible. We then pick n species that contain orthologs for both proteins and construct, for each protein, a distance matrix which is an n x n matrix containing the pairwise distances between the orthologous n sequences chosen. The pairwise distance between two sequences are measured by their sequence alignment scores, which could simply be the number of mismatches in the alignment13. The proteins are predicted to be interacting if their distance matrices were found to have a high correlation coefficient value; see Figure 4 for an illustrated example. In the work by Goh and co-workers (2000)19, they showed that the mirrortree method can be used to predict the interactions of chemokines with their receptors. Pazos and Valencia (2001)20 then showed that the method can be used to construct networks of protein interactions by successfully applied the method to genome-wide prediction of protein-protein interactions in Escherichia coli. The co-evolution approach was later further exploited by Ramani and Marcotte (2003)21 to successfully pinpoint a family of ligands to its specific receptors, demonstrating the promising utility of the phylogenetic tree similarity method as a computational means for indicating physical interactions between the proteins. 2.5. Literature mining The bioinformatic approaches so far have been based on genome sequences. There is another data source that can be mined for constructing protein b
Note that the phylogenetic profiling method described earlier can actually be considered as a simplification of the phylogenetic tree similarity method. In this case, the "distance matrix" for each protein is simply a binary vector indicating the presence or absence of the protein ortholog in a particular species.
Constructing Protein Interaction Networks
661
Fig. 4. Interaction between proteins A and B is predicted by their phylogenetic tree similarity. For each protein, we derive its distance matrix containing the pairwise distances between its orthologous sequences in the species R, S, T, X, Y, and Z. Proteins A and B are shown to have identical distance matrices here for illustration purpose.
interaction networks: the scientific literature. Despite the fact that genetic and molecular sequence data are now routinely deposited by biologists in online computer-readable structured databases (e.g. GenBank 22 , SwissProt n ) , molecular interaction information—especially those interactions that have been studied in details with traditional experimental methods (and not by error-prone high throughput methods)—are still primarily reported in scientific journals in free-text formats. In fact, interaction data from traditional small-scale experiments are generally more reliable because their biologically relevance is often very thoroughly investigated by the researchers and the published results are oftentimes based on repeated observations by multiple research groups. The scientific literature is therefore an important data warehouse that can be mined for assembling high quality, well-annotated networks of protein interactions. Unlike the research literature in other disciplines, the biological literature is centrally organized. This is due to the commendable efforts by the National Library of Medicine (NLM) that maintain a remarkable MED-
662
S.-K. Ng
Fig. 5. Constructing networks of protein-protein interactions through literature mining.
LINE database (www.ncbi.nlm.nih.gov/entrez). To-date, there are more than 14 million abstracts of biological articles published since the 195O's stored in the MEDLINE database that are publicly accessible via NLM's PubMed web interface. This central repository can be exploited to construct networks of annotated protein-protein interactions using literature mining; see Figure 5.
Fig. 6. A complex interaction extracted by literature mining.
The various detection methods that we have described so far merely detect whether one protein interacts with another. The resulting interaction networks are therefore largely unannotated. In literature mining, an extracted interaction can usually be annotated with additional contextual information that can also be extracted from the text. Such contextual annotation can be very useful for scientists in their interrogation of the networks for understanding the underlying biological mechanisms. In addition to the
Constructing Protein Interaction Networks
663
simple binary interactions detected by the previous methods, much more complex protein interactions can often be extracted from the reported literature; see the interaction in Figure 6 for an example. The interaction was extracted from three different sentences from Eizirik et a/.23: • "in the Nitric oxide independent pathway, converging signaling from the interleukin-1, IFNgamma, and TNF-alpha receptors towards MAPK activation, combined with IL-1 mediated activation of caspase-1 and other genes s t i l l to be determined, leads to effector caspase activation and apoptosis." • "IL-1, alone, or in combination with IFN-gamma and TNF-alpha leads to islet cell dysfunction and death. " • "Exposure of fluorescence activated cell sorting (FACS)-purified rat and mouse beta cells to interleukin-lbeta (IL-lbeta), in combination with IFN-gamma and/or TNF-alpha, leads to cell death by necrosis and predominantly by apoptosis." While the sentences illustrate the richness of annotational information that can be extracted from the literature, they also reflect some of the computational challenges faced by a literature mining system, as shown in the linguistic complexity and diversity of the sentences describing the same interaction. As literature mining remains a computationally challenging task24, many of the current key protein and interaction databases such as SwissProt11, the Biomolecular Interaction Database (BIND)25, the Database of Interacting Proteins (DIP)26, and the Kyoto Encyclopedia of Genes and Genomes (KEGG)27 still require laborious hand-curation by expert biologists. Such manual approach is clearly inadequate for keeping up with the sheer volume of research literature being generated. As a result, mining the biological literature for such information has become one of the central problems in post-genome bioinformatics, resulting in a growing body of works that address biological text mining28'29'30'31'32'33. As a recent example, Jenssen et al. (2001)34 used a predefined list of gene names and executed a Boolean search using PubMed to retrieve all the MEDLINE abstracts mentioning these bio-molecules. They then built a network with the genes as nodes and edges connecting genes that are mentioned in the same abstract; the edges were weighted by the number of co-occurrences. This resulted in a gene/protein interaction network containing about 140,000 interactions connecting 7,512 human genes, arguably
a
S.-K Ng
DATABASE
URL
REFERENCE
DIP BIND MIPS PIMRider STRING MINT BRITE Predictome
dip.doe-mbi.ucla.edu www.blueprint.org/bind mips.gsf.de/proj/ppi pim.hybrigenics.fr s t r i n g . embl. de mint. bio. uniroma2. i t www.genome.ad.jp/brite predictome.bu.edu
Salwinskie* al. (2004)39 Bader et al. (2003)25 Mewes et al. (2004)40 Rain et al. (2001)41 von Mering et al. (2003)42 Zanzoni et al. (2002)43 Kanehisa et al. (2002)27 Mellor et al. (2002)44
Fig. 7. Example protein-protein interaction databases.
the largest protein network predicted from literature mining so far. Other than Jenssen et a/.'s co-occurrence approach, there is also an increased use of sophisticated computational linguistics approaches (e.g. using categorial grammars 35) to improve the accuracy of protein interaction networks constructed from literature, but the current state-of-the-art is still inadequate for replacing manual expert curation. Moreover, current literature mining approaches tend to focus on extracting interaction relationships that are already reported in the literature and do not attempt to discover novel interactions. In principle, one could follow Swanson's methodology36'37'38 and use transitive relations as clues for yet-unknown relationships, but such predictive approaches have not yet been systematically explored by the literature mining community. 3. From Interactions to Networks The primary building blocks of biological networks are the individual protein-protein interactions detected. With the advent of such large-scale detection methods as those described in this chapter, an increasing number of online interaction databases are now available for systematic interrogation (Figure 7). Theoretically, we can simply use the protein-protein interactions published in journals and the online interaction databases to construct the biological interaction network of our interest. We can build interaction graphs with the protein partners as the vertices and the interactions between them
Constructing Protein Interaction Networks
665
as edges. Unfortunately, caveat emptor applies here: the interaction data cannot be used "as is" because the current data have been found to be highly error-prone2—both in terms of false negatives as well as false positives. As such, the construction of protein interaction networks is not merely a straightforward act of connecting all the dots (i.e., proteins) together with the detected interactions as links. The interaction data must be postprocessed with close scrutiny before they can be turned into useful biological interaction networks. In this section, we describe various strategies for addressing these issues. 3.1. False negatives False negative interactions are biological interactions that are missed by a detection method. False negatives could be due to experimental limitations such as incorrect protein folding induced by the artificial environment, or the lack of specific post-translational modifications required by interactions. Inherent limitations of experimental methods, such as the weakness of the popular Yeast-two-hybrid (Y2H) method in detecting interactions between cytoplasmic or membrane proteins, result in a high degree of false negatives in the experimental interaction data. For computational detection methods, false negatives could be introduced by a priori biases corresponding to the various biological hypotheses underlying the prediction algorithms. For example, the gene neighborhood method would incur a high degree of false negatives in re-constructing protein interaction networks for the higher eukaryotic species where the correlation between genome order and biological functions is less pronounced. In a recent study by von Mering et al. (2002)2, they reported that for the 80,000 interactions between yeast proteins that were detected by various high-throughput methods (experimental and computational), only 2,400— a surprisingly small number—were found to be supported by more than one method, reflecting the generally high degree of false negatives in individual detection methods. For a specific example, consider the well-characterized septin complex formed by the interactions between seven proteins CDC3, CDC10, CDC11, CDC12, SHS1, GIN4, and SPR28. Figure 8 showed that none of the high-throughput detection methods studied by von Mering and colleagues was able to detect all members of the protein complex. However, by combining the interactions detected by the different methods, it became possible to detect six out of the seven protein components for the complex. This shows that there are strong complementarities between the different
666
S.-K. Ng
methods despite their individual biases in interaction coverage, suggesting that such complementarities amongst multiple detection methods could be exploited to tackle the problem of false negatives.
Fig. 8. Protein-protein interactions in the septin complex. The shaded molecules are the actual protein components of the complex. The gray links show the interacting protein components based on Y2H-detected interactions, while the dotted and solid boxes show the complex components detected using two other detection methods (namely, TAP purification and HMS-PCI purification). None of the three detection methods was able to singularly identify all the members of the septin complex.
3.2. False positives However, Figure 8 also shows that numerous non-septin protein components were mistakenly detected by various methods, suggesting the presence of false positives. In fact, several recent surveys have revealed unacceptably high false positive rates in current high throughput experimental protein interaction detection methods—even the the reliability of popular highthroughput yeast-two-hybrid assay can be as low as 50% li2 ' 3 . The inclusion of interaction data from multiple complementary detection methods as a strategy to tackle false negatives would also introduce more false positives into the data for constructing the protein interaction networks. As such, it is important to assess the interaction data to eliminate false positive links in protein interaction networks constructed from high throughput data. One obvious strategy is to consider combining the results from multiple independent detection methods to derive highly reliable data, as suggested in 2. However, this approach is of limited applicability because of the low overlap 45'2 between the different detection methods. Moreover, it will only worsen the problem of false negatives. Another more viable approach is to model the expected characteristics of
Constructing Protein Interaction Networks
667
true protein interaction networks, and then devise mathematical measures to assess the reliability of the candidate interactions. For example, Saito et al. have recently developed a series of computational measures called interaction generalities (IG) 46>47 for assessing the reliability of proteinprotein interactions: Interaction Generality 1 (IG1). The IGl measure 46 is based on the idea that interacting proteins that appear to have many interacting partners that have no further interactions are likely to be false positives. IGl is defined as the number of proteins that directly interact with the target protein pair, subtracted by the number of proteins interacting with more than one protein. This is a reasonable model for yeast two-hybrid data, as some proteins in yeast two-hybrid assays do have a tendency to turn on the positive signals of the assay by themselves. As a result, these proteins will be shown to interact with a large number of random proteins that do not interact with one another as they are functionally unrelated. They can be detected with high IGl values. Interaction Generality 2 (IG2). However, IGl is only a local measure—it does not consider the topological properties of the protein interaction network beyond the candidate protein pair. As such, its coverage for the different types of experimental data errors is limited. Saito et al. then developed the IG2 measure 47, which attempts to incorporate topological properties of interactions beyond the candidate interacting pairs. By considering the five possible topological relationships of a third protein C with a candidate interacting pair (A, J5), IG2 is computed as a weighted sum of the five topological components with respect to C; the weights are assigned a priori by performing first a principal component analysis on the entire interaction network. As expected, the researchers showed that IG2 performed better than IGl. An extension to such work is to model the protein interaction networks based on global topological properties. Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have recently developed bioinformatic techniques based on sophisticated graphical or topological models to help us better understand or predict the behavior of protein-protein interaction networks (see Barabasi and Oltvai (2004)48 for a comprehensive review). One recent example was the use of the so-called "small-world" properties for protein interaction network construction by Goldberg and Roth (2003)49. The small-world network was a graph model proposed by Watts and Strogatz (1998)50 to capture the properties of most real networks, which are typi-
668
S.-K. Ng
cally highly clustered, like regular lattices, and yet have small characteristic path lengths, like random graphs. By ascertaining how well each detected protein-protein interaction fits the pattern of a small-world network, Goldberg and Roth were able to eliminate erroneous interactions {false positives) as well as infer missing ones (false negatives) using the small-world graphical model. Their promising results showed that network approaches on the systems biology level can be useful in alleviating the uncertain nature of experimentally and computationally derived protein interaction networks.
4. Conclusion The reconstruction of the networks of protein interactions marks the exciting beginning of an inroad towards understanding the fundamental biological processes in our cells. We have made rapid progress—our efforts in the genome era have unraveled for us the "parts list" of our biological selves. Prom the genomic sequences, we now know what are the genes and proteins that exist in our cells. The bioinformatics efforts for post-genome functional proteomics described in this chapter will help connect the newly discovered genes and proteins in our parts list into an increasingly complete "circuit diagram", leading us closer to finally unraveling the biocomputing principles and mechanisms underlying the biological machineries operating in us. The next step is to use the unraveled biological circuit diagrams to study the dynamic properties of biocomputing systems in the cell by in silico simulation, and researchers have already begun to make some headway toward this end51'52'53'54'55. As we progress deeper into the realm of systems biology, the difference between the relative relevance of experimental and computational approaches will diminish as the scale and complexity of biological investigations escalate—a phenomenon we have already glimpsed in the current chapter, where both experimental and bioinformatic methods were employed for constructing networks of protein-protein interactions. The ultimate goal is to use the mapped interaction networks to predict the effect of the products of specific genes and genetic networks—as well as other biochemical molecules—on cell and tissue function, which will in turn lead us toward the discovery of new and better drugs for many diseases. As biology becomes increasingly in silico, bioinformatics is poised to play an key role in our tireless quest towards attaining better quality of life for the human kind.
Constructing Protein Interaction Networks
669
References 1. P. Legrain, J. Wojcik, and J. M. Gauthier. Protein-protein interaction maps: a lead towards cellular functions. Trends Genet, 17(6):346-52, 2001. 2. C. von Mering, R. Krause, B. Snel, et al. Comparative assessment of largescale data sets of protein-protein interactions. Nature, 417(6887):399-403, 2002. 3. E. Sprinzak, S. Sattath, and H. Margalit. How reliable are experimental protein-protein interaction data? J Mol Biol, 327(5):919-23, 2003. 4. L. R. Matthews, P. Vaglio, J. Reboul, et al. Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". Genome Res, ll(12):2120-6, 2001. 5. E. M. Marcotte, M. Pellegrini, H. L. Ng, et al. Detecting protein function and protein-protein interactions from genome sequences. Science, 285(5428): 7513, 1999. 6. A. J. Enright, I. Iliopoulos, N. C. Kyrpides, and C. A. Ouzounis. Protein interaction maps for complete genomes based on gene fusion events. Nature, 402(6757) :86-90, 1999. 7. D. M. Burns, V. Horn, J. Paluh, and C. Yanofsky. Evolution of the tryptophan synthetase of fungi, analysis of experimentally fused escherichia coli tryptophan synthetase alpha and beta chains. J Biol Chem, 265(4):2060-9, 1990. 8. C. M. Ross, J. B. Kaplan, M. E. Winkler, and B. P. Nichols. An evolutionary comparison of acinetobacter calcoaceticus trpf with trpf genes of several organisms. Mol Biol Evol, 7(1):74-81, 1990. 9. E. M. Marcotte, M. Pellegrini, M. J. Thompson, et al. A combined algorithm for genome-wide prediction of protein function. Nature, 402(6757) :83-6, 1999. 10. S. Tsoka and C. A. Ouzounis. Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion. Nat Genet, 26(2):141-2, 2000. 11. B. Boeckmann, A. Bairoch, R. Apweiler, et al. The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Res, 31(l):365-70, 2003. 12. F. Servant, C. Bru, S. Carrere, et al. Prodom: automated clustering of homologous domains. Brief Bioinform, 3(3):246-51, 2002. 13. A. Valencia and P. Pazos. Computational methods for the prediction of protein interactions. Curr Opin Struct Biol, 12(3):368-73, 2002. 14. T. Blumenthal. Gene clusters and polycistronic transcription in eukaryotes. Bioessays, 20(6):480-7, 1998. 15. T. Dandekar, B. Snel, M. Huynen, and P. Bork. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci, 23(9):324-8, 1998. 16. R. Overbeek, M. Fonstein, M. D'Souza, et al. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA, 96(6):2896-901, 1999. 17. Q. Wu and T. Maniatis. A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell, 97(6):779-90, 1999. 18. M. Pellegrini, E. M. Marcotte, M. J. Thompson, et al. Assigning protein
670
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.
S.-K. Ng
functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA, 96(8):4285-8, 1999. C. S. Goh, A. A. Bogan, M. Joachimiak, et al. Co-evolution of proteins with their interaction partners. J Mol Biol, 299(2):283-93, 2000. F. Pazos and A. Valencia. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng, 14(9):609-14, 2001. A. K. Ramani and E. M. Marcotte. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J Mol Biol, 327(1):273-84, 2003. D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, et al. Genbank. Nucleic Acids Res, 30(l):17-20, 2002. D. L. Eizirik and T. Mandrup-Poulsen. A choice of death-the signal-transduction of immune-mediated beta-cell apoptosis. Diabetologia, 44(12):2115-33, 2001. L. Hirschman, J. C. Park, J. Tsujii, et al. Accomplishments and challenges in literature data mining for biology. Bioinformatics, 18(12):1553-61, 2002. G. D. Bader, D. Betel, and C. W. Hogue. Bind: the biomolecular interaction network database. Nucleic Acids Res, 31(l):248-50, 2003. I. Xenarios, L. Salwinski, X. J. Duan, et al. Dip, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res, 30(l):303-5, 2002. M. Kanehisa, S. Goto, S. Kawashima, and A. Nakaya. The kegg databases at genomenet. Nucleic Acids Res, 30(l):42-6, 2002. C. Blaschke, M. A. Andrade, C. Ouzounis, and A. Valencia. Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol, pages 60-7, 1999. E. M. Marcotte, I. Xenarios, and D. Eisenberg. Mining literature for proteinprotein interactions. Bioinformatics, 17(4):359-63, 2001. S. K. Ng and M. Wong. Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Inform Ser Workshop Genome Inform, 10:104-112, 1999. T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 17(2):155-61, 2001. T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter. Edgar: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput, pages 517-28, 2000. J. Thomas, D. Milward, C. Ouzounis, et al. Automatic extraction of protein interactions from scientific abstracts. Pac Symp Biocomput, pages 541-52, 2000. T. K. Jenssen, A. Laegreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet, 28(l):21-8, 2001. J. C. Park, H. S. Kim, and J. J. Kim. Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar. Pac Symp Biocomput, pages 396-407, 2001. D. R. Swanson. Fish oil, raynaud's syndrome, and undiscovered public knowl-
Constructing Protein Interaction Networks
671
edge. Perspect Biol Med, 30(l):7-18, 1986. 37. D. R. Swanson. Migraine and magnesium: eleven neglected connections. Perspect Biol Med, 31(4):526-57, 1988. 38. D. R. Swanson. Somatomedin c and arginine: implicit connections between mutually isolated literatures. Perspect Biol Med, 33(2):157-86, 1990. 39. L. Salwinski, C. S. Miller, A. J. Smith, et al. The database of interacting proteins: 2004 update. Nucleic Acids Res, 32 Database issue:D449-51, 2004. 40. H. W. Mewes, C. Amid, R. Arnold, et al. Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Res, 32 Database issue:D41-4, 2004. 41. J. C. Rain, L. Selig, H. De Reuse, et al. The protein-protein interaction map of helicobacter pylori. Nature, 409(6817):211-5, 2001. 42. C. von Mering, M. Huynen, D. Jaeggi, et al. String: a database of predicted functional associations between proteins. Nucleic Acids Res, 31(1):258-61, 2003. 43. A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, et al. Mint: a molecular interaction database. FEBS Lett, 513(l):135-40, 2002. 44. J. C. Mellor, I. Yanai, K. H. Clodfelter, et al. Predictome: a database of putative functional links between proteins. Nucleic Acids Res, 30(l):306-9, 2002. 45. T. R. Hazbun and S. Fields. Networking proteins in yeast. Proc Natl Acad Sci USA, 98(8):4277-8, 2001. 46. R. Saito, H. Suzuki, and Y. Hayashizaki. Interaction generality, a measurement to assess the reliability of a protein-protein interaction. Nucleic Acids Res, 30(5): 1163-8, 2002. 47. R. Saito, H. Suzuki, and Y. Hayashizaki. Construction of reliable proteinprotein interaction networks with a new interaction generality measure. Bioinformatics, 19(6):756-63, 2003. 48. A. L. Barabasi and Z. N. Oltvai. Network biology: understanding the cell's functional organization. Nat Rev Genet, 5(2):101-13, 2004. 49. D. S. Goldberg and F. P. Roth. Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA, 100(8) :4372-6, 2003. 50. D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684) :440-2, 1998. 51. K. Takahashi, K. Kaizu, B. Hu, and M. Tomita. A multi-algorithm, multitimescale method for cell simulation. Bioinformatics, 20(4):538-46, 2004. 52. P. Dhar, T. C. Meng, S. Somani, et al. Cellware—a multi-algorithmic software for computational systems biology. Bioinformatics, 20(8):1319-21, 2004. 53. B. M. Slepchenko, J. C. Schaff, I. Macara, and L. M. Loew. Quantitative cell biology with the virtual cell. Trends Cell Biol, 13(ll):570-6, 2003. 54. C. G. Moles, P. Mendes, and J. R. Banga. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res, 13(ll):2467-74, 2003. 55. P. Mendes. Biochemistry by numbers: simulation of biochemical pathways with gepasi 3. Trends Biochem Sci, 22(9):361-3, 1997.
CHAPTER 8 COMPUTATIONAL MODELLING OF GENE REGULATORY NETWORKS
Nikola K. Kasabov, Zeke S.H. Chan, Vishal Jain Knowledge Engineering and Discovery Research Institute (KEDRI), Auckland University of Technology, New Zealand {nkasabov, shchan, vishal.jain}@aut.ac.nz Igor Sidorov and Dimiter S. Dimitrov National Cancer Institute, National Institute of Health, USA {sidorovi, dimitrov} @ncifcrf.gov Major cell functions are dependent on interactions between DNA, RNA and proteins. Directly or indirectly in a cell such molecules either interact in a positive or in repressive manner and such interplay leads to genetic regulatory networks (GRN) which in turn controls the gene regulation. Therefore it is hard to obtain the accurate computational models through which the final state of a cell can be predicted with certain accuracy and it can help scientific community to better understand the functioning of organism. Here we summarize the biological behavior of actual regulatory systems and have made an attempt for GRN discovery of a large number of genes from multiple time series gene expression observations over small and irregular time intervals. The method integrates a genetic algorithm (GA) to select a small number of genes and a Kalman filter to derive the GRN of these genes. After GRNs of smaller number of genes are obtained, these GRNs may be integrated in order to create the GRN of a larger group of genes of interest. 1. Introduction Gene regulatory network is about the systems controlling the fundamental mechanisms that govern biological systems. A single gene interacts with many other genes in the cell, inhibiting or promoting directly or indirectly, the expression of some of them at the same time. Gene interaction may control whether and how vigorously that gene will produce RNA with the 673
674
N. K. Kasabov et al.
help of a group of important proteins known as transcription factors. When these active transcription factors associate with the target gene sequence (DNA bases), they can function to specifically suppress or activate synthesis of the corresponding RNA. Each RNA transcript then functions as the template for synthesis of a specific protein. Thus the gene, transcription factor and other proteins may interact in a manner that is very important for determination of cell function. Gene regulatory networks govern which genes are expressed in a cell at any given time, how much product is made from each one, and the cell's responses to diverse environmental cues and intracellular signals? Much less is known about the functioning of the regulatory systems of which the individual genes and interaction form a part 12 ' 15 ' 23 ' 26 ' 31 . Transcription factors provide a feedback pathway by which genes can regulate one anothers expression as mRNA and then as protein8. The discovery of gene regulatory networks (GRN) from time series of gene expression observations can be used to: (1) Identify important genes in relation to a disease or a biological function, (2) Gain an understanding on the dynamic interaction between genes, (3) Predict gene expression values at future time points. Example 1 Let us assume the system includes only two genes, a\ and d2, coding for individual mRNAs, mi and rri2, wherefrom proteins &i and 62 are translated. Monomer of protein b\ forms dimer d\. Protein 62 is a specific factor responsible for degrading mRNA m\; protein b\ inhibits the transcription of gene 02 and simultaneously activates transcription of its own gene al; while dimer d\ of protein 61 inhibits transcription from its own gene a\. In norm, gene a\ is inactive. Its primary activation depends on the outside signal Os. Gene a-i in norm synthesizes mRNA with a certain nonzero activity. Both mRNA and proteins have limited life spans. Direct and indirect feedbacks typically are important. More realistic networks often feature multiple tiers of regulation, with first-tier gene products regulating expression of another group of genes, and so on. 2. A Novel Approach DNA gene expression microarrays allow biologists to study genome-wide patterns of gene expression in any given cell type, at any given time, and under any given set of conditions. In order to draw meaning inferences from gene expression data, it is important that each gene be surveyed under a variety of conditions, preferably in the form of expression time series in response to perturbations. Such datasets may be analysed us-
Modelling of Gene Regulatory Networks
675
Fig. 1. Formal description of gene regulatory network to illustrate all biological processes to be considered for computational modelling of the gene network model.
ing a range of methods with increasing depth of inference. Beginning with cluster analysis and determination of mutual information content, one can capture control processes shared among genes. A variety of clustering algorithms have been used to group together genes with similar temporal expression patterns7'11'14'29. A major problem is to infer an accurate model for interactions between important genes in a cell. The main approaches that deals with the modelling of gene regulatory networks involve differential equations6'22, stochastic models24, evolving connectionist systems20'21, boolean networks27'33, generalized logical equations32, threshold models30, petri nets18, bayesian networks16, directed and undirected graphs. We have made a novel attempt to model the behaviour of gene regulatory networks using an integrated approach of Kalman Filter9'28 and Genetic Algorithm2'17'19. The GA is used to select a small number of genes, and the Kalman filter method is used to derive the GRN of these genes. After GRNs of smaller number of genes are obtained, these GRNs may be integrated in order to create the GRN of a larger group of genes of interest. We used extracts of the U937 human leukemic plus and minus series to apply the proposed method. Each series contains the time-series expression of 32 pre-selected candidate genes that have been found potentially relevant, as well as the expression of the telomerase (crucial gene). Both the plus series and minus series contains four samples recorded at the (0, 6, 24, 48)th hour. Discovering GRN from these two series is challenging in two
676
N. K. Kasabov et al.
aspects, first both series are sampled at irregular time intervals and second the number of samples is scarce (only 4 samples). A third potential problem is that the search space grows exponentially in size as more candidate genes are identified in the future. Several GRNs of 3 most related to the telomerase genes are discovered, analysed and integrated. The results and their interpretation confirm the validity and the applicability of the proposed method. The integrated method can be easily generalized to extract GRN from other time series gene expression data. 3. Modelling Application with Integrated Approach of First-order Differential Equations, State Space Representation and Kalman Filter 3.1. Discrete-Time Approximation Differential Equations
of First-Order
Our GRN is modelled with the discrete time approximation of first-order differential equations, given by: xt+i = Fxt + et
(1)
where xt = {%\,X2, ...,xn)' is the gene expression at the t-th time interval and n is the number of genes modelled, et is a noise component with covariance E = cov(et), and F = (fij)i — 1, ~.,n,j = 1, ..,n is the transition matrix relating Xt to xt+\. It is related to the continuous first-order differential equations dx/dt = fe + e b y F = r $ + / and et = re where r is the time interval note the subscript notation ( t + k ) is actually the common abbreviation for (t + k-r)13. We work here with a discrete approximation instead of a continuous model for the ease of modelling and processing the irregular timecourse data (with Kalman filter). Besides being a tool widely used for modelling biological processes, there are two advantages in using first-order differential equations. First, gene relations can be elucidated from the transition matrix F through choosing a threshold value (£; 1 > C > 0). If |/y| is larger then the threshold value £, Xtj is assumed to have significant influence on a;t+i,j- A positive value of ftj indicates a positive influence and vice-versa. Second, they can be easily manipulated with KF to handle irregularly sampled data, which allow parameter estimation, likelihood evaluation and model simulation and prediction.
Modelling of Gene Regulatory Networks
677
The main drawback of using differential equations is that it requires the estimation of n 2 parameters for the transition matrix P and n^1' parameters for the noise covariance E. To minimize the number of model parameters, we estimate only F andfixE to a small value. Since both series contain only 4 samples, we avoid over-parameterization by setting n to 4, which is the maximum number of n before the number of parameters exceeds the number of training data { It matches the number of model parameters (the size of F is n2 — 16) to the number of training data (n x 4 samples =16)}. Since in our case study one of the n genes must be telomerase, we can search for a subset of size K = 3 other genes to form a GRN. To handle irregularly sampled data, we employ the state-space methodology and the KF. We treat the true trajectories as a set of unobserved or hidden variables called the state variables, and then apply the KF to compute their optimal estimates based on the observed data. The state variables that are regular or complete can now be applied to perform model functions like prediction, parameter estimations instead of the observed data that are irregular or incomplete. This approach is more superior to interpolation methods as it prevents false modelling by trusting a fixed set of interpolated points that may be erroneous. 3.2. State Space Representation To apply the state-space methodology, a model must be expressed in the following format called the discrete-time state space representation xt+i = $xt + wt
(2)
yt = Axt + vt
(3)
cov(vt) = R
cov{wt) = Q\
(4)
where xt is the system state; yt is the observed data; $ is the state transition matrix that relates xt to Xt+1; A is the linear connection matrix that relates xt to yt\ Wt and Vt are uncorrelated white noise sequences whose covariance matrices are Q and R respectively. The first equation is called the state equation that describes the dynamics of the state variables. The second equation is called the observation equation that relates the states to the observation.
678
N. K. Kasabov et al.
To represent the discrete-time model in the state-space format, we simply substitute the discrete-time equation 1 into the state equation 2 by setting $ = F, wt = et and Q = E and form a direct mapping between states and observations by setting A = I. The state transition matrix <J> (functional equivalent to F) is the parameter of interest as it relates the future response of the system to the present state and governs the dynamics of the entire system. The covariance matrices Q and R are of secondary interest and are fixed to small values to reduce the number of model parameters. 3.3. Kalman Filter KF 9 ' 28 is a set of recursive equations capable of computing optimal estimates (in the least-square sense) of the past, present and future states of the state-space model based on the observed data. Here we use it to estimate gene expression trajectories given irregularly sampled data. To specify the operation of Kalman filter, we define the conditional mean value of the state x\ and its covariance P/ u as: x3t =E(xt
| 2/1,2/2, •--, 2/s)
P/ u = E[(xt - x})(xu - x'J* | y i ,...,y.)
(5)
(6)
For prediction, we use the KF forward recursions to compute the state estimates for (s < t). For likelihood evaluation and parameter estimation, we use the KF backward recursions to compute the estimates called the smoothed estimates based on the entire data, i.e. (s = T\ T > t is the index of the last observation), which in turn are used to compute the required statistics. 3.4. Using GA for The Selection of Gene Subset for a GRN The task is to search for the genes that form the most probable GRN models, using the model likelihood computed by the KF as an objective function. Given N the number of candidates and K the size of the subset, the number of different gene combinations is KUN1_K)I • I n o u r c a s e study, TV = 32 is small enough for an exhaustive search. However, as more candidates are identified in the future, the search space grows exponentially
Modelling of Gene Regulatory Networks
679
in size and exhaustive search will soon become infeasible. For this reason a method based on GA is proposed. The strength of GA is two-fold, (1) Unlike most classical gradient methods or greedy algorithms that search along a single hill-climbing path, a GA searches with multiple points and generates new points through applying genetic operators that are stochastic in nature. These properties allow for the search to escape local optima in a multi-modal environment. GA is therefore useful for optimizing high dimensional functions and noisy functions whose search space contains many local optima points and (2) A GA is more effective than a random search method as it focuses its search in the promising regions of the problem space. 3.5. GA Design for Gene Subset Selection In the GA-based method for gene subset selection proposed here, each solution is coded as a binary string of N bits. A 1 in the ith bit position denotes that the ith gene is selected and a 0 otherwise. Each solution must have exactly K Is and a repair operator is included to add or delete Is when this is violated. The genetic operators used for crossover, mutation and selection are respectively the standard crossover, the binary mutation and the (/i, A) selection operators. Since there are two series the plus and the minus series of time-course gene expression observations in our case study, a new fitness function is designed to incorporate the model likelihood in both series. For each solution, the ranking of its model likelihood in the plus series and in the minus series are obtained and then summed to obtain a joint fitness ranking. This favors convergence towards solutions that are consistently good in both the plus and the minus series. The approach is applicable to multiple time series data. 3.6. Procedure of the GA-Based Method for Gene Subset Selection (1) Population Initialization, Create a population of \i random individuals (genes from the initial gene set, e.g.32) as the first generation parents. (2) Reproduction. The goal of reproduction is to create A offspring from fi parents. This process involves three steps: crossover, mutation and repair. 2.1 Crossover, operator transfers parental traits to the offspring. We use the uniform crossover that samples the value of each bit position from the first parent at the crossover probability pc and from
680
N. K. Kasabov et al.
the second parent otherwise. In general, performance of GA is not sensitive to the crossover probability and it is set to a large value in the range of [0.5, 0.9]3'4. Here we set it to 0.7. 2.2 Mutation, operator induces diversity to the population by injecting new genetic material into the offspring. For each bit position of the offspring, mutation inverts the value at a small mutation rate pm. Performance of GA is very sensitive to the mutation probability and it usually adapts a very small value to avoid disrupting convergence. Here we use pm = i , which has been shown to be both the lower bound value and the optimal value for many test functions1'3'4'25, providing an average of one mutation in every offspring. 2.3 Repair, function of the repair operator is to ensure that each offspring solution has exactly K "1" to present the indices of the K selected genes in the subset. If the number of "l"s is greater than K, invert a "1" at random; and vice-versa. Repeat the process until the number of "l"s matches the subset size K. (3) Fitness Evaluation. Here A offspring individuals (solutions) are evaluated for their fitness. For each offspring solution, we obtain the model likelihood in the both the plus and the minus series and compute their ranking (lower the rank, higher the likelihood) within the population. Next, we sum the rankings and use the negated sum as fitness estimation so that the lower the joint ranking, the higher the fitness. (4) Selection. The selection operator determines which offspring or parents will become the next generation parents based on their fitness function. We use the (/i, A) scheme that selects the fittest fi of A offspring to be the next generation parents. It is worth comparing this scheme to another popular selection scheme (// + A)2 that selects the fittest /J, of the joint pool of fx parents and A offspring to be the next generation parents, in which the best-fitness individuals found are always maintained in the population, convergence is therefore faster. We use the (/i, A) scheme because it offers a slower but more diversified search that is less likely to be trapped in local optima. (5) Test for termination. Stop the procedure if the maximum number of generations is reached. Otherwise go back to step(2) - the reproduction phase. Upon completion, GA returns the highest likelihood GRNs found in both the plus and the minus series of gene expression observations. The
Modelling of Gene Regulatory Networks
681
proposed method includes running the GA-based procedure over many iterations (e.g. 50) thus obtaining different GRN that include possibly different genes. Then we summarize the significance of the genes based on their frequency of occurrence in these GRNs and if necessary we put together all these GRNs thus creating a global GRN on the whole gene set. 4. Experiments and Results The integrated GA-KF method introduced above is applied to identify genes that regulate telomerase in a GRN from a set of 32 pre-selected genes. Since the search space is small (only Cf2 = 4960 combinations), we apply exhaustive search as well as GA for validation and comparative analysis. The experimental settings are as follows. Expression values of each gene in plus and minus series are jointly normalized in interval [—1,1]. The purpose of the joint normalization is to preserve the information on the difference between the two series in the mean. For each subset of n genes defined by the GA, we apply KF for parameter estimation and likelihood evaluation of the GRN model. Each GRN is trained for at least 50 epochs (which is usually sufficient) until the likelihood value increases by less than 0.1. During training, the model is tested for stability by computing the eigenvalues of ($ - 7) 5 ' 13 . If any of the real part of eigenvalues is positive, the model is unstable and is abandoned. For the experiments reported in this paper a relatively low resource settings are used. Parent and offspring population sizes (fi, A) are set to (20, 40) and maximum number of generations is set to 50. These values are empirically found to yield consistent results over different runs. We run it for 20 times from different initial population to obtain the cumulated results. The results are interpreted from the list of 50 most probable GRNs found in each series (we can lower this number to narrow down the shortlist of significant genes). The frequencies of each gene being part of the highest likelihood GRNs in the plus and in the minus series are recorded. Next, a joint frequency is calculated by summing the two frequencies. The genes that have a high joint frequency are considered to be significant in both minus and plus series. For exhaustive search, we simply run through all gene combinations of 3 genes plus the telomerase; then evolve through KF a GRN for each combination and record the likelihood of each model in both the plus and minus series. A similar scoring system as GAs fitness function is employed. We obtain a joint ranking by summing the model likelihood rankings in the
682
N. K. Kasabov et al.
plus series and the minus series, and then count the frequency of the genes that belong to the best 50 GRNs in the joint ranking. Table 1 shows the top ten highest scoring genes obtained by GA and exhaustive search. Table 1. The five highest likelihood GRN models found by GA in the plus series are put together.
Rank 1 2 3 4 5 6 7 8 9 10
Indices of significant genes found by GA (freq. of occurrence in Minus GRNs, freq, of occurence in Plus GRNs) and their accession numbers in Genbank 1 2 3
Indices of significant genes found by exhaustive search (gene index) 4
5
27 21 12 32 20 22 11 5 18 6
20 27 32 12 6 29 5 22 10 13
M98833 X59871 X79067 J04101 AL021154 X66867 D50692 U25435 HG3521-HT3715 J04102
(179, 185) (261, 0) (146, 48) (64, 118) (0, 159) (118, 24) (0, 126) (111, 0) (0, 105) (75, 0)
X59871 U15655 J04101 X79067 M98833 U25435 HG3523-HT4899 D50692 D89667 AL021154
The results obtained by GA and exhaustive search are strikingly similar. In both lists, seven out of top ten genes are common (genes 27, 12, 32, 20, 22, 5, 6) and four out of top five genes are the same (genes 27, 12, 32 and 20). The similarity in the results supports the applicability of a GA-based method in this search problem and in particular, when the search space is too large for an exhaustive search. An outstanding gene identified is gene 27, TCF-1. The biological implications of TCF-1 and other high scoring genes are currently under investigation. The GRN dynamics can also be visualized with a network diagram using the influential information extracted from the state transition matrix. As an example, we examine one of the discovered GRN of genes (33, 8, 27, 21) for both the plus and minus series, shown in Figure 2 and Figure 3 respectively. The network diagram shows only the components of $ whose absolute values are above the threshold value ( = 0.3. For the plus series, the network diagram in Figure 2 (a) shows that gene 27 has the most significant role regulating all other genes (note that gene 27 has all its arrows out-going). The network simulation, shown in Figure 2(b) fits the true observations well and the predicted values appear stable, suggesting that the model is accurate and robust. For the minus series, the
Modelling of Gene Regulatory Networks
683
network diagram in Figure 3(b) shows a different network from that of the plus series. The role of gene 27 is not as prominent. The relationship between genes is no more causal but interdependent, with genes 27, 33 and 21 simultaneously affecting each other. The difference between the plus and minus models is expected. Again, the network simulation result shown in Figure 3(b) shows that the model fits the data well and the prediction appears reasonable.
Fig. 2. The identified best GRN of gene 33 (telomerase) and genes 8, 27 and 21 for the plus series: (a) The network diagram (b) The network simulation and gene expression prediction over future time. Solid markers represent observations.
Fig. 3. The identified best GRN of gene 33 (telomerase) and genes 8, 27 and 21 for the minus series: (a) The network diagram (b) The network simulation and gene expression prediction over future time. Solid markers represent observations.
4.1. Building a Global GRN of the Whole Gene Set Out of the GRNs of Smaller Number of Genes (Putting the Pieces of the Puzzle Together) After many GRNs of smaller number of genes are discovered, each involving different genes (with a different frequency of occurring), these GRNs can
684
N. K. Kasabov et al.
be put together to create a GRN of the whole gene set. Figure 4 illustrates this on the top five (fittest) GRNs from our experiment.
Fig. 4. The five highest likelihood GRN models found by GA in the plus series are put together.
5. Conclusions Gene regulatory networks (GRNs) are so central to understanding and manipulating cells and are very critical to proceed for any biological mission. Here we propose a novel method that integrates Kalman Filter and Genetic Algorithm for the discovery of GRN from gene expression observations of several time series (in this case they are two) of small number of observations. As a case study we have applied the method for the discovery of GRN of genes that regulate telomerase in two sub-clones of the human leukemic cell line U937. The time-series contain 12,625 genes, each of which sampled 4 times at irregular time intervals, but only 32 genes of interest are dealt in our experiment. The method is designed to deal effectively with irregular and scarce data collected from a large number of variables (genes). GRNs are modelled as discrete-time approximations of first-order differential equations and Kalman Filter is applied to estimate the true gene trajectories from the irregular observations and to evaluate the likelihood of the GRN models. After several runs of the GA-based method the genes that occur with the highest frequency in the optimal GRNs are considered significant. The obtained GRNs may be put together to form a global GRN of the whole gene set. This approach reduces the size of GRN to be modelled from the number of candidate genes to the size of the smaller subsets, thus reducing the dependence on the amount of data. The biological implications of the identified networks are complex and currently under investigation.
Modelling of Gene Regulatory Networks
685
References 1. T. Baeck, Optimal Mutation Rates in Genetic Search. The Fifth International Conference on Genetic Algorithms, San Mateo, CA, Morgan Kaufmann Publishers, (1993). 2. T. Baeck, Evolutionary algorithm in theory and practice: evolution strategies, evolutionary programming, and genetic algorithms. New York, Oxford University Press, (1995). 3. T. Baeck, et al., Evolutionary Computataion II. Advanced algorithm and operators. Bristol, Institute of Physics Pub, (2000). 4. T. Baeck, et al., Evolutionary Computation I. Basic algorithm and operators. Bristol, Institute of Physics Publishing, (2000). 5. J. S. Bay, Fundamentals of Linear State Space Systems. WCB/McGraw-Hill, (1999). 6. O. E. Belova, et al., Computer system for investigation and integrated description of molecular-genetic system regulation of interferon induction and action. CABIOS 11, 213-218 (1995). 7. A. Ben-Dor, R. Shamir and Z. Yakhini, Clustering gene expression patterns. J. Comp. Biol. 6(3-4), 281-297 (1999). 8. J. M. B. a. H. Bolouri, Computational modelling of Genetic and Biochemical Networks. London, The MIT Press, (2001). 9. R. G. Brown, Introduction to Random Signal Analysis and Kalman Filtering. John Wiley and Son, (1983). 10. M. J. Brownstein, J. M. Trent, and M. S. Boguski, Functional genomics In M. Patterson and M. Handel, eds. Trends Guide to Bioinformatics, 27-29 (1998). 11. R. J. Cho, et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2, 65-73 (1998). 12. J. Collado-Vides, A transformational-grammar approach to study the regulation of gene expression. J. Theor. Biol. 136, 403-425 (1989). 13. R. Dorf and R. H. Bishop, Modern Control Systems. Prentice Hall, (1998). 14. M. B. Eisen, P. T. Spellman, P. O. Brown and D. Botstein, Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868 (1998). 15. S. Fields, Y. Kohara and D. J. Lockhart, Functional genomics. Proc. Natl. Acad. Sci. USA 96, 8825-8826 (1999). 16. L. Friedman, Nachman, Pe'er, Using Bayesian networks to analyze expression data. J. Comp. Biol. 7, 601-620 (2000). 17. D. E. Goldberg, Genetic Algorithms in Search, Optimization and machine Learning. Reading, MA, Addison-Wesley, (1989). 18. R. a. M. F. Hofestadt, Interactive modelling and simulation of biochemical networks. Comput. Biol. Med. 25, 321-334, (1995). 19. J. H. Holland, Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor, MI, (1975).
686
N. K. Kasabov et al.
20. N. Kasabov and D. Dimitrov, A method for gene regulatory network modelling with the use of evolving connectionist systems. ICONIP'2002 - International Conference on Neuro-Information Processing, Singapore, IEEE Press, (2002). 21. N. Kasabov, et al. Evolutionary computation for parameter optimisation of on-line evolving connectionist systems for prediction of time series with changing dynamics. Int. Joint Conf. on Neural Networks IJCNN'2003, USA, (2003). 22. V. A. Likhoshvai, et al., A generalized chemical kinetic method for simulating complex biological systems. A computer model of lambda phage ontogenesis. Computational Technol. 5(2), 87-89 (2000). 23. W. F. Loomis and P. W. Sternberg, Genetic networks. Science 269, 649 (1995). 24. H. H. Me Adams and A. Arkin, Stochastic mechanism in gene expression. Proc. Natl. Acad. Sci. USA 94, 814-819 (1997). 25. H. Muhlenbein, How genetic algorithms really work: I. mutation and hillclimbing. Parallel Problem Solving from Nature 2. B. Manderick. Amsterdam, Elsevier, (1992). 26. B. O. Palsson, What lies beyond Bioinformatics? Nat. Biotechnology 15, 3-4 (1997). 27. L. Sanchez, J. van Helden and D. thieffry, Establishment of the dorso-ventral pattern during embryonic development of Drosophila melanogaster. A logical analysis. J. Theor. Biol. 189, 377-389 (1997). 28. R. H. Shumway, Applied Statistical Time Series Analysis. Englewood Cliffs, New Jersey, Prentice Hall, (1998). 29. P. Spellman, et al., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273-3297 (1998). 30. R. N. Tchuraev, A new method for the analysis of the dynamics of the molecular genetic control systems. I. Description of the method of generalized threshold models. J. Theor. Biol. 151, 71-87 (1991). 31. D. Thieffry, From global expression data to gene networks. BioEssays 21(11), 895-899 (1999). 32. D. Thieffry, Dynamical behaviour of biological regulatory networks-II. Immunity control in bacteriophage lambda. Bull. Math. Biol. 57, 277-297 (1995). 33. M. Wahde, et al., Assessing the significance of consistently mis-regulated genes in cancer associated gene expression matrices. Bioinformatics 18(3), 389-394 (2002).
CHAPTER 9 OVERVIEW OF TEXT-MINING IN LIFE-SCIENCES
Kanagasabai Rajaraman, Li Zuo, Vidhu Choudhary, Zhuo Zhang, Vladimir B Bajic Knowledge Extraction Lab, Institute for Infocomm Research, Singapore { kanagasa, lizuo, vidhu, zzhang, bajicv} @i2r. a-star. edu. sg Hong Pan Genome Institute of Singapore, Singapore panghQgis. a-star. edu. sg Tiow-Suan Sim Department of Microbiology, National University of Singapore, Singapore micsimts@nus. edu. sg Sanjay Swarup Department of Biological Sciences, National University of Singapore, Singapore dbsss&nus.edu.sg One of the emerging technologies in life-sciences is text-mining. This technology enables automated extraction of pieces of relevant biological knowledge from a large volume of scientific documents. We present a short overview of text-mining and give hints about how it can efficiently support biologists and medical scientists to infer function of biological entities and save them a lot of time, paving way to formulate new hypotheses and validate by more focused and detailed follow-up research. 1. Introduction Text-mining of biomedical literature has received increasing attention in the past several years 1 ' 7 ' 8 ' 10 ' 12 - 15 ' 18 ' 29 . It is seen as an interesting and powerful supporting technology to complement research in life sciences. A number of text-mining systems, which tackle problems of genomics, 687
688
K. Rajaraman et al.
proteomics, or relations of genes and proteins with diseases, etc, have been reported 2 " 6 ' 9 ' 11 ' 14 ' 16 - 23 - 27 ' 30 - 33 . Several reasons contributed to this increased interest in text-mining: a) a huge volume of scientific documents available over internet to an average user; b) inability of an average user to keep track of all relevant documents in a specific domain of interest; c) inability of humans to keep track of associations usually contained in, or implied by, scientific texts; these associations could be either explicitly stated, such as 'interaction of A and B', or they need not be explicitly spelled out in a single sentence; d) inability of humans to simultaneously deal with a large volume of terms and their cross-referencing; e) necessity to search a number of different documents (and frequently different resources) to extract a set of relevant information; f) inability of a single user to acquire the required information in a relatively short (acceptable) time. Just for illustration, PubMed repository currently contains 14 million indexed documents34. It is common that searches of PubMed frequently provide several thousands returned documents for a single query. Studying these large document sets and organizing the content meaningfully is not easy task for a single user. If the analyses have to be done multiple times with different selection of documents, then such a task is usually not feasible. In spite of these practical problems, one can raise a question of what new information text-mining can provide for life-scientists? Secondly, whether it is possible by using text-mining to infer a new biological function of the examined biological entities? To answer the first question, we make an analogy with bioinformatics. Bioinformatics produces answers which are by nature putative. These answers are hypotheses which require additional experimental verification. Text-mining systems are no different. They analyze the existing knowledge and suggest associations, which, in all cases, require either experimental validation or confirmation from the existing literature. The answer to the second question is that the associations between biological entities need not necessarily be spelled out explicitly and automated systems, which can analyze and classify entities based on statistical significance or some other set of rules, may reveal connections which suggest particular functions of the examined entities.
Text-Mining in Life-Sciences
689
In this chapter, we present a short overview of text-mining, describe some of the general features that these systems should provide to the end users in life-sciences, and comment on several systems developed for lifesciences applications. 2. Overview of Text-Mining Text-mining, also known as Knowledge Discovery from Text, refers to the process of extracting interesting and non-trivial patterns or pieces of knowledge from non-structured text documents10. Text-mining technologies can be differentiated based on the level of analysis performed on the text. The most basic of them are the document organization technologies that perform mining at the document/abstract level. Examples are text categorization and text clustering. Text categorization refers to the classification of texts into predefined categories, whereas text clustering is the automatic grouping of documents when no pre-defined categories are available. At the next level are the Knowledge Extraction tools. These tools perform tasks such as entity recognition, key term extraction, relationship extraction and event extraction. At the most advanced level are content-mining technologies that perform 'text understanding'. Text understanding is a challenging problem and research in this field is still in its formative stages. Natural Language Processing19 (NLP) is a key technology used at the low level by many present-day text mining tools. Many 'Document organization and knowledge extraction' tools use shallow parsing techniques, while text understanding often requires full parsing coupled with many other NLP techniques. Some technologies, such as text-visualization, enable knowledge discovery by involving humans in the discovery loop, and they are used at all three levels. It should be mentioned that often a text-mining system may feature several tasks that may span two or more levels. 3. Scope and Nature of Text-Mining in Life-Sciences Domain By automated knowledge extraction we understand an automatic extraction of names of entities, such as genes, proteins, metabolites, enzymes, pathways, diseases, drugs, etc., which appear in biomedical literature, as well as the relationships between these entities. Basic relation between two entities is characterized by the co-occurrence of their names in the same document, or in a specific segment of the document. However, the actual relation between these entities is not easy to characterize unambiguously by computer program. It is thus customary to leave it to users to asses
690
K. Rajaratnan et al.
the actual nature of such relations based on the associated documents. To the best of our knowledge, very few text-mining systems exist, which can accurately extract such relations, and this problem remains one of the key challenges to be solved. 3.1. Characteristics of Text-Mining Systems There are several basic features that text-mining systems should provide. These systems should: a) b) c) d) e) f)
be easy to use; be interactive; allow multiple methods of data submission; allow user to select broad areas of classification for the analyses; provide suitable interactive summary reports; show association maps in suitable graphical and interactive formats; g) preferably have built-in intelligence to filter out irrelevant documents; h) preferably be able to extract large-volume of useful information. While, in principle, any free text document could be analyzed, we will assume that documents are abstracts of scientific articles, such as those contained in PubMed34 repository. Then, generally speaking, there could be three levels at which text analysis can be conducted: the 'Abstract level', the 'Sentence level', and the 'Relation level'. At the 'Abstract level', systems analyze the whole abstract aiming to determine if it contains relations between the utilized biological entities or not; at the 'Sentence level', systems assess whether the abstract analyzed contains sentences which explicitly claim relations between the entities or not; here, the individual sentences are analyzed as a whole; finally, at the 'Relation level', systems attempt to extract specific entities and relations they are subjected to, from the sentences which are assessed to contain such relations. 3.2. Systems Aimed at Life-sciences Applications Biological literature often contains terms having many variations, which may affect the accuracy of text mining. To alleviate this problem, many text-mining systems aimed at biomedical field make use of controlled vocabularies and ontologies20'28such as MeSH (http://www.nlm.nih.gov/mesh/),
Text-Mining in Life-Sciences
691
UMLS (http://www.nlm.nih.gov/research/umls/) and gene ontology13 (GO). Such systems predominantly use PubMed abstracts as the data source. A few systems attempt to analyze the full collection (e.g. PubGene16 ), but many make use of only a sub-collection of documents. This sub-collection is usually generated by querying PubMed. Since a PubMed query can sometimes return huge number of documents, a few systems perform document organization level as a first step before further analysis can be performed, e.g. Bio MetaCluster (http://vivisimo.com/projects/BioMed) and Medical Knowledge Explorer5 (MeKE). However, analysis of only abstracts of scientific documents, such as those from PubMed collections, cannot provide the complete information. The most critical issue is the coverage that textmining systems achieve by relying only on abstracts of scientific reports. Thus, a number of systems attempt to deal with the full texts documents when they are available, such as Textpresso21 In addition to all problems faced with the analysis of abstracts, the analysis of full text documents relies on the public availability of these documents in electronic form, which is a serious problem. A number of text-mining systems for biomedical domain have been proposed for knowledge extraction. Key-term extraction is performed by al) XplorMed25'27 for focused PubMed search, bl) AbXtract2 for automatic annotation of functional characteristics in protein families, cl) High-density Array Pattern Interpreter20 (HAPI), GOMiner35, Expression Analysis Systematic Explorer14 (EASE), and TXTgate11 to facilitate biological interpretation of commonalities in gene groups. Extraction of term associations is done by a2) PubGene16 for gene pathway mapping, b2) Dragon TF Association Miner23 for mining transcription factor interaction networks, c2) Dragon Metabolome Explorer (http://research.i2r.a-star.edu.sg/DRAG0N/ME/), for exploration of metabolic subsystems in plants and other species and their associations with GO categories, d2) Dragon Explorer of Bacterial Genomes (DEBG) for bacterial cross-species comparison, e2) Dragon Disease Explorer (http://research.i2r.a-star.edu.sg/DDE/) for exploration of association of diseases, GO and eVOC17 categories and transcription factors, f2) PubMatrix3 for analyzing combinatorial datasets as found with multiple experimental systems, g2) MedGene15 for mining disease-gene associations. MedMiner33 performs both key-term extraction and association mining to help discover biological functions of genes. Sometimes the association between two biological entities may
692
K. Rajaraman et al.
be implicit. Such associations are mined by a3) Arrowsmith30, Manjal (http://geordi.info-science.uiowa.edu/Manjal.html) for making new discoveries by searching links between two literature sets within Medline, b3) Genes2Diseases26 for disease gene discovery. In contrast to term associations, the actual relations between the terms are extracted by a4) SUISEKI4 and the system of Ono et al22 for discovering protein-protein interactions, and b4) Dragon TF Relation Extractor24 (DTFRE) for discovery of actual transcription factor relations. Many of these systems at best use only shallow parsing techniques. Deeper text analysis systems, though seem promising, are still in their infancy due to technical limitations. However, the developments in advanced NLP19 will help speed up the progress in future. 4. Conclusions Text-mining is a useful technology, which can support research in lifesciences and make it easier to infer functions of examined entities. The strength of this approach is its comprehensiveness, speed and ability to present sometimes unexpected associations of categories and terms based on analysis of large sets of documents. This is not feasible for a single user. However, this is also a weakness, since very few text-mining systems have built-in intelligence to automatically determine the relevance of extracted pieces of information from the document context. The accuracy of such intelligent blocks is currently not sufficiently high, which requires that users analyze carefully the results obtained. However, it is reasonable to expect that the developments in advanced NLP will make crucial contribution to this field in the future. References 1. M. A. Andrade and P. Bork. Automated extraction of information in molecular biology. FEBS Lett, 476(1-2): 12-17, Jun 2000. 2. M. A. Andrade and A. Valencia. Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics, 14(7):600-607, 1998. 3. K. G. Becker, et al. PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics, 4(1):61, Dec 2003. 4. C. Blaschke and A. Valencia. The frame-based module of the Suiseki information extraction system. IEEE Intelligent Systems, 17:14-20, 2002. 5. J.-H. Chiang and H.-C. Yu. MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics, 19(11):1417-1422, Jul 2003. Evaluation Studies.
Text-Mining in Life-Sciences
693
6. J.-H. Chiang, H.-C. Yu, and H.-J. Hsu. GIS: a biomedical text-mining system for gene information discovery. Bioinformatics, 20(l):120-121, Jan 2004. 7. B. de Bruijn and J. Martin. Getting to the (c)ore of knowledge: mining biomedical literature. Int J Med Inform, 67(1-3):7-18, Dec 2002. 8. S. Dickman. Tough Mining: The challenges of searching the scientific literature. PLoS Biol, 1(2):E48, Nov 2003. 9. I. Donaldson, et al. PreBIND and Textomy-mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4(1):11, Mar 2003. Evaluation Studies. 10. R. Feldman. Tutorial notes: Mining unstructured data. In ACM Conference on Information and Knowledge Management (CIKM '01), Atlanta, Georgia, 2001. 11. P. Glenisson, et al. TXTGate: profiling gene groups with text-based information. Genome Biol, 2004;5(6):R43. Epub 2004 May 28. 12. L. Grivell. Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep, 3(3):200-203, Mar 2002. 13. M. A.Harris, et al. Gene Ontology Consortium. The Gene Ontology(GO) database and informatics resource. Nucleic Acids Res, 32:D258-61, 2004. 14. D. A. Hosack, et al. Identifying biological themes within lists of genes with EASE. Genome Biol, 4(10):R70, 2003. 15. Y. H. Hu, et al. Analysis of genomic and proteomic data using advanced literature mining. J Proteome Res, 2(4):405-412, Jul 2003. 16. T. K. Jenssen, A. Laegreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet, 28(l):21-28, May 2001. 17. J. Kelso, et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res, 13(6A): 1222-1230, Jun 2003. 18. R. Mack and M. Hehenberger. Text-based knowledge discovery: search and mining of life-sciences documents. Drug Discov Today, 7(11 Suppl):S89-98, Jun 2002. 19. C. D. Manning and H. Schiitze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999. 20. D. R. Masys, et al. Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics, 17(4):319-326, Apr 2001. 21. H. M. Muller, E. E.Kenny, and P. W.Sternberg. Textpresso: An OntologyBased Information Retrieval and Extraction System for Biological Literature. PLoS Biol, 2(ll):E309, Sep 2004. 22. T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 17(2):155-161, Feb 2001. 23. H. Pan, et al. Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining. Nucleic Acids Res, 32(Web Server issue):230-234, Jul 2004. 24. H. Pan, et al. Extracting information for meaningful function inference through text-mining, in Discovering Biomolecular Mechanism with Compu-
694
K. Rajaraman et al.
tational Biology (F Eisenhaber, Ed.), Landes Bioscience, accepted, 2004. 25. C. Perez-Iratxeta, P. Bork, and M. A. Andrade. XplorMed: a tool for exploring MEDLINE abstracts. Trends Biochem Set, 26(9):573-575, Sep 2001. 26. C. Perez-Iratxeta, P. Bork, and M. A. Andrade. Association of genes to genetically inherited diseases using data mining. Nat Genet, 31(3):316-319, Jul 2002. 27. C. Perez-Iratxeta, A. J. Perez, P. Bork, and M. A. Andrade. Update on XplorMed: A web server for exploring scientific literature. Nucleic Acids Res, 31(13):3866-3868, Jul 2003. 28. S. Schulze-Kremer. Ontologies for molecular biology and bioinformatics. In Silica Biol, 2(3):179-193, 2002. 29. H. Shatkay and R. Feldman. Mining the biomedical literature in the genomic era: an overview. J Comput Biol, 10(6):821-855, 2003. 30. N. R. Smalheiser and D. R. Swanson. Using ARROWSMITH: a computerassisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed, 57(3):149-153, Nov 1998. 31. P. Srinivasan. MeSHmap: a text mining tool for MEDLINE. Proc AMIA Symp, pages 642-646, 2001. 32. D. R.Swanson and N. R.Smalheiser. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence, 91:183-203, 1997. 33. L. Tanabe, et al. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques, 27(6): 1210-1214, Dec 1999. 34. D. L. Wheeler, et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res, 32 Database issue:D35-40, Jan 2004. 35. B. R. Zeeberg, et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol, 4(4):R28, 2003.
CHAPTER 10 INTEGRATED PROGNOSTIC PROFILES: COMBINING CLINICAL AND GENE EXPRESSION INFORMATION THROUGH EVOLVING CONNECTIONIST APPROACH
Nikola K. Kasabov Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, New Zealand and Pacific Edge Biotechnology Ltd, Dunedin, New Zealand nkasabov@aut. ac. nz Liang Goh Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, New Zealand [email protected] Mike Sullivan Christchurch Hospital, Christchurch, New Zealand Christchurch School of Medicine, University of Otago, New Zealand and Pacific Edge Biotechnology Ltd, Dunedin, New Zealand mike. sullivan@stonebow. otago. ac.nz In the field of bio-computing, the problem of selecting significant gene markers for disease profiling and generating models from the markers have the potential to improve treatment of disease and health management. This chapter presents a novel approach to classification of cancer using an evolving connectionist-based system that can merge highly complex gene expression data and discrete clinical features into single prognostic profiles which result in a better prognostic accuracy. The approach is applied on two well-known gene expression data sets; the analysis and classification of diffuse large B cell Lymphoma by Shipp et al. 200214, and the clinical outcome analysis of breast cancer by Van Veer et al. 2002 . The classification accuracy of the combined strategy, where microarray gene expression data is integrated with clinical data is 695
696
N. K. Kasabov et al.
higher by up to 15% when compared with using clinical or gene expression data separately. The method also suggests that there are different genetic profiles for different clusters of clinical information, supporting observations that the broad spectrum of clinical diagnosis and prognosis can be further refined with genetic information.
1. Introduction Genomics data allow for genetic profiling of diseases, cancer being one of the most widely investigated. Marker genes, such as BRCAl and BRCA2, have been used in microarray analysis to distinguish cancerous from normal tissues, fatal from survival cases; good from bad prognosis4'15. Clinical data, sometimes published together with genomic data has been used as supplementary information to the class output information, is often not treated as part of the input data used to derive a classification model14'15. Clinical data itself has however been traditionally used as a first step towards diagnosis and prognosis of a disease, which if integrated with genomic data, can provide information about unique characteristics of the patient, thus paving the way for personalized treatment and medicine. It is a foreseeable goal (given the explosion of genomic data and the archive of clinical data that has been collected in medical institutes and hospitals) that integration of these data are inevitable in the future, as it has the potential to provide comprehensive and more precise information of the disease with respect to each individual. The key challenge then is the development of information methods to integrate the data and to discover knowledge on relationship between genotype and phenotype. In pharmacogenomics, research is being conducted on the application of human genetic data to drug development so as to personalize drugs for individuals. The idea is to identify people who can benefit from the drug with a simple and cheap genetic test that can be used to eliminate those individuals who might benefit from another drug. Different classes of drugs work differently for people, due to their different genetic makeup. Some recent figures (see for example: http://millennium-debate.org/ind8decO34.htm) show that drugs efficiency for oncology is 25% and the majority of drugs work only in 25 - 60% of people. Evidence shows that substantial portion of the variability in drug response is genetically determined with age, sex, nutrition, and environmental exposure being important contributory factors10. The need to focus on effective treatment for individuals characterized by their distinct genetic and clinical profiles is the next challenge in health care management.
Integrated Prognostic Profiles
697
The reliance on gene expression profiles alone will inevitably lead to inaccuracy in clinical prediction. Likewise, basing clinical diagnosis and prognosis on clinical data may be too coarse and not cost effective. We believe an integrated approach of integrating different types of genetic and medical data can refine diagnosis and prognosis of diseases and treatment for individuals. Some methodologies and preliminary results on integrating gene data and clinical data in a single prognostic or classification medical decision support system have already been developed and published1'2'5'6'11. This paper extends some ideas and some preliminary publications1'2'6 on gene and clinical data integration, and demonstrates integrated data not only improves on the model performance, but facilitates the discovery of hidden relationship between genotype and phenotype information. The resulting integrated profiles can be used for drug targeting and personalized treatment design. The evolving connectionist-based method consists of the following steps: gene expression (genotype) and clinical (phenotype) data integration, common feature set selection, model generation and validation, and extracting integrated profiles of gene expression and clinical data. The evolving connectionist-based approach is applicable more generally to the discovery of relationships between a broad spectrum of complex genotype and phenotype information. The clinical course of any given malignancy will not depend on its gene expression signature alone. Other independent disease and patient related factors such as the tumor genotype, its location and size etc, and the age, sex and genotype of the patient are almost certainly involved in predicting the response to therapy or the ultimate disease outcome. In order to influence patient management, any clinical decision support system must have a high level of confidence and accuracy in the classification of the disease process, and in the classification of patients. 2. Methods The method for genotype and phenotype information integration, modeling and relationship discovery proposed here consists of the following steps: (1) (2) (3) (4)
Genotype and phenotype data integration; Common feature set selection; Model generation and validation; Integrated profiles extraction;
698
N. K. Kasabov et al.
2.1. Data Sets The cancer gene expression data sets used in the experiments here are public domain data. The first set is the lymphoma data set from Shipp et a/.14 for the problem of predicting survival outcome. There are 7129 gene expression variables with 56 samples. The clinical data is represented by an IPI (International Prognostic Index), which is an aggregate of some phenotype measurements. The second set is the breast cancer prognosis data 15 for the classification of good and poor prognosis. The breast cancer has 78 samples (vectors): 44 samples with good prognosis (remain free of disease after their initial diagnosis for an interval of at least 5 years) and 34 with poor prognosis. There were about 25,000 gene variables in the data set. There were 7 clinical variables. 2.2. Data Integration Data integration may be performed at different levels: (1) Integration of data from different sources prior to the model creation early integration5; (2) Integration of results obtained from separate models each working on one source of data 1 ' 2 ' 7 ' 11 . The second approach takes the route of ensemble of experts that has been extensively studied. As we are interested to explore or discover relationships between different sources of data, in particular - gene expression and clinical data, the first approach of data integration prior to modeling was adopted here. 2.3. Common Feature Set Selection Feature selection is the process of choosing the most appropriate features when creating a computational model of a process8'12-13. Most of the feature selection methods are applied across a single data set (e.g. gene expression data). In Van Veers approach for example15, the correlation of genes against the disease outcome was evaluated and then the genes were ranked. With an increment of 5 genes for each classification model, a set of 70 genes was identified and 83% classification accuracy was achieved. In Shipps approach, the signal/noise ratio method (SNR) was used to identify 30 genes14 with 75% accuracy achieved. The set of gene variables were then used for building a classification model not related to any clinical variables despite of their availability.
Integrated Prognostic Profiles
699
When both gene expression (genotype) and clinical data (phenotype) information are available for the same problem of classification, there are two main approaches to feature selection and modeling as indicated previously: (1) Gene data and clinical data are integrated together; combined feature vectors are extracted from the integrated data through generation and testing of a classification model. (2) Gene variables (features) are selected from gene expression data and clinical variables are selected from clinical data separately; two classification models are developed based on separate sets of features and the results from the two classification models are combined1'2'3'7'11 (3) The former approach is used in the paper and in the experiments shown later on the lymphoma and the breast cancer data set. The feature selection, model generation and validation have been integrated into a novel feature selection method of 'Integrated Feature Selection. 2.4. Algorithm of Integrated Feature
Selection
Here a method for integrated feature selection is proposed based on the idea of grouping (binning) co-regulated or correlated genes first using Pearson correlation. A significant gene is then selected from each bin where they are weighted according to some weighting metrics. In this instance, the weighting metric is a signal-noise-ratio ranking algorithm14. Significance of each gene is assessed based on its contribution to the classification rate. If it improves the classification rate, the gene is added to the list, otherwise, it is discarded and the next gene is considered. The method greatly reduces the dimension of the gene variables while maintaining a list of bins with correlated genes for further functional genomics analysis. It allows for 'geneswapping' within the bins to select the most significant gene with biological relevance. The integrated feature selection method can be applied on gene data only or on an integrated gene and clinical data and the most important variables from the two variable sets (genotype and phenotype) will form a common feature vector for generating and testing a model and for rule extraction later on. The algorithm is as follows: H I Set a threshold for the binning correlation e.g. Pthreshoid = 0.6. For each iteration, i — 1, ...,n (where n is the number of variables in the data set) calculate the correlation coefficient r^j for Xi with respect to Xj,j = i,...,n.
700
N. K. Kasabov et al.
H2 Select those variables where ritj is greater than Pthreshoid, where Xije Si, and Si is the set that contains the correlated variables for variable Xi at iteration i. H3 Select the first ranked variable to represent this group (bin) of correlated variables. This will remove redundant variables that are similar to each other and will choose only one variable to represent the set Si. The variables that are in a set of correlated variables but are not selected to represent this set will not be used in the next step of binning thus reducing significantly the size of the overall correlation matrix and obtaining a reduced data set D for further processing. H4 Perform SNR on D and rank the variables according to their SNR value, to obtain a list or ranked variables A. Set thresholds for variable cut-off, VthreshoU and classification threshold Cthreshoid e.g. Vthreshoid = 100, Cthreshoid = 1 (100% classification rate). This will set the limit in which the validation process will stop, based on maximum number of SNR-ranked variables that are investigated, or on a set of classification accuracy (e.g. 100%). An increment step is defined to increase the number of selected variables for the creation and the validation of a classification model, e.g. Vstep = 1H5 Create a classification model, starting with the highest ranked SNR variables in A, i.e. Vk e A, Vk = {L,rank(SNR(D),k)},k = l,..m, where m is the number of variables in D, and L is the list of successful variables which is null in the beginning. Validate the class model the leave-one-out method. Add the variable to the L if it improves the classification rate. Continue the process if the average classification rate is less than Cthreshoid, or the number of variables is less than VthreshoidH6 If the defined average classification rate is achieved (e.g.100%) or the number of variables threshold is reached, stop the process. The set of variables in L will be the set that has the highest classification rate amongst the iterations.
Using the above approach of reducing the data by removing highly correlated variables, we are able to obtain a reasonable matrix of correlation coefficients for the remaining variables. Selecting the set of variables incrementally based on the classification rate makes it possible to select a minimum set of variables that can classify the entire data set, since every variable in the list L is selected only if it increases the classification rate.
Integrated Prognostic Profiles
2.5. Experimental
701
Design
Integrated feature selection was performed on the integrated data sets of gene expression and clinical data. A minimal set of genes and clinical variables were extracted and this was used to build the evolving connectionist model. Here we have used the Evolving Classifying Function (ECF)8 for class modeling. ECF is a simplified version of Evolving Fuzzy Neural Network (EfuNN)9. It allows for supervised training from data and rule (profile) extraction. The profile represents a cluster of data samples that belong to the same class. It also contains information on radius of each cluster. An example of the profile is shown below. Lymphoma Profile 6: if Clinical Variable IPI Gene Variable SM15 Gene Variable P190-B Gene Variable CES2 Gene Variable SELE Gene Variable P120E4F
is is is is is is
(low: 0.95) and (low: 0.60) and (low: 0.64) and (high: 0.95) and (high: 0.60) and (high: 0.54)
then Class is Cured Radius = 0.354936 , 14 samples in Cluster The integrated profiles or rules were extracted from the model and analyzed. 3. Results The integrated profiles extracted from the trained ECF classification models on the entire integrated gene and clinical data set for the selected above feature sets on the two case study data, are visualized in Figure 1,2. Light color indicates low-normalized value of the variable (gene or clinical) and dark indicates high-normalized value. There are 2 charts in each figure, showing the two classes; top for class 1 (i.e. lymphoma, cured) and bottom for class 2 (i.e. lymphoma, fatal). For breast cancer Figure 2, the top chart shows good prognosis (class 1) while the bottom chart shows poor prognosis (class 2). The numbers on the right of the charts show the number of samples associated with each cluster represented as a rule node (a profile or rule) in ECF.
702
N. K. Kasabov et al.
Fig. 1. Integrated profiles or rules composed of 5 genes and a clinical variable of IPI for lymphoma using ECF classification model. Values of gene expression and clinical data range from low-normalized values of 1 (light) to high-normalized value of 1 (dark). There are 11 rules for cured (a) and 10 for fatal (b). The # of samples indicate the number of samples represented by each profile. The highlighted profiles 6 and 18 show the profiles that have the most number of samples in each class. These 2 profiles represented 35% of the sample population. Some of the profiles have single sample, which are outliers.
3.1. Classification Accuracy Test and Profile Verification For the Lymphoma data, the ECF classifier was optimized along with the feature selection procedure on the integrated gene and IPI data sets. The features and the ECF parameters are optimized based on the classification accuracy estimated through leave-one-out test method. The accuracy of the evolved ECF model is 85.7%. For the final rule set extraction, the ECF model is trained on the whole data set and then profiles are extracted as shown in Figure 1. The accuracy is significantly better than the accuracy of a classification method using clinical variable only (74.1% with the use of the LDA method) or using gene expression data only (75%) see Table 1. For the breast cancer data, two separate feature sets were selected - from the gene expression data (9 genes) and from the clinical data (3 clinical variables). All these 12 variables are then used to form a common input vector to an ECF classification model trained and tested on the whole integrated data set with the use of the leave-one-out method. The accuracy is evaluated as 88.4% (Table 1). After that, the ECF is trained on the whole data set and profiles are extracted as shown in Figure 2.
Integrated Prognostic Profiles
703
Fig. 2. Integrated profiles or rules for Breast Cancer with 9 genes and 3 clinical variables (grade, age, ERP). Top chart (a) shows the profiles for good prognosis and bottom chart (b) shows the profiles for poor prognosis. # of samples indicates the number of samples represented by each profile. Profiles 2 and 16 show the most number of samples for each prognosis (class). In profiles 2 and 3 (class 1), 15, 16 and 18 (class 2), 2 genes are differently expressed; SEC10L and AHI1. Together these profiles represented 59% of the sample population along with the three clinical variables.
Results in Table 1 shows that adding genotype information not only helps discover relationships with phenotype but it also improves the accuracy of the prognosis. Table 1. The classification accuracy of 6 best classification models achieved with clinical variables and or gene expression data Classification accuracy using Lymphoma Breast cancer
Clinical variables only 74.1% (LDA model) 73.1% (ECF model)
Gene expression only 75% (Shipp et al.)
Clinical and gene variables integrated 85.7% (ECF model)
83% (Van 88.4% (ECF model) Veer et al.)
704
N. K. Kasabov et al.
4. Discussion 4.1. Discovering Genotype Phenotype Through Integrated Profiles
Relationships
For lymphoma, Figure 1 shows that IPI is usually low for patients who have survived the disease and high for patients who succumbed to the disease. 4 out of 30 samples (profiles 4, 7, & 8) for cured disease, and 6 out of 26 fatal samples (profiles 12, 14-16, Sz 21) show contrary results with IPI. This demonstrates that though IPI is generally a good marker for prognosis, it is not a perfect indicator by itself. An integrated profile with gene expression provides a more comprehensive view into the sub-types of genomic profiles existing within one broad spectrum of clinical index. With a more individualistic approach to management of disease, those individuals whose phenotype does not follow the norm can still have comprehensive prognosis through their genotype, and receive proper medical treatment. Majority of the samples are found in profiles 6 and 18. Could this be the norm for the general population, where drug could be developed to target this group? The advantage of integrating genotype and phenotype data enables a more comprehensive understanding of the disease and its many manifestations in individuals. Rather than following a consistent treatment, personalized treatment can be tailored to suit each individuals genetic profile. The sample information in the profiles indicate the distribution of patient population in the various sub-types of genetic profiles, providing information for pharmaceutical companies to target drugs or more focused research for a specific group (or profile). Another interesting feature in ECF8 is the cluster radius information obtained in the profiles. This shows how correlated the samples are within the profile. Profiles of breast cancer show some interesting information (see Figure 2). Age, normally associated with higher risk of disease when one becomes older, is contrary here. It shows that individuals succumbed more easily to breast cancer when they are younger (profiles 14-16,19). This constitutes 70.6% of patients with poor-prognosis. Two genes, SEC10L1 and KIAA1750 have consistently high expressions for poor prognosis. SEC10L1 is a component of the exocyst complex involved in the docking of exocystic vesicles with fusions site on the plasma membrane. KIAA1750 is involved with nucleosome assembly. In the good prognosis chart, profiles 2 and 3 capture the majority of the samples. They differ in 3 variables: KIAA1750, ZNF222 and Grade of cancer. Of these 3 variables, Grade is a clinical or phenotype index.
Integrated Prognostic Profiles
705
These two profiles show that within a spectrum of clinical index, there are variations in genetic profile of the patients, which makes it important to have a wider perspective of prognosis based on integrated information. This is also made evident by the profiles for poor prognosis 15 and 18. Not all patients with high ERP have poor prognosis. In these profiles, 14 samples actually have low ERP but distinctly different genetic profiles in the gene expressions of UDP, FLJ23033 and TP53BP1. If one were to solely rely on the clinical aspect, the individual might have missed out on being diagnosed properly. In this case, the outcome might have been fatal. The genes identified here may not be significant markers on their own, but they would contribute to a more conclusive decision for an individual. The number of profiles extracted for both case studies shows that there are many sub-types of genetic profile under the broad spectrum of clinical indices, with some showing only 1 sample in each profile, which we consider as outliers. This would suggest that for certain individuals whose integrated profile does not fit into a majority profile; they may require a more specialized treatment. 5. Conclusion This paper introduces a method for genotype-phenotype data integration, model creation and relationship discovery through profile extraction from a classification model based on a novel integrated feature selection method. The evolving connectionist classifier (ECF) is designed as a classification technique based on clustering. However, its inherent features of rule extraction with all the information associated to a profile (number of data sample; radius of the cluster; number of samples in this cluster, etc) and of the evolving learning algorithm, makes it a more comprehensive technique for the analysis of bio-data collected from different sources, especially with new data generated continuously. In this paper, the evolving learning feature of ECF is not demonstrated but models that will be created in the future would need to further evolve and learn from new incoming data. 6. Acknowledgement The research presented in this paper is partially funded by the New Zealand Foundation for Research, Science and Technology under the grants NERF/AUTX02-01 (2002-2005) and NERF/AUX006 (2000-2002). Any commercial application of the presented techniques and methods requires a licence agreement with the Knowledge Engineering and Discovery
706
N. K. Kasabov et al.
Research Institute (KEDRI), www.kedri.info, and with the Pacific Edge Biotechnology Ltd (PEBL), www.penblnz.com. The authors thank Dougal Greer from KEDRI Auckland University of Technology and Dr Matthias Futschik (Humboldt University in Berlin) for their help in the development of the visualization and rule extraction codes and for data preparation. References 1. M. E. Futschik, A. Reeve and N. K. Kasabov, Modular Decision System and Information Integration for Improved Disease Outcome Prediction. In European Conference on Computational Biology, Paris, France, (2003). 2. M. E. Futschik, A. Reeve and N. K. Kasabov, Prediction of clinical behaviour and treatment for cancers. Applied Bioinformatics 2, 52-58 (2003). 3. M. E. Futschik, A. Reeve and N. Kasabov, Evolving Connectionist Systems for knowledge discovery from gene expression data of cancer tissue. Artificial Intelligence in Medicine, (2003). 4. I. Hedenfalk, et al., Gene-Expression Profiles in Hereditary Breast Cancer. N. Engl. J. Med. 344(8), 539-548 (2001). 5. N. Kasabov, et al., USA Provisional patent application. (2003) PCT application 1001 US1.002. A method and system for integrating microarray gene expression data and clinical information, (2002). 6. N. Kasabov, et al., New Zealand PCT 480030. Medical Applications of Adaptive Learning Systems, PCT/NZ03/00045, (2002). 7. N. Kasabov, et al., U.S. Patent Application Serial No: 60/403,756. Medical decision support systems utilizing gene expression and clinical information and methods for use, (2002). 8. N. K. Kasabov, Evolving Connectionist Systems. Methods and Applications in Bioinformatics, Brain Study and Intelligent Machines Springer Verlag, (2002). 9. N. Kasabov, Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Transactions on Systems Man and Cybernetics Part B- Cybernetics 31(6), 902-918 (2001). 10. R. Molidor, et al., New trends in bioinformatics: from genome sequence to personalized medicine. Experimental Gerontology 38(10), 1031-1036 (2003). 11. J. R. Nevins, et al., Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Human Molecular Genetics 12(2), 153157 (2003). 12. S. Ramaswamy, et al., Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America 98(26), 15149 (2001). 13. A. Rosenwald, et al., The Use of Molecular Profiling to Predict Survival after Chemotherapy for Diffuse Large-B-Cell Lymphoma. N. Engl. J. Med. 346(25), 1937-1947 (2002).
Integrated Prognostic Profiles
707
14. M. A. Shipp, et al., Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8(1), 68-74 (2002). 15. L. J. Veer, et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530 (2002).
CHAPTER 11 DATABASES ON GENE REGULATION
O. Kel-Margoulis, V. Matys, C. Choi, M. Krull, I. Reuter, N. Voss, A. Kel BIOBASE GmbH, Germany { oke,vma, cch, mkl, ire, nvo, ake} ©biobase. de E. Wingender BIOBASE GmbH, Germany and Dept. of Bioinformatics, UKG, University of Goettingen, Germany [email protected] A. Potapov, I. Liebich Dept. of Bioinformatics, UKG, University of Goettingen, Germany { anatolij.potapovoke, ines. liebich} @med. uni-goettingen. de
"Nur die F-iille fiihrt zur Klarheit." (Only abundance leads to clarity) - F. Schiller
1. Introduction The variety in the living world is very high. To find the commonalities and principles underlying this variety can, therefore, be particularly hard and requires collection of a great amount of data. Remarkably, many of the great ideas of the past (e.g. Darwin's theory of evolution by natural selection, Humboldt's plant geography, William Smith's stratigraphy with leitfossils, to name only a few) were based on the meticulous collection and cataloguing of a huge amount of data. Although the purely descriptive science, exemplified by the above, has in most areas of biology been replaced by a more experimental approach, data collection and classification is nowadays as important as it used to be in the days of Darwin. Today the 709
710
O. Kel-Margoulis et al.
collection of the data in a computer-readable form is often a prerequisite for development and application of bioinformatic tools, which can make use of these data for analysis. The structure of a database should always be governed by the accommodated data, and therefore - on a simplistic (reductionist) level - the tables and their relations in a database can be regarded as a model of the described natural objects and the relationships between them. Besides a suitable database structure, the quality and consistence of the stored data themselves are of great importance. Living organisms have to transmit information to maintain their cellular and physiological processes and to be able to interact with their environment. Since errors in the transmission of signals can lead to fatal diseases, molecular surveillance mechanisms exist in every cell. Maybe the most prominent examples are the checkpoints that control the error-free progression of the cell cycle. Populating databases with the existing data on molecular biological processes has to adapt the same careful handling of information by introducing quality management methods to be useful in Understanding the mechanisms of life. The methods include clear annotation guidelines for the curators as well as programs that check the consistency of the data according to the database model and specifications. The use of specialized input tools equipped by controlled vocabularies and automatic frequent annotation steps help in preventing errors. Presently, a great number of databases on different aspects of molecular biology are available. Every year more than 100 biological databases are presented in the Nucleic Acids Research database issue. In this chapter we describe databases devoted to gene expression regulation. First we give a brief overview on databases, which deal with genes and proteins (as information carriers) in general. In the following two sections we then focus on those databases which deal with information transmission and processing in the cell, i.e. databases on gene transcription regulation and databases on protein interactions and signaling networks. We consider in details TRANSFAC®, the commonly accepted database on many aspects of transcription regulation, accompanying databases TRANSCompel® and TM
SMARt DB devoted to the composite elements and scaffold/matrix attached regions respectively, and TRANSPATH® presenting information on signal transduction pathways. These curated databases along with specific features provide links to a number of general databases and information resources, and are equipped by search, browsing and visualization tools, and by documentation pages and guided tours as well. Public versions of these
Databases on Gene Regulation
711
databases accompanied by sequence analysis tools (for non-profit organizations) are available at www.gene-regulation.com. SMARt DB is accessible at http://smartdb.bioinf.med.uni-goettingen.de/. Professional versions are available at www.biobase.de . In the concluding section of the chapter the application of the described databases and tools for the analysis of gene expression data are shown. 2. Brief Overview of Common Databases Presenting General Information on Genes and Proteins Several databases assign biological entities such as genes and proteins unique names and characterize their primary sequences and structural features (Table 1). Functional classification of genes and their products can be done according to an ontology-based terminology17. Commonly accepted gene/protein identification is extremely important for the scientific community. For the three most important mammalian model organisms human, mouse, and rat specific database projects exist: Genew56, MGI9 and RGD54, respectively, that assign generally accepted identifiers, thereby coordinating the nomenclature for orthologous genes. The NCBI database Entrez Gene is presently considered as a central resource for gene nomenclature and links to other databases. Nucleotide sequences of different origins are collected in the international collaborative DDBJ/EMBL/GenBank33. The three databases synchronize their records on a daily basis. As this general depository contains redundancy, RefSeq has been established to provide non-redundant reference sequences for biological molecules44. Organization, annotation and analysis of complete genomes of a variety of species are the aims of the database project Ensembl8. It provides the opportunity to browse the genomes of currently 13 different organisms and to find identified and predicted genes. UniProt unifies protein sequence and functional information from SwissProt, TrEMBL and PIR (Protein Information Resource) in one central resource2. The database consists of two parts, manually curated records and computationally annotated entries. The Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and other macromolecules6. The PDB records are the standard format used for visualization of macromolecular structures. InterPro is a database on protein families, domains and functional sites that has been
712
O. Kel-Margoulis el al.
created out of the integration of several other databases40. Table 1. List of Public Databases Presenting General Information on Genes and Proteins Genew http://www.gene.ucl.ac.uk/cgi-pl (Human Gene &bin/nomenclature/searchgenes Nomenclature Database) MGI (Mouse http://www.informatics.jax.org/ Genome Informatics database) RGD (Rat Genome Database)
http://rgd.mcw.edu/
(Entrez) Gene DDBJ/ EMBL/ GenBank
http://www.ncbi.nlm.nih.gov/entrez/query. fcgi?db=gene http://www.ddbj.nig.ac.jp/ http://www.ebi.ac.uk/embl/ http://www.ncbi.nlm.nih.gov/Genbank Overview.html Ensembl http://www.ensembl.org/
RefSeq http://www.ncbi.nlm.nih.gov/RefSeq/ (Reference Sequence collection) UniProt
http://www.uniprot.org
PDB (RCSB http://www.rcsb.org/pdb/ Protein Data Bank) InterPro http://www.ebi.ac.uk/interpro/
nomenclature of human genes nomenclature of mouse genes, gene expression, genomic sequences nomenclature of rat genes, ontologies, sequences summarized gene information nucleotide sequences (DNA, RNA) multiple species genome browser reference sequences for genomic DNA, transcripts (RNA), proteins protein information and sequences macromolecular 3-D structure data protein families and functional domains
3. Specialized Databases on Transcription Regulation A fundamental question during all life processes from differentiation/development to the reaction to external signals is therefore how this ordered spatio-temporal pattern of information decoding and realization
Databases on Gene Regulation
713
within an organism is accomplished and controlled. How are the switches which are responsible for gene regulation encoded and how are the y recognized by transcription factors? To answer such questions, huge amounts of data have been accumulated in the literature. To make these data accessible to analysis with bioinformatic tools and to provide a basis for predictions on the regulation of new (uninvestigated) genes, these data are collected in a structured way in a number of databases. To mention the most relevant ones: EPD, Eukaryotic Promoter Database50; ooTFD, Object-Oriented Transcription Factor database15; TRRD, Transcription Regulatory Region Database31; PlantCARE35; PLACE, Plant cis-acting regulatory DNA elements22; SCPD, Saccharomyces cerevisiae promoter database61; RegulonDB47 and PRODORIC41, transcription regulation in prokaryotes. In the following, we will describe in more detail TRANSFAC® and the accompanying databases TRANSCompel® and SMARt DB 3.1. TRANSFAC® Historically, TRANSFAC® is the first data collection in the field57. Currently, it presents the largest archive of transcription factors, their binding sites a unique library of positional weight matrices30'58'38 (Figure 1). The core of TRANSFAC® is formed by three tables: for transcription factors (FACTOR table, 5711 entries), the genes regulated by them (GENE table, 7500 entries in release 8.3, September 2004)^ and the transcription factor binding sites (TFBS; SITE table, 14406 entries in total), by which the factors act upon their target genes. The primary basis for the admission of a site is its experimentally proven interaction with a protein. The borders of a site sequence depend on the method used for detection of the site. What are the features within a site which are recognized by a factor? To approach this question, the commonalities of the binding sequences for one and the same factor are investigated: The collected sites for a factor are aligned and a nucleotide distribution matrix (NDM) is derived. These matrices are stored in a separate table (MATRIX table, 735 entries). Some of them are based on in wiro-selected synthetic sequences. Due to the large set of sites, these latter matrices usually have a good statistical basis and through the random collection of the sites they can be assumed to reflect the binding preference of the factors quite well, at least under in vitro conditions (although that does not necessarily imply that the sequences with the highest affinity for a factor are the most frequent ones in the genome).
714
O. Kel-Margoulis et al.
Fig. 1. TRANSFAC database structure: main tables and their relations.
Several programs such as Matlnspector45 or Match (which is incor25 porated into TRANSFAC® utilize the matrix library of TRANSFAC® to predict binding sites in DNA sequences. The expression patterns of transcription factors that have been included in TRANSFAC®, and the tissueTM
specific profiles, which we define in Match as sets of matrices for factor groups involved in the tissue- or condition-specific gene expression, allow to generate testable hypotheses on the where and when of the expression of the genes under investigation. The GENE table (Figure 1) is shared by all databases of the TRANSFAC system which additionally comprises PathoDB, SMARt DB, TRANSPRO, and TRANSPATH58. It therefore serves as a source for links to an increasing number of internal and external databases. In TRANSFAC® itself a gene entry connects all TFBS which are involved in the regulation of this gene, hierarchically organized from single sites through composite elements up to whole promoters, enhancers and locus control regions. In all of these regulatory regions different signals may converge and can be integrated, resulting in the turning-on or -off of the expression of the gene. For those genes which themselves encode transcription factors, direct links to the encoded factors in TRANSFAC® are additionally given, which allows the retrieval and visualization of gene regulatory networks with a specifically adapted version of the PathwayBuilder tool (see 4.3). These gene regulatory networks in TRANSFAC® are embedded in the overall network of signal transduction featured by the TRANSPATH® database.
Databases on Gene Regulation
715
3.2. TRANSCompe® - A Database on Composite Regulatory Elements The TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their closely situated target sites and, thus, providing specific features of gene regulation in a particular cellular context. We define a composite element as a combination of TFBS which, as such and through protein-protein interactions between the TFs involved, provides an own regulatory feature27'28. Experimentally defined composite regulatory elements contain two or three closely situated binding sites for distinct transcription factors, and represent minimal functional units providing combinatorial transcriptional regulation. TRANSCompel® comprises more than 400 examples of composite elements (8.3, September 2004). Both specific factor-DNA and factorfactor interactions contribute to the function of composite elements, which is supported by resolved crystal structures of several ternary complexes. There are two main types of composite elements (CEs): synergistic and antagonistic ones. In synergistic CEs, simultaneous interactions of two factors with closely situated target sites results in a synergistic transcriptional activation. Within an antagonistic CE two factors interfere with each other. Interacting factors may differ in their functional properties. According to the specific transcriptional regulation they provide (inducible, tissuerestricted, ubiquitous or constitutive transcriptional activation), CEs could be functionally classified28. Interestingly, not all conceivable combinations are equally found, and by far the largest group is that where two inducible factors cooperate to integrate the effects of two signal transduction pathways. Information about the structure of known composite elements and specific gene regulation achieved through such composite elements has proven to be extremely useful for the predictgion of promoter and gene functions. TM
To enable this, the program CATCH searches potential composite elements in DNA sequences28 and a more specific tool identifies potential NFAT-AP1 composite elements24. 3.3. SMARt DB - A Database on Scaffold / Matrix Attached Regions One aim of the SMAR transaction database, SMARt DB, is to enhance the understanding of gene regulation with respect to functional domains36. Another major goal is to obtain systematic insight into SMAR fine structure
716
O. Kel-Margoulis et al.
and to aid the prediction of these elements from mere sequence data37. Multiple lines of evidence suggest that the eukaryotic chromatin is organized in the form of functionally independent loop domains19. This is mainly achieved by scaffold or matrix attached regions, SMARs, of the genome which are tightly bound to the proteinaceous scaffold structure of the nucleus18. SMARs have also been assigned to function in gene expression where they are regarded as a distinct class of cis-acting elements affecting transcription regulation51. Despite their important role SMARs are vaguely defined on the sequence level13'37. Numerous nuclear proteins/factors that contribute to the nuclear architecture by interacting with the base of the chromatin loops have been characterized. For 80 of them structural and functional features are collected in SMARt DB (release 2.2, September 2004). The core structure of the database includes three interlinked tables: SMAR, SMARbinder and Gene36. The SMAR table gives information on more than 450 individual experimentally proven DNA elements (September, 2004). The sequences therein are assigned to more than 200 genes of eukaryotic species ranging from yeast to human. Genes are collected in the Gene table and the SMARbinder table contains information on SMAR binding proteins. 4. Databases on Biomolecular Interactions and Signaling Networks 4.1. Regulatory Networks: General Properties and Peculiarities To understand cell biology at the system level, one needs to examine the structure and dynamics of cellular function, rather than the characteristics of isolated molecular parts of a cell29. Intracellular processes are governed by numerous highly interconnected interactions and chemical reactions between various types of molecules such as proteins, DNA, RNA, and metabolites and can be viewed as a complex network. Being a kind of abstraction, such representation provides a general framework for describing biological objects in a formal and universal language. That enables to represent and analyze complex intracellular systems at different levels of their complexity5'39. Intracellular networks are commonly sub-divided into the partially overlapping metabolic, protein-protein interaction and signal transduction networks59. Signal transduction networks focus on cascades of reactions in-
Databases on Gene Regulation
717
volving numerous proteins that transmit signals from the cell surface to specific internal targets to change their activities. It is the characteristic of signaling cascades that each key component becomes activated by the previous step to activate the key molecule of the subsequent reaction. Networks are commonly modeled by means of graphs, where vertices and edges may represent components and reactions between them respectively. Networks and their graphs can be either directed or undirected. Protein interaction networks (and their graphs), for instance, are undirected59, whereas metabolic, transcriptional, and signal transduction networks are directed. To characterize the hierarchy of regulatory networks organization, the architecture of the network might be formally represented by means of constructions with increasing complexity: steps (reactions and the components involved), paths (linear combinations of subsequent steps), pathways (combination of paths starting at the same entry but having mutiple exits, or vice versa), and networks of pathways43. Studying the characteristics of the overall signal transduction network in eukaryotic (mainly: vertebrate) cells, we could show that they have scalefree properties (Potapov et al. in preparation), i. e. the distribution of the connectivity of their nodes following a power-law5. This was shown before to be true for several other cellular networks as well23'46'55, though not for all of them16. 4.2. Variety of Databases on Protein Interactions and Signaling Networks The interplay between protein interaction and signal transduction networks is far from being obvious. Protein interactions shape the processes of signal transduction (ST). However, this does not mean that all features of protein interaction networks might automatically be extrapolated on ST networks. First, signal transduction networks are directed. Second, protein interaction networks are expected to be much larger than ST networks. Third, ST networks also comprise non-proteinaceous components. The databases in this field can be categorized accordingly (Table 2). ST databases model the transduction of signals from extracellular ligands via cell surface receptors into cytoplasm and further to several targets, mainly into the nucleus where affected transcription factors regulate their target genes32'34'52. One of the earliest signaling databases was CSNDB52. It models and visualizes signaling pathways and was the first to come up with the idea of pathway
718
O. Kel-Margoulis et al.
classification based on a cell signaling ontology53. Table 2. List of Databases on Protein Interactions and Signaling Networks |
Database | URL Protein interaction databases BIND http://bind.ca (Biomolecular Interaction Network Database) DIP http://dip.doe-mbi.ucla.edu (Database of Interacting Proteins) HPRD http://hprd.org/ (Human Protein Reference Database) IntAct http://www.ebi.ac.uk/intact MINT http://mint.bio.uniroma2.it/mint/ (Molecular INTeractions database) Proteome http://proteome.incyte.com/ Bioknowledge Library
[ Focus
|
biomolecular interactions, complexes, pathways protein-protein interactions human proteins, protein interactions protein interactions protein interactions protein information, protein interactions
Signal transduction databases aMAZE http://www.amaze.ulb.ac.be
biomol. interactions, metabolic and signaling pathways CSNDB http://athos.is.s.usignaling tokyo.ac.jp/ace/ pathways Reactome metabolic and (Genome http://www.genomeknowledge.org/ signaling knowledge- http://www.genomeknowledge.org/ pathways base) http://www.biobase.de/pages signaling TRANSPATH /products/transpath.html pathways, network analysis
The largest curated and publicly available interaction database is BIND4
Databases on Gene Regulation
719
with more than 95,000 interactions (August 2004). BIND and MINT60 use text-mining methods to preselect relevant articles for subsequent manual annotation. Databases that rely purely on text-mining tools and automatic annotation lack accuracy and precision with error rates that constrict their use for analysis12. As high-throughput data carry the risk of having rather high falsepositive rates, it is necessary to assess the reliability of mass data. Thus, the DIP team has developed methods for quality assessment, e.g. by comparison with a core data set48. Several of the databases, for instance BIND, MINT and IntAct20 provide interfaces where researchers can directly submit their data in a suitable format to speed up data acquisition. Two initiatives have been launched that aim to set standards and exchangeable formats to avoid redundant work and to make a future unification of the databases possible: the molecular interaction (MI) format from the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI)21 and the BioPAX initiative for creating a pathway exchange format (http://www.biopax.org/). 4.3. TRANSPATH® Pathways
- A Database on Signal Transduction
The core of the TRANSPATH® database11-32'49 includes four main tables: Molecule, Gene, Reaction and the recently added Pathway table (Figure 2). Here, the underlying data model is that of a bipartite directed graph with the node classes "molecule" and "reaction".
Fig. 2. Schematic structure of the TRANSPATH® database. Main tables are Molecule, Reaction, Gene and Pathway.
Molecules are represented by polypeptides, non-phosphorylated or phosphorylated on particular amino acids, protein complexes, as well as by
720
O. Kel-Margoulis et al.
lipids or small molecules, among others. A molecule hierarchy is described in terms of "family", "group", "ortholog", "basic", and "isoform". All together, there are more than 20,100 molecules presented (release 5.3, September 2004). More details can be found at http://www.biobase.de. Regulatory reactions are commonly described in the literature by means of semantic and/or mechanistic representations. A semantic representation displays causal relationships between selected molecules and indicates the direction of signal flow between components. It avoids specific details, which often are not yet known, and gives a broad overview of regulatory events. In contrast to that, the mechanistic representation is complete and depicts the underlying biochemical mechanism. That means that semantic and mechanistic descriptions relate to different levels of abstraction. In TRANSPATH®, the reaction hierarchy is combined with the molecule hierarchy. There are five kinds of reaction descriptions in the database: semantic, indirect, pathway step, decomposition, and molecular evidence. "Semantic reactions" show the connectivity within a network. "Indirect reactions" are those where a signal donor exerts an effect on a distant molecule and where the in-between steps are unknown. "Pathway step" reactions depict the biochemical details of signal transduction cascades. "Decompositions" explain the mechanism and identify the acting molecules of reactions that occur within complexes. They are always linked to a pathway step. "Molecular evidence" reflects experimental conditions as they are published in primary literature and has a quality value assigned11. Reaction hierarchy is shown in Figure 3.
Fig. 3. Reaction hierarchy is introduced to provide different levels of abstraction. Examples of reactions at different levels are shown. The notation "-Jak2(m)->" points to a reaction that is catalyzed by murine Jak2.
The Pathway table contains two types of entries: pathways and chains. Pathways reflect canonical pathways for specific signaling molecules and
721
Databases on Gene Regulation
are made up of one or more chains. Chains are sets of reactions that have been experimentally proven to occur sequentially. Chains that are linked to pathways consist of reactions of the type 'pathway step' and in some cases 'indirect' reactions. Chains may have bifurcations and even loops. In this case, the hierarchical complexity of reaction increases in the order 'pathway step —> chain —> pathway'. The incorporated tools PathwayBuilder
TM
TM
and ArrayAnalyzer
allow
TM
visualization and analysis of the signaling network. PathwayBuilder constructs all potential pathways based on collected individual pairwise reactions. Connection and integration with the TRANSFAC® database allows to outline the whole pathway between extracellular signal molecules and TM
the genes that respond to these triggers. Application of ArrayAnalyzer for gene expression data analysis is described in more detail in the last section of the chapter. 5. Application of the Databases for Causal Interpretation of Gene Expression Data New high-throughput technologies generate mass data on gene expression. Computer-aided interpretation of the data frequently makes use of gene annotation, in particular Gene Ontology17. This approach generally allows only to draw conclusions about the effects of the observed gene induction phenomena (downstream modeling), but does not provide any direct and stringent clue on their causes. However, the reliable identification of key molecules as, e. g., potential drug targets requires at least easily testable hypotheses about the causes of the observed gene inductions (upstream modeling). Making use of the resources described here, upstream modeling is done in two principal steps (Figure 4): (1) A thorough sequence analysis of promoters and other regulatory regions of the genes that are upand down-regulated under certain condition. At this step we find putative transcription factor binding sites (TFBS) and their characteristic combinations as well as make suggestions about transcription factors involved. (2) Reconstruction of signaling pathways that may regulate the activity of the suggested transcription factors and identification of the key nodes in the signal transduction network that controls the observed set of genes. 5.1. Analysis of Promoters The specificity of gene transcription is mainly defined by combinations of TFs acting on their target promoters and enhancers by forming dynamic
722
O. Kel-Margoulis et al.
Fig. 4. Schema of downstream and upstream modeling of gene expression data.
function-specific complexes, "enhanceosomes". At the level of DNA, this is conferred by specific combinations of TF binding sites located in close proximity to each other. We refer to such structures as composite modules (CMs). Genes that are co-expressed in the microarray experiment are assumed to share similar features in their promoters. Therefore we search for common CMs in promoters of co-expressed genes. For predicting TF binding sites in nucleotide sequences we have develryi-k K
oped Match 25. The algorithm of this program is similar to the matrixbased method described earlier45. TFBS predicted by Match are then used to identify particular combinations that may constitute a CM which may be associated with a certain regulatory feature. This analysis is done by "CMFinder", a tool that is based on a genetic algorithm26. We apply the genetic algorithm in order to find optimal matrix combinations and to tune the parameters of the scoring function so that we can discriminate promoters of co-expressed genes from background sequences, e.g. promoters of genes that did not change their expression on the same experiment. If the score computed for a certain TFBS combination in the promoter X exceeds a denned threshold, the promoter X is considered as containing the corresponding CM and therefore is suggested to mediate a specific regulation of the gene.
723
Databases on Gene Regulation
5.2. Identification of Key Nodes in Signaling Networks To understand the mechanisms of gene expression, microarray data should be analyzed in the context of complex regulatory networks of a cell. Through such networks, few regulators can control expression of large sets alof genes. As an integral part of TRANSPATH®, Array Analyzer lows fast searches for key upstream regulators in the signal transduction network. The network is defined as weighted graph G = (V,E,C), where V = genes U molecules U reactions are the vertices, E are the edges and C : E —+ R+ U 0 is the cost function that defines a non-negative value for each edge. TM
Array Analyzer starts its upstream search from a list of molecule names (subset Vx of V). For instance, a list of genes or a list of TFs derived TM
from previous experiments can be used as input for ArrayAnalyzer Upstream of the target molecules Vx, the program searches for such key nodes, or key molecules, in the signal transduction network from which a signal can reach the maximal number of target molecules in a minimal number of steps. Regulation through key nodes can explain, at least partially, coordinated regulation of a number of genes in microarray experiments. The algorithm of ArrayAnalyzer uses compressed shortestdistance matrices calculated by Floyd's algorithm for data collected in TRANSPATH®. The identification of key nodes is done by assigning a score to every node depending on a given maximal distance (search radius). The found potential key nodes are ranked according to the calculated score Si. The detailed path from regulator i down to the genes Vx (subnetwork Gi—(Vi, Et)) is computed using a modified Dijkstra algorithm. TM
ArrayAnalyzer is extremely fast and can perform the search over large radii. The worst-case computational complexity is O(|V| 2 ). As it has been described above, TRANSPATH® contains information about pathways and chains of consecutive reactions known to take place in certain cellular conditions. Any pathway P is denned by a graph Gv = (Vp, Ep, C) which is a sub-graph of the graph G. We use the information about chains and pathways to improve the acTM
curacy of key node predictions by ArrayAnalyzer , especially to diminish the false positive error. While searching, the priority is given to the potential paths that utilize annotated chains of reactions. Still, predictions beyond the known paths are allowed but with smaller priority. Information about chains and pathways can be smoothly incorporated into initial graph G by introducing additional edges and modifying the cost function C, which
724
O. Kel-Margoulis et al.
results in a new graph G', but without modifying the main algorithm. The cost function is adjustable by the user through a "rigidity parameter" to select the proper balance between rigidity and sensitivity during the search. References 1. H. Agrawal, Extreme self-organization in networks constructed from gene expression data. Phys. Rev. Lett. 89, 268702 (2002). 2. R. Apweiler, et al., UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, D115-D119 (2004). 3. A. Arai, et al., CrkL is recruited through its SH2 domain to the erythropoietin receptor and plays a role in Lyn-mediated receptor signaling. J. Biol. Chem. 276, 33282-33290 (2001). 4. G. D. Bader, D. Betel and C. W. V. Hogue, BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248-250 (2003). 5. A. L. Barabasi and Z. N. Oltvai, Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5, 101-113 (2004). 6. H. M. Berman, P. E. Bourne and J. Westbrook, The Protein Data Bank: A case study in management of community data. Curr. Proteomics 1, 49-57 (2004). 7. A. Bhan, D. J. Galas and T. G. Dewey, A duplication growth model of gene expression networks. Bioinformatics 18, 1486-1493 (2002). 8. E. Birney, et al., Ensembl 2004. Nucleic Acids Res. 32,D468-D470 (2004). 9. C. J. Bult, et al., The Mouse Genome Database (MGD): integrating biology with the genome. Nuceic Acids Res. 32, D476-D481 (2004). 10. S. L. Carter, et al., Gene expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 15, (2004). 11. C. Choi, et al., TRANSPATH® a high quality database focused on signal transduction. Comp. Fund. Genom. 5, 163-168 (2004). 12. I. Donaldson, et al., PreBIND and Textomy mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 4, 11 (2003). 13. S. Ganapathy and G. Singh, Statistical Mining of S/MARt Database. Proc. Atlantic Symp. Comp. Biol. Genome Information System Technol., 235-239 (2001). 14. S. M. Gasser and U. K. Laemmli, A glimpse at chromosomal order. Trends Genet. 3, 16-22 (1987). 15. D. Ghosh, Object-oriented transcription factors database (ooTFD). Nucleic Acids Res. 28, 308-310 (2000). 16. N. Guelzim, S. Bottani, P. Bourgine and F. Kepes, Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet. 31, 60-63 (2002). 17. M. A. Harris, et al., The Gene Ontology (GO) database and informatics resource. Nucl. Acids Res. 32, D258-D261 (2004).
Databases on Gene Regulation
725
18. C. M. Hart and U. K. Laemmli, Facilitation of chromatin dynamics by SARs. Curr. Opin. Genet. Dev. 8, 519-525 (1998). 19. H. H. Q. Heng, et al., Chromatin loops are selectively anchored using scaffold/matrix-attachment regions. J. Cell Sci. 117, 999-1008 (2004). 20. H. Hermjakob, et al., IntAct: an open source molecular interaction database. Nuceic Acids Res. 32, D452-D455 (2004). 21. H. Hermjakob, et al., The HUPO PSI's molecular interaction format-a community standard for the representation of protein interaction data. Nat. Biotechnol. 22, 177-183 (2004). 22. K. Higo, Y. Ugawa, M. Iwamoto and T. Korenaga, Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Res. 27, 297-300 (1999). 23. H. Jeong, et al., The large-scale organization of metabolic networks. Nature 407, 651-654 (2000). 24. A. Kel, O. Kel-Margoulis, V. Babenko and E. Wingender, Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. J. Mol. Biol. 288, 353-376 (1999). 25. A. E. Kel, et al., MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31, 3576-3579 (2003). 26. A. Kel, et al., A novel computational approach for the prediction of networked transcription factors of Ah-receptor regulated genes. Mol. Pharmacol., In press (2004). 27. O. V. Kel, et al., A compilation of composite regulatory elements affecting gene transcription in vertebrates. Nucleic Acids Res. 23, 4097-4103 (1995). 28. Kel-Margoulis, et al., TRANSCompel®: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res. 30, 332-334 (2002). 29. H. Kitano, Systems biology: a brief overview. Science 295, 1662-1664 (2002). 30. R. Kniippel, et al., TRANSFAC retrieval program: a network model database of eukaryotic transcription regulating sequences and proteins. J. Comput. Biol. 1, 191-198 (1994). 31. N. A. Kolchanov, et al., Transcription Regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids Res. 30, 312-317 (2002). 32. M. Krull, et al., TRANSPATH®: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 31, 97-100 (2003). 33. T. Kulikova, et al., The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 32, D27-D30 (2004). 34. C. Lemer, et al., The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids. Res. 32, D443-D448 (2004). 35. M. Lescot, et al., PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 30, 325-327 (2002). 36. I. Liebich, J. Bode, M. Frisch and E. Wingender, SMARt DB - A database on scaffold/matrix attached regions. Nucleic Acids Res. 30, 372-374 (2002). 37. I. Liebich, et al., Evaluation of sequence motifs found in scaffold / matrix atteached regions (SMARs). Nucleic Acids Res. 30, 3433-3442 (2002). 38. V. Matys, et al., TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374-378 (2003).
726
O. Kel-Margoulis et al.
39. R. Milo, et al., Network Motifs: Simple Building Blocks of Complex Networks. Science 298, 824-827 (2002). 40. N. J. Mulder, et al., The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315-318 (2003). 41. R. Munch, et al., PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266-269 (2003). 42. E. A. Nigg, Mitotic kinases as regulators of cell division and its checkpoints. Nat. Rev. Mol. Cell Biol. 2, 21-32 (2001). 43. A. Potapov and E. Wingender, Modeling the architecture of regulatory networks. Proc. German Con}. Bioinformatics, 6-10 (2001). 44. K. D. Pruitt, T. Tatusova and D. R. Maglott, NCBI Reference Sequence Project: update and current status. Nucl. Acids Res. 31, 34-37 (2003). 45. K. Quandt, et al., Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878-4884 (1995). 46. E. Ravasz, et al., Hierarchical organization of modularity in metabolic networks. Science 297, 1551-1555 (2002). 47. H. Salgado, et al., RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acids Res. 32, D303-306 (2004). 48. L. Salwinski, et al., The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449-D451 (2004). 49. F. Schacherer, et al., The TRANSPATH signal transduction database: a knowledge base on signal transduction networks. Bioinformatics 17, 10531057 (2001). 50. C. D. Schmid, et al., The Eukaryotic Promoter Database EPD: the impact of in silico.primer extension. Nucleic Acids Res. 32, D82-D85 (2004). 51. D. Schiibeler, C. Mielke, K. Maass and J. Bode, Scaffold/matrix-attached regions act upon transcription in a context-dependent manner. Biochemistry 35, 11160-11169 (1996). 52. T. Takai-Igarashi and T. Kaminuma, A pathway finding system for the cell signaling networks database. In Silico Biol. 1, 129-146 (1999). 53. T. Takai-Igarashi and R.Mizoguchi, Cell signaling networks ontology. In Silico Biol. 4, 81-87 (2004). 54. S. Twigger, Rat Genome Database: mapping disease onto the genome. Nucl. Acids Res. 30, 125-128 (2002). 55. A. Wagner and D. A. Fell, The small world inside large metabolic networks. Proc. R. Soc. Lond. B. Biol. Sci. 268, 1803-1810 (2001). 56. H. M. Wain, et al., Genew: the Human Gene Nomenclature Database, 2004 updates. Nucleic Acids Res. 32, D255-D257 (2004). 57. E. Wingender, Compilation of transcription regulating proteins. Nucleic Acids Res. 16, 1879-1902 (1988). 58. E. Wingender, et al., TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316-319 (2000). 59. Y. Xia, et al., Analyzing cellular biochemistry in terms of molecular networks. Annu. Rev. Biochem. 73, 1051-1087 (2004).
Databases on Gene Regulation
727
60. A. Zanzoni, et al., MINT: a Molecular INTeraction database. FEBS Lett. 513, 135-140 (2002). 61. J. Zhu and M. Q. Zhang, SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15, 607-611 (1999).
CHAPTER 12 ON THE SEARCH OF BETTER VALIDATION AND STATISTICAL METHODS IN MICROARRAY DATA ANALYSIS He Yang Bioinformatics Institute, Singapore henryy@bii. a-star. edu. sg The amount of data collected from microarray experiments is massive and overwhelming. Careful analysis is required to make sense of these data. Biocomputing applied in microarray gene expression profiling mainly involves a series of analysis steps with large sets of data.
1. Introduction The DNA array technology is emerging as a powerful tool for large-scale gene expression analysis. In comparison to traditional single gene expression profiling methods such as RT-PCR and Northern blotting, this highthroughput technique can reveal expression levels of a large number of genes, if not the whole genome of an organism simultaneously, hence enabling biologists to have a glimpse at a whole picture of the genome. Using a comparative approach, this high-throughput technique opens a new frontier in gene discovery. By comparing the gene expression profiles of a diseased (or drug-treated) tissue with those of a normal (or untreated) tissue or gene expression profiles of one phenotype (or mutant) with those of another phenotype (or wild type), differentially expressed genes can be extracted. Such information is important in relating the disease, the drug effect or the phenotype with the genome type. As a result, the detection of all differentially expressed genes at one shot using microarray technology can turn the traditional blind (random) or empirical (guess-wise) gene discovery towards more guided gene discovery (Figure 1). Extraction of differentially expressed genes from microarray data is, however, not so straightforward. Several data analysis steps must be undertaken before we can extract differentially expressed genes reliably from microarray data. 729
730
H. Yang
Fig. 1. Microarray technology used in comparative genomics for gene discovery
2. Microarray Analysis Steps The microarray data analysis steps include preprocessing, normalization and identification of differentially expressed genes. Other analysis steps such as clustering, classification and networking can be further applied to find out co-expression patterns, to extract marker genes, and to estimate regulatory interactions. In this chapter, discussions will be focused on the analysis steps up to identification of differentially expressed genes. 3. Preprocessing The detected intensity (acquired from image analysis) of a gene spot i can be expressed as the sum of specific (xns,i) &nd nonspecific (xsPti) bindings plus background noise (xb,i) : Vi = %sp,i + Xns,i + %b,i
(1)
The preprocessing step involves removal of nonspecific binding and background intensities. The background correction is currently done by using the intensity of the area surrounding the spot. For nonspecific binding corrections, specially designed spots are often used. The measured intensity of such a spot j is assumed to be derived from the nonspecific binding and background intensity only Vj = Xnsj + Xb,j
(2)
Better Validation and Statistical Methods in Microarray Data Analysis
731
In spotted cDNA microarray platforms, foreign DNA sequences without any or with minimal homology to the DNA sequences of the organism under study are used to ensure that no specific binding is reflected in the nonspecific binding spots10. However, these negative control spots are not particularly designed for individual gene sequences but rather as a rough estimation of the overall (average) nonspecific binding strength. In the Affymetrix setup, a mismatch spot is designed for each gene probe. The nonspecific binding is estimated using the mismatch probe, which has a mutation in the middle of the specific oligonucleotide sequence. Since nonspecific binding may come from other sequences with other types of mutations rather than a mutation in the middle, the nonspecific binding estimated this way might again just be a rough estimation. A more critical point to Affymetrix mismatch spots is probably that the intensity of a mismatch spot may be contributed mainly by the specific binding as the targeted sequence is highly complementary to the probe sequence on the mismatch spot. The current approach for removal of nonspecific binding and background noise from measured intensities is based on the assumptions that xns^ = %ns,j , %b,i = x'bi and Xbj = x'b p and that Equation 2 holds its validity. Subtraction of Equation 2 from Equation 1 results in:
Xsp,i = Vi ~ X'b,i ~ &
~
X
'b,j)
(3)
As discussed earlier, nonspecific binding may not be accurately described by Equation 2. This inaccuracy may not be significant when the specific binding is much higher than the nonspecific binding. However, for a spot with weaker gene expression, subtraction of nonspecific and background noise can lead to quite incorrect results, e.g. negative specific binding intensities. Several empirical methods have been proposed to cope with the failure of Equation 3 when xsp^ < 0. Those empirical methods can be summarized as follows: 1) no nonspecific/background corrections, i.e. xSp,i = Hi, or 2) assignment of a small value for specific binding, e.g. xsp^ = 0.5, or 3) shrinkage used for building the ratio of specific binding intensities of two different arrays or channels. A method which should not be recommended at all is elimination of those spots if Equation 3 delivers a negative or small positive value. Unfortunately, many image-processing software tools declare those spots with A for absence or no call and users are advised to eliminate them due to low hybridization quality. A statistically sounder method is to use two times of the standard deviation Sns+b
732
H. Yang
of nonspecific binding/background intensities to replace the small specific binding intensity, i.e. xsPti = 26ns+b, if xsPti <
25n3+b10.
All these methods assume the correctness of Equation 3 and try to correct it when a negative specific intensity occurs. However, issues pertaining to a better estimation of nonspecific binding and background correction both experimentally and computationally have been overlooked so far. 4. Normalization Besides the nonspecific binding and background noise, several inherent measurement errors are contained in the microarray measured intensities themselves. Due to variations in labeling, hybridization, spotting or surface characteristics, detected expression intensities resulting from the same amount of mRNA can differ from experiment to experiment. Variations or measurement errors can be classified into two types: system variations (or systematic errors) and random variations (or random errors). System variations are supposed to be accounted for by normalization. To handle random errors, appropriate statistical methods should be selected in both normalization and identification of differentially expressed genes. Global normalization using the mean or median of expression intensities and normalization based on housekeeping genes are the most widely used methods for both one-channel and two-channel arrays. Other methods include an iterative method using a ratio probability density based on housekeeping genes3 and a method using only genes with constant expression levels1. Due to different labeling and detection efficiencies for various fluorescent dyes, two-channel arrays require nonlinear normalization protocols rather than a constant normalization factor4'11'13'14. The most well received method, Lowess, provides a normalization factor by robust local regression of expression log ratio against expression intensity. Lowess normalization with global14, print-tip14 or rank-invariant11 genes has been proven a useful nonlinear normalization method. Utilizing self-hybridization results, Chua et al.4 showed that the cross-correlation of the distribution of log ratios in an intensity interval with self-hybridization results could lead to a more robust normalization method than Lowess. With the various normalization methods mentioned above, it is often a question of which normalization method to use. Experimental validation methods will address this question later in the chapter. Some theoretical criteria that govern good normalization method will now be presented. Normalization of array data involves two steps: 1) selection of genes to be used
Better Validation and Statistical Methods in Microarray Data Analysis
733
for normalization, and 2) application of a mathematical operator or metric to calculate the normalization factor using the data of only selected genes. Genes selected for normalization should be theoretically restricted to non-differentially expressed genes. Differentially expressed genes intrinsically possess three unknown variations (system, random and biological variations) whereas non-differentially expressed genes have only two (system and random variations). As the aim of normalization is to predict systematic variations, it is naturally easier to estimate this quantity using non-differentially expressed genes as only two variables are confounded together instead of three. This leads us to the next point: the normalization operator/metric should be effective in coping with random errors as estimates of system variations from non-differentially expressed genes contain a random component. For the same reason, the number of genes used for normalization should also be statistically representative (large). Normalization based on a small subset of all genes, such as traditional housekeeping genes, may not therefore be a good method. In terms of better normalization operators/metrics (median, local regression, or cross-correlation), Chua et al.4 showed that cross-correlation could be more robust than Lowess, particularly when a relatively large number of genes are differentially expressed. 5. Identification of Differentially Expressed Genes Posterior to preprocessing and normalization, we can finally perform identification of differentially expressed genes (gene identification in short). Assuming that normalization has totally removed the system variations, gene identification methods should then search for significant fold changes beyond random errors. Methods for gene identification available in the literature can be broadly categorized into those that are used for a single two-channel array or a pair of one-channel arrays (single array) and those that utilize a set of replicates (replicate arrays). Methods used for single-array gene identification include minimal fold change and percentile cutoffs. Selection of differentially expressed genes based on setting an intensity threshold and a minimal fold change (typically 2-3) is the traditional, frequently used method to discard genes with low expression levels and insignificant fold differences. In the percentile methods, only top ranked percentile genes are declared as differentially expressed. These methods are associated with some problems due to the lack of intensity dependence. At higher expression levels, noise is relatively small compared to signal intensity. A smaller fold change at very high expression
734
H. Yang
levels might therefore be significant. On the other hand, at moderate expression levels 2- or 3-fold changes may not be significant due to relatively higher noise-to-signal ratios. Yang et al.13 showed an improved intensitydependent percentile version with usage of self-hybridization (or identical replicates) to get the correct cutoff of ranking. There are also other statistical methods for single-array gene identification. Using a ratio probability density, upper and lower boundaries (masks) for differentially expressed genes can be derived at a given confidence level3. Such masks are again independent of intensity. Recognizing this problem, Newton et al.s used a Gamma-Gamma-Bernoulli model to develop the ratio probability density and presented intensity-dependent masks for identification of differentially expressed genes at various posterior odds. Such masks are however wider at both low and high expression intensities. For gene identification using replicate arrays, there are a number of statistical methods: t-statistic, ANOVA7, modified t-statistic (SAM, samroc)2'12, B statistic6, hierarchical modeling11 and Wilcoxon rank sum statistic5. Several authors showed that modified statistic performs superior to t-statistic due to more effective elimination of false positives by the fudge constant induced in a modified t-statistic 12. Recently, it has been shown that another modified version of t-statistic (called samroc statistic) came out better than or as good as the modified t-statistic (SAM)2. This result was obtained based on a set of spike-in experiments with a very small number (14) of spiked genes. All these methods are however associated with some problems. First, as explained earlier, gene identification should look for significant fold changes beyond random errors (or variations) and signal-to-noise ratio is intensity-dependent. However, none of these methods considers such dependence. Second, most of these methods do not utilize self-hybridization results, which can serve as rough evaluation for array quality. Despite differences in array quality, the same statistic parameters are currently used across different arrays. Recently, Chua et al.4 introduced an intensity-dependent modified t-statistic, which improved the modified t-statistic by integrating the intensity or log ratio standard deviation (intensity window-wise) into the fudge constant.
6. Validation Strategies As discussed, there is a variety of analysis methods at each step, and more methods will be proposed, which makes microarray data analysis actually very confusing. Add to the fact that there are numerous analysis software
Better Validation and Statistical Methods in Microarray Data Analysis
735
tools freely downloadable or commercially available, the doubt that microarray data analysis results are highly subjective is increasing. To ensure a high quality in data analysis, experimental validation should be performed to assess different data analysis methods. The validation strategy includes three steps. The first step involves creation of a set of benchmark data. Recently, several experimental methods have been proposed for creation of validation data sets4'13. In those validation experiments, differentially expressed genes can be experimentally either evaluated or manipulated. Thus, a set of truly differentially expressed genes are available and can serve as references. At the second validation step, microarray data analysis is performed with different combinations of preprocessing, normalization and gene identification methods. The final step involves comparison of the predicted differentially expressed genes generated by using all possible combinations of preprocessing, normalization and gene identification methods with the truly differentially expressed genes. The prediction errors indicate the performance of different analysis methods (Figure 2).
Fig. 2. Validation strategy for microarray data analysis
Not only different experimental validation approaches and data analysis methods but also different definitions of prediction errors will produce different validation results. Using normalization and gene identification errors, Yang et al.13 validated different normalization and gene identification methods. Statistical validation methods are more powerful and accurate in describing prediction errors. A popular approach for estimation of predic-
736
H. Yang
tion errors is based on a receiver operating characteristics (ROC) curve. To obtain a ROC curve, either false discovery rate (FDR) versus true discovery rate (TDR) or false positives (FP) versus false negatives (FN) can be used. Differential expression can be either significant up-regulation or significant down-regulation. One point that none of statistical methods has highlighted is that identification of an up-regulated gene as down-regulated one is indeed a totally wrong prediction. Such a wrong discovery (WD) is, however, currently classified as a true discovery or true positive (TP). Using different validation data or approaches, it has been shown by Yang et al.13 that wrong discovery is possible, particularly for those genes with low expression intensities, such as transcription factors.
7. Experimental Validation Methods 7.1. Self-hybridization
or Identical
Replicates
As mentioned previously, normalization is used to minimize systematic errors, and gene identification involves looking for biological variations beyond random variations. The correct characterization of random variations is indeed essential to the success of identification of differential expression. Self-hybridization experiments or identical replicates provide a unique opportunity for doing so since posterior to normalization, all variations in fold changes should be traced back to random variations. However, utilization of self-hybridization experiments or identical replicates is currently lacking. For example, 2-fold change is used as the threshold for selection of differential expression cross various microarray platforms, regardless of differences in array quality or magnitude of random variations. As discussed earlier, the question whether 2-fold change is rightfully chosen or not can be easily checked by self-hybridization experiments or identical replicates. If a large fraction of genes in self-hybridization experiments or identical replicates have fold change greater than or close to two, application of 2-fold change for the differential expression cutoff is obviously not very reasonable. On the other hand, if almost none of genes show a fold change larger than two in such experiments, 2-fold change can be a rightful threshold. It should be noted that fine differentiation of different gene identification methods cannot be achieved by using self-hybridization data. 7.2. Quantitative RT-PCR RT-PCR is often used for validation of microarray data. When using quantitative RT-PCR for validation of microarray data analysis methods, the
Better Validation and Statistical Methods in Microarray Data Analysis
737
number of measurements should be large. In addition, among the measured genes, the fractions of differentially expressed genes and non-differentially expressed should be roughly balanced. Otherwise, either TDR or FDR is not representative. Due to the sample size limitation, only a limited number of RT-PCR may be performed with one sample, but a large amount of samples will eventually give a large number of RT-PCR measurements. This concept was suggested by Yang et al.13 by pooling microarray results and RT-PCR measurements of all samples together to carry out validation. They presented a validation procedure with 10 genes over 15 different pairs of one-channel arrays13. Using replicates of quantitative RT-PCR measurements, the significance of the mean of log ratios for each gene in each sample pair with respect to a certain fold change (e.g. 1.5) can be determined. After having passed the test of significance, a differentially expressed gene is declared. It should be noted that the same gene can be declared as differentially expressed in one sample pair but not in another sample pair. In this way, 148 (2 samples were contaminated) references were created. On the other hand, different gene identification methods data analysis delivered different sets of predicted differentially expressed genes. By defining a total identification error, different gene identification methods were assessed13. It should be pointed out that unlike a ROC curve such point-wise comparison may not really reflect the best performance of each analysis method, and thus, may lead to biased (unfair) results. What we can learn from the table though, is that wrong discovery should be introduced. 7.3. Mutant versus Wild Type A more robust method for evaluation of wrong discovery can come from the gene-expression analysis of a mutant versus wild type. A mutant can only be qualified as a validation mutant when a large set of genes are deleted or inserted. Yang et al.13 demonstrated this concept by using a mutant C. acetobutylicum strain called M5. This mutant strain is isogenic to the wild type (WT) but lacking a plasmid that contains 178 genes. All these 178 deleted genes are expected to be classified as down-regulated (if expressed in WT samples) or non-differentially expressed (if not expressed in WT samples), when M5 samples are co-hybridized with WT. Since a deleted gene can be either differentially expressed (down-regulated) or non-differentially expressed, a count of truly differentially expressed genes cannot be established. Thus, a prediction error cannot be derived from this validation approach. However, this method can be a useful tool for WD evaluation, since
738
H. Yang
identification of a deleted gene as significantly up-regulated is a WD count. 7.4. Gene Spike-in Experiments The spike-in approach is originally designed to produce control spots on arrays which can be used for assessment of hybridization performance. Spiked transcripts are supposed to have no matches with any genes of the samples under study. Independent of mRNA samples and their qualities, those spiked spots should therefore show up with consistent intensity levels. Recently, Affymetrix performed two sets of spike-in experiments for two different chips to validate the models for constructing the expression intensity of a gene from 1 1 - 2 0 representing probes. For Affymetrix U95 chips, 14 spike-in experiments with 3 replicates at 14 different spike-in concentrations (starting from null concentration) were performed. The number of spiked genes is only 14. Unlike quantitative RT-PCR validation, different experimental results cannot be pooled together to get a higher number of truly differentially expressed genes than 14. Thus, when using those data sets for validation of different analysis methods, the prediction power may be limited. Recently, Chua et al.4 developed a set of spike-in experiments based on the self-hybridization approach. A large number of genes (~200) were selected for spiking. After in-vitro transcription from their respective clones, half of the transcripts were spiked into sample R, which were labeled later with the Red dye, and the rest into sample G, which were labeled later with the Green dye. The spiking was carried out at different concentrations from 0 to 3 pmol. In such spike-in experiments, mRNA levels of 200 genes were manipulated. As a result, when comparing the red channel with the green channel, 100 genes spiked into sample R should be differentially up-regulated, while the other 100 genes spiked into sample G should be differentially down-regulated. Taking these genes as truly differentially expressed genes, statistical validation of different analysis methods can be performed. Using all possible combinations of normalization and gene-identification methods, Chua et al.4 found out that several combinations perform similarly well. The normalization methods such as global Lowess and intensitydependent cross-correlation were equally good. Combination of either global Lowess or intensity-dependent cross-correlation with intensity-dependent modified t-statistic demonstrated the best results. The normalization methods under validation were global median, global Lowess, rank-invariant
Better Validation and Statistical Methods in Microarray Data Analysis
739
Lowess, print-tip Lowess, intensity-dependent cross-correlation, while fold change cut-off, percentile method, t-statistic, modified t-statistic, and intensity-dependent modified t-statistic were used as methods for identification of differentially expressed genes. In order to apply single-array gene identification with array replicates, Chua et al.4 established a protocol based on consistent prediction of single-array gene identification methods over all replicates. Except for the spiking part, spike-in experiments are identical to selfhybridization ones. A drawback of this validation method lies, like selfhybridization, in its insensitiveness for differentiation of different analysis methods, if the number of the spiked genes is small. With 200 spiked genes, this method may be reasonable for validation of different gene identification methods while more spiked genes are required for the method to distinguish intensity-dependent normalization methods. 7.5. Other Validation
Experiments
Situations when a large number of genes are differentially expressed have been suggested to be more frequent than presently assumed by researchers9. In such circumstances, normalization using all the genes can be misleading as a large fraction of genes are differentially expressed. In order to test the robustness of different normalization methods (e.g. global Lowess and intensity-dependent cross-correlation), special experiments can be designed, e.g. heat-shock or deficient media. Chua et al.4 performed array experiments with co-hybridization of mRNA extracted from mouse cells grown in a glutamine-deficient medium with that from mouse cells grown in a glutamine-rich medium. Global Lowess normalization produces a normalization factor (curve) deviated from the center of the main population as more genes become dispersed, while normalization based on intensitydependent cross-correlation delivers a normalization factor remaining at the center of the main population. The difference in normalized fold changes between global Lowess and intensity-dependent cross-correlation normalization can be as large as 1.38. 8. Summary With benchmark data, we are able to validate different microarray data analysis methods. This enables us to answer the question which normalization and gene identification methods are more accurate. As different experimental validation methods have their own advantages and drawbacks,
740
H. Yang
probably not a single validation method but a combination of different validation approaches should be used for selecting the best microarray data analysis method(s) at various analysis steps. References 1. T. Beissbarth, et al., Processing and quality control of DNA array hybridization data. Bioinformatics 16, 1014-1022 (2000). 2. P. Broberg, Statistical methods for ranking differentially expressed genes. Genome Biology 4, R4I. (2003). 3. Y. Chen, E. R. Dougherty and M. L. Bittner, Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics 2, 364-374 (1997). 4. S. W. Chua, et al., Development of microarray benchmark data leads to discovery of novel microarray data analysis methods, submitted. (2004). 5. B. Efron and R. Tibshirani, Empirical Bayes methods and false discovery rates for microarray. Genetic Epidemiol. 23, 70-86 (2002). 6. I. Lonnstedt and T. P. Speed, Replicated microarray data. Stat. Sinica 12, 31-46 (2002). 7. M. K. Kerr, M. Martin and G. A. Churchill, Analysis of variance for gene expression microarray data. J. Comput. Biol. 7, 819-837 (2000). 8. M. A. Newton, et al., On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8, 37-52 (2001). 9. J. Peppel, et al., Monitoring global messenger RNA changes in externally controlled microarrray experiments. EMBO Reports 4, 387-393 (2003). 10. C. Tomas, et al., DNA array-based transcriptional analysis of asporogenous, nonsolventogenic Clostridium acetobutylicum strains SKO1 and M5. J. Bacteriol. 185, 4539-4547 (2003). 11. G. C. Tseng, et al., Issues in cDNA microarray analysis: quality filtering channel normalization, models of variations and assessment of gene effects. Nucleic Acid Res. 29, 2549-2557 (2001). 12. V. G. Tusher, R. Tibshirani and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116-5121 (2001). 13. H. Yang, et al., A segmental nearest neighbor normalization and gene identification method gives superior results for DNA-array analysis. Proc. Natl. Acad. Sci. USA 100, 1122-1127 (2003). 14. H. Yang, et al., Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acid Res. 30, el5 (2002).
CHAPTER 13 INFORMATION EXTRACTION FROM DYNAMIC BIOLOGICAL WEB SOURCES
Roshni Mohapatra and Kanagasabai Rajaraman Institute for Infocomm Research, Singapore [email protected], and [email protected] Extracting useful information from the World Wide Web is increasingly becoming a key task in present-day bioinformatics systems. Wrappers, tools commonly used to address this problem, are tedious to create and maintain since each web site often uses a custom layout and this layout may change unpredictably. To tackle this situation, several systems have been proposed to automatically regenerate the wrappers when the target web sources change their layout. In this chapter, we review some of the state-of-the-art systems and discuss their strengths and weaknesses.
1. Introduction Today a number of biological information sources are available for public access on the World Wide Web (See, for example, the Google directory on Bioinformatics Online Services1). Extracting useful information from these web sources is increasingly becoming a key step in presentday bioinformatics systems that perform advanced inference tasks. Several web sources can be downloaded as structured databases, e.g. MeSH (http://www.nlm.nih.gov/mesh/) and Gene Ontology (http://www.geneontology.org/), and some are accessible as web services, e.g. XEMBL (http://www.ebi.ac.uk/xembl/). Extracting information from these sources is usually straight forward since they are nicely structured. There are many other web sources in the form of HTML-based web sites and query interfaces that can primarily be accessed interactively over the web, e.g. BLAST (http://www.ncbi.nlm.nih.gov/BLAST/), Dragon Genome Explorer (http://research.i2r.a-star.edu.sg/DRAG0N/). Automated extraction from these web interfaces is not trivial. The reason is that the web, which is characterized by diverse authoring styles and content variations, 741
742
R. Mohapatra & K. Rajaraman
does not have a rigid and static structure like relational databases. A web page often contains a lot of extraneous information that makes the extraction task non-trivial. To ease this problem, some web sites offer content in a lightweight XML format called RSS (http://purl.org/rss) (short for "Really Simple Syndication"). Being a structured format, RSS simplifies the extraction problem. However, this is an evolving standard and, at present, RSS is adopted by only a few sites. A more popular approach for web information extraction is to use wrappers. Wrappers are custom routines employed to extract information from a semi-structured web source8. A wrapper can parse a web page and capture the semi-structured data into a structured format. For example, consider an imaginary web site listing TF-TF associations, shown in Figure 1. To extract the TF entities, we can propose a wrapper, say TFWrapper, using the delimiters {(B), {/B), (/), (//)}, where the first two define the left and right delimiters of the first TF and the last two define the corresponding delimiters for second TF. This wrapper can be used to extract the contents of the page in Figure 1, and of any other page, where the same delimiters define the content.
Fig. 1. Original Page
Fig. 2. Changed Page
Biological Web Sources Information Extraction
743
Traditionally wrappers are created by hand-coding, but it is a tedious process. Wrapper induction is a technique proposed for constructing wrappers semi-automatically or automatically, using example pages9. For example, in Figure 1, the wrapper TFWrapper could be induced automatically from examples of (TF 1, TF 2) tuples. Several systems such as RAPIER3, WHISK16, WIEN9, SoftMealy6, and STALKER15 have been proposed for wrapper induction. A lot of commercial systems are also available e.g. Fetch Technologies and VelocityScape (http://www.fetch.com and http://www.velocityscape.com). To induce wrappers, all these systems either use content based features or landmark based features. Content based approaches3'16 use content/linguistic features like capitalization, presence of numeric characters etc., whereas landmark based approaches9'6'15 use delimiter based extraction rules that rely on formatting features to delineate the structure of data. Thus, all these systems are sensitive to changes in the web page format. For example, suppose the site in Figure 1 changes to a new layout as in Figure 2. Now TFWrapper will extract the tuples as (Spl, Puralpha), (YB-1, Ssn6), (Tupl, Nrgl) rather than (COUP-TF, Spl), (Puralpha, YB-1), (Ssn6, Tupl), (Nrgl, Ssn6), and hence has become incorrect. Note that such layout changes may happen unpredictably, and when they do happen, new examples are required for learning the correct wrapper. If many web sites are being wrapped, this could require a lot of human effort. To tackle this situation, systems have been proposed to automatically regenerate the wrappers when the target web sources change their layout. In this chapter, we review some of these systems. 2. Information Extraction from Dynamic Web Sources A dynamic web source is one whose content, layout or both may change over time. Typically the content changes will be more frequent than layout changes. For example, Kushmerick investigated 27 actual sites for a period of 6 months, and found that 44% of the sites changed its layout during that period at least once10. However, the layout may change drastically over a short time frame compared to content changes. Wrappers are hence susceptible to "break" on dynamic web sources, and it is important that wrappers are updated periodically to maintain continued extraction. This is called Wrapper
Maintenance11'12.
Wrapper maintenance consists of two key steps: Wrapper Verification and Wrapper Reinduction. Wrapper verification is the task of monitoring
744
R. Mohapatra & K. Rajaraman
Fig. 3. A Wrapper Maintenance System. the validity of data returned by the wrapper. If the site changes, the wrapper may extract nothing at all or some data that is not correct. The verification system will detect data inconsistency and notify the operator or automatically launch a wrapper repair process. Wrapper reinduction refers to task of repairing the extraction rules so that the new wrapper works on changed page. Reinduction is a tougher problem because it requires new examples to reconstruct the wrapper, which may be expensive. A generic wrapper maintenance system is shown in Figure 3. Given a web source, a wrapper is first designed and used to extract data from the web source. The wrapper verification system monitors this wrapper continually. If found invalid, it triggers the wrapper reinduction system to regenerate the wrapper. This is accomplished through a new set of examples from the modified web source. The new wrapper will then be used to extract the data. The wrapper verification will now monitor the new wrapper for validity and the process repeats. In this way the wrapper is updated periodically so that data is always extracted correctly. Several systems for wrapper maintenance have been proposed in the literature. We survey some of the important ones in the next section.
Biological Web Sources Information Extraction
745
3. Survey of Wrapper Maintenance Systems 3.1. Wrapper Verification Methods 3.1.1. RAPTURE Kushmerick10 proposed a method for wrapper verification that uses a statistical approach. The method relies on obtaining heuristics for the new page, and comparing it against the heuristic data for pre-verified pages to check whether it is correct. First it estimates the number of tuple distribution parameters for pre-verified pages. The mean tuple number and the standard deviation is computed by estimating the feature value distribution parameters for each attribute in the pre-verified pages. For any page, a similar computation of tuple distribution, feature value distribution is done. These values are compared against the values for the pre-verified pages. Based on this, the overall verification probability is computed. This probability is compared against a fixed threshold to determine whether the wrapper is correct or incorrect. RAPTURE uses very simple numeric features to compute the probabilistic similarity measure between a wrapper's expected and observed output. In experiments on numerous actual Internet sources, RAPTURE has been observed to perform substantially better than standard regression testing approaches. For most part, this method uses a black-box approach for measuring overall page metrics and hence it can be applied in any wrapper generation system for verification. However, this approach is ineffective if the statistical features are insufficient to capture the variations. 3.1.2. Forward-Backward Scanning Algorithm The forward-backward scanning approach was suggested by Chidlovskii4 under the assumption of "small change". The assumption is that the pages rarely undergo any massive or sweeping change and more often than not it is a slight local change or concept shift. This method tackles verification by classifiers built using content features of extracted information. The approach makes an effort to extend the conventional forward wrappers with backward wrappers to create a multipass wrapper verification approach. In contrast to forward wrappers, the backward wrappers scan files from the end to the beginning. The backward wrapper is similar in structure to the forward wrapper, and can run into errors when the format changes. However, because of the backward scanning, it will fail at positions different from where the forward wrapper would fail.
746
R. Mohapatra & K. Rajararnan
This can typically work in case of errors generated due to typos or missing close tags in HTML pages, and help to fine tune the answers further. The forward-backward scanning is unique and found to be a robust approach to handle wrapper verification, especially for missing attributes and tags. Tested on 18 websites, this method has a reported accuracy of 95.3% when using the forward-backward wrappers with the context classifier. However, this method is ineffective if no good content features can be found. 3.2. Wrapper Reinduction
Methods
3.2.1. ROADRUNNER ROADRUNNER5 is a method that uses unsupervised learning to generate the wrappers. Pages from the same website are supplied and a page comparison algorithm is used to generate wrappers based on similarities and mismatches. The algorithm performs a detailed analysis of the HTML tag structure of the pages to generate a wrapper to minimize mismatches. This system employs wrappers based on a class of regular expressions, called Union-Free Regular Expressions. The extraction process compares the tag structure between the sample pages and generates regular expressions that handle structural mismatches found between them. In this way, the algorithm discovers structures such as tuples, lists and variations. An approach similar to ROADRUNNER was used by Arasu et al.2. They propose an approach to automatic data extraction by automatically inducing the underlying template of some sample pages with the same structures from data-intensive web sites to deduce templates from a set of template generated pages, and to extract the value encoded in them. However, this does not handle multiple values listed on one page. Since this method needs no examples to learn the wrappers, it has an obvious strength: it provides an alternative way to deal with the wrapper maintenance problem, especially in cases where no labeled examples are available. On the other hand, since ROADRUNNER searches in a larger wrapper space, the algorithm is potentially inefficient. Also, the unsupervised learning method additionally gives little control to the user. The user might want to make some refinements and only extract a specific subset of the available tuples. In such cases, some amount of user input is clearly necessary to extract the correct set of tuples. Another problem of this approach is the need for many examples to learn the wrapper accurately13.
Biological Web Sources Information Extraction
747
3.2.2. DataProg Knoblock et al.7 developed a method called DataPro for wrapper repairing in the case of small mark-up change; it detects the most frequent patterns in the labeled strings; these patterns are searched in a page when the wrapper is broken. Lerman et al.12 extended the above content-centric approach for verification and re-induction for their DataProg system. The system takes a set of labeled example pages and attempts to induce content-based rules so that examples from new pages can be located. Wrappers are verified by comparing the patterns of data returned to the learned statistical distribution. When a significant difference is found, an operator can be notified or the wrapper repair process can be automatically launched. Using this, they locate the examples on the new page, which are passed to a wrapper induction algorithm to re-induce the wrapper. This approach is similar to the approaches used by content-centric wrapper tools3'16. The class of wrappers described by DataProg is very expressive since they can handle missing and rearranged attributes. This approach applies machine learning techniques to learn specific statistical distribution of the patterns for each field as against the generic approach used by Kushmerick10. For many cases like news, scientific publications or even for author names, this approach cannot be used too well since there are no fixed content-based rules (alphanumeric, capitalized, etc.) which can be identified to separate them from other content on the page. 3.2.3. Schema-guided Wrapper Maintenance (SG-WRAM) SG-WRAM is a method that utilizes data features such as syntactic features and annotations, for reinduction. The approach is based on the assumption that some features of desired information in previous document remain same, e.g. syntactic (data types) hyperlink (whether or not a hyperlink is present) and annotation features (any string that occurs before the data field) will be retained. It is also assumed that the underlying schemas will still be the same and are preserved in the changed HTML document. These features help the system to identify the locations of the content in the modified pages though tag structure analysis. Internally the system computes mapping for each one of the fields above to the HTML tree, and generates the extraction rule. For simple changes in pages, this method depends on the syntactic and annotation features, but in case the web site has undergone a structural change, this method uses the schema to locate structural groups and use
748
R. Mohapatra & K. Rajaraman
them to extract data. This method is limited due to the assumption that data of the same topic will always be put together similar to the user defined schema, and will be retained even when the page is changed. If the data schema or the syntactic and tag structure changes, then this method is not effective. 3.2.4. Relnduce Algorithm Relnduce is a recent wrapper reinduction algorithm proposed by Mohapatra et a/.14. This algorithm is based on the observation that many of the dynamic web sites are incrementally updated, i.e. the content changes are small over a sufficiently small time frame. For such web sites, it is possible to find a small time interval such that some of the tuples are retained even if the layout gets modified. This idea is used to locate new examples to reinduce the wrapper. However, the examples located on the new page may be very few and so the wrapper induction algorithm should be able to learn from a small number of examples. Mohapatra et al. have proposed an induction algorithm called Induce that has this property. This algorithm employs the Left-Right (LR)9 wrapper representation. An adapted version of RAPTURE10 called Verify has been used for verification. The reinduction algorithm works as follows. Once a layout change has been detected, Relnduce attempts to locate examples in the new page and, with the located examples, Induce is invoked to learn a new wrapper. Thus, the algorithm needs to be provided with the examples only once, and thereafter the wrapper will be automatically reinduced whenever the layout changes. In experiments, Relnduce has been observed to perform close to perfect performance often using very few examples. However, this algorithm is limited by the LR representation. 4. Conclusion Automated extraction of information from dynamic biological web sources is an important, though difficult problem. To tackle this, the key problem to be solved is Wrapper Maintenance which consists of the two steps: Wrapper Verification and Wrapper Reinduction. Wrapper verification can be treated as a black-box and can be applied in any wrapper maintenance system. In this sense, it can be handled independently of wrapper reinduction. In contrast, the latter is a tougher problem. We described several state-of-the-art approaches to wrapper maintenance, that have been shown to handle many real web sites. However, they exhibit trade-offs in terms of
Biological Web Sources Information Extraction
749
page layout complexity, content regularity and computational complexity. Hence, much remains to be done to tackle fully automated extraction from many complex, loosely structured web sources. Research along this direction can complement the efforts towards universal adoption of XML and Web Services, for example, as a means for conversion of non-XML based web sources to XML.
References 1. Google Directory: Bioinformatics Online Services. http://directory.google.com/Top/Science/Biology/Bioinformatics/Online_Services/. 2. A. Arasu and H. Garcia-Molina, Extracting structured data from web pages. In Proceedings of the 2003 A CM SIGMOD international conference on Management of data, San Diego, California. ACM Press, 337-348 (2003). 3. M. E. Califf and R. J. Mooney, Relational learning of pattern-match rules for information extraction. In Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Menlo Park, CA. AAAI Press, 6-11 (1998). 4. B. Chidlovskii, Automatic repairing of web wrappers. In Proceeding of the third international workshop on Web information and data management, Atlanta, Georgia, USA. ACM Press, 24-30 (2001). 5. V. Crescenzi, G. Mecca and P. Merialdo, Roadrunner: Towards automatic data extraction from large web sites. In Proceedings of International Conference on Very Large Data Bases (VLDB 01), 109-118 (2001). 6. C. N. Hsu and M. T. Dung, Generating finite-state transducers for semistructured data extraction from the web. Information Systems 23(8), 521538 (1998). 7. C. A. Knoblock, et al., Accurately and reliably extracting data from the web: a machine learning approach. Intelligent exploration of the web, 275-287 (2003). 8. S. Kuhlins and R. Tredwell, Toolkits for generating wrappers - a survey of software toolkits for automated data extraction from web sites. (2003). 9. N. Kushmerick, (2000). Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118, 15-68 (2000). 10. N. Kushmerick, Wrapper verification. World Wide Web 3(2), 79-94 (2000). 11. N. Kushmerick, and B. Thomas, Adaptive information extraction: A core technology for information agents. Intelligent Information Agents R&D in Europe: An AgentLink perspective, (2002). 12. K. Lerman, S. Minton and C. Knoblock, Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research 18, 149-181 (2003). 13. X. Meng, D. Hu and C. Li, Schema-guided wrapper maintenance for webdata extraction. In Proceedings of International on Web Information and Data Management(WIDM 03), 1-8 (2001).
750
R. Mohapatra & K. Rajaraman
14. R. Mohapatra, K. Rajaraman and S. Y. Sung, Efficient wrapper reinduction from dynamic web sources. To appear in Proceedings of International Conference on Web Intelligence 2004, (2004). 15. I. Muslea, S. Minton and C. Knoblock, STALKER: Learning extraction rules for semistructured text. In Proceedings of AAAI-98 Workshop on AI and Information Integration, Technical Report WS-98-01, Menlo Park, California. AAAI Press, (1998). 16. S. Soderland, Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233-272 (1999).
CHAPTER 14 COMPUTER AIDED DESIGN OF SIGNALING NETWORKS
Hao Zhu and Pawan K. Dhar Systems Biology Group, Bioinformatics Institute, Singapore [email protected]
1. Introduction
The limitation of pure theoretical and experimental research has made computation the third and indispensable approach in both physical sciences and biology. A key role of computational systems biology is computer aided deciphering and design of signaling pathways in cells. The aim of CAD biology is to analyze, comprehend and control cellular processes and reprogram them for specific biomedical applications. Computational molecular biology, which is sequence data oriented and algorithm centered, has undergone substantial evolution over the last decade, in parallel with the emergence and development of genomics. The key factors that have ensured the success of computation in genome biology are (1) sufficient and clean data; (2) clear objectives and strategies; (3) effectiveness of the reductionistic approach; and (4) validated results. These features, however, do not exist in the post-genome biology in which modeling is the key strategy to decipher the function of genes. The wealth of methods and tools developed in engineering need to be customized for biological-modeling applications as the issues and challenges are vastly different from the ones we come across in physical sciences10'20. This chapter briefly introduces some of these issues and describes our efforts in this direction. 2. Signaling Pathways: A Prickly Proposition The first feature of signaling pathways is their emergent property. In the traditional sense, a pathway describes a set of stereotyped biochemical reac751
752
H. Zhu & P. K. Dhar
tions occurring in a group of genes, proteins and other molecules. The exact structure of a pathway is context-dependent and even concept-dependent. Stereotyped reactions point out that the order of reactions is neither random nor hardwired. In the intercellular signaling pathways, in addition to a small group of core players, i.e., the receptors and ligands, there are transducers that transform signals from one form into another, transporters that carry signal molecules from one place to another and mediators that modify these molecules. These players, upstream, downstream or cross-stream with the core elements, together define the pathway-centric context of signaling process. With the identification of more genes, pathway repositories seem to be on an expansion spree. The dynamic contribution of non-core partners brings out differential behavior of pathway in varying molecular contexts, making the analysis and modeling of the events challenging. In fact, some of the temporally and spatially assembled pathways may be astonishingly complex11. Signaling occurs in networks rather than in pathways, which exhibit emergent behavior such as the spatiotemporal evolution of structure in cells as cells take different developmental directions. If signaling processes were represented by a hardwired group of molecules mathematically represented by ordinary differential equations (ODEs) e.g. ^ = YLj=i aj~3tL No single group of ODE could tell us how the system evolved from state 1 to state 2. More generalized forms, like the superset of equations for the system at different stages, do not help either because we cannot be sure when and which molecules interact with which and both the number of molecules and the number of transient states for a real system may be very large. The second feature is the context dependency of signaling semantics. The function of a pathway is based upon its structure and components, yet its roles are context dependent. A pathway, like Notch and Wnt pathway, can take diverse, even reverse roles in organogenesis and morphogenesis2'3. The eye development in Drosophila fly indicates that it is not the result of the signal itself, but due to a combined impact of the cellular and molecular environment that interprets the signal and bestows specific semantics to it. To some extent the receptor-ligand binding appears with only trivial differences in the signal transformation. The functional divergence may arise due to temporal and spatial binding patterns of a limited set of molecules in the system5. Signaling pathways are repeatedly used in different combinations. The development of organs, even as simple as the compound eye of fly, needs the correct orchestration of nearly all major pathways18. This indicates that it is not the number of pathways and sig-
Computer Aided Design of Signaling Networks
753
Fig. 1. The temporal change of pathway topology. Dashed arrows are context dependent reactions.
nals, but the interactions among them that determine the complexity and characters of developed organs. A small group of conserved pathways not only enables the development of different organs in an organism, but is also used for deciding body plans12'14. Thus, signaling and mis-signaling are actually two sides of a coin and malignancy maybe considered an abnormal phenotypic outcome of mis-signaling15. The structural and functional pleiotropy of pathways calls for effective simulation and identification of protocols and principles that molecules follow but cannot be addressable by traditional experiments alone13. Thirdly, signaling, which determines the fate of the cells and implements the function of cells, is in essence different than metabolism, which supports the living of cells. Metabolic pathways are deterministic systems with little changes in cells of different origins, type and state and have a rich modeling history19. In contrast to this, signaling pathways are emergent processes and follow yet-to-be understood rules. A brief comparison between metabolic and signaling pathways as shown in Table 2 may be informative. Finally, the much-less-than-expected number of genes makes it hard to visualize the origin of developmental processes, anomalies and diseases. The complex behavior of the system can be understood by studying the collective systemic behavior of the objects, instead of studying gene and protein components in isolation. Thus to link the local gene-mRNA-protein context with the global impact of their presence, influence and exit from the system, forms a great challenge of systems biology6. We list two domains where signaling modeling may provide crucial help. The first is signaling for morphogenesis. How slightly different pathways lead to distinctively dissimilar morphogenesis are of great interest. To capture the full scenario of dynamical process of molecular interaction in de-
754
H. Zhu & P. K. Dhar
Table 1. A brief comparison between signaling pathways and metabolic pathways. Metabolic Pathways Signaling Pathways Cell fate specification No Yes Specificity of gene High Low function Fixed network, time and tissue Emergent network, Pathway / Network independent time and tissue dependent Reactions Transform materials Transform signals Cleavage and phosphoProteins function Anabolism and metabolism rylation Dominate description Ordinary differential equation _? Follows rules Chemistry and biochemistry ? Optimum Mathemati- Flux balance analysis ? cal tools
veloping cells may be beyond the realm of experimentation. Considering that the cell is hardware that runs the genetic software encoded in genome, pertinent questions that can be addressed using computational systems biology approach are (1) How many alternative genetic programs exist to produce a given phenotype, including those malicious ones that give rise to tumorigenesis? (2) How do cells manage running and checking programs in presence or absence of a control or master program piece? (3) Is there a clear functional boundary between viable and unviable programs and reprogramming? (4) How are reprogrammed programs inherited by offspring cells? (5) How many determinant points are there in running a genetic program and where are they located? Answers to these questions may help fundamentally understand the origin and progression of cancer and other developmentally related diseases, which are long standing problems in biology and medicine. 3. Challenges of Signaling Modeling To study a system as a whole or as a group of components needs different strategies, methods and tools. Metabolic activities can be reduced to biochemistry and thus studied analytically, but signaling activities apparently cannot. The question therefore is: can they be described by some sort of informatics and studied synthetically using computer experiments? To investigate signaling networks systematically and formally, we face a few challenges as discussed next.
Computer Aided Design of Signaling Networks
755
Formalizing signaling systems. Different methods have been proposed to describe signaling (Table 3 to address various aspects of signaling pathways and networks7. For a molecular system exhibiting certain properties, an underlining question is Are these properties generic or only established to specific modeling methods and under specific simulation conditions? Identifying and measuring complexity. The concept of complexity is strictly defined and precisely analyzed in mathematics and physics, yet it lacks an operational definition in biology. The popular measure of complexity for dynamical systems, i.e. computational complexity (for example, the complexity of a sequence can be inferred from what finite state machine can produce it) is unsuitable for biological systems1. Recently, the complexity of networks has attracted researchers in different fields4'16. The evolution and emergent properties of molecular networks give much implication to their roles in signaling. A tough task is to harmonize different attributes - emergent behaviors, stochastic traits and re-programmability to make a comprehensive and fundamental understanding attainable. Table 2. Proposed methods for the formal description of signaling pathways/network. Methods Features Differential equation fine grained description, fixed topology, quantitative and continuous computation, heavy computation, quantitative properties of small and medium systems, deterministic and predictable. Boolean network coarse grained description, evolvable topology, qualitative and discrete computation, light computation, global property of large systems, deterministic and predictable. Rule highly coarse grained description, evolvable topology, highly qualitative and discrete computation, light computation, global property of large systems. Automata theory qualitative and discrete computation, natural parallelism, internally programmed autonomy Stochastic systems very fine grained description, fixed topology, continuous computation, very heavy computation, random properties of small and medium systems, stochastic and unpredictable.
Time cost of simulation. When modeling large molecular systems with differential equations, the major contributing factor of time consumption may not be the number of equations, but the size of time step and the number of iterations. The smallest At demanded by the fastest reactions dictates the performance of a system if no runtime adaptive time step method is adopted. When stochastic descriptions are used to precisely de-
756
H. Zhu & P. K. Dhar
pict molecular activities because of intrinsic and extrinsic noise17 in the system, the time expense soars. Non-cell autonomous signaling favors multicellular modeling, making the burden heavier. We are currently working on these issues and our preliminary results indicate that clever hybrid algorithms running in parallel may significantly offset the temporal cost of simulations. Validation and verification. Building models of well understood signaling systems and then using validated models to study unknown systems is an approach to understand signaling process. Another way is to model systems with distinct phenotypes which can be used to validate the model. In view of these challenges, a modeling platform should possess following capabilities. Multiple representation and algorithm support. A platform supporting flexible and different formalisms and algorithms would be critical. It includes (1) Modeling support - abstracting different cellular processes requires different methods and data about target systems (2) Simulation support - efficient simulation needs various facilitating algorithms; an important kind is parameter estimation algorithms (3) Analysis support - a fruitful analysis may need a variety of qualitative and quantitative technologies and methods. Flexible and extensible software architecture. The architecture of modeling platforms should be flexible and extensible to adopt new modeling methods for a variety of data. Support for database access and parallel computation. Since a large number of gene/protein databases have been built, a convenient and transparent access to them is required. 4. The Goals and Features of Cellware Cellware is a multi-algorithmic modeling platform developed to address the diverse requirements of modeling cellular processes8'9. It supports hybrid, large-scale modeling with different methods such as stochastic and deterministic ones and providing seamless description of cellular processes. Cellware consists of: (1) a two-dimensional graphic model editor, (2) a biological specification subsystem, (3) an I/O subsystem and (4) a simulation subsystem. Its features include: (1) Multi-platform (2) Graphical User Interface (see Figure 2 (3) Extensive Algorithms Library. Simulation with Cellware may be performed in the batch mode (results are saved in a file)
757
Computer Aided Design of Signaling Networks Table 3. Some of the computational tools for modeling and simulation. Gepasi Target sys- Metabolic network tems
E-Cell Gene regulatory network, signal transduction, metabolic network Simulation Deterministic ODE Deterministic algorithms solvers ODE solvers, stochastic algorithms estiAnalysis al- Parameter gorithms mation, optimization, parameter sweep Visualization Simple flexible GUI Flexible interface with with text based input both scripting and GUI and efficient plotters Stand-alone, Cluster Software ar- Stand-alone chitecture Modeling SBML support Proprietary EML language External No No database bindings
Cytoscape Gene regulatory network, expression profiles — Rich user interface with pathway layout Web based SBML support BIND, GO, TRANSFAC
or in the interactive mode (results plotted in a new window). To enhance the productivity of modeling, a grid version has been developed to use grid computing to more effectively solve parameter estimation and to speed up simulation. 5. Concluding Remarks Computer Aided Design (CAD) of intra-cellular and extra-cellular networks is turning out to be a key armor of researchers, who have hitherto heavily relied on the wet-bench approach for a long time. The CAD approach is precise, simple and intuitive and makes use of standard biochemical and physical principles. In the context of signaling networks, the validated in silico models can be subjected to a variety of thought experiments using standard computational resources. The impact of cell part plug-ins and plug-outs can be easily assayed in computational models that normally take several years of experimental iterations. The in silico approach can be extended beyond the routine biochemical analysis and even used to check phenotypic outcome of the rewiring process in the cells. Using CAD biology it is feasible to observe topological evolution of networks and understand their contextual relationships which can be re-designed for various scientific and industrial uses. The CAD signaling approach takes
758
H. Zhu & P. K. Dhar
Fig. 2. Cellware an integrated modeling and simulation environment.
us a step closer towards the recently launched field of Synthetic Biology (http://www.syntheticbiology.org). Synthetic biology aims at creating a library of un-natural genes and proteins. To take synthetic biology from the drawing board to the wet lab, we predict CAD biology would emerge as the key strategy. In addition, CAD biology can also test optimal experimental conditions (e.g. growth media, culture conditions etc.) that can support formation of new cell-parts and devices. Another key application of CAD biology is its use in understanding cellular processes during embryo development. The questions we are currently addressing: how can signaling pathways with slightly different topologies control diverse developmental processes in different organism? Given a set of key molecular components, how is the diversity and specificity of biochemical processes maintained with statistical accuracy? How does the formation of complexes, their biological lifetime within the system and their controlled destruction affect various processes both within the cell and also among the cells? Multi-cellular modeling incorporating all the fundamental elements of single cell modeling may be a solution to this and the next wave of biological modeling.
Computer Aided Design of Signaling Networks
759
References 1. C. Adami, What is complexity. BioEssays 24, 1085-1094 (2002). 2. S. Artavanis-Tsakonas, M. D. Rand and R. J. Lake, Notch signaling: cell fate control and signal integration in development. Science 284, 770-776 (1999). 3. E. B. Baker, Notch signaling in the nervous system. Pieces still missing from the puzzle. BioEssays 22, 264-273 (2000). 4. U. S. Bhalla and R. Iyengar, Emergent properties of networks of biological signaling pathways. Science 283, 381-387 (1999). 5. C. A. Brennan and K. Moses, Determination of Drosophila photoreceptors: timing is everything. Cell. Mol. Life Sci. 57, 195-214 (2000). 6. D. Davidson and R. Baldock, Bioinformatics beyond sequence: Mapping gene function in the embryo. Nat. Rev. Genet. 2, 409-417 (2001). 7. H. de Jong, Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. 9, 67-103 (2002). 8. P. Dhar, et al., Cellware: a multi-algorithmic software for computational systems biology. Bioinformatics 20, 1319-1321 (2004). 9. P. Dhar, et al., Grid Cellware: The first Grid-enabled tool for modeling and simulating cellular processes. Bioinformatics. In the press. (2004). 10. T. Ideker, T. Galitski and L. Hood, A new approach to decoding life: Systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343-372 (2001). 11. K. W. Kohn, Molecular interaction map of the mammalian cell cycle control and DNA repair systems. Mol. Biol. Cell 10, 2703-2734 (1999). 12. N. Niwa, Y. Horomi and M. Okabe, A conserved developmental program for sensory organ formation in D.melanogaster. Nat. Gent. 36, 293-297 (2004). 13. P. Nurse, Reductionism: The ends of understanding. Nature 387, 657-657 (1997). 14. A. Pires-daSilva and R. J. Sommer, The evolution of signalling pathways in animal development. Nat. Rev. Genet. 4, 39-49 (2003). 15. D. Radisky, C. Hagios and M. Bissell, Tumor are unique organs defined by abnormal signaling and context. Sem. Can. Biol. 11, 87-95 (2001). 16. S. Strogatz, Exploring complex networks. Nature 410, 268-276 (2001). 17. P. S. Swain, M. B. Elowitz and E. D. Siggia, Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci. USA 99, 12795-12800 (2002). 18. J. E. Tresman, and U. Heberlein, Eye development in Drosophila: Formation of the eye field and control of differentiation. Curr. Topics in Dev. Biol. 39, 119-159 (1998). 19. E. O. Voit, Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists. Cambridge University Press, (2002). 20. H. Zhu, S. Huang, and P. Dhar, The next step in systems biology: simulating the temporospatial dynamics of molecular network. Bioessays 26, 68-72 (2004).
CHAPTER 15 ANALYSIS OF DNA SEQUENCES: HUNTING FOR GENES
Artemis G. Hatzigeorgiou Center for Bioinformatics, Department of Genetics, Medical School Department of Information Science, School of Engineering, University of Pennsylvania agh@pcbi. upenn. edu Molly S. Megraw Center for Bioinformatics, Department of Genetics, Medical School University of Pennsylvania megraw@gmail. com
1. Introduction The rate of innovation in molecular biology is breathtaking. The crudest measure of progress, the size of nucleic acid databases, has an exponential growth rate. To make a parallel with information technology both the size of genomic databases and the power of computers have been doubling at about the same rate. Consequently a new area of expertise is being created, combining the biological and information sciences. Finding relevant facts and hypotheses in huge databases is becoming essential to biology. Apart from the obvious data management applications of bioinformatics, computational biology research is divided into two main schools: • the analysis and interpretation of data and • the development of new algorithms and statistics1. In other words Bioinformatics incorporates into a broad interdisciplinary field both conceptual and practical tools for the understanding, generation, processing, and propagation of biological information2. There are several excellent books for further reading on both practical3 and theoretical4'5'6 761
762
A. G. Hatzigeorgiou & M. S. Megraw
aspects of computational biology. This chapter gives an example of the mathematical structure analysis of coding regions on genomic and complimentary DNA (cDNA) - a special case of nucleic acid sequences. It starts with the biological background of DNA and leads onto algorithmic detail analysis of its structure. At the end of the chapter some results are given and compared with the prediction results of other programs. 2. DNA and Genes 2.1. DNA In humans, as in other higher organisms, a DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder whose sides, made of sugar and phosphate molecules, are connected by rungs of nitrogen- containing chemicals called bases. Each strand is a linear arrangement of repeating similar units called nucleotides, which are each composed of one sugar, one phosphate, and a nitrogenous base (Figure 1). Four different bases are present in DNA: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). The two DNA strands are held together by weak bonds between the bases on each strand, forming base pairs (bp). Strict base- pairing rules are adhered to: • adenine will pair only with thymine (an A-T pair) and • cytosine with guanine (a C-G pair). Genome size is usually stated as the total number of base pairs; the human genome contains roughly 3 billion bp. The particular order of the bases arranged along the sugar-phosphate backbone is called the DNA sequence. In other words a DNA sequence can be described as a very long word over a four letter alphabet A = {A, C, G,T}, a letter for every base. 2.2. Coding Genes Each DNA molecule contains many genes, the basic physical and functional units of heredity. The number of human genes is still an unclear issue. It is estimated to be between 40,000 and 120,000. The exact answer is expected to be found by 2003 when all the human genome will be sequenced and analyzed7. Human genes vary widely in length, often extending over thousands of bases, but less than 5% of the genome is known to include the protein
Analysis of DNA Sequences: Hunting for Genes
763
Fig. 1. The four nitrogenous bases of DNA are arranged along the sugar- phosphate backbone in a particular order (the DNA sequence), encoding all genetic instructions for an organism. Adenine (A) pairs with Thymine (T), while Cytosine (C) pairs with Guanine (G). The two DNA strands are held together by weak bonds between the bases.
coding sequences of genes called exons. Exons are interrupted by many intron sequences without coding function. For the information within a gene to be expressed, a complementary RNA strand is produced (a process called transcription) from the DNA. This strand is a molecule, called messenger ribonucleic acid (mRNA). similar to a single strand DNA. It is also made up out of four nucleotides: Adenine (A), Cytosine (C), Guanine (G) and Uracil(U), that replace Thymine (T). Before the mRNA serves as a template for protein synthesis the introns are removed through the splicing machinery. Then the mRNA arrives at the ribosome, a protein- synthesizing machinery, which reads the instructions from the mRNA to build a protein8.
764
A. G. Hatzigeorgiou & M. S. Megraw
2.3. The Genetic Code and Proteins Proteins are large, complex molecules made up of long chains of subunits called amino acids. There are twenty different kinds of amino acids. Each amino acid is encoded through triplets of nucloetides called codons (genetic code). Out of the 4 nucleotides 64(43) different codons can be built. 61 of them encode 20 amino acids. Three codons (TAA, TAG and TGA) cause protein transcription to cease. These are known as stop codons. Since every triplet encodes an amino acid or a stop codon, many pairs of codons, that differ only in the third position base, code for the same amino acid. A translation example of nucleotides (codons) to amino acids is:
A3V
TAC
Meth
Tyr
TG£
GG£
Cys
C...
Gly
A shift of one letter in reading the same nucleic acid sequence results in a very different amino acid sequence:
A
TGT
<4CT
Cys
Thr
GCG, Ala
GC£ Pro
The phase of codon reading is called the reading frame. There are three different frames in one direction and three more reading frames on the complementary strand. 2.4. Structure of Coding Genes A gene region of the DNA itself contains after the copying mechanism (transcription) parts with coding information (Exons) and noncoding information (Introns). The introns are getting removed through the splicing mechanism. Proteins binds near the borders of Intron-Exons (Acceptorsite) and Exon-Intron (Donorsite). The remaining parts are only exons and are called messenger RNA (mRNA). The only Exons which contain non-coding regions are the first and the last exon. The coding part starts with a specific codon (AUG), which is also called the start codon. The coding region stops with one of the stop codons. Figure 2 shows the structure of a prototypical gene.
Analysis of DNA Sequences: Hunting for Genes
765
Fig. 2. The structure of a gene. The promoter region contains several small characteristic sequences that attract different elements of the transcription machinery and initiate the copying of the gene under distinct environment conditions. The parts of the transcribed sequence that are finally translated into a protein sequence are called exons. Some parts of the transcribed sequence are cut out before translation. The machinery cutting these intron sequences is controlled through the occurrence of sequence motifs at the beginning of an intron called donor sites and at the end of an intron called acceptor site.
2.5. Complementary DNA (cDNA) In the laboratory, the mRNA molecule can be isolated and used as a template to synthesize a complementary DNA (cDNA) strand, which can then be used to locate the corresponding genes on a chromosome map. A cDNA starts with an untranslated region UTR) followed by a coding region and ends again with another UTR. The begin of the cDNA is also called 5' end of cDNA and the end of cDNA as 3' end. The coding region begins always with a start codon and ends with a stop codon. A start codon is a nucleotide triplet (ATG) that translates the amino acid Methionine. The same triplet (ATG) can also occur in the middle of a coding frame. In this case it just encodes Methionine and does not act as a start codon. A start codon leads the coding frame, that is always part of an open reading frame (ORF). An ORF is a frame that translates amino acids without the interruption of stop codons. In other words, an open reading frame is a frame between two stop codons - read always in steps of three nucleotides. The first nucleotide of the codon defines the translation initiation start (TIS). 2.6. Non-Coding Genes Some DNA sequences in the genome are transcribed into RNA molecules which are never translated into protein, but still have functional roles in
766
A. G. Hatzigeorgiou & M. S. Megraw
the body. These RNA molecules can be referred to as functional non-coding RNAs because they do not code for proteins as messenger RNAs do, yet they still have important biological roles. The discovery of functional noncoding RNAs has revealed that protein production can be regulated by the transcription of sequences which were not formerly known to have any direct relationship to gene regulation. One class of functional non-coding RNAs which is of particular and immediate interest to the biological, medical, and computational science communities is a class of tiny regulatory RN As called microRNAs (miRNAs). The first miRNA was uncovered through genetic screens in worms as the product of a gene which plays a specific role in embryonic development. A miRNA is 21-22 nucleotides in length, and pairs to sites within the 3' untranslated region (UTR) of an mRNA produced by a gene. Such a gene is called the target of the miRNA, because the pairing of the miRNA with the genes mRNA causes protein translation to be repressed. Around 600 miRNAs have now been identified from nematodes, flies, plants, fish, mice and humans. Each miRNA is derived from the stem of a hairpin-like ~75 nt precursor (pre-miRNA), which is present as a single copy in the genome. There are only few cases known where both strands of the stem-loop produce a miRNA. The precursors of some miRNAs are present in two or more copies on the same or on different chromosomes. A few miRNAs can be derived from two or more slightly different precursors. The miRNA precursors are found in intergenic areas and also within introns of known genes. 3. Genomes The genome sequence of an organism is an information resource unlike any other that biologists have previously had access to. But the value of a genome is only as good as its annotation, which bridges the gap from the sequence to the biology of the organism. More than 300 genomes have been sequenced in the last decade, most of them from prokaryotes. Now that most of the human genome has been sequenced, several new eukaryotic genome projects have been started for a variety of organisms, including animals, plants, fungi, and numerous pathogenic protozoa. In the ideal case, genome analysis should be provided automatically after a significant amount of sequence assembly for a new genome is available. This is not presently possible, however. Although a large number of gene prediction programs exist, all include organism-specific parameters that must be determined
Analysis of DNA Sequences: Hunting for Genes
767
from training examples. The usual process is to find experimentally some genes, and then to use these data to design a new gene prediction algorithm, or retrain an existing one. This procedure is time-consuming, and several semi-automated programs have therefore been developed to perform gene prediction in bacteria. However, such programs are not available for the more complex genomes of eukaryotes. 3.1. Computational Analysis of the Genome: Coding Gene Prediction Coding Gene prediction can be based on a) homology with nucleotide and amino acid databases, and b) ab initio recognition of genes and c) combinations of these two methods. In prokaryotes and some simple eukaryotes, genes normally have single continuous open reading frames, and adjacent genes are separated by short intergenic regions. By contrast, the genes in most eukaryotes can be very complex and can have multiple exons, as well as introns that may be tens of kilobases in length, noncoding 5' and 3' exons, and products that are alternatively spliced9. Local regularities called signals or motifs are important indicators of the structure of a gene; methods for detecting them are called signal sensors. Signals in genomic DNA can be contrasted with extended variable-length regions such as exons and introns. These regions can be recognized with a variety of methods, mostly based on statistical information, that are collectively content sensors10. Most of the recent eukaryotic gene identification methods combine both signal and content sensors. In addition, some methods combine ab initio predictions with homology information derived by aligning genomic DNA against related protein sequences. Other developments augment sensor information with EST information11. Signal and content sensors have been successfully implemented as trainable discriminant functions such as artificial neural networks (ANNs) and support-vector machines (SVMs). Probabilistic sequence models such as hidden Markov models (HMMs) have also been used. For accurate gene finding, however, sensors must be supplemented by structure models that encode grammar of genes, for instance that an internal exon must start with a splice acceptor and end with a splice donor. Structure models are naturally represented as HMMs10'12. Some successful gene finders use a single HMM for sensing and structure13. However, while HMMs are a good representation for gene structure, they have serious limitations as signal and content sensors. In contrast with ANNs or SVMs, HMMs cannot readily represent complex statistical dependencies among
768
A. G. Hatzigeorgiou & M. S. Megvaw
multiple sequence features. Therefore, some gene finders combine HMMs for structure modeling with separately trained sensors implemented as discriminant functions whose output controls the transition and observation probabilities of the HMM14. This method is computationally convenient because it decouples the training of the sensors from the training of the structure HMM, but there is evidence from research in sequence models for text that it does not tradeoff correctly the votes of the sensor discriminant functions at different points in the sequence15. Instead, it is possible to use the discriminant functions to provide generalized transition and observation scores for an HMM-like structure model, and train all of the functions and structure model together to maximize the likelihood of correct analysis15. Experiments with these improved methods on text suggest that the integrated training procedure is accurate and computationally feasible even for large training sets. The most direct way to characterize genomic coding regions and provide reliable information for structural annotation of genes in genomic sequences is still to use cDNA and EST libraries. Given their important contribution to rapid gene discovery, EST sequencing projects are commonly run simultaneously with genome sequencing projects. Unlike high quality finished genome sequences, which are double-strand and multiple-pass, EST sequences are mostly single-strand and single-pass sequences, which are therefore more likely to contain sequencing errors than sequenced genomic DNA.
3.2. Computational Analysis of the Genome: Non-coding Gene Prediction In order to understand where miRNAs may be located in the genome, it is important to understand the structure and nature of the miRNA precursors from which they will ultimately be excised. As mentioned in the introductory material, miRNA precursors form fold-back structures called hairpin loops or stem loops, as their shape looks like the shape of a hairpin; they have a double-stranded stem section with a loop (of unpaired bases) at the end. This precursor structure must have some degree of thermodynamic stability in order to maintain its shape. However in miRNA precursors, it has been observed that the bases in the stem part of the hairpin structure do not all have to be perfectly paired in canonical Watson-Crick base pairs (G with C, A with T) - short segments of imperfectly matching sides of the stem which cause the structure to bulge out at these locations. The overall stability of a structure is usually characterized by the free energy
Analysis of DNA Sequences: Hunting for Genes
769
associated with it. On a very intuitive level, one can think of such a value as representing how likely it would be for the structure to fall apart in a thermodynamic environment given that its own component molecules are jiggling around and also being jostled by other molecules. The structural and thermodynamic properties characteristic of miRNA precursors are crucial to their recognition as unique elements in the genome. These structures cannot have so many loops and bulges or such a large loop structure that their fundamental stability is challenged, yet likely cannot be so stable that protein complexes have trouble excising the mature miRNA from their stems. This creates a computational challenge to develop algorithms which incorporate these features in a biologically meaningful way. 4. Closing Remarks The problem of gene identification is one of the main tasks of bioinformatics. There has been a great deal of progress in gene identification methods in the last few years. The older coding region identification methods have given way to methods that can suggest the overall structure of genes. After the large sequencing projects, such as the Human Genome Project, are completed the much larger part of the analysis of these sequences starts and the computer simulated prediction systems play a major role. Algorithms using sophisticated ANN, fuzzy logic, integrated methods and hybrid systems are in great demand. Moreover, as we enter the post-genomic era, it is becoming clear that interesting aspects of biology will go far beyond assembling and finding genes, and even beyond predicting the function of endless DNA sequences. Bioinformatics and data generation continue to develop hand in hand to enable us to understand the complexities of cells. It will be exciting to watch the cooperation between bioinformatics and biology in the coming years. References 1. M. Boguski, Bioinformatics - A New Area. Trends guide to bioinformatics, Trends Supplement, 1-2 (1998). 2. S. J. Spengler, Bioinformatics In The Information Age. Science 287, 12211222 (2000). 3. A. Baxevanis and B. F. F. Ouellette, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. John Wiley and Sons, (1998). 4. M. S. Waterman, Introduction to Computational Biology: Sequences, Maps and Genomes. Chapman Hall, (1995). 5. R. Durban, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analy-
770
6. 7. 8. 9. 10. 11. 12. 13. 14.
15.
A. G. Hatzigeorgiou & M. S. Megraw
sis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, (1998). P. Baldi and S. Brunak, Bioinformatics: The Mashine Learning Approach. MIT Press, (1998). E. Penisi, And the Gene Number Is ... ? Science 288, 1146-1147 (2000). D. Casey, Primer on Molecular Genetics. Technical report, Human Genome Managment Information System, Oak Ridge National Laboratory, (1992). S. Lewis, M. Ashburner and M. G. Reese, Annotating eukaryote genomes. Curr. Opin. Struct. Biol. 10(3), 349-54 (2000). D. Haussler, Computational Genefinding. Trends in Biochemical Sciences, Supplementary Guide to Bioinformatics, 12-15 (1998). M. G. Reese, Genie-Gene Finding in Drosophila Melanogaster. Genome Res. 10(4), 529-538 (2000). A. Krogh, Gene Finding: Putting the Parts Together. Academic Press, (1998). C. Burge and S. Karlin, Prediction of Complete Gene Structures in Human Genomic DNA. J. Mol. Biol. 268, 78-94 (1997). D. Kulp, D. Haussler, M. Reese, and F. Eeckman, A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA. In Fourth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, (1996). J. Lafferty, A. McCallum, and F. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of ICML-01, ICML, (2001).
CHAPTER 16 BIOLOGICAL DATABASES AND WEB SERVICES: METRICS FOR QUALITY
Tin Wee Tan, Khar Heng Choo, Joo Chuan Tong Department of Biochemistry, National University of Singapore {bchtantw, bchckh, bchtjc}@nus.edu.sg
Martti T Tammi Department of Biological Sciences and Department of Biochemistry, National University of Singapore martti Qnus. edu. sg Vladimir B Bajic Institute for Infocomm Research, Singapore bajicv@i2r. a-star. edu. sg
1. Introduction Biological databases (BDs) and web services have been proliferating during the past decade. In 1997, it was estimated that there were more than 400 web accessible BDs1'2. By 2004, the combined figure for Internet accessible BDs and related web services has grown to more than 1000, and we expect it to double every two years. These Internet-accessible BDs and bioinformatic tools are continuously introduced. As they proliferate, many have become dormant; others are made obsolete as more advanced ones emerge. Yet no formal or informal methods of directly tracking their existence or directly measuring their number of hits, quality, impact, reliability and efficiency are available. Meanwhile, a new wave of development in database and web service integration is starting to emerge, linking up such Internet accessible resources into integrated streamlined systems. Examples include BioKleisli3, one of 771
772
T. W. Tan et al.
the earliest database integration technologies which powers the products of GeneticXchange, Inc.; the Sequence Retrieval System (SRS)4, one of the most widely used bioinformatics indexing system from LION bioscience AG; and BioMart (http://www.ebi.ac.uk/biomart/), a simple distributed data integration system with powerful query capabilities from European Bioinformatics Institute (EBI). In addition, workflow integration systems such as VIBE (http://www.incogen.com/VIBE), Biopipe5, KOOPlatform and bioinformatics Grid Computing 6 ' 7 also critically rely on the stability, speed and quality-assured access to BDs and computational resources, many of which are third party web services and databases that are not subjected to any objective metrics for measuring their response times, etc. This chapter looks at available avenues to track all these computational and informational resources through registry services, and the challenge of quantitatively assessing the quality, impact, reliability and scientific worthiness of such online resources. 2. Growing Need for Quality Control There is an urgent need to set up a framework to ensure quality control of BD resources. For example, quality data-sets are crucial for efficient use of search-retrieval systems such as Entrez8 and SRS that facilitate access to a host of BDs; for distributed annotation systems (DAS, http://biodas.org) that integrate genome annotation information from multiple servers before presenting it to the user in a single view; for realization of workflows to increase analysis throughput by automating operations from ordering of microarrays gene expression data to sequence gathering to retrieval of sequence annotation and more recently, for the construction of Biomolecular Interaction Network Database (BIND)9 which is a database that documents molecular interactions. This database not only relies on manually curated data, but also aggregates data into SeqHound from other databases. Many freely available public databases collaborate to synchronize their release of data, e.g. GenBank, EMBL and DDBJ10 work together to provide nucleotide sequences to the global community; RCSB, MSD-EBI, and PDBj11 team up to form the Worldwide Protein Data Bank (wwPDB) for macromolecular structural data. While these efforts attempt to ensure consistency and standardization of data entries, on the flip side, any resultant errors will be propagated in all these resources. Manually curated databases such as Swiss-Prot12 attempt to minimize errors and data redundancies in the data set. However, these resources are still prone to errors. Furthermore,
Qualitative Metrics for Biological Databases
773
many smaller 'boutique' databases with limited maintenance often contain useful and unique data for specific scientific communities, but due to the lack of adequate quality assurance information, many such databases were either unused or ignored. It is not unusual that scientists interested in these databases will download them and perform their own independent quality control. With universal quality assurance metrics in place, much of this unnecessary redundant work may eventually be eliminated. Another problem is the ability to track BDs and rank them in terms of quality and other relevant features. It is hence desirable to have a 'registry' to track the data sources, synonymous to how yellow pages would function. One exemplary project is BioMOBY13. BioMOBY is a powerful retrieval and discovery tool. It supplies cross-references to other relevant data and applications. Nevertheless, to consummate the goals of this pioneering project and many others to come, we feel that a ranking system to rate all these databases and tools, both proprietary and public available ones, is essential. One of the key problems is assessment of quality of content and quality of service. Here we highlight some efforts to shed light on this problem. 3. Metrics for Quality Analysis It is inherently difficult to define metrics to measure the quality of Internetaccessible online databases, services or tools objectively and quantitatively. However, there are some metrics that can be used to provide a reasonable approximation to the content and services provided by BDs. The following classifies such potential metrics according to several modalities. 3.1. Content Due to the heterogeneity of BDs, it is difficult to objectively define a universal quality of contents in detail. Similarly, it is difficult to objectively define the quality of published journal articles. However, there are a number of metrics which can be used to give an estimate to the quality of the data curation in a database. Impact Probably one of the most important measures of the data quality is the number of citations referring to the BD. However, BDs developed for a narrow, yet important, field, do not gather as much citations in absolute terms as opposed to BDs that cover more general fields utilized by large
774
T. W. Tan et al.
scientific communities. Thus, the impact measure should be classified accordingly by an appropriate division into scientific fields and normalized to yield more objective measure of importance. Theoretically, it is possible to classify databases into categories using different criteria, such as type of bio-molecules stored or by taxonomy. It would then become possible to define metrics to measure scope, completeness, and possibly the usefulness of databases. Unfortunately, it is difficult to find a universal classification scheme which is applicable in all cases. However, it is possible to utilize many different classifications schemes, but in reality the multi-scheme would be too complex to be practical. We can simply measure impact by connectivity links from other resources to a particular BD. A common measure for ranking, evaluating, categorizing, and comparing scientific journals is the JCR impact factor14. It is a measure of the frequency with which an average article in a journal has been cited in a particular year or period. The annual JCR impact factor is a ratio between citations and recent citable items published as shown in Table 1. A similar scheme can be utilized for assessing BDs. Table 1. A measure for computing impact factor by JCR Calculation for Journal Impact Factor A = total cites in 1997 B = 1997 cites to articles published in 1995-96 (this is a subset of A) C = number of articles published in 1995-96 D = B/C = 1997 impact factor
Documentation The existence and completeness of database documentation can be used as a part of a measure of content quality. Types of such documentations could be peer reviewed journal article describing BD and on-line provided documentation. 3.2. Availability Another useful metric is the availability of the servers or BDs. The BDs will be of little use if they are always unavailable when user needs them. Server uptime and availability can be gathered by requesting the service provider in question to offer data on server uptime. Externally, this can be validated simply by making use of responses to Internet 'pings' by the server, or
Qualitative Metrics for Biological Databases
775
the presence of a database or web service response to a test-set of queries. For properly managed services, a 99.999% uptime15 for '24h x 7 days per week' is frequently the industry standard, with any downtime typically due to upgrades or scheduled maintenance. It is hence paramount that the provider of the BDs take into consideration their network bandwidth and server capability. Some BDs are mirrored to several sites to mitigate the overloading of a single server or to continue their availability in the event one of the servers fails. 3.3. Combining Different
Metrics
While various strategies have been developed to rank database quality, such as normalization of the summation of a weighted sum of metrics or simplified reduced weighted sum of metrics, many issues need to be addressed for the ranking protocol. Many questions arise on the feasibility of the ranking protocol. Does it set standards for raising the quality of data set used? Is it accurate enough to be fair and useful for different data set? How can the ranking/index sustain itself? What will it take to have it adopted routinely? Some possible metrics that can be used for quality measurement are listed in Table 2. These metrics can be assigned a score and weighted differently where certain metrics have precedence over the others. The sum of the weighted score for the metrics will give an idea of BD quality to the user and this can be adjusted and presented differently based on the requirements or expectation of the user. Table 2. Some possible metrics for measuring quality of resources and data No Technical Metrics Description 1 Number of entries or records relative to Larger number of entries provide larger the estimated scope of the field amount of useful information. 2 Number of hits and downloads The more visited site, the more popular BDis. 3 Server uptime Proportion of the time the server is available to the user. 4 Database / web server response time How fast the server can respond to the users' queries. 5 Number of staffs allocated for mainte- How many staffs are dedicated to curate nance and curation the data set or test the tool/service or to maintain the resource. 6 Frequency of maintenance and curation Self explanatory.
776
T. W. Tan et al.
4. Discussion and Conclusion Quantitative metrics are important to facilitate tracking and quantitative assessment of the quality, impact, reliability and scientific worthiness of online computational and informational resources for bioinformatics. Some important issues that must be addressed include what type of data are available for quality measurement, how much data must be available for analysis and what level of accuracy must be attained for the ranking to be of any use at all. The quality of quantitative metrics are directly influenced by the choice of the adopted metric, and may also be subjected to influence by the number of hits and usage of database, filtering out web robots, number of journal citations of accompanying papers, conformance to standards in formats, number of mirror sites, as well as the quality of mirror sites. A meta-database, or a registry, or a catalogue to track the vast number of BDs and web services, such as INFOBIOGEN's DBCAT 16 , will prove useful as it provides a one-stop access to all BDs in the world. With the addition of a ranking system, user will have at least a certain level of confidence and expectation when using the resources listed. We hope that in the near future, such centralized resource will be developed and made available to broad life science community. References 1. M. Kanehisa. Linking databases and organisms: GenomeNet resources in Japan. Trends Biochem Sci. 22: 442-444, 1997. 2. C. Discala, et. al. DBcat: a catalog of biological databases. Nucleic Acids Res. 27: 10-11, 1999. 3. S. B. Davidson, C. Overton, V. Tannen, and L. Wong, BioKleisli: A Digital Library for Biomedical Researchers. Int. J. on Digital Libraries, 1997. 4. E. M. Zdobnov, R. Lopez, R. Apweiler, and T. Etzold. The EBI SRS servernew features. Bioinformatics 18: 1149-1150, 2002. 5. Hoon S, et al. Biopipe: a flexible framework for protocol-based bioinformatics analysis. Genome Res. 13: 1904-1915, 2003. 6. T. W. Tan, et al. BioWorldWideWorkflow: An APBioNet-APBioGrid Experimental Testbed. The 2nd Pacific Rim Application and Grid Middleware Assembly Workshop, 2002. 7. H. Nakamura, Biogrid: Integration of Biological Data and Computing Grid, Fukuoka, The 3rd Pacific Rim Application and Grid Middleware Assembly Workshop, 2003. 8. J. McEntyre. Linking Up With Entrez. Trends Genet. 14: 39-40, 1998. 9. G. D. Bader, and C. W. Hogue. BIND: the Biomolecular Interaction Network Database. Nucleic. Acids Res. 31: 248-250, 2003. 10. D. S. Roos. Bioinformatics-Trying to Swim in a Sea of Data. Science 291:
Qualitative Metrics for Biological Databases
777
1260-1261, 2001. 11. H. M. Berman, K. Henrick, and H. Nakamura. Announcing the worldwide Protein Data Bank. Nature Structural Biology 10: 980, 2003. 12. B. Boeckmann, et. al. The Swiss-Prot protein knowledgebase and its supplement TrEMBL. Nucleic Acids Res. 31: 365-370, 2003. 13. M. D. Wilkinson, and M. Links. BioMOBY: an open-source biological web services proposal. Briefings In Bioinformatics 3: 331-341, 2002. 14. SCI Journal Citation Reports: a bibliometric analysis of science journals in the ISI database. Philadelphia: Institute for Scientific Information, Inc., 1993. 15. W.P. Turner IV, P.E., and K.G. Brill. Industry standard tier classification define site infrastructure performance, White paper of The Uptime Institute. 16. C. Discala, X. Benigni, E. Barillot, G. Vaysseix. DBcat: a catalog of 500 biological databases. Nucleic Acids Res. 28: 8-9, 2000.