Ibmedical
Editors
Giuseppe Nardulli Sebastiano Stramaglia
1
J
Mod^tm^omedicaJ •t^-f
This page is intentionally left blank
pmedica
igna s Bari, Italy
19-21 September 2001
Editors
Giuseppe Nardulli Sebastiano Stramaglia Center of Innovative Technologies for Signal Detection and Processing University of Bari, Italy
V | f e World Scientific wb
New Jersey • London • Sim Singapore • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
MODELLING BIOMEDICAL SIGNALS Copyright © 2002 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-4843-1
Printed in Singapore by Mainland Press
V
Preface In the last few years, concepts and methodologies initially developed in theoretical physics have found high applicability in a number of very different areas. This book, a result of cross-disciplinary interaction among physicists, biologists and physicians, covers several topics where methods and approaches rooted in physics are successfully applied to analyze and to model biomedical data. The volume contains the papers presented at the International Workshop Modelling Bio-medical Signals held at the Physics Department of the University of Bari, Italy, on September 19-21th 2001. The workshop was gathered under the auspices of the Center of Innovative Technologies for Signal Detection and Processing of the University of Bari (TIRES Centre); the Organizing Committee of the Workshop comprised L. Angelini, R. Bellotti, A. Federici, R. Giuliani. G. Gonnella, G.Nardulli and S. Stramaglia. The workshop opened on September 19th 2001 with two colloquia given by profs. N. Accornero (University of Rome, la Sapienza), on Neural Networks and Neurosciences, and E. Marinari (University of Rome, la Sapienza) on Physics and Biology. Around 70 scientists attended the workshop, coming from different fields and disciplines. The large spectrum of competences gathered in the workshop favored an intense and fruitful exchange of scientific information and ideas. The topics discussed in the workshop include: decision support systems in medical science; several analyses of physiological rhythms and synchronization phenomena; biological neural networks; theoretical aspects of artificial neural networks and their role in neural sciences and in the analysis of EEG and Magnetic Resonance Imaging; gene expression patterns; the immune system; protein folding and protein crystallography. For the organization of the workshop and the publication of the present volume we acknowledge financial support from the Italian Ministry of University and Scientific Research (MURST) under the project (PRIN) "Theoretical Physics pf Fundamental Interactions", from the TIRES Centre, the Physics Department of the University of Bari and from the Section of Bari of the Istituto Nazionale di Fisica Nucleare (INFN). We also thank the Secretary of the Workshop, Mrs. Fausta Cannillo and Mrs. Rosa Bitetti for their help in organizing the event.
Giuseppe Nardulli Sebastiano Stramaglia University of Bari
This page is intentionally left blank
VII
CONTENTS
Preface
v
ANALYSIS AND MODELS OF BIOMEDICAL DATA BY THEORETICAL PHYSICS METHODS The Cluster Variation Method for Approximate Reasoning in Medical Diagnosis H. J. Kappen*
3
Analysis of EEG in Epilepsy K. Lehnertz, R. G. Andrzejak, T. Kreuz, F. Mormann, C. Rieke, P. David and C. E. Elger
17
Stochastic Approaches to Modeling of Physiological Rhythms Platnen Ch. Ivanov and Chung-Chuan Lo
28
Chaotic Parameters in Time Series of ECG, Respiratory Movements and Arterial Pressure E. Conte and A. Federici
51
Computer Analysis of Acoustic Respiratory Signals A. Vena, G. M. Insolera, R. Giuliani, T. Fiore and G. Perchiazzi
60
The Immune System: B Cell Binding to Multivalent Antigen Gyan Bhanot
67
Stochastic Models of Immune System Aging L. Mariani, G. Turchetti and F. Luciani
80
NEURAL NETWORKS AND NEUROSCIENCES Artificial Neural Networks in Neuroscience N. Accornero and M. Capozza
Italicized name indicates the author who presented the paper.
93
VIII
Biological Neural Networks: Modeling and Measurements R. Stoop and S. Lecchini Selectivity Property of a Class of Energy Based Learning Rules in Presence of Noisy Signals A. Bazzani, D. Remondini, N. Intrator and G. Castellani
107
123
Pathophysiology of Schizophrenia: fMRI and Working Memory G. Blasi and A. Bertolino
132
ANN for Electrophysiological Analysis of Neurological Disease R. Bellotti, F. de Carlo, M. de Tommaso, O. Difruscolo, R. Masssafra, V. Sciruicchio and S. Stramaglia
144
Detection of Multiple Sclerosis Lesions in Mri's with Neural Networks P. Blonda, G. Satalino, A. D'addabbo, G. Pasquariello, A. Baraldi and R. de Blasi
157
Monitoring Respiratory Mechanics Using Artificial Neural Networks G. Perchiazzi, G. Hedenstierna, A. Vena, L. Ruggiero, R. Giuliani and T. Fiore
165
GENOMICS AND MOLECULAR BIOLOGY Cluster Analysis of DNA-Chip Data E. Domany
175
Clustering mtDNA Sequences for Human Evolution Studies C. Marangi, L. Angelini, M. Mannarelli, M. Pellicoro, S. Stramaglia, M. Attimonelli, M. de Robertis, L. Nitti, G. Pesole, C. Saccone and M. Tommaseo
196
Finding Regulatory Sites from Statistical Analysis of Nucleotide Frequencies in the Upstream Region of Eukaryotic Genes M. Caselle, P. Provero, F. di Cunto and M. Pellegrino
209
Regulation of Early Growth Response-l Gene Expression and Signaling Mechanisms in Neuronal Cells: Physiological Stimulation and Stress G. Cibelli
221
Geometrical Aspects of Protein Folding C. Micheletti
234
ix
The Physics of Motor Proteins G. Lattanzi and A. Maritan
251
Phasing Proteins: Experimental Loss of Information and its Recovery C. Giacovazzo, F. Capitelli, C. Giannini, C. Cuocci and M. lanigro
264
List of Participants
279
Author Index
281
ANALYSIS AND MODELS OF BIOMEDICAL DATA BY THEORETICAL PHYSICS METHODS
This page is intentionally left blank
3
T H E C L U S T E R VARIATION M E T H O D FOR A P P R O X I M A T E R E A S O N I N G IN M E D I C A L DIAGNOSIS H.J. KAPPEN Laboratory
of Biophysics, bertSmbfys.
University kun. nl
of
Nijmegen,
In this paper, we discuss the rule based and probabilistic approaches to computer aided medical diagnosis. We conclude that the probabilistic approach is superior to the rule based approach, but due to its intractability, it requires approximations for large scale applications. Subsequently, we review the Cluster Variation Method and derive a message passing scheme that is efficient for large directed and undirected graphical models. When the method converges, it gives close to optimal results.
1
Introduction
Medical diagnosis is the a process, by which a doctor searches for the cause (disease) that best explains the symptoms of a patient. The search process is sequential, in the sense that patient symptoms suggest some initial tests to be performed. Based on the outcome of these tests, a tentative hypothesis is formulated about the possible cause(s). Based on this hypothesis, subsequent tests are ordered to confirm or reject this hypothesis. The process may proceed in several iterations until the patient is finally diagnosed with sufficient certainty and the cause of the symptoms is established. A significant part of the diagnostic process is standardized in the form of protocols. These are sets of rules that prescribe which tests to perform and in which order, based on the patient symptoms and previous test results. These rules form a decision tree, whose nodes are intermediate stages in the diagnostic process and whose branches point to additional testing, depending on the current test results. The protocols are defined in each country by a committee of medical experts. The use of computer programs to aid in the diagnostic process has been a long term goal of research in artificial intelligence. Arguably, it is the most typical application of artificial intelligence. The different systems that have been developed so-far use a variety of modeling approaches which can be roughly divided into two categories: rulebased approaches with or without uncertainty and probabilistic methods. The rule-based systems can be viewed as computer implementations of the protocols, as described above. They consist of a large data base of rules of the form: A -) B, meaning that "if condition A is true, then perform action B"
4
or "if condition A is true, then condition B is also true". The rules may be deterministic, in which case they are always true, or 'fuzzy' in which case they are true to a (numerically specified) degree. Examples of such programs are Meditel 1 , Quick Medical Reference (QMR) 2 , DXplain 3 , and Iliad 4 . In Berner et al. 5 a detailed study was reported that assesses the performance of these systems. A panel of medical experts collected 110 patient cases, and concensus was reached on the correct diagnosis for each of these patients. For each disease, there typically exists a highly specific test that will unambiguously identify the disease. Therefore, based on such complete data, diagnosis is easy. A more challenging task was defined by removing this defining test from each of the patient cases. The patient cases were presented to the above 4 systems. Each system generated its own ordered list of most likely diseases. In only 10-20 % of the cases, the correct diagnosis appeared on the top of these lists and in approximately 50 % of the cases the correct diagnosis appeared in the top 20 list. Many diagnoses that appeared in the top 20 list were considered irrelevant by the experts. It was concluded that these systems are not suitable for use in clinical practice. There are two reasons for the poor performance of the rule based systems. One is that the rules that need to be implemented are very complex in the sense that the precondition A above is a conjunction of many factors. If each of these factors can be true or false, there is a combinatoric explosion of conditions that need to be described. It is difficult, if not impossible, to correctly describe all these conditions. The second reason is that evidence is often not deterministic (true or false) but rather probabilistic (likely or unlikely). The above systems provide no principled approach for the combination of such uncertain sources of information. A very different approach is to use probability theory. In this case, one does not model the decision tree directly, but instead models the relations between diseases and symptoms in one large probability model. As a (too) simplified example, consider a medical domain with a number of diseases d = ( d i , . . . ,d„) and a number of symptoms or findings / = (/i, • • • , / r o ) One estimates the probability of each of the diseases p(di) as well as the probability of each of the findings given a disease, p(fj\di). If diseases are independent, and if findings are conditionally independent given the disease, the joint probability model is given by: P(d,f)=P(d)p(f\d)=Upwiipifjidi) i
(i)
j
It is now possible to compute the probability of a disease dj, given some
5
findings by using Bayes' rule:
mft) =
-7W
(2)
where ft is the list of findings that has been measured up to diagnostic iteration t. Computing this for different di gives the list of most probable diseases given the current findings ft and provides the tentative diagnosis of the patient. Furthermore, one can compute which additional test is expected to be most informative about any one of the diagnoses, say di, by computing
for each test j that has not been measured so-far. The test j that minimizes Iij is the most informative test, since averaged over its possible outcomes, it gives the distribution over di with the lowest entropy. Thus, one sees that whereas the rule based systems model the diagnostic process directly, the probabilistic approach models the relations between diseases and findings. The diagnostic decisions (which test to measure next) is then computed from this model. The advantage of this latter approach is that the model is much more transparent about the medical knowledge, which facilitates maintenance (changing probability tables, adding diseases or findings), as well as evaluation by external experts. One of the main drawbacks of the probabilistic approach is that it is intractable for large systems. The computation of marginal probabilities requires summation over all other variables. For instance, in Eq. 2 p(/») = !>,/,*>(<*»/) and the sum over d, f contains exponentially many terms. Therefore, probabilistic models for medical diagnosis have been restricted to very small domains 6 ' 7 or when covering a large domain, at the expense of the level of detail at which the disease areas are modeled 8 . In order to make the probabilistic approach feasible for large applications one therefore needs to make approximations. One can use Monte Carlo sampling but one finds that accurate results require very many iterations. An alternative is to use analytical approximations such as for instance mean field theory 9 ' 10 . This approach works well for probability distributions that resemble spin systems (so-called Boltzmann Machines) but, as we will see, they perform poorly for directed probability distributions of the form Eq. 1.
6
2
The Cluster Variation Method
A very recent development is the application of the Cluster Variation method (CVM) to probabilistic inference. CVM is a method that has been developed in the physics community to approximately compute the properties of the Ising model 11 . The CVM approximates the probability distribution by a number of (overlapping) marginal distributions (clusters). The quality of the approximation is determined by the size and number of clusters. When the clusters consist of only two variables, the method is known as the Bethe approximation. Recently, the method has been introduced by Yedidia et al. 12 into the machine learning community, showing that in the Bethe approximation, the CVM solution coincides with the fixed points of the belief propagation algorithm. Belief propagation is a message passing scheme, which is known to yield exact inference in tree structured graphical models 13 . However, BP can can also give impressive results for graphs that are not trees 14 . Let x = (xi,..., xn) be a set of variables, where each xi can take a finite number of values. Consider a probability distribution on x of the form
PH{X)
' = wrf"(x)
z
= Y.z-H(x)
It is well known, that PH can be obtained as the minimum of the free energy, which is a functional over probability distributions of the following form: FH(p) = (H) + {logp),
(3)
where the expectation value is taken with respect to the distribution p, i.e. (H) = ^2xp{x)H{x). When one minimizes FH{P) with respect to p under the constraint of normalization ^2xp(x) — 1, one obtains pjj a. Computing marginals of PH such as PH(XI) or pn(xi,Xj) involves sums over all states, which is intractable for large n. Therefore, one needs tractable approximations to pn- The cluster variation method replaces the probability distribution PH{X) by a large number of (possibly overlapping) probability distributions, each describing the interaction between a small number of variables. Due to the one-to-one correspondence between a probability distribution and the minima of a free energy we can define approximate probability distributions by constructing approximate free energies and computing their minimum (or minima!). This is achieved by approximating Eq. 3 in terms of the cluster probabilities. The solution is obtained by minimizing this approximate free energy subject to normalization and consistency constraints. "Minimizing the free energy can also be viewed as maximizing the entropy with an additional constraint on (H).
7
Define clusters as subsets of distinct variables: xa = (xilt... ,Xik), with 1 < ij < n. Define a set of clusters P that contain the interactions in H and write H as a sum of these interactions:
H{x) = YJHl{xa) aeP
For instance for Boltzmann-Gibbs distributions, H(x) — X};>, WijXiXj + Y^i^ixi a n d P consists of all pairs and all singletons: P = {a\a = (ij),i > j or a = (i)}. For directed graphical models with evidence, such as Eq. 2, P is the set of clusters formed by each node i and its parent set itf. P = {a\a — (i,iti),i = 1 , . . . , n}. x is the set of non-evidence variables (d in this case) and Z = p(/t). We now define a set of clusters B, that will determine our approximation in the cluster variation method. B should at least contain the interactions in p(x) in the following way: Va € P =*• 3a' e B,a C a'. In addition, we demand that no two clusters in B contain each other: a,a' £ B => a <£ a',a' (£_ a. Clearly, the minimal choice for B is to chose clusters from P itself. The maximal choice for B is the cliques obtained when constructing the junction tree 15 . In this case, the clusters in B form a tree structure and the CVM method is exact. In general, one, can chose any set of clusters B that satisfy the above definition. Since the proposed method scales exponentially in the size of the clusters in B, the smaller the clusters in B, the faster the approximation. For a simple directed graphical model an intermediate choice of clusters is illustrated in Fig. 1. Define a set of clusters M that consist of any intersection of a number of clusters of B: M = {/3|/3 = nkak,ak € B}, and define U = BUM. Once U is given, we define numbers ap recursively by the Moebius formula
1=
]T
aa, v/?er/
aeU,aD0
In particular, this shows that aa = 1 for a 6 B. The Moebius formula allows us to rewrite interactions on potentials in P in terms of interactions on clusters in U:
H(x) = Y^ Hl(xfi)
=Y
Yl
aaHl{x0) = Y
a
*Ha,
8
Figure 1. Directed graphical model consisting of 5 variables. Interactions are defined on clusters in P - {(1), (1,2), (2,3), (1,4), (3,4,5)}. The clusters in B are depicted by the dashed lines (B - {(1,2,3), (2,3,5), (1,4,5), (3,4,5)}. The set M = {(1), (2,3),(3),(5), (3, 5)}
where we have defined Ha as the sum of all interactions in /? 6 P that are contained in cluster a £ U:
Ha(xa) =
^2
H
p(xp)
0€P,0Ca
Since interactions may appear in multiple clusters, the constants aa ensure that double counting is compensated for. b Thus, we can express (H) in Eq. 3 explicitly in terms of the cluster probabilities pa as (H) = ^
aa (Ha) = ^2 aa^2Ha{xa)pa(xa)
' i n the case of the Boltzmann distribution H\ =Hi=
iiij
and a(ij) = 1 and a^
— 2 — n.
BiXi
— W^jXiXj
-j- "iXi ~r VjXj
(4)
9
Whereas (H) can be written exactly in terms of pa, this is not the case for the entropy term in Eq. 3. The approach is to decompose the entropy of a cluster a in terms of 'connected entropies' in the following way: c Sa
= ~ ^ P a ( Z a ) \ogPa{xa) Xa
S = ^ 0' 0Ca
(5)
Such a decomposition can be made for any cluster. In particular it can be made for the 'cluster' consisting of all variables, so that we obtain
S = - $ > ( * ) logp(:r) = £ 5 i x
(6)
0
where /3 runs over all subsets of variables d. The cluster variation method approximates the total entropy by restricting this sum to only clusters in U and re-expressing Sp in terms of Sa, using the Moebius formula and the definition Eq. 5.
s
~ ]C sl= Yl H aaSl= 12aaSa 0<EU
0€U aZ>0
(7)
a£U
Since Sa is a function of pa (Eq. 5) we have expressed the entropy in terms of cluster probabilities pa. The quality of this approximation is illustrated in Fig. 2. Note, that the both the Bethe and Kikuchi approximation strongly deteriorate around J = 1, which is where the spin-glass phase starts. For J < 1, the Kikuchi approximation is superior to the Bethe approximation. Note, however, that this figure only illustrates the quality of the truncations in Eq. 7 assuming that the exact marginals are known. It does not say anything about the accuracy of the approximate marginals using the approximate free energy. Substituting Eqs. 4 and 7 into the free energy Eq. 3 we obtain the approximate free energy of the Cluster Variation method. This free energy must be minimized subject to normalization constraints 5Zx pa(xa) = 1 and consistency constraints Pa(xp) = p0(xp),
/3eM,a£B,l3ca.
(8)
Note, that we have excluded constraints between clusters in M. This is sufficient because when /?,/?' £ M, f3 C (3' and ft' C a 6 B: pa{xp') — Pp'{x0') c
T h i s decomposition is similar to writing a correlation in terms of means and covariance. For instance when a = (i), S^ = SjL is the usual mean field entropy and S^jf = SjL + ST.. + S,.., defines two node correction. 0) (y) O n n variables this sum contains 2 n terms.
d
10
Figure 2. Exact and approximate entropies for the fully connected Boltzmann-Gibbs distribution on n = 10 variables with random couplings (SK model) as a function of mean coupling strength. Couplings viij are chosen from a normal Gaussian distribution with mean zero and standard deviation J'/'s/n. External fields di are chosen from a normal Gaussian distribution with mean zero and standard deviation 0.1. The exact entropy is computed from Eq. 6. The Bethe and Kikuchi approximations are computed using the approximate entropy expression Eq. 7 with exact marginals and by choosing B as the set of all pairs and all triplets, respectively.
and Pa(xp) = pp{xp) implies pp>{xp) = pp{xp). In the following, a and /? will be from B and M respectively, unless otherwise stated e . Adding Lagrange multipliers for the constraints we obtain the Cluster Variation free energy: Fcvm{{pa(xa)},
{Xa}, {Xapixp)})
~ H A° ( X ^ * ^ )
-1
= ^2
a
a ^Pafca)
{Ha(xa)
+
\ogpa(xa))
) ~ X ^2^2XMx0)(Pa(xp) -P0(x0)) (9)
3
Iterating Lagrange multipliers
Since the Moebius numbers can have arbitrary sign, Eq. 9 consists of a sum of convex and concave terms, and therefore is a non-convex optimization problem. One can separate F c v m in a convex and concave term and derive an e
In fact, additional constraints can be removed, when clusters in M contain subclusters in M. See Kappen and Wiegerinck 1 6 .
11
iteration procedure in pa and the Lagrange multipliers that is guaranteed to converge 17 . The resulting algorithm is a 'double loop' iteration procedure. Alternatively, by setting QF7™\,U £ U equal to zero, one can express the cluster probabilities in terms of the Lagrange multipliers:
Pa{xa) = — exp I -Ha(xa)
exp P0 (xp) = — Z
?
-H0{xp)
\
+ ^2 ^ap{xp)
(10)
V Xa0{x0)
a
^p
(11)
J
The remaining task is to solve for the Lagrange multipliers such that all constraints (Eq. 8) are satisfied. There are two ways to do this. One is to define an auxiliary cost function that is zero when all constraints are satisfied and positive otherwise and minimize this cost function with respect to the Lagrange multipliers. This method is discussed in Kappen and Wiegerinck 16 . Alternatively, one can substitute Eqs. 10-11 into the constraint Eqs. 8 and obtain a system of coupled non-linear equations. In Yedidia et al. 12 a message passing algorithm was proposed to find a solution to this problem. Here, we will present an alternative method, that solves directly in terms of the Lagrange multipliers. Consider the constraints Eq. 8 for some fixed cluster /? and all clusters a D (3 and define Bp = {a E B\a D /?}. We wish to solve for all constraints a D /?, with a € B0 by adjusting Xap,oi € Bp. This is a sub-problem with |2?/3||:r3| equations and an equal number of unknowns, where \Bp\ is the number of elements of B0 and \x0\ is the number of values that x0 can take. The probability distribution p0 (Eq. 11) depends only on these Lagrange multipliers, up to normalization. pa (Eq. 10) depends also on other Lagrange multipliers. However, we consider only its dependence on \ap,ct G Bp, and consider all other Lagrange multipliers as fixed. Thus, Pa(xa) = exp{Xap(xp))pa(xa),a
6 Bp
(12)
with pa independent of \ap, a € Bp. Substituting, Eqs. 11 and 12 into Eq. 8, we obtain a set of linear equations for \ap(xp) which we can solve in closed form: Xap{xp) =
/.
n
,Hp(x0) -Y]Aaa,
logpa>{xp)
12 with a/3 + \B0\ We update the probabilities with the new values of the Lagrange multipliers using Eqs. 11 and 12. We repeat the above procedure for all /? G M until convergence. 4
Numerical results
We show the performance of the Lagrange multiplier iteration method (LMI) on several 'real world' directed graphical models. For undirected models, see Kappen and Wiegerinck 16 . First, we consider the well-known chest clinic problem, introduced by Lauritzen and Spiegelhalter 15 . The graphical model is given in figure 3a. The model describes the relations between three diagnoses (Tuberculosis(T), Lung Cancer(L) and Bronchitis(B), middle layer), clinical observations and symptoms (Positive X-ray(X) and Dyspnoea(D)(=shortness of breath), lower layer) and prior conditions (recent visit to Asia(A) and whether the patient smokes(S)). In figure 3b, we plot the exact single node marginals against the approximate marginals for this problem. For LMI, the clusters in B are defined according to the conditional probability tables, i.e. when a node has k parents, a cluster of size k + 1 on this node and its parents is included in the set B. Convergence was reached in 6 iterations. Maximal error on the marginals is 0.0033. For comparison, we computed the mean field and TAP approximations, as previously introduced by Kappen and Wiegerinck 10 . Although TAP is significantly better than MF, it is far worse than the CVM method. This is not surprising, since both the MF and TAP approximation are based on single node approximation, whereas the CVM method uses potentials up to size 3. Secondly, we consider a graphical model that was developed in a project together with the department of internal medicine of the Utrecht Academic hospital. In this project, called Promedas, we aim to model a large part of internal medicine 18 . The network that we consider was one of the first modules that we built and models in detail some specific anemias and consists of 91 variables. The network was developed using our graphical tool BayesBuilder 19 which is shown with part of the network in figure 4. The clusters in B are defined according to the conditional probability tables. Convergence was reached in 5 iterations. Maximal absolute error on the marginals is 0.0008. The mean field and TAP methods perform very poorly on this problem. Finally, we tested the cluster variation method on randomly generated
13 — •
x MF O TAP + CVM
o
.•• o
r
:
x x
0.1
s® t-m^
• 0.1
(a) Chest clinic graphical model
,••-
0.2 0 .3 0.4 Exact marglinals
(b) Approximate inference
Figure 3. a) The Chest Clinic model describes the relations between diagnoses, findings and prior conditions for a small medical domain. An arrow a —> b indicates that the probability of b depends on the values of a. b) Inference of single node marginals using MF, TAP and LMI method, comparing the results with exact.
directed graphical models. Each node is randomly connected to k parents. The entries of the probability tables are randomly generated between zero and one. Due to the large number of loops in the graph, the exact method requires exponential time in the so-called tree width, which can be seen from Table 1 to scale approximately linear with the network size. Therefore exact computation is only feasible for small graphs (up to size n = 40 in this case). For the CVM, clusters in B are denned according to the conditional probability tables. Therefore, maximal cluster size is k + 1. On these more challenging cases, LMI does not converge. The results shown are obtained with the auxiliary cost function as that was briefly mentioned in section 3 and fully described in Kappen and Wiegerinck 16 . Minimization was done using conjugate gradient descent. The results are shown in Table 1. 5
Conclusion
In this paper, we have described two approaches to computer aided medical diagnosis. The rule based approach directly models the diagnostic decision tree. We have shown that this approach fails to pass the test of clinical
14
Figure 4. BayesBuilder graphical software environment, showing part of the Anemia network. The network consists of 91 variables and models some specific Anemias.
n 10 20 30 40 50
Iter 16 30 44 48 51
\c\ 8 12 16 21 26
Potential error 0.068 0.068 0.079 0.073 -
Margin error 0.068 0.216 0.222 0.218 -
Constraint error 5.8e-3 6.2e-3 4.5e-3 4.2e-3 3.2e-3
Table 1. Comparison of CVM method for large directed graphical models. Each node is connected to k = 5 parents. \C\ is the tree width of the triangulated graph required for the exact computation. Iter is the number of conjugate gradient descent iterations of the CVM method. Potential error and margin error are the maximum absolute error in any of the cluster probabilities and single variable marginals computed with CVM, respectively. Constraint error is the maximum absolute error in any of the constraints Eq. 8 after termination of CVM.
relevance and we have given several reasons t h a t could account for this failure. T h e alternative approach uses a probabilistic model to describe the relations between diagnoses and findings. This approach has the great advantage t h a t it provides a principled approach for the combination of different sources
15 of uncertainty. T h e price t h a t we have t o pay for this luxury is t h a t probabilistic inference is intractable for large systems. As a generic approximation m e t h o d , we have introduced the Cluster Variation m e t h o d and presented a novel iteration scheme, called Lagrange Multiplier Iteration. W h e n it converges, it provides very good results and is very fast. However, it is not guaranteed t o converge in general. In those more complex cases one must resort to more expensive methods, such as C C C P 1 7 or using an auxiliary cost function 1 6 . Acknowledgments T h i s research was supported in p a r t by t h e Dutch Technology Foundation ( S T W ) . I would like to t h a n k Taylan Cemgil for providing his M a t l a b graphical models toolkit, and Wim Wiegerinck and Sebino Stramaglia (Bari, Italy) for useful discussions. References 1. 2. 3. 4. 5.
6.
7.
8.
9. 10.
Meditel, Devon, Pa. Meditel: Computer assisted diagnosis, 1991. CAMDAT, Pittsburgh. QMR (Quick Medical Reference), 1992. Massachusetts General Hospital, Boston. DXPLAIN, 1992. Applied Informatics, Salt Lake City. ILIAD, 1992. E.S. Berner, G.D. Webster, A.A. Shugerman, J.R. Jackson, J. Algina, A.L. Baker, E.V. Ball, C.G. Cobbs, V.W. Dennis, E.P. Frenkel, L.D. Hudson, E.L. Mancall, C.E. Racley, and O.D. Taunton. Performance of four computer-based diagnostic systems. N-Engl-J-Med., 330(25):1792-6, 1994. D.E. Heckerman, E.J. Horvitz, and B.N. Nathwani. Towards normative expert systems: part I, the Pathfinder project. Methods of Information in medicine, 31:90-105, 1992. D.E. Heckerman and B.N. Nathwani. Towards normative expert systems: part II, probability-based representations for efficient knowledge acquisition and inference. Methods of Information in medicine, 31:106-116, 1992. M.A Shwe, B. Middleton, D.E. Heckerman, M. Henrion, Horvitz E.J., H.P. Lehman, and G.F. Cooper. Probabilistic Diagnosis Using a Reformulation of the Internist-1/ QMR Knowledge Base. Methods of Information in Medicine, 30:241-55, 1991. H.J. Kappen and F.B. Rodriguez. Efficient learning in Boltzmann Machines using linear response theory. Neural Computation, 10:1137-1156, 1998. H.J. Kappen and W.A.J.J. Wiegerinck. Second order approximations for probability models. In Todd Leen, Tom Dietterich, Rich Caruana, and Virginia de Sa, editors, Advances in Neural Information Processing Systems 13, pages 238244. MIT Press, 2001.
16 11. R. Kikuchi. Physical Review, 81:988, 1951. 12. J.S. Yedidia, W.T. Freeman, and Y. Weiss. Generalized belief propagation. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13 (Proceedings of the 2000 Conference), 2001. In press. 13. J. Pearl. Probabilistic reasoning in intelligent systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, California, 1988. 14. Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, pages 467-475, 1999. 15. S.L. Lauritzen and D.J. Spiegelhalter. Local computations with probabilties on graphical structures and their application to expert systems. J. Royal Statistical society B, 50:154-227, 1988. 16. H.J. Kappen and W. Wiegerinck. A novel iteration scheme for the cluster variation method. In T.G. Dieterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14, 2002. In press. 17. A.L. Yuille and A. Rangarajan. The convex-concave principle. In T.G. Dieterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14, 2002. In press. 18. W. Wiegerinck, H.J. Kappen, E.W.M.T ter Braak, W.J.P.P ter Burg, M.J. Nijman, Y.L. O, and J.P. Neijt. Approximate inference for medical diagnosis. Pattern Recognition Letters, 20:1231-1239, 1999. 19. B. Kappen, W. Wiegerinck, and M. Nijman. Bayesbuilder. In W. Buntine, B. Fischer, and J. Schumann, editors, Software Support for Bayesian Analysis. RIACS, NASA Ames Research Center, 2000.
17 ANALYSIS OF EEG I N E P I L E P S Y K. LEHNERTZ 1 -*, R. G. ANDRZEJAK 1 - 2 , T. KREUZ 1 - 2 , F. MORMANN 1 ' 3 , C. RIEKE 1 ' 3 , P. DAVID 3 , C. E. ELGER 1 1 Department of Epileptology, University of Bonn John von Neumann Institue for Computing, Forschungszentrum Jiilich Institute for Radiation and Nuclear Physics, University of Bonn Germany 'E-mail:
[email protected]
We present potential applications of nonlinear time series analysis techniques to electroencephalographic recordings (EEG) derived from epilepsy patients. Apart from diagnostically oriented topics including localization of epileptic foci in different anatomical locations during the seizure-free interval we discuss possibilities for seizure anticipation which is one of the most challenging aspects in epileptology.
1
Introduction
The disease epilepsy is characterized by a recurrent and sudden malfunction of the brain that is termed seizure. Epileptic seizures reflect the clinical signs of an excessive and hypersynchronous activity of neurons in the cerebral cortex. Depending on the extent of involvement of other brain areas during the course of the seizure, epilepsies can be divided into two main classes. Generalized seizures involve almost the entire brain while focal (or partial) seizures originate from a circumscribed region of the brain (epileptic focus) and remain restricted to this region. Epileptic seizures may be accompanied by an impairment or loss of consciousness, psychic, autonomic or sensory symptoms or motor phenomena. Knowledge about basic mechanisms leading to seizures is mainly derived from animal experiments. Although there is a considerable bulk of literature on the topic, the underlying electrophysiological and neurobiochemical mechanisms are not yet fully explored. Moreover, it remains to be proven whether findings from animal experiments are fully transformable to human epilepsies. Recordings of the membrane potential of neurons under epileptic conditions indicate an enormous change, which by far exceeds physiological changes occurring with neuronal excitation. This phenomenon is termed paroxysmal depolarization shift (PDS 1 ' 2 ' 3 ) and represents a shift of the resting membrane potential that is accompanied by an increase of intracellular calcium and a massive burst of action potentials (500 - 800 per second). PDS originating from a larger cortical region are associated with steep field potentials (known
18
as spikes) recorded in the scalp E E C Focal seizures are assumed to be initiated by abnormally discharging neurons (so-called bursters 4>5>6) that recruit and entrain neighboring neurons into a "critical mass". This build-up might be mediated by an increasing synchronization of neuronal activity that is accompanied by a loss of inhibition, or by facilitating processes that permit seizure emergence by lowering a threshold. The fact that seizures appear to be unpredictable is one of the most disabling aspects of epilepsy. If it was possible to anticipate seizures, this would dramatically change therapeutic possibilities 7 . Approximately 0.6 - 0.8% of the world population suffer from epilepsy. In about half of these patients, focal seizures originate from functional and/or morphological lesions of the brain. Antiepileptic drugs insufficiently control or even fail to manage epilepsy in 30 - 50% of the cases. It can be assumed that 10 - 15% of these cases would profit from epilepsy surgery. Successful surgical treatment of focal epilepsies requires exact localization of the epileptic focus and its delineation from functionally relevant areas. For this purpose, different presurgical evaluation methodologies are currently in use 8 . Neurological and neuropsychological examinations are complemented by neuroimaging techniques that try to identify potential morphological correlates. Currently, the gold standard for an exact localization of the epileptic focus, however, is to record the patient's spontaneous habitual seizure using electroencephalography. Depending on the individual occurrence of seizures, this task requires long-lasting and continuous recording of the E E C In case of ambiguous scalp EEG findings, invasive recordings of the electrocorticogram (ECoG) or the stereo-EEG (SEEG) via implanted depth electrodes are indicated. This procedure, however, comprises a certain risk for the patient and is time-consuming and expensive. Thus, reliable EEG analysis techniques are required to localize and to demarcate the epileptic focus even during the seizure-free interval 9 . 2
EEG analysis
In recent years, technical advantages such as digital video-EEG monitoring systems as well as an increased computational power led to a highly sophisticated clinical epilepsy monitoring allowing to process huge amounts of data in real-time. In addition, chronically implanted intracranial electrodes allow continuous recording of brain electrical activity from the surface of the brain and/or within specific brain structures at a high signal-to-noise ratio and at a high spatial resolution. Due to its high temporal resolution and its close relationship to physiological and pathological functions of the brain, electroencephalography is regarded indispensible for clinical practice despite the rapid
19
development of imaging techniques like magnetic resonance tomography or positron emission tomography. Usually EEG analysis methods are applied to long-lasting multi-channel recordings in a moving-window fashion. The time length of a window is chosen in such a way that it represents a reasonable tradeoff between approximate stationarity and sufficient number of data points. Depending on the complexity of the analysis technique applied, computation times vary between a few milliseconds up to some tenths of seconds. Thus, most applications can be performed in real-time using standard personal computers. However, analyses cannot be applied in a strict mathematical sense because the necessary theoretical conditions cannot be met in practice - a common problem that applies to any analysis of short (and noisy) data segments or nonstationary data. Linear EEG analysis methods 10 can be divided into two main conceptbased categories. Nonpar am etric methods comprise analysis techniques such as evaluation of amplitude, interval or period distributions, estimation of autoand crosscorrelation functions as well as analyses in the frequency domain like power spectral estimation and cross-spectral functions. Parametric methods include, among others, AR (autoregressive) and ARMA (autoregressive moving average) models 11 , inverse AR-filtering and segmentation analysis. These main branches are accompanied by pattern recognition methods involving either a mixture of techniques mentioned before or, more recently, the wavelet transform 12,13 ' 14 . Despite the limitations mentioned above, classical EEG analysis has significantly contributed to and still advances understanding of physiological and pathophysiological mechanisms of the brain. Nonlinear time series analysis techniques 15 ' 16 ' 17 have been developed to analyze and characterize apparently irregular behavior - a distinctive feature of the EEG. Techniques mainly involve estimates of an effective correlation dimension, entropy related measures, Lyapunov exponents, measures for determinism, similarity, interdependencies, recurrence quantification as well as tests for nonlinearity. During the last decade a variety of these analysis techniques have been repeatedly applied to EEG recordings during physiological and pathological conditions and were shown to offer new information about complex brain dynamics 18 ' 19 ' 20 ' 21 . Today it is commonly accepted that the existence of a deterministic or even chaotic structure underlying neuronal dynamics is difficult if not impossible to prove. Nevertheless, nonlinear approaches to the analysis of the system brain have generated new clinical measures as well as new ways of interpreting brain electrical function, particularly with regard to epileptic brain states. Indeed, recent results provide converging evidence that nonlinear EEG analysis allows to reliably characterize different
20
states of brain function and dysfunction, provided that limitations of the respective analysis techniques are taken into consideration and, thus, results are interpreted with care (e.g., only relative measures with respect to recording time and recording site are assumed reliable). In the following, we will concentrate on nonlinear EEG analysis techniques and illustrate potential applications of these techniques in the field of epileptology. 3
Nonlinear EEG analysis in epilepsy
In early publications 22 ' 23 evidence for low-dimensional chaos in EEG recordings of epileptic seizures was claimed. However, accumulating knowledge about influencing factors as well as improvement of analysis techniques rendered these findings questionable 24>25>26'27. There is, however, now converging evidence that relative estimates of nonlinear measures improve understanding of the complex spatio-temporal dynamics of the epileptogenic process in different brain regions and promise to be of high relevance for diagnostics 28 ' 29 ' 30 ' 31 . 3.1
Outline of analysis techniques
In the course of our work attempting to characterize the epileptogenic process we investigated the applicability of already established measures and developed new ones. Results presented below were obtained from extracting these measures from long-lasting ECoG and SEEG recordings from subgroups of a collective of about 300 patients with epileptogenic foci located in different anatomical regions of the brain. Apart from linear measures such as statistical moments, power spectral estimates, or auto- and cross-correlation functions several univariate and bivariate nonlinear measures are currently in use. Since details of different analysis techniques have been published elsewhere, we here only provide a short description of the measures along with the respective references. Univariate measures: Based on the well known fact that neurons involved in the epileptic process exhibit high frequency discharges that are scarcely modulated by physiological brain activity 5 , we hypothesized that this neuronal behavior should be accompanied by an intermittent loss of complexity or an increase of nonlinear deterministic structure in the corresponding electrographic signal even during the seizure-free interval 33 . To characterize complexity, we use an estimate of an effective correlation dimension 32 £>|ff and the derived measure neuronal complexity loss L*33'34. These measures are accompanied by estimates of the largest Lyapunov exponent Ai 35 ' 36 ' 37 , by entropy
21
measures 38 , by the nonlinear prediction error 39 ' 40 , and by different complexity measures 41 derived from the theory of symbolic dynamics 42 . Detection and characterization of nonlinear deterministic structures in the EEG is achieved by combining tests for determinisms 43 and for nonlinearity 44 resulting in a measure we have termed fraction of nonlinear determinism £ 45 . 46 . 47 . Bivariate measures: As already mentioned, pathological neuronal synchronization is considered to play a crucial role in epileptogenesis. Therefore, univariate measures are supplemented by bivariate measures that aim to detect and characterize synchronization in time series of brain electrical activity. The nonlinear interdependence 5 4 8 , 4 9 characterizes statistical relationships between two time series. In contrast to commonly used measures like cross-correlation, coherence and mutual information, S is non-symmetric and provides information about the direction of interdependence. It is closely related to other attempts to detect generalized synchronization 50 . Following the approach of understanding phase synchronization 51 in a statistical sense 52 we developed a straight-forward measure for phase synchronization employing the circular variance 53 of a phase distribution. We have termed this measure mean phase coherence i? 5 4 ' 5 5 . 3.2
Localizing the epileptic focus
Several lines of evidence originating from studies of human epileptic brain tissue as well as from animal models of chronic seizure disorders indicate that the epileptic brain is different from the normal, even between seizures - during the so called interictal state. In order to evaluate the efficacy of different analysis techniques to characterize the spatio-temporal dynamics of the epileptogenic process and thus to localize the epileptic focus during the interictal state, we applied them to long-lasting interictal ECoG/SEEG recordings covering different states of normal behavior and vigilance as well as different extents of epileptiform activity. We retrospectively analyzed data of patients with mesial temporal lobe epilepsy (MTLE) and/or neocortical lesional epilepsy (NLE) undergoing presurgical evaluation. We included data of patients for whom surgery led to complete post-operative seizure control as well as of patients who did not benefit from surgery. Nonlinear EEG analysis techniques allow to reliably localize epileptic foci in different cerebral regions in more than 80 % of the cases. This holds true regardless whether or not obvious epileptiform activity is present in the recordings. Results obtained from our univariate measures indicate that the dynamics of the epileptic focus during the seizure-free interval can indeed be characterized by an intermittent loss of complexity or an increased nonlinear determin-
22
istic structure in an otherwise stochastic environment. Bivariate measures indicate that the epileptic focus is characterized by a pathologically increased level of interdependence or synchronization. Both univariate and bivariate measures thus share the ability to detect dynamical changes related to the epileptic process. It can be concluded that our EEG analysis techniques approach the problem of characterizing the epileptogenic process from different points of view, and they indicate the potential relevance of nonlinear EEG analysis to improve understanding of intermittent dysfunctioning of the dynamical system brain between seizures. Moreover, our results also stress the relevance of nonlinear EEG analyses in clinical practice since they provide potentially useful diagnostic information and thus may contribute to an improvement of the presurgical evaluation 9 ' 56 ' 57 ' 29 . 3.3
Anticipating seizures
In EEG analysis the search for the hidden information predictive of an impending seizure has a long history. As early as 1975, researchers considered analysis techniques such as pattern recognition analytic procedures of spectral data 58 or autoregressive modeling of EEG data 5 9 for predicting epileptic seizures. Findings indicated that EEG changes characteristic for pre-seizure states may be detectable, at most, a few seconds before the actual seizure onset. None of these techniques have been implemented clinically. Apart from applying signal analysis techniques the relevance of steep, high amplitude epileptiform potentials (spikes, the hallmark of the epileptic brain) were investigated in a number of clinical studies 60 ' 61 ' 62 . While some authors reported a decrease or even total cessation of spikes before seizures, reexamination did not confirm this phenomenon in a larger sample. Although there are numerous studies exploring basic neuronal mechanisms that are likely to be associated with seizures, to date, no definite information is available as to the generation of seizures in humans. In this context, the term "critical mass" might be misleading in the sense that it could merely imply an increasing number of neurons that are entrained into an abnormal discharging process. This mass phenomenon would have been easily accessible for conventional EEG analyses which, however, failed to detect it. Recent research in seizure anticipation has shown that evident markers in the EEG representing the transition from asynchronous to synchronous states of the epileptic brain {pre-seizure state) can be detected on time scales ranging from several minutes up to hours. These studies indicate that the seizure-initiating process should be regarded as an unfolding of an increasing number of critical, possibly nonlinear dynamical interferences between
23
neurons within the focal area as well as with neurons surrounding this area. Indeed, there is converging evidence from different laboratories that nonlinear analysis is capable to characterize this collective behavior of neurons from the gross electrical activity and hence allows to define a critical transition state, at least in a high percentage of cases 63,64,65,67,34,68,69,70,55,71,28 4
Future perspectives
Results obtained so far are promising and emphasize the high value of nonlinear EEG analysis techniques both for clinical practice and basic science. Up to now, however, findings have been mainly obtained from retrospective studies in well-elaborated cases and using invasive recording techniques. Thus, on the one hand, evaluation of more complicated cases as well as prospective studies on a larger population of patients are necessary. The possibility of defining a critical transition state can be regarded as the most prominent contribution of nonlinear EEG analysis to advance knowledge about seizure generation in humans. This possibility has recently been expanded by studies indicating accessibility of critical pre-seizure changes from non-invasive EEG recordings 65 ' 71 . Nonetheless, in order to achieve an unequivocal definition of a pre-seizure state from either invasive or non-invasive recordings, a variety of influencing factors have to be evaluated. Most studies carried out so far have concentrated on EEG recordings just prior to seizures. Other studies 33 ' 48 ' 66 ' 55,47,28 , however, have shown that there are phases of dynamical changes even during the seizure-free interval pointing to abnormalities that are not followed by a seizure. Moreover, pathologically or physiologically induced dynamical interactions within the brain are not yet fully understood. Among others, these include different sleep stages, different cognitive states, as well as daily activities that clearly vary from patient to patient. In order to evaluate specificity of possible seizure anticipation techniques, analyses of long-lasting multi-channel EEG recordings covering different pathological and physiological states are therefore mandatory 67,34,28 . Along with these studies, EEG analysis techniques have to be further improved. New techniques are needed that allow a better characterization of non-stationarity and high-dimensionality in brain dynamics, techniques disentangling even subtle dynamical interactions between pathological disturbances and surrounding brain tissue as well as refined artifact detection and elimination. Since the methods currently available allow a distinguished characterization of the epileptogenic process, the combined use of these techniques along with appropriate classification schemes 72 ' 73 ' 74 can be regarded a promising venture.
24
Once given an improved sensitivity and specificity of EEG analysis techniques for both focus localization and seizure anticipation, broader clinical applications on a larger population of patients, either at home or in a clinical setting, can be envisaged. As a future perspective, one might also take into consideration implantable seizure anticipation and prevention devices similar to those already in use with Parkinsonian patients 75 ' 76 . Although optimization of algorithms underlying the computation of specific nonlinear measures 77,69 already allows to continuously track the temporal behavior of nonlinear measures in real time, these applications still require the use of powerful computer systems, depending on the number of recording channels necessary to allow unequivocal characterization of the epileptogenic process. Thus, further optimization and development of a miniaturized analyzing system are definitely necessary. Taking into account the technologies currently available, realization of such systems can be expected within the next years.
Acknowledgments We gratefully acknowledge discussions with and contributions by Jochen Arnhold, Wieland Burr, Guillen Fernandez, Peter Grassberger, Thomas Grunwald, Peter Hanggi, Christoph Helmstaedter, Martin Kurthen, Hans-Rudi Moser, Thomas Schreiber, Bruno Weber, Jochen Wegner, Guido Widman and Heinz-Gregor Wieser. This work was supported by the Deutsche Forschungsgemeinschaft. References 1. 2. 3. 4. 5. 6. 7. 8. 9.
E. S. Goldensohn and D. P. Purpura, Science 139, 840 (1963). H. Matsumoto and C. Ajmone-Marsan, Exp. Neurol. 9, 286 (1964). H. Matsumoto and C. Ajmone-Marsan, Exp. Neurol. 9, 305 (1964). R. D. Traub and R. K. Wong, Science 216, 745 (1982). A. R. Wyler and A. A. Ward, in Epilepsy, a window to brain mechanisms, eds. J. S. Lockard and A. A. Ward (Raven Press, New York, 1992). E. R. G. Sanabria, H. Su and Y. Yaari, J. Physiol. 532, 205 (2001). C. E. Elger, Curr. Opin. Neurol. 14, 185 (2001). J. Engel Jr. and T. A. Pedley, Epilepsy: a comprehensive text-book (Philadelphia, Lippincott-Raven, 1997). C. E. Elger, K. Lehnertz and G. Widman in Epilepsy: Problem solving in clinical practice, eds. D. Schmidt and S. C. Schacter (Martin Dunitz Publishers, London, 1999).
25
10. F. H. Lopes da Silva in Electroencephalography, eds. E. Niedermeyer and F. H. Lopes da Silva (Williams & Wilkins, Baltimore, 1993). 11. P J. Franaszczuk and G. K. Bergey, Biol. Cybern. 8 1 , 3 (1999). 12. S. J. SchiSetal, Electroencephalogr. clin. Neurophysiol. 91,442(1994). 13. R. R. Coifman and M. V. Wickerhauser, Electroencephalogr. clin. Neurophysiol. (Suppl.) 45, 57 (1996). 14. A. Effern et al, Physica D 140, 257 (2000). 15. H. G. Schuster, Deterministic chaos: an introduction (VCH Verlag, Basel, Cambridge, New York, 1989). 16. E. Ott, Chaos in dynamical systems (Cambridge University Press, Cambridge, UK, 1993). 17. H. Kantz and T. Schreiber, Nonlinear time series analysis (Cambridge University Press, Cambridge, UK, 1997). 18. E. Ba§ar, Chaos in Brain Function (Springer, Berlin, 1990). 19. D. Duke and W. Pritchard, Measuring chaos in the human brain (World Scientific, Singapore, 1991). 20. B. H. Jansen and M. E. Brandt, Nonlinear dynamical analysis of the EEG (World Scientific, Singapore, 1993). 21. K. Lehnertz , J. Arnhold, P. Grassberger and C. E. Elger, Chaos in brain? (World Scientific, Singapore, 2000). 22. A. Babloyantz and A. Destexhe, Proc. Natl. Acad. Sci. USA 83, 3513 (1986). 23. G. W. Frank et al, Physica D 46, 427 (1990). 24. J. Theiler, Phys. Lett. A 196, 335 (1995). 25. J. Theiler and P. E. Rapp, Electroencephalogr. clin. Neurophysiol. 98, 213 (1996). 26. T. Schreiber, in Chaos in brain?, eds. K. Lehnertz , J. Arnhold, P. Grassberger and C. E. Elger, (World Scientific, Singapore, 2000). 27. F. H. Lopes da Silva et al., in Chaos in brain?, eds. K. Lehnertz , J. Arnhold, P. Grassberger and C. E. Elger, (World Scientific, Singapore, 2000). 28. B. Litt et al, Neuron 30, 51 (2001). 29. K. Lehnertz et al, J. Clin. Neurophysiol. 18, 209 (2001). 30. M. Le Van Quyen et al, J. Clin. Neurophysiol. 18, 191 (2001). 31. R. Savit et al, J. Clin. Neurophysiol. 18, 246 (2001). 32. P. Grassberger, T. Schreiber and C. Schaffrath, Int. J. Bifurcation Chaos 1, 521 (1991). 33. K. Lehnertz and C. E. Elger, Electroencephalogr. clin. Neurophysiol. 95, 108 (1995). 34. K. Lehnertz and C. E. Elger, Phys. Rev. Lett. 80, 5019 (1998).
26
35. M. T. Rosenstein, J. J. Collins and C. J. de Luca Physica D 65, 117 (1994). 36. H. Kantz, Phys Lett A 185, 77 (1994). 37. J. Wegner, Diploma thesis, University of Bonn (1998). 38. R. Quian Quiroga et al., Phys. Rev. E 62, 8380 (2000). 39. A. S. Weigend and N. A. Gershenfeld, Time Series Prediction: Forecasting the Future and Understanding the Past (Addison-Wesley, Reading, 1993). 40. R. G. Andrzejak et al., Phys. Rev. E 64, 061907 (2001). 41. T. Kreuz, Diploma thesis, University of Bonn (2000). 42. B. L. Hao, Elementary Symbolic Dynamics and Chaos in Dissipative Systems (World Scientific, Singapore, 1989). 43. D. T. Kaplan and L. Glass, Phys. Rev. Lett. 68, 427 (1992). 44. T. Schreiber and A. Schmitz, Phys. Rev. Lett. 77, 635 (1996). 45. R. G. Andrzejak, Diploma thesis, University of Bonn (1997). 46. R. G. Andrzejak et al, in Chaos in brain?, eds. K. Lehnertz , J. Arnhold, P. Grassberger and C. E. Elger, (World Scientific, Singapore, 2000). 47. R. G. Andrzejak et al, Epilepsy Res. 44, 129 (2001). 48. J. Arnhold et al., Physica D 134, 419 (1999). 49. J. Arnhold, Publication Series of the John von Neumann Institute for Computing, Forschungszentrum Jiilich, Vol. 4 (2000). 50. N.F. Rulkov et al., Phys. Rev. E 51, 980 (1995). 51. M. G. Rosenblum et al., Phys. Rev. Lett. 76, 1804 (1996) 52. P. Tass et al., Phys. Rev. Lett. 81, 3291 (1998). 53. K. V. Mardia Probability and mathematical statistics: Statistics of directional data. (Academy Press, London, 1972). 54. F. Mormann, Diploma thesis, University of Bonn (1998). 55. F. Mormann et al, Physica D 144, 358 (2000). 56. C. E. Elger et al., in Neocortical epilepsies, eds P. D. Williamson, A. M. Siegel, D. W. Roberts, V. M. Thadani and M. S. Gazzaniga (Lippincott, Williams & Wilkins: Philadelphia, 2000). 57. C. E. Elger et al., Epilepsia 41 (Suppl. 3), S34 (2000). 58. S. S. Viglione and G. O. Walsh, Electroencephalogr. clin. Neurophysiol. 39, 435 (1975). 59. Z. Rogowski, I. Gath and E. Bental, Biol. Cybern. 42, 9 (1981). 60. J. Gotman et al, Epilepsia 23, 432 (1982). 61. H. H. Lange et al., Electroencephalogr. clin. Neurophysiol. 56, 543 (1983). 62. A. Katz et al., Electroencephalogr. clin. Neurophysiol. 79, 153 (1991). 63. L. D. Iasemidis et al., Brain Topogr. 2, 187 (1990).
27
64. C. E. Elger and K. Lehnertz, in Epileptic Seizures and Syndromes, ed. P. Wolf (J. Libbey & Co, London, 1994). 65. L. D. lasemidis et al. in Spatiotemporal Models in Biological and Artificial Systems, eds. F. H. Lopes da Silva, J. C. Principe and L. B. Almeida (IOS Press, Amsterdam, 1997). 66. M. Le Van Quyen et al, Physica D 127, 250 (1999). 67. C. E. Elger and K. Lehnertz, Eur. J. Neurosci. 10, 786 (1998). 68. J. Martinerie et al, Nat. Med. 4, 1173 (1998). 69. M. Le Van Quyen et al, Neuroreport 10, 2149 (1999). 70. H. R. Moser et al, Physica D 130, 291 (1999). 71. M. Le Van Quyen et al, Lancet 357, 183 (2001). 72. Y. Salant, I. Gath and O. Henriksen, Med. Biol. Eng. Comput. 36, 549 (1998). 73. R. Tetzlaff et al, IEEE Proc. Eur. Conf. Circuit Theory Design , 573 (1999). 74. A. Petrosian et al, Neurocomputing 30, 201 (2000). 75. A. L. Benabid et al, Lancet 337, 403 (1991). 76. P. Tass, Biol. Cybern. 85, 343 (2001). 77. G. Widman et al, Physica D 121, 65 (1998).
28
S T O C H A S T I C A P P R O A C H E S TO MODELING OF PHYSIOLOGICAL R H Y T H M S P L A M E N CH. IVANOV Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215 Cardiovascular Division, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA E-mail:
[email protected] CHUNG-CHUAN LO Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215, USA E-mail:
[email protected] The scientific question we address is how physiological rhythms spontaneously selfregulate. It is fairly widely believed, nowadays, deterministic mechanisms, including perhaps chaos, offer a promising avenue to pursue in answering this question. Complementary to these deterministic foundations, we propose an approach which treats physiologiocal rhythms as fundamentally governed by several random processes, each of which biases the rhythm in different ways. We call this approach stochastic feedback, since it leads naturally to feedback mechanisms that are based on randomness. To illustrate our approach, we treat in some detail the regulation of heart rhythms and sleep-wake transitions during sleep — two classic "unsolved" problems in physiology. We present coherent, physiologically based models and show that a generic process based on the concepts of biased random walk and stochastic feedback can account for a combination of independent scaling characteristics observed in data.
1 1.1
M o d e l i n g scaling features in heartbeat
dynamics
Introduction
The fundamental principle of homeostasis asserts that physiological systems seek to maintain a constant output after perturbation 1 ' 2 - 3 ' 4 . Recent evidence, however, indicates that healthy systems even at rest display highly irregular dynamics 5-6>7>8>9,io_ Here, we address the question of how to reconcile homeostatic control and complex variability. We propose a general approach based on the concept of "stochastic feedback" and illustrate this approach by considering the neuroautonomic regulation of the heart rate. Our results suggest that in healthy systems the control mechanisms operate to drive the system away from extreme values while not allowing it to settle down to a constant (homeostatic) output. The model generates complex dynamics and
29 successfully accounts for key characteristics of the cardiac variability not fully explained by traditional models: (i) 1 / / power spectrum, (ii) stable scaling form for the distribution of the variations in the beat-to-beat intervals and (hi) Fourier phase correlations Ii>i2,i3,i4,i5,i6,i7_ Furthermore, the reported scaling properties arise over a broad zone of parameter values rather than at a sharply-defined "critical" point. 1.2
Random walks and feedback mechanisms
The concept of dynamic equilibrium or "homeostasis" x ' 2 ' 3 led to the proposal that physiological variables, such as the cardiac interbeat interval r(n), where n is the beat number, maintain an approximately constant value in spite of continual perturbations. Thus one can write in general r{n) =T0+n,
(1)
where TQ is the "preferred level" for the interbeat interval and 77 is a white noise with strength a, defined as the standard deviation of rj. We first re-state this problem in the language of random walks. The time evolution of an uncorrelated and unbiased random walk is expressed by the equation r ( n + l ) —r(n) = 77. At every step the walker has equal probability to move "up" or "down." The deviation from the initial level increases as n 1 / 2 18 , so an uncorrelated and unbiased random walk does not preserve homeostasis (Fig. la). To maintain a constant level, there must be a bias in the random walk 19 , Tin + 1) - Tin) = I(n),
(2)
T(„\ - j W tt + V) , if Tin) < T0, W - \ - « , (1+77) , if r ( n ) > r 0 .
,,,
with J
{6)
The weight w is the strength of the feedback input biasing the walker to return to its preferred level To- When away from the attraction level To, walker has an higher probability of moving towards the attraction level. This behavior represents Cannon's idea of homeostasis (dynamical equilibrium), where a system maintains constancy even when perturbed by external stimuli. Note that Eqs. (2 & 3 ) generate dynamics similar to Eq. (1) but through a nonlinear feedback mechanism. The dynamics generated by these rules correspond to a system with time-independent feedback. As expected in this case, for short time scales (high frequencies), the power spectrum scales as l / / 2 (Brownian noise) with a crossover to white noise at longer time scales due to the attraction to level TQ (Fig. lb). Note the shift
30
n
—». n
A
lnf
'
». Inf
'
>A
Figure 1. (a) Schematic representation of the dynamics of the model, (a) Evolution of a random walk starting from initial position TO. The deviation of the walk from level TO increases as n 1 ' 2 , where n is the number of steps. The power spectrum of the random walk scales as l / / 2 (Brownian noise). The distribution P(A) of the amplitudes A of the variations in the interbeat intervals follows a Rayleigh distribution. Here the amplitudes are obtained by: (i) wavelet transform of the random walk which filters out trends and extracts the variations at a time scale a; (ii) calculation of the amplitudes of the variations via Hilbert transform, (b) Random walk with a bias toward TO. (C) Random walk with two stochastic feedback controls. In contrast to (b), the levels of attraction TO and n change values in time. Each level persists for a time interval X; drawn from a distribution with an average value Ti oc i(. Each time the level changes, its new value is drawn from a uniform distribution. Perturbed by changing external stimuli, the system nevertheless remains within the bounds defined by A T even after many steps. We find that such dynamical mechanism based on a single characteristic time scale T\oc^ generates a 1 / / power spectrum over several decades. Moreover, P(A) decays exponentially, which we attribute to nonlinear Fourier phase interactions in the walk.
of the crossover to longer time scales (lower frequencies) when stronger noise is present. For weak noise the walker never leaves the close vicinity of the attraction level, while for stronger noise, larger drifts can occur leading to
31
longer trends and longer time scales. However, in both cases, P(A) follows the Rayleigh distribution because the wavelet transform filters out the drifts and trends in the random walk (Fig. lb). For intermediate values of the noise there is a deviation from the Rayleigh distribution and the appearance of an exponential tail. We find that Eqs. (2 & 3) do not reproduce the statistical properties of the empirical data (Fig. lb). We therefore generalize them to include several inputs Ik {k = 0,1, • • • , m), with different preferred levels Tk, which compete in biasing the walker: m
r(n + l)-T(n) = X)/*(«).
(4)
ife=0
where lk[
>~ \-wk
(1+v),
iir(n)>rk.
{b
>
From a biological or physiological point of view, it is clear t h a t t h e preferred levels Tk of t h e inputs Ik cannot remain constant in time, for otherwise the system would not be able t o respond t o varying external stimuli. We assume t h a t each preferred interval T>. is a random function of time, with values correlated over a time scale T]* ck . We next coarse grain t h e system and choose T^(TI) t o be a random step-like function constrained t o have values within a certain interval and with t h e length of t h e steps drawn from a distribution with an average value Xlock (Fig. l c ) . This model yields several interesting features, including 1 / / power spectrum, scaling of the distribution of variations, and correlations in t h e Fourier phases. 1.3
Neuroautonomic regulation of heartbeat dynamics
To illustrate the approach for the specific example of neuroautonomic control of cardiac dynamics, we first note that the healthy heart rate is determined by three major inputs: (i) the sinoatrial (SA) node; (ii) the parasympathetic (PS); and (hi) the sympathetic (SS) branches of the autonomic nervous system. (i) The SA node or pacemaker is responsible for the initiation of each heart beat 20 ; in the absence of other external stimuli, it is able to maintain a constant interbeat interval 2 . Experiments in which PS and SS inputs are blocked reveal that the interbeat intervals are very regular and average only 0.6s 20 . The input from the SA node, Is A , thus biases the interbeat interval r toward its intrinsic level TSA (see Fig. lb).
32
beat 10 r •
beat 10 3 .
,
10""
10"3
,10"'
/[beat]
10"1
10°
10s
10"4
10"3
,10"2
/[beat 1 ]
10~1
Figure 2. Stochastic feedback regulation of the cardiac rhythm. We compare the predictions of the model with the healthy heart rate. Sequences of interbeat intervals r from (a) healthy individual and (b) from simulation exhibit an apparent visual similarity, (c) Power spectra of the interbeat intervals r(n) from the data and the model. To first approximation, these power spectra can be described by the relation S(f) ~ l / / 1 1 . The presence of patches in both heart and model signals lead to observable crossovers embedded on this 1 / / behavior at different time scales. We calculated the local exponent B from the power spectrum of 24h records ( « 10 5 beats) for 20 healthy subjects and found that the local value of P shows a persistent drift, so no true scaling exists. (This is not surprising, due to the nonstationarity of the signals), (d) Power spectra of the increments in r(n). The model and the data both scale as power laws with exponents close to one. Since the non-stationarity is reduced, crossovers are no longer present. We also calculated the local exponent for the power spectrum of the increments for the same group of 20 healthy subjects as in the top curve, and found that the exponent Bj fluctuates around an average value close to one, so true scaling does exist.
(ii) The PS fibers conduct impulses that slow the heart rate. Suppression of SS stimuli, while under PS regulation, can result in the increase of the interbeat interval to as much as 1.5s 20>21. The activity of the PS system changes with external stimuli. We model these features of the PS input, Ips, by the following conditions: (1) a preferred interval, Tps(n), randomly chosen from an uniform distribution with an average value larger than TSA , and (2) a correlation time, Tps, during which Tps does not change, where Tps is drawn
33
from a distribution with an average value T]0Ck. (iii) The SS fibers conduct impulses that speed up the heart beat. Abolition of parasympathetic influences when the sympathetic system remains active can decrease the interbeat intervals to less than 0.3s 2 0 . There are several centers of sympathetic activity highly sensitive to environmental influences 2 1 . We represent each of the N sympathetic inputs by I3SS (j = 1, • • •, N). We attribute to I3SS the following characteristics: (1) a preferred interbeat interval TgS(n) randomly chosen from a uniform distribution with an average value smaller than TSA, and (2) a correlation time Tj in which r3ss(n) does not change; Tj is drawn from a distribution with an average value Xiock which is the same for all N inputs (and the same as for the PS system), so Ti ock is the characteristic time scale of both the PS and SS inputs. The characteristics for the PS and SS inputs correspond to a random walk with stochastic feedback control (Fig. lc). Thus, for the present example of cardiac neuroautonomic control, we have N + 2 inputs and Eq. (4) becomes: N
r{n + 1) - r{n) = ISA(n)
+ IPS (n,TPS(n))
+ ^2lJss
[n,TJss(n)j
,
(6)
where the structure of each input is identical to the one in Eq. (5). Equation (6) cannot fully reflect the complexity of the human cardiac system. However, it provides a general framework that can easily be extended to include other physiological systems (such as breathing, baroreflex control, different locking times for the inputs of the SS and PS systems 5 ' 2 2 , etc.). We find that Eq. (6) captures the essential ingredients responsible for a number of important statistical and scaling properties of the healthy heart rate. Next we generate a realization of the model with parameters N = 7 and WSA = wss — W P S / 3 = O.Olsec (Fig. 2b). We choose Tj randomly from an exponential distribution with average T)OCk = 1000 beats. (We find that a different form of the distribution for Tj does not change the results.) The noise r\ is drawn from a symmetrical exponential distribution with zero average and standard deviation a = 0.5. We define the preferred values of the interbeat intervals for the different inputs according to the following rules: (1) TSA = 0.6sec, (2) Tps are randomly selected from an uniform distribution in the interval [0.9,1.5]sec, and (3) the r | s ' s are randomly selected from an uniform distribution in the interval [0.2,1.0]sec. The actual value of the preferred interbeat intervals of the different inputs and the ratio between their weights are physiologically justified and are of no significance for the dynamics — they just set the range for the fluctuations of r , chosen to correspond to the empirical data.
34
/[beat"]
lnS(f)
~ \/ TB
\/ \
\ // ^\ 2lock
A
TAl
TB1
Tc D \C Ax
B \ A
TC In/ Figure 3. (top) Effect of the correlation time T[oc^ on the scaling of the power spectrum of T(TI) for a signal comprising 10 6 beats, (b) Schematic diagram illustrating the origin of the different scaling regimes in the power spectrum of r ( n ) .
1.4
Experimantal findings and results of simulations
To qualitatively test the model, we first compare the time series generated by the stochastic feedback model and the healthy heart 23 and find that both signals display complex variability and patchiness (Fig. 2a,b). To quantitatively test the model, we compare the statistical properties of heart data with the predictions of the model: (a) We first test for long-range power-law correlations in the interbeat intervals, which exist for healthy heart dynamics 24 . These correlations can be uncovered by calculating power spectra, and we see (Fig. 2) that the model simulations correctly reproduce the power-law correlations observed in data over several decades. In particular, we note that the non-stationarity of both the data and model signals leads to the existence of several distinct scaling
35
regimes in the power spectrum of r(n) (Figs. 2c and 3). We find that with increasing Tiock, the power spectrum does not follow a single power law but actually crosses over from a behavior of the type l / / 2 at very small time scales (or high frequencies), to a behavior of the type 1//° for intermediate time scales, followed by a new regime with l / / 2 for larger time scales (Fig. 3). At very large time scales, another regime appears with flat power spectrum. In the language of random walkers, T is determined by the competition of different neuroautonomic inputs. For very short time scales, the noise will dominate, leading to a simple random walk behavior and l / / 2 scaling (regime A in Fig. 3(bottom). For time scales longer than T^, the deterministic attraction towards the "average preferred level" of all inputs will dominate, leading to a flat power spectrum (regime B in Fig. 3(bottom), see also Fig. lb). However, after a time TB (of the order of T Iock /iV), the preferred level of one of the inputs will have changed, leading to the random drift of the average preferred level and the consequent drift of the walker towards it. So, at these time scales, the system can again be described as a simple random walker and we expect a power spectrum of the type l / / 2 (regime C in Fig. 3(bottom)). Finally, for time scales larger than Tc, the walker will start to feel the presence of the bounds on the fluctuations of the preferred levels of the inputs. Thus, the power spectrum will again become fiat (regime D). Since the crossovers are not sharp in the data or in the numerical simulations, they can easily be misinterpreted as a single power law scaling with an exponent /3 « 1. By reducing the strength of the noise, we decrease the size of regime A and extend regime B into higher frequencies. In the limit a — 0, the power spectrum of r(n), which would coincide with the power spectrum of the "average preferred level", would have only regimes B, C and D. The stochastic feedback mechanism thus enables us to explain the formation of regions (patches) in the time series with different characteristics. (b) By studying the power spectrum of the increments we are able to circumvent the effects of the non-stationarity. Our results show that true scaling behavior is indeed observed for the power spectrum of the increments, both for the data and for the model (Fig. 2). (c) We calculate the probability density P(A) of the amplitudes A of the variations of interbeat intervals through the wavelet transform. It has been shown that the analysis of sequences of interbeat intervals with the wavelet transform 25 can reveal important scaling properties 26 for the distributions of the variations in complex nonstationary signals. In agreement with the results of Ref. 2 7 , we find that the distribution P(A) of the amplitudes A of interbeat interval variations for the model decays exponentially—as is observed for healthy heart dynamics (Fig. 4). We hypothesize that this decay arises
36
10°
:
0"'
;
a. oe
i Data • Model
AP Figure 4. Analysis of the amplitudes A of variations in r{n). We apply to the signal generated by the model the wavelet transform with fixed scale a, then use the Hilbert transform to calculate the amplitude A. The top left panel shows the normalized histogram P(A) for the data (6h daytime) and for the model (with the same parameter values as in Fig. 2), and for wavelet scale a — 8 beats, i.e., ft) 40s. (Derivatives of the Gaussian are used as a wevelet function). We test the generated signal for nonlinearity and Fourier phase correlations, creating a surrogate signal by randomizing the Fourier phases of the generated signal but preserving the power spectrum (thus, leaving the results of Fig. 2 unchanged). The histogram of the amplitudes of variations for the surrogate signal follows the Rayleigh distribution, as expected theoretically (see inset). Thus the observed distribution which is universal for healthy cardiac dynamics, and reproduced by the model, reflects the Fourier phase interactions. The top right panel shows a similar plot for d a t a collected during sleep and for the model with N < wps/wssWe note that the distribution is broader for the amplitudes of heartbeat interval variations during sleep compared wake activity indicating counterintuitively a higher probability for large variations with large values deviating from the exponential tail 2 8 . Our model reproduces this behavior when the number of sympathetic imputs is reduced in accordance with the physiological observations of decreased sympathetic tone during sleep 2 0 . The bottom panel tests the stability of the analysis for the model at different time scales a. The distribution is stable over a wide range of time scales, identical to the range observed for heart data 2 7 . T h e stability of the distributions indicates statistical self-similarity in the variations at different time scales.
37
from nonlinear Fourier phase interactions and is related to the underlying nonlinear dynamics. To test this hypothesis, we perform a parallel analysis on a surrogate time series obtained by preserving the power spectrum but randomizing the Fourier phases of a signal generated by the model (Fig. 4); P{A) now follows the Rayleigh distribution P(A) ~ Ae~A , since there are no Fourier phase correlations 2 9 . (d) For the distribution displayed in Fig. 4, we test the stability of the scaling form at different time scales; we find that P(A) for the model displays a scaling form stable over a range of time scales identical to the range for the data (Fig. 4) 2 7 . Such time scale invariance indicates statistical selfsimilarity 3 0 . A notable feature of the present model is that in addition to the power spectra, it accounts for the form and scaling properties of P(A), which are independent of the power spectra 31 . No similar tests for nonlinear dynamics have been reported for other models 12>13>14. Further work is needed to account for the recently reported long-range correlations in the magnitude of interbeat interval increments 32 , the multifractal spectrum of heartrate fluctuations 33 and the power-law distribution of segments in heart rate recordings with different local mean values 34 . The model has a number of parameters, whose values may vary from one individual to another, so we next study the sensitivity of our results to variations in these parameters. We find that the model is robust to parameter changes. The value of Xiock and the strength of the noise a are crucial to generate dynamics with scaling properties similar to those found for empirical data. We find that the model reproduces key features of the healthy heart dynamics for a wide range of time scales (500 < TiOCk < 2000) and noise strengths (0.4 < a < 0.6). The model is consistent with the existence of an extended "zone" in parameter space where scaling behavior holds, and our picture is supported by the variability in the parameters for healthy individuals for which similar scaling properties are observed. 1.5
Conclusions
Scaling behavior for physical systems is generally obtained for fixed values of the parameters, corresponding to a critical point or phase transition 3 5 . Such fixed values seem unlikely in biological systems exhibiting power law scaling. Moreover, such critical point behavior would imply perfect identity among individuals; our results are more consistent with the robust nature of healthy systems which appear to be able to maintain their complex dynamics over a wide range of parameter values, accounting for the adaptability of healthy
38
systems. The model we review here, and the data which it fits, support a revised view of homeostasis that takes into account the fact that healthy systems under basal conditions, while being continuously driven away from extreme values, do not settle down to a constant output. Rather, a more realistic picture may involve nonlinear stochastic feedback mechanisms driving the system.
2 2.1
Modeling dynamics of sleep-wake transitions Introduction
In this Section we investigate the dynamics of the awakening during the night for healthy subjects and find that the wake and the sleep periods exhibit completely different behavior: the durations of wake periods are characterized by a scale-free power-law distribution, while the durations of sleep periods have an exponential distribution with a characteristic time scale. We find that the characteristic time scale of sleep periods changes throughout the night. In contrast, there is no measurable variation in the power-law behavior for the durations of wake periods. We develop a stochastic model, based on biased random walk approach, which agrees with the data and suggests that the difference in the dynamics of sleep and wake states arises from the constraints on the number of microstates in the sleep-wake system. In clinical sleep centers, the "total sleep time" and the "total wake time" during the night are used to evaluate sleep efficacy and to diagnose sleep disorders. However, the total wake time during a longer period of nocturnal sleep is actually comprised of many short wake intervals (Fig. 5). This fact suggests that the "total wake time" during sleep is not sufficient to characterize the complex sleep-wake transitions and that it is important to ask how periods of the wake state distribute during the course of the night. Although recent studies have focused on sleep control at the neuronal level 36>37,38,39^ v e r y little is known about the dynamical mechanisms responsible for the time structure or even the statistics of the abrupt sleep-wake transitions during the night. Furthermore, different scaling behavior between sleep and wake activity and between different sleep stages has been observed 40>41. Hence, investigating the statistical properties of the wake and sleep states throughout the night may provide not only a more informative measure but also insight into the mechanisms of the sleep-wake transition.
39
120 Wake REM
180 240 Time (min)
L
cc 2
g3 z 4
-^W
Wake Sleep 50
70
90 110 Time (min)
130
150
Figure 5. The textbook picture 4 3 of sleep-stage transitions describes a quasi-cyclical process, with a period of « 90 min, where the wake stage is followed by light sleep and then by deep sleep, with transition back to light sleep, and then to rapid-eye-movement (REM) sleep—or perhaps to wake stage. Sleep-wake transitions during nocturnal sleep, (a) Representative example of sleep-stage transitions from a healthy subject. Data were recorded in a sleep laboratory according to the Rechtschaffen and Kales criteria 5 2 : two channels of electroencephalography (EEG), two channels of electrooculography (EOG) and one channel of submental electromyography (EMG) were recorded. Signals were digitized at 100 Hz and 12 bit resolution, and visually scored by sleep experts in segments of 30 seconds for sleep stages: wakefulness, rapid-eye-movement (REM) sleep and non-REM sleep stages 1, 2, 3 and 4. (b) Magnification of the shaded region in (a), (c) In order to study sleep-wake transitions, we reduce five stages into a single sleep state by grouping rapid-eye-movement (REM) sleep and sleep stages 1 to 4 into a single sleep state.
2.2
Empirical analysis
We analyze 39 full-night sleep records collected from 20 healthy subjects (11 females and 9 males, ages 23-57, with average sleep duration 7.0 hours). We first study the distribution of durations of the sleep and of the wake states during the night (Fig. 5). We calculate the cumulative distribution of
40
durations, defined as p(r)dr,
(7)
where p(r) is the probability density function of durations between r and r + dr. We analyze P(t) of the wake state, and we find that the data follow a power-law distribution, P(t) ~ t~a .
(8)
We calculate the exponent a for each of the 20 subjects, and find an average exponent a = 1.3 with a standard deviation a = 0.4. It is important to verify that the data from individual records correspond to the same probability distribution. To this end, we apply the KolmogorovSmirnov test to the data from individual records. We find that we cannot reject the null hypothesis that p(t) of the wake state of each subject is drawn from the same distribution, suggesting that one can pool all data together to improve statistics without changing the distribution (Fig. 6a). Pooling the data from all 39 records, we find that P(t) of the wake state is consistent with a power-law distribution with an exponent a — 1.3 ± 0.1 (Fig. 7a). In order to verify that the distribution of durations of wake state is better described by a power law rather than an exponential or a stretched exponential functional forms, we fit these curves to the distributions from pooled data. Using Levenberg-Marquardt method, we find that both exponential and stretched exponential form lead to worse fit. The x 2 error of power-law fit, exponential fit and stretched exponential fit are 3 x 10~ 5 , 1.6 x 10~ 3 and 3.5 x 10~ 3 , respectively. We also check the results by plotting (i) log P(t) versus t and (ii) log(| logP(t)\) versus logt a and find in both cases that the data are clearly more curved than when we plot log P(t) versus logt, indicating that power law provides the best description of the data b. We perform a similar analysis for the sleep state and find, in contrast to the result for the wake state, that the data in large time region (t > 5 min) exhibit exponential behavior P(t) ~ e-*/ r .
(9)
"For the stretched exponential y = aexp(—bxc), where a, b and c are constants, the log(| log2/|) versus logx plot is not a straight line unless a = 1. Since we don't know what the corresponding value of a is in our data, we can not rescale y so that a — I. The solution is to shift x for a certain value to make y = 1 when x — 0, in which case a = 1. In our data, P(t) = 1 when t = 0.5, so we shift t by —0.5 before plotting log(| logP(t)|) versus logt. b According Eq. 7, if P(t) is a power-law function, so is p(t). We also separately check the functional form of p(t) for the data with same procedure and find that the power law provides the best description of the data.
41
Figure 6. Cumulative probability distribution P(t) of sleep and wake durations of individual and pooled data. Double-logarithmic plot of P(t) of wake durations (a) and semilogarithmic plot of P(t) of sleep durations (b) for pooled data and for data from one typical subject. P(t) for three typical subjects is shown in the insets. Note that due to limited number of sleep-wake periods for each subject, it is difficult to determine the functional form for individual subjects. We perform K-S test and compare the probability density p(t) for all individual data sets and pooled data for both wake and sleep periods. For both sleep and wake, less than 10% of the individual d a t a fall below the 0.05 significant level of disproof of the null hypothesis, that p(t) for each individual subject is very likely drawn from the same distribution. The K-S statistics significantly improves if we use recordings only from the second night. Therefore, pooling all data improves the statistics by preserving the form of p(t).
We calculate the time constants r for the 20 subjects, and find an average r = 20 min with a = 5. Using the Kolmogorov-Smirnov test, we find that we cannot reject the null hypothesis that p(t) of the sleep state of each subject of our 39 data sets is drawn from the same distribution (Fig. 6b). We further find that P(t) of the sleep state for the pooled data is consistent with an exponential distribution with a characteristic time r = 22 ± 1 min (Fig. 7b). In order to verify that P(t) of sleep state is better described by an exponential functional form rather than by a stretched exponential functional form, we fit these curves to the P(t) from pooled data. Using Levenberg-Marquardt method, we find that the stretched exponential form lead to worse fit. The X2 error of exponential fit and stretched exponential fit are 8 x 10~5 and 2.7 x 10~ 2 , respectively. We also check the results by plotting log(| logP(t)\) versus logt (1) and find that the data are clearly more curved than when we plot logP(i) versus logt, indicating that an exponential form provides the best description of the data. Sleep is not a "homogeneous process" throughout the course of the night
42
Figure 7. Cumulative distribution of durations P(t) of sleep and wake states from data, (a) Double-logarithmic plot of P(t) from the pooled data. For the wake state, the distribution closely follows a straight line with a slope a — 1.3 ± 0.1, indicating power-law behavior of the form, cf. Eq. (8). (b) Semi-logarithmic plot of P(t). For the sleep state, the distribution follows a straight line with a slope 1 / T where T = 22 ± 1, indicating an exponential behavior of the form, cf. Eq. (9). It has been reported that the individual sleep stages have exponential distributions of durations 53 > 54 > 55 . Hence we expect an exponential distribution of durations for the sleep state. 42 43
' , so we ask if there is any change of a and r during the night. We study sleep and wake durations for the first two hours, middle two hours, and the last two hours of nocturnal sleep using the pooled data from all 39 records (Fig. 8). Our results suggest that a does not change for these three portions of the night, while r decreases from 27 ± 1 min in the first two hours to 22 ± 1 min in the middle two hours, and then to 18 ± 1 min in the last two hours. The decrease in r implies that the number of wake periods increases as the night proceeds, and we indeed find that the average number of wake periods for the last two hours is 1.4 times larger than for the first two hours.
2.3
Model
We next investigate mechanisms that may be able to generate the different behavior observed for sleep and wake. Although several quantitative models, such as the two-process model 44 and the thermoregulatory model 45 , have been developed to describe human sleep regulation, detailed modeling of frequent short awakening during nocturnal sleep has not been addressed 46 . To model the sleep-wake transitions, we make three assumptions (Fig. 9) 4 r :
43 10
o
a=1.3 O
-
a. Wak e 3
10"
,
1
A».
°
, 10
time (min)
.X^ 100
o
• Middle 2 hr A Last 2 hr
£\o ,
o
>,
o First 2 hr
Cumulativ e probabilit
\g,
o
Cumulativ e probabili
>,
50
time (min)
Figure 8. P(t) of sleep and wake states in the first two hours, middle two hours and last two hours of sleep, (a) P(t) of wake states; the power-law exponent a does not change in a measurable way. (b) P(t) of sleep states; the characteristic time r decreases in the course of the night.
Assumption 1 defines the key variable x(t) for sleep-wake dynamics. Although we consider a two-state system, the brain as a neural system is unlikely to have only two discrete states. Hence, we assume that both wake and sleep "macro" states comprise large number of "microstates" which we map onto a continuous variable x(t) defined in such a way that positive values correspond to the wake state while negative values correspond to the sleep state. We further assume that there is a finite region — A < x < 0 for the sleep state. Assumption 2 concerns the dynamics of the variable x(t). Recent studies 37 ' 39 suggest that a small population of sleep-active neurons in a localized region of the brain distributes inhibitory inputs to wake-promoting neuronal populations, which in turn interact through a feedback on the sleep-active neurons. Because of these complex interactions, the global state of the system may present a "noisy" behavior. Accordingly, we assume that x(t) evolves by a random-walk type of dynamics due to the competition between the sleepactive and wake-promoting neurons. Assumption 3 concerns a bias towards sleep. We assume that if x(t) moves into the wake state, then there will be a "restoring force" pulling it towards the sleep state. This assumption corresponds to the common experience that in wake periods during nocturnal sleep, one usually has a strong tendency to quickly fall asleep again. Moreover, the longer one stays awake, the more difficult it may be to fall back asleep, so we assume that the restoring force becomes weaker as one moves away from the transition point x = 0. We
44
model these observations by assuming that the random walker moves in a logarithmic potential V(x) = b\uxJ yielding a force f(x) = -dV(x)/dx = —b/x, where the bias b quantifies the strength of the force. Assumptions 1-3 can be written compactly as: c /,x ,. -.N /.x f e W> if-A
+
(sleep), .,„. l P (10) ( w a k 4;
where e(t) is an uncorrelated Gaussian-distributed random variable with zero mean and unit standard deviation. In our model, the bias b and the threshold A may change during the course of the night due to physiological variations such as circadian cycle 44 - 46 . In our model, the distribution of durations of the wake state is identical to the distribution of return times of a random walk in a logarithmic potential. For large times, this distribution is of a power law form 48.49>50>51, Hence, for large times, the cumulative distribution of return times is also a power law, Eq. (8), and the exponent is predicted to be a=\+b.
(11)
From Eq. (11) it follows that the cumulative distribution of return times for a random walk without bias (b = 0) decreases as a power law with an exponent a = 1/2. Note that introducing a restoring force of the form f(x) = —b/x1 with 7 ^ 1 , yields stretched exponential distributions 51 , so 7 = 1 is the only case yielding a power-law distribution. Similarly, the distribution of durations of the sleep state is identical to the distribution of return times of a random walk in a space with a reflecting boundary. Hence P(t) has an exponential distribution, Eq. (9), in the large time region, with the characteristic time r predicted to be r ~ A2 .
(12)
Equations (11) and (12) indicate that the values of a and r in the data can be reproduced in our model by "tuning" the threshold A and the bias b (Fig. 10). The decrease of the characteristic duration of the sleep state as the night proceeds is consistent with the possibility that A decreases (Fig. 9. Our calculations suggest that A decreases from 7.9 ± 0.2 in the first hours of sleep, to 6.6 ± 0.2 in the middle hours, and then to 5.5 ± 0.2 for the final hours of sleep. Accordingly, the number of wake periods of the model increases by a factor of 1.3 from the first two hours to the last two hours, consistent with the data. However, the apparent consistency of the power-law exponent for the wake state suggests that the bias b may remain approximately constant during the night. Our best estimate is b = 0.8 ± 0.1.
45
C. Wake Sleep Wake
Data
1 II 11 1II 1
Moclei
11
Sleep I
100
.
If
1 I
.
200
I
.
300
400
Time (min)
Figure 9. Schematic representation of the dynamics of the model. T h e model can be viewed as a random walk in a potential well illustrated in (a), where he bottom flat region between —A < x < 0 corresponds to the area without field, and the region x > 0 corresponds to the area with logrithmic potential, (b) The state x(t) of the sleep-wake system evolves as a random walk with the convention that x > 0 corresponds to wake state and —A < x < 0 corresponds to the sleep state, where A gradually changes with time to account for the decrease of the characteristic duration of the sleep state with progression of the night. In the wake state there is a "restoring force," f(x) = —b/x, "pulling" the system towards the sleep state. The lower panel in (b) illustrates sleep-wake transitions from the model, (c) Comparison of typical data and of a typical output of the model. The visual similarity between the two records is confirmed by quantitative analysis (Fig. 10).
To further test the validity of our assumptions, we examine the correlation between the durations of consecutive states. Consider the sequence of sleep and wake durations { 5i W\ Si Wi....Sn Wn }, where Sn indicates the duration of n-th sleep period and Wn indicates the duration of n-th wake period (Fig. 9b). Our model predicts that there are no autocorrelations in the series Sn and Wn, as well as no cross-correlations between series Sn and Wn, the reason being that the uncorrelated random walk carries no information about previous steps. The experimental data confirms these predictions,
46
Duration (min)
Duration (min)
Figure 10. Comparison of P(t) for data and model (two runs with same parameters), (a) P(t) of the wake state, (b) P(t) of the sleep state. Note that the choice of A depends on the choice of the time unit of the step in the model. We choose the time unit to be 30 seconds, which corresponds to the time resolution of the data. To avoid big jumps in x(t) due to the singularity of the force when x(t) approaches x = 0, we introduce a small constant A in the definition of the restoring force f(x) — —b/(x + A). We find that the value of A does not change a or T.
within statistical uncertainties. 2.4
Conclusions
Our findings of a power-law distribution for wake periods and an exponential distribution for sleep periods are intriguing because the same sleep-control mechanisms give rise to two completely different types of dynamics—one without a characteristic scale and the other with. Our model suggests that the difference in the dynamics of the sleep and wake states (e.g. power law versus exponential) arises from the distinct number of microstates that can be explored by the sleep-wake system for these two states. During the sleep state, the system is confined in the region —A < x < 0. The parameter A imposes a scale which causes an exponential distribution of durations. In contrast, for the wake state the system can explore the entire half-plane x > 0. The lack of constraints leads to a scale-free power-law distribution of durations. In addition, the l/x restoring force in the wake state does not change the functional form of the distribution, but its magnitude determines the power-law exponent of the distribution (see Eq. (11)). Although in our model the sleep-wake system can explore the entire halfplane x > 0 during wake periods, the "real" biological system is unlikely to generate very large value (i.e., extreme long wake durations). There must be
47
a constraint or boundary in the wake state at a certain value of a:. If such a constraint or boundary exists, we will find a cut-off with exponential tail in the distribution of durations of the wake state. More data are needed to test this hypothesis. Our additional finding of a stable power-law behavior for wake periods for all portions of the night implies that the mechanism generating the restoring force in the wake state is not affected in a measurable way by the mechanism controlling the changes in the durations of the sleep state. We hypothesize that even though the power-law behavior does not change in the course of the night for healthy individuals, it may change under pharmacological influences or under different conditions, such as stress or depression. Thus, our results may also be useful for testing these effects on the statistical properties of the wake state and the sleep state. 3
Summary
We show that a stochastic approach based on general phenomenological considerations can successfully account for a variety of scaling and statistical features in complex physiological processes where interaction between many elements is typical. We propose a "common framework" to describe diverse physiological mechanims such as heart rate control and sleep-wake regulation. In particular, in the context of cardiac dynamics we find that the generic process of a random walk biased by attracting fields, which are often functions of time, can generate the long-range power-law correlations, and the form and stability of the probability distribution observed in heartbeat data. A process based on the same concept, in the context of sleep-wake dynamics, generates complex behavior which accounts both for the scale-free power-law distribution of the wake periods, and for the scale-dependent exponential distribution of the sleep periods. Further studies are needed to establish the extent to which such approaches can be used to elucidate mechanisms of physiologic control. Acknowledgments We are greatful to many individuals, including L.A.N. Amaral, A.L. Goldberger, S. Havlin, T. Penzel, J.-H. Peter, H.E. Stanley for major contributions to the results reviewed here, which represent a collaborative research effort. We also thank A. Arneodo, Y. Ashkenazy, A. Bunde, I. Grosse, H. Herzel, J.W. Kantelhardt, J. Kurths, C.-K. Peng, M.G. Rosenblum, and B.J. West for valuable discussions. This work was supported by NIH/National Center
48
for Research Resources (P41 RR13622), NSF, NASA, and The G. Harold and Leila Y. Mathers Charitable Foundation. References 1. 2. 3. 4. 5. 6. 7.
8. 9. 10. 11. 12.
13. 14. 15. 16. 17. 18.
19.
20.
C. Bernard, Les Phenomenes de la Vie (Paris, 1878) B. van der Pol and J. van der Mark, Phil. Mag 6, 763 (1928). W. B. Cannon, Physiol. Rev. 9, 399 (1929). B. W. Hyndman, Kybernetik 15, 227 (1974). S. Akselrod et al, Science 213, 220 (1981). M. Kobayashi and T. Musha, IEEE Trans, of BME 29, 456 (1982). M. F. Shlesinger, Ann. NY Acad. Sci. 504, 214 (1987); M. F. Shlesinger and B. J. West, in Random Fluctuations and Pattern Growth: Experiments and Models (Kluwer Academic Publishers, Boston, 1988). M. Malik and A. J. Camm, Eds. Heart Rate Variability (Futura, Armonk NY, 1995). J. Kurths et al, Chaos 5, 88 (1995). G. Sugihara et al., Proc. Natl. Acad. Sci. USA 93, 2608 (1996). R. deBoer et al, Am. J. Physiol. 253, H680 (1987). M. Mackey and L. Glass, Science 197, 287 (1977); L. Glass and M. Mackey, From Clocks to Chaos: The Rhythms of Life (Princeton Univ. Press, Princeton, 1981); L. Glass et al, Math. Biosci. 90, 111 (1988); L. Glass and C. P. Malta, J. Theor. Biol. 145, 217 (1990); L. Glass, P. Hunter, A. McCulloch, Eds. Theory of Heart (Springer Verlag, New York, 1991). M. G. Rosenblum and J. Kurths, Physica A 215, 439 (1995). H. Seidel and H. Herzel, in Modelling the Dynamics of Biological Systems E. Mosekilde and O. G. Mouritsen, Eds. (Springer-Verlag, Berlin, 1995). J. P. Zbilut et al, Biological Cybernetics 75, 277 (1996). J. K. Kanters et al, J. Cardiovasc. Electrophysiology 5, 591 (1994). G. LePape et al, J. Theor. Biol. 184, 123 (1997). E. W. Montroll and M. F. Shlesinger, in Nonequilibrium Phenomena II: From Stochastics to Hydrodynamics, L. J. Lebowitz and E. W. Montroll Eds. (North-Holland, Amsterdam, 1984), pp. 1-121. N. Wax, Ed. Selected Papers on Noise and Stochastic Processes (Dover Publications Inc., New-York, 1954); G. H. Weiss, Aspects and Applications of the Random Walk (Elsevier Science B.V., North-Holland, NewYork, 1994) R. M. Berne and M. N. Levy, Cardiovascular Physiology 6th ed. (C.V. Mosby Company, St. Louis, 1996).
49
21. M. N. Levy, Circ. Res. 29, 437 (1971). 22. G. Jokkel et al, J. Auton. Nerv. Syst. 5 1 , 85 (1995). 23. MIT-BIH Polysomnographic Database CD-ROM, second edition (MITBIH Database Distribution, Cambridge, 1992) 24. C. -K. Peng et al, Phys. Rev. Lett. 70, 1343 (1993); J. M. Hausdorff and C. -K. Peng, Phys. Rev. E 54, 2154 (1996). 25. A. Grossmann and J. Morlet, Mathematics and Physics: Lectures on Recent Results (World Scientific, Singapore, 1985); I. Daubechies, Comm. Pure and Appl. Math. 4 1 , 909 (1988). 26. J. F. Muzy et al., Int. J. Bifurc. Chaos 4, 245 (1994); A. Arneodo et al., Physica D 96, 291 (1996). 27. P. Ch. Ivanov et al., Nature 383, 323 (1996). 28. P. Ch. Ivanov et al, Physica A 249, 587 (1998). 29. R. L. Stratonovich, Topics in the Theory of Random Noise (Gordon and Breach, New York, 1981). 30. J. B. Bassingthwaighte, L. S. Liebovitch, B. J. West, Fractal Physiology (Oxford Univ. Press, New York, 1994). 31. P. Ch. Ivanov et al., Europhys. Lett. 43, 363 (1998). 32. Y. Ashkenazy et al, Phys. Rev. Lett 86, 1900 (2001). 33. P. Ch. Ivanov et al, Nature 399, 461 (1999). 34. P. Bernaola-Galvan et al., Phys. Rev. Lett 87, 168105(4) (2001). 35. H. E. Stanley, Introduction to Phase Transitions and Critical Phenomena (Oxford University Press, London, 1971). 36. M. Chicurel, Nature 407, 554 (2000). 37. D. Mcginty and R. Szymusiak, Nature Med. 6, 510 (2000). 38. J. H. Benington, Sleep 23, 959 (2000). 39. T. Gallopin et al, Nature 404, 992 (2000). 40. P. Ch. Ivanov et al, Europhys. Lett. 48, 594 (1999). 41. A. Bunde et al, Phys. Rev. Lett. 85, 3736 (2000). 42. J. Born et al, Nature 397, 29 (1999). 43. M. A. Carskadon and W. C. Dement, Principles and Practice of Sleep Medicine (WB Saunders Co, Philadelphia) 2000, pp. 15-25. 44. A. A. Borbely and P. Achermann, J. Bio. Rhythm. 14, 557 (1999). 45. M. Nakao et al, J. Biol. Rhythm. 14, 547 (1999). 46. D. -J. Dijk and R. E. Kronauer, J. Bio. Rhythm. 14, 569 (1999). 47. C.-C. Lo et al, pre-print: cond-mat/0112280; Europhys. Lett. , (2002) in press. 48. S. Zapperi et al, Phys. Rev. B 58, 6353 (1998). 49. S. Havlin et al, J. Phys. A 18, 1043 (1985). 50. D. Ben-Avraham and S. Havlin, Diffusion and Reactions in Fractals and
50
Disordered Systems (Cambridge Univ. Press, Cambridge) 2000. 51. J. A. Bray, Phys. Rev. E 62, 103 (2000). 52. A. Rechtschaffen and A. Kales, A Manual of Standardized Terminology, Techniques, and Scoring System for Sleep Stages of Human Subjects (Calif: BIS/BRI, Univ. of California, Los Angeles) 1968. 53. R. Williams et al, Electroen. Clin. Neuro. 17, 376 (1964). 54. V. Brezinova, Electroen. Clin. Neuro. 39, 273 (1975). 55. B. Kemp and H. A. C. Kamphuisen, J. Biol. Rhythm. 9, 405 (1986).
51
CHAOTIC PARAMETERS IN TIME SERIES OF ECG, RESPIRATORY MOVEMENTS AND ARTERIAL PRESSURE E. CONTE, A. F E D E R I C I 0 Department
of Pharmacology
and Human Physiology-University
70100 Bari-Italy. '^Center ofInnovative
ofBari, P.zza G. Cesare,
Technologies for Signal Detection and
Processing,
Bari, Italy. E-mail: fisio2@fisiol. uniba. it Correlation Dimension, Lyapunov exponents, Kolmogorov entropy were calculated by analysis of ECG, respiratory movements and arterial pressure of normal subjects in spontaneous and forced conditions of respiration. We considered the cardiovascular system arranged by a model of five oscillators having variable coupling strengths, and we found that such system exhibits chaotic activity as well as its components. In particular, we obtained that respiration resolves itself in a non linear input into heart dynamics, thus explaining that it is a source of chaotic non linearity in heart rate variability.
1. Introduction A recent, relevant paradigm is that, due to the complexity of biological matter, chaos theory should represent a reasonable formulation of living systems. Chaotic behaviour should be dominant and non chaotic states should correspond more to pathological than normal states. Fundamental results and theoretical reasons sustain the relevant role of chaos theory in explaining the mechanisms of living matter. This is so because many physiological systems may be represented by the action of coupled biological oscillators. It has been evidenced 4 that, under suitable conditions, such stimulated and coupled oscillators generate chaotic activity. We retain that, in different physiological conditions, a stronger or lower coupling among such oscillators takes place, determining a modification in the control parameters of the system, with enhancement or reduction of the chaotic behaviour of an oscillator respect to the other mutually coupled. Such dynamical modification will be resolved and observed by a corresponding modification of the values of the chaotic parameters (i.e. Lyapunov exponents) usually employed in the analysis of experimental time series. Recent studies' of the cardiovascular system emphasize the oscillatory nature of the processes happening within this system. Circulatory system is represented by the heart, and the systemic and pulmonary vessels. To regulate vessels resistance, myogenic activity operates to contract the vessels in response to a variation of intra-
52
vascular pressure. This generates a rhythmic activity related to periodicity in signals of blood pressure5 and of blood flow7'9. Neural system also realizes the activity of the autonomic nervous system and it is over-imposed to the rhythmic activity of pacemaker cells. Rhythmic regulation of vessels resistance is also realized by the activity of metabolic substances in the blood. In conclusion, the dynamics of blood flow in its passage through the cardiovascular system, is governed by five oscillators, the heart, the lungs, myogenic-, neural-, metabolic-activities. We may consider this system to be a spatially distributed physical system constituted by five oscillators. Each oscillator exhibits autonomous oscillations but positive and negative feedback loops take place so that the continuous regulation of blood circulation is realized through the coherent activity of such mutually coupled oscillators. This is the model that we employ in the present study. We have all the elements to expect such system to be a non linear and complex system. So, we arrive to the central scope of the present work. We intend to ascertain the following points: using the methods of non linear analysis, we intend to establish if cardiovascular system exhibits chaotic activity as well as its components; we aim to ascertain also if the model of five oscillators is or not supported by our analysis, and, in particular, if we may arrive to the final conclusion that respiration resolves itself as a non linear input from respiration into heart dynamics of the cardiovascular oscillator. It is well known the importance to give a definitive and rigorous answer to this last problem. Let us specify in more detail the nature of our objective. Analyzing data regarding ECG time series, several authors2'6 obtained results indicating that the normal sinus rhythm in ECG must be ascribed to actual low-dimensional chaos. By the same kind of analysis, it was also obtained evidence for inherent non linear dynamics and chaotic determinism in time series of consecutive R-R intervals. The physiological origins for such chaotic non linearity are unknown. The purpose of our study was to establish whether a non linear input from spontaneous respiration to heart exists and if it may be considered one of the sources for the chaotic non linearity in heart rate variability. 2. Methods We measured signals of ECG, respiratory movements, arterial pressure in six normal nonsmoking subjects in normal (NR) and forced (FR) conditions of respiration, respectively. The condition FR was obtained by asking the subjects to perform inspiratory acts with a 5 s periodicity, at a given signal. The signal for expiration was given 2 s after every inspiration. The measured ECG signals were
53
sampled at 500 Hz for 300 s. Signals vs time tracings for respiration, ECG, Doppler, and R-R intervals are given in Fig.l for the subject #13-07. Peak to peak values were considered for time series. Programs for noise reduction were utilized in order to use noise reduced time series data only. In order to follow variability in time of the collected data, the obtained time series were re-sampled in five intervals (subseries), each interval containing 30.000 points. All the data were currently analyzed by the methods of non linear prediction and of surrogate data. Correlation dimension, Lyapunov spectrum, Kolmogorov entropy were estimated after determination of time delay T by auto-correlation and mutual information. Embedding dimension in phase space was established by the method of False Nearest Neighbors (FNN) (for chaotic analysis, see, as example, ref. 3, 8). 3. The Results The main results of chaotic analysis are reported in Table 1. For cardiac oscillations time delays resulted ranging from 14 to 60 msec both in the two cases of subjects in NR and FR. Embedding dimension in phase space resulted to be d = 4, thus establishing that we need four degrees of freedom to correctly describe heart dynamics. Correlation dimension, D2, established with saturation in a D 2 -d plot, resulted to be a very stable value during the selected intervals of experimentation. It assumed values ranging from 3,609 ± 0,257 to 3,714 + 0,246 in the case of five intervals for normal subjects in NR, and D2-values ranging from 3,735 ± 0,228 to 3,761 ± 0,232 in the case of subjects in FR. On the basis of such results, we concluded that normal cardiac oscillations as well as cardiac oscillations of subjects under FR follow deterministic dynamics of chaotic nature. Soon after we estimated Lyapunov exponents: ^-i and X2 resulted to be positive; X3 and X4 assumed negative value; the sum of all the calculated exponents resulted to be negative as required for dissipative systems. We concluded that cardiac oscillations of normal subjects under NR and FR, represent an hyper-chaotic dynamics. X] and X2 positive exponents in Table 1 represent the rates of divergence of the attractor in the directions of maximum expansion. These are the direction in which the cardiac oscillating system realizes chaoticity. ^,3 and ^.4 negative values in Table 1 represent the rates of convergence of the attractor in the contracting directions. The emerging picture is that cardiac oscillations, as measured by ECG in normal subjects, are representative of a large ability of heart to continuously cope with rapid changes corresponding to the high values of its chaoticity. Looking in Table 1 at the calculated values of X{ and X2 along the five different time intervals that we have analysed, we deduce that such values remained substantially stable in different intervals. Thus, we may conclude that, due to the constant action of the oscillators defined in our model, in
54
the NR and FR conditions, heart chaotic dynamics remains substantially stable in time. The same tendency was confirmed examining the results obtained for Kolmogorov entropy, K, (see Table 1) and characterizing the overall chaoticity of the system. Thus, we may arrive to establish the first conclusion of the present paper: heart dynamics exhibits chaoticity, and this remains substantially stable in time for normal subjects in NR and FR. However, we have to respond here also to the question to ascertain if respiration resolves itself in a non linear input given from the respiratory system into the heart dynamics of cardiac oscillator. To this regard we must remember that, as previously explained in the introduction, the estimation of Lyapunov exponents, and, in particular, of positive Lyapunov exponents must be considered, in chaotic analysis of physiological systems, a sign to confirm the presence of a physiological mechanism of control acting through a modification of the control parameters of the considered system via a stronger or lower coupling between the oscillators assumed to act in the system. According to this thesis, an existing non linear input from respiration into the heart dynamics of the cardiovascular system should be actually realised through a modification of the control parameters of the system via a modification of the coupling strength between two considered oscillators, and it would be evidenced through evident modifications of positive and negative Lyapunov exponents in the two cases of NR and FR. In fact, for A.! values we obtained, in five normal subjects, an increase, in FR with respect to NR, varying from 6% to about 36%. For X2, the same increase was more relevant, varying from about 12% to about 61%. Also the corresponding negative values, A,3 and X4, were increased in FR with respect to NR. Only in one subject a decrease was obtained in the values of Lyapunov exponents after passing from NR to FR. Also in this case, appreciable percent differences were observed. In conclusion, the substantial differences in Lyapunov exponent values in NR with respect to FR, is a result of the present work. Increasing values of positive Lyapunov exponent reveal an increasing degree of chaoticity. Decreasing values reveal, instead, decreasing chaoticity. Increased and (in only one case) decreased values of positive Lyapunov exponents that we found in FR with respect to NR, indicate that in the first condition we had an increasing (in only one case decreasing) chaoticity, and this establishes that respiration acts as a non linear input of the respiration on the cardiovascular oscillator. According to our model, such non linear input from respiration to cardiovascular oscillator, resolves itself in a greater (or lower) coupling strength between such two considered oscillators. Obviously, in order to confirm this conclusion, we need to emphasize that also respiration is characterised by chaotic dynamics. We so performed chaos analysis of respiration movements time series data obtained from the previously considered normal
55
subjects, and the analysis was executed following the same previous methodological criteria. Time delays resulted to vary from 4 to 76 msec, embedding dimension in phase space resulted to be d = 3. As previously said, such dimension reflects the number of degrees of freedom necessary for description of respiratory system. We deduced from d = 3 that it is necessary to consider the action of three possible oscillators determining the behaviour of such system. Mean value of correlation dimension, D2, resulted to be 2,740 ± 0,390 in the case of NR in the first interval of investigation. A rather stable mean value, ranging from D 2 = 2,579 + 0,340 to D2 = 2,665 ± 0,346, was also obtained in the case of the four remaining intervals. We concluded that respiratory system of the examined normal subjects exhibits chaotic determinism. As expected, during FR, we obtained a reduction of the mean values of correlation dimension respect to NR. The mean value resulted to be D2 = 2,414 ± 0,417 in the first interval and varied between D2 = 2,339 ± 0,314 and D2 = 2,389 ± 0,383 in the remaining four intervals whit a decreasing percent about 10-12 % respect to NR. Then, we had reduction of chaotic dynamics of respiration during FR respect to NR physiological condition. A clear discrimination of such two conditions was also obtained by calculation of the dominant Lyapunov exponent. We obtained a mean value of such parameter, XD, XD = 0,028 ± 0,023 in the case of NR and A.D = 0,009 ± 0,004 in the case of FR in the first interval of experimentation whit a percent decreasing in the case of FR about 68%. Evident discrimination was also obtained in the other four intervals, in the second interval it resulted A,D = 0,029 ± 0,020 for NR and XD = 0,012 ± 0,004 for FR (decreasing percent about 59%), in the third interval it resulted ^ D = 0,030 ± 0,022 for NR against XD = 0,008 ± 0,003 for FR (decreasing percent about 73%), in the fourth interval we had ^,D = 0,026 ± 0,022 for NR against XD = 0,009 ± 0,004 for FR (decreasing percent about 65%), and in the fifth interval XD = 0,022 + 0,020 for NR and A.D = 0,011 ± 0,008 for FR (decreasing percent about 50%). In conclusion, we had a great stability of the dominant Lyapunov exponents calculated along the intervals of experimentation and both in the two cases of NR and FR, while instead we determined a decreasing percent of values in the case of FR respect to NR. These results indicated that respiratory systems exhibit chaotic dynamics. Without any doubt, chaoticity resulted strongly reduced during FR respect to NR. This results clearly supports our thesis, based on the model of five oscillators: during forced respiration, we have a reduction of chaoticity of respiratory system respect to spontaneous respiration, to such reduction it corresponds instead an increasing of chaoticity of cardiac oscillations in consequence of a greater non linear input from the respiratory system to heart dynamics. In other terms, a stronger coupling between the two oscillators is realized
56
and it is resolved in an enhancement of cardiac chaoticity correlated to a simultaneous reduction of chaoticity of the respiratory oscillatory system. The final aim was to test for the possible chaotic dynamics of blood pressure. We analyzed arterial pressure time series data following the same previous methodology. Time delay resulted to be about 2 msec. Embedding dimension in phase space resulted to be d = 5. We retain that this is the best our result to confirm the correctness of the model of the cardiovascular system based on five oscillators. Blood pressure signal reflects the action of the five oscillators that we considered, and, in fact, our calculated embedding dimension resulted to be just d = 5. Calculation of correlation dimension, D2, again gave rather stable values along the five intervals of experimentation and resulted to vary between 3,661 and 3,924 in the case of NR, and between 3,433 and 3,910 in the case of FR. Thus, we may conclude that blood pressure signal indicates to have the behaviour of a very chaotic deterministic system, as confirmed also from Kolmogorov entropy values. The calculation of Lyapunov exponents are given in Table 1, and they confirm that blood pressure is a deterministic hyper-chaotic dissipative system. The values of the exponents Xu X2, X4, a n d A.5 resulted to be very stable along the five intervals of experimentation with an evident their similarity also in the two conditions of experimentation. Considering the model of five oscillators, we may say that we have constant non linear inputs acting from the oscillators that determine such constant level of chaoticity. X3 showed instead a very great variability along the five considered intervals as well as in comparison of NR respect to FR. The variability of A.3 happened with three characteristic times that resulted to be respectively about 3-4 seconds, about 10 seconds and finally about 20-30 seconds corresponding to 0,3-0,4 Hz, to 0,1-0,2 Hz and to 0,04-0,06 Hz. We concluded that the first frequency should be due to the action of the respiratory oscillator, and the two other remaining frequencies should correspond to the action of the myogenic and neural (baroreceptors) oscillators.
57 Table 1 CHAOS
ANALYSIS
LYAPUNOV
OF
R-R
OSCILLATIONS.
SPECTRUM
Kolmogorov entropy
h
h
X2
^2
^3
^3
/>4
K
K
K
NR
FR
NR
FR
NR
FR
NR
FR
NR
FR
Five intervals of data m.v.
0,271 0,343 0,089 0,119 -0,135 -0,133 -0,548 -0,639 0,360 0,462
s.d.
0,063 0,075 0,043 0,036
m.v.
0,278 0.349 0,092 0,138 -0,124 -0,119 -0,537 -0,606 0,370 0,488
s.d.
0,086 0,127 0,059 0,071
m.v.
0,262 0,346 0,091 0,110 -0,121 -0,140 -0,534 -0,578 0,353 0,456
s.d.
0,092 0,148 0,060 0,054
m.v.
0,276 0,307 0,089 0,094 -0,114 -0,113 -0,522 -0,565 0,365 0,401
s.d. m. v.
0,096 0,018 0,064 0,029 0,021 0,011 0,118 0,078 0,160 0,046 0,279 0,289 0,090 0,095 -0,121 -0,125 -0,526 -0,572 0,369 0,384
s.d.
0,093 0,074 0,062 0,065
0,022
0,026
0,029
0,023
0,020
0,020
0,032
0,098
0,099
0,090
0,025
0,130
Table 2 CHAOS ANALYSIS OF BLOOD PRESSURE SIGNAL LYAPUNOV
h
x2
SPECTRUM ^3
A,4
^5
Kolmogorov Entropy
Five intervals
NR
NR
NR
NR
NR
NR
of data m.v.
0,557 0,250
0,032
-0,213 -0,705
0,838
s.d.
0,036 0,019
0,023
0,043
0,033
0,033
m.v.
0,561 0,251
0,012
-0,215 -0,687
0,824
s.d.
0,025 0,004
0,006
0,030
0,025
0,015
m.v.
0,577 0,252
0,040
-0,232 -0,695
0,868
s.d.
0,027 0,011
0,013
0,015
0,008
0,051
m. v.
0,553 0,259
0,018
-0,220 -0,704
0,829
s.d. m. v.
0,016 0,006 0,570 0,246
0,009 0,009 0,012 -0,249
s.d.
0,011 0,002
0,006
0,005
0,006 -0.706
0,018 0,827
0,018
0,018
0,118
0,121
0,085
0,105 0,106
0,144 0,196
0,152 0,198
0,135 0,154 0,139
58 Fig.l
la-ii-OI DV WON'I KESP
Li
1 !
i « V-
•v.' \ X X\f V\J \ \ \ \ M V \ r \ N v \ V \ \ W M V- > H*-»
Acknowledgements Authors wish to thank Ms Anna Maria Papagni for her technical assistance. References 1. 2. 3.
Akselrod S, Gordon D, Ubel FA, Shannon DS, Borger AC, Choen RJ. Science 1981;213:220-225 Babloyantz A, Destexhe A. Is the normal heart a periodic oscillator. Biological Cybernetics 1988; 58: 203-211 Badii R, Politi A. Dimensions and Entropies in Chaotic Systems. Springer Berlin, 1986
59 4.
5. 6. 7.
8. 9.
Guevara MR, Glass L, Shrief A. Phase-locking, period-doubling bifurcations, and irregular dynamics in periodically stimulated cardiac cells. Science 1981; 214: 1350-1353 Kitney RI, Fulton T, Mc Donald AH, Likens DA. J. Biomed. Eng. 1985; 7: 217-225 Kitney RS, Rompelman O. The study of heart rate variability. Oxford University Press, 1980 Madwed JB, Albrecht P, Mark RG, Cohen RJ. Low-frequency oscillations in arterial pressure and heart rate: a simple computer model. Am. J. Phys. 1989; 256: H1573-H1579 Schreiber T. Interdisciplinary application of non linear time series methods. Physics Report 1999; 308: 1-64 Streen MD. Nature 1975; 254: 56-58
60
COMPUTER ANALYSIS OF ACOUSTIC RESPIRATORY SIGNALS A. VENA, G.M. INSOLERA, R. GIULIANI**', T. FIORE Department of Emergency and Transplantation, Bari University, Policlinico Hospital, Piazza Giulio Cesare, 11, 70124 Bari- Italy. (*)Center of Innovative Technologies for Signal Detection and Processing, Bari, Italy. e-mail: antonvenaCalvahoo.com G. PERCHIAZZI Department of Clinical Physiology, Uppsala University Hospital, S-75185 Uppsala,
Sweden
Evaluation of breath sounds is a basic step of patient physical examination. The auscultation of the respiratory system gives direct information about the structure and function of lung tissue that is not to be achieved with any other simple and non-invasive method. Recently, the application of computer technology and new mathematical techniques has supplied alternative methodologies to respiratory sounds analysis. We present a new computerized approach to analyzing respiratory sounds.
1
Introduction
Acoustic respiratory signals have been the subject of considerable research over the last years, however their origin is still not completely known. It is now generally accepted that, during respiration, turbolent motion of a compressive fluid in larger airways with rugous walls (trachea ad bronchi) generates acoustic energy [5]. This energy is transmitted through the airways and lung parenchima to the chest wall that represent a non-stationary system. [1,4]. Pulmonary diseases induce anatomical and functional alterations in the respiratory system; changes in the quality of lung sounds (loudness, length and frequency) are often directly correlated to pathological changes in the lung. The traditional method of auscultation is based on a stethoscope and human auditory system; however, due to the poor response of the human auditory system to lung sounds (low frequency and low sign-to-noise ratio) and the subjective character of the technique, it is common to find different clinical description of the respiratory sounds. Lung-sounds nomenclature has long been unclear: until last decades were used the names derived from originals given by Laennec [10] and translated into english by Forbes [2]. In 1985, the International Lung Sounds Association (I.L.S.A.) has composed an international standard classification of lung sounds that include fine and coarse crackles, wheezes and rhonchi: each of these terms can be described acoustically [13].
61
The application of computer technology and recent advancements in signal processing have provided new insights into acoustic mechanisms and supplied new measurements of clinical importance from respiratory sounds. Aim of this study is to develop a system for acquisition and elaboration of respiratory acoustic signals: this would provide an effective, non-invasive and objective support for diagnosis and monitoring of respiratory disorders. 2
Respiratory Sounds
Lung sounds in general are classified into three major categories: "normal" (vesicular, bronchial and bronchovesicular breath sounds), "abnormal" and "adventitious" lung sounds. Vesicular breath sounds consist of a quiet and soft inspiratory phase followed by a short, almost silent expiratory phase. They are low pitched and normally heard over most lung fields of a healthy subject. These sounds are not generated by gas flow moving through the alveoli (vesicles) but are the result of attenuation of breath sound produced in the larger bronchi. Bronchial breath sounds are normally heard over the trachea and reflect turbulent airflow in the main-stem bronchi. They are loud, high-pitched and the expiratory phase is generally longer than the inspiratory phase with a tipical pause between the phases. Bronchial sounds heard over the thorax suggest lung consolidation and pulmonary disease. Bronchovesicular breath sounds are normally heard on both sides of the sternum in the first and second intercostal spaces. They should be quieter than the bronchial breath sounds and increased intensity of these sounds is often associated with increased ventilation. Abnormal lung sounds include the decrease/absence of normal lung sounds or their presence in areas where they are normally not heard (bronchial breath sounds in peripheral areas where only vesicular sounds should be heard). This is characteristic of parenchima consolidation (pneumonia) that transmit sound from the lung bronchi much more efficiently than through the air-filled alveoli of the normal lung. The term "adventitious" (adventitious lung sounds) refers to extra or additional sounds that are heard over normal lung sounds and their presence always indicates a pulmonary disease. These sounds are classified into discontinuous (crackles) or continuous (wheezes) adventitious sounds. Crackles are discontinuous, intermittent and nonmusical noises that may be classified as "fine" (high pitched, low amplitude, and short in duration) and "coarse" (low pitched, higher in amplitude, and long in duration). Crackles are generated by fluid in the small airways or by sudden opening of closed airways. Their presence is often associated with inflammation or infection of the small bronchi, bronchioles, and alveoli, with pulmonary fibrosis, with heart failure and many other cardiorespiratory disorders. Wheezes are continuous (since their duration is much longer than that of crackles), lower-pitched and musical breath sounds, which are superimposed on the normal lung sounds. They originate by air moving through small airways narrowed by constriction or
62
swelling of airway or partial airway obstruction. They are often heard (during expiration, or during both inspiration and expiration) in patients with asthma or other obstructive diseases. Other respiratory sounds are: rhonchi (continuous sounds that indicate partial obstruction by thick mucous in the bronchial lumen, oedema, spasm or a local lesion of the bronchial wall); stridor (high-pitched harsh sound heard during inspiration and caused by obstruction of the upper airway); snoring (acoustical signals produced by a constriction in the upper airway, usually during sleep) and pleural rubs (low-pitched sounds that occur when inflamed pleural surfaces rub together during respiration). 3
Review of literature
Many studies focused on the acoustic properties of normal lung sounds in healthy subjects [6,7] and their changes with airflow [8]. In 1996 Pasterkamp et al., by using the Fast Fourier Transform (FFT), analysed and described the lung sound spectra in normal infants, children, and adults [14]. At the end of the 1980's, normal and pathological lung sounds were displaied and studied in the time and frequency domain [15]. Various works investigated the characteristics of crackles due to asthma, chronic obstructive pulmonary diseases (COPD), heart failure, pulmonary fibrosis, and pneumonia [16,12,9]. In 1992 Pasterkamp and Sanchez indicated the significance of tracheal sounds analysis in upper airway obstructions [17]. Malmberg et al. analysed changes in frequency spectra of breath sounds during histamine challange test in adult asthmatics subjects [11]. In the last years, the application of Wavelet Transform has demonstrated the possibility of elaborating properly non-stationary signals (such as crackles); by comparing the ability of Fourier and Wavelet based techniques to resolve both discrete and continuous sounds, many studies concluded that the wavelet-based methods had the potential to effectively process and display both continuous and discrete lung sounds [3]. 4
Signal acquisition and processing methods
Lung sounds transmitted through the respiratory system can be acquired by an equipment able to convert the acoustic energy to electrical signal. Then, the next elaboration phase, by using of specific mathematical transformations, returns a sequence of data that allows to study the features of each signal. In this study, respiratory sounds were picked up over the chest wall of normal and abnormal subjects by an electronic stethoscope (Electromag Stethoscope ES120, Japan).
63
The sensor was placed over the bronchial regions of the anterior chest (second intercostal space on the mid clavicular line), the vesicular regions of the posterior chest (apex and base of lung fields, bilaterally) and the trachea at the lower part of the neck, 1-2 cm to the right of the medline. Sounds were amplified, low-pass filtered and recorded in digital format (Sony Minidisc MZ-37, Japan) using a rate of sampling of 44.1 Khz. and 16 bit quantization. The signal was transferred to a computer (Intel Pentium 500 Mhz. Intel Corp., Santa Clara, CA, USA) and then analyzed by a specific Fourier Transform based spectral analysis software (CoolEdit pro 1.0, Syntrillium Software Corp., Phoenix, USA). Because of the clinical necessity to correlate the acoustic phenomenon to the phases of the human respiration, a method of analysis dedicated to the time/frequency plane was applied: the STFT (Short Time Fourier Transform). It provided "spectrograms" related to different respiratory acoustic patterns that, according to the intensity and frequency changes in the time domain, were analyzed offline. The spectrogram shows, in a three-dimensional coordinate system, the acoustic energy of a signal versus time and frequency. We studied normal breath sounds (vesicular and tracheal) from healthy subjects without pulmonary diseases and adventitious lung sounds (crackles and wheezes) from patients with pneumonia and COPD (chronic obstructive pulmonary disease), spontaneously breathing. The signals were examinated for artifacts (generally emanating from defective contact between sensor and chest wall or background noise) and contaminated segments were excluded from further analysis. 5
Results
Normal breath sounds (vesicular and tracheal) showed typical spectra with a frequency content extending up to 700 Hz {vesicular sounds) and 1600 Hz {tracheal sound). Generally, at a frequency below 75-100 Hz there are artefacts from the heart and the muscle sounds. Inspiratory amplitude was higher than expiratory amplitude for vesicular sounds and lower than expiratory amplitude for tracheal sounds (fig. 1, fig. 2).
Fig. 1 Vesicular sound
Fig. 2 Tracheal sound
64
Discontinuous adventitious sounds (crackles) appeared as non-stationary explosive end-inspiratory noise with a frequency content extending beyond 1000 Hz; their duration was less than 200 msec (fig. 3). Continuous adventitious sounds (wheezes) appeared as expiratory spectral densities, harmonically related, at 300 Hz, 600 Hz and 1200 Hz; their duration was longer than 200 msec. (fig. 4).
Fig. 3 Crackles
6
Fig. 4 Wheezes
Conclusions
In this study significant changes in averaged frequency spectra of breath sounds were demonstrated passing from healthy to sick lungs. Moreover, this processing method was able to classify abnormal patterns into different pathology-related subgroups. Implementation of this technology on a breath-to-breath basis, will provide an useful tool for continuous bed-side monitoring by a computerized auscultation device which can record, process and display the respiratory sound signals with sophisticated visualization techniques. Future perspectives for respiratory sound reasearch include the building of miniaturized systems for non-invasive and real-time monitoring; the application of multimicrophone analysis to evaluate regional distribution of ventilation; respiratory sounds databases; remote diagnosis systems; automatic recognition systems of acoustic respiratory patterns by artificial neural networks.
65
7
References 1. Cohen A. Signal Processing Methods for Upper Airways and Pulmonary Dysfunction Diagnosis. IEEE Engineering In Medicine And Biology Magazine, (1990). 2. Forbes J. A Treatise of the Diseases of the Chest, 1st ed. Underwood. London, (1821). 3. Forren J.F., Gibian G. Analysis of Lung Sounds Using Wavelet Decomposition, (1999). 4. Fredberg J.J. Acoustic determination of respiratory system properties. Ann. Biomed. Eng. 9 (1981) pp. 463-473. 5. Gavriely N. Breath Sounds Methodology. Boca Raton, FL: CRC Press, Inc., (1995). 6. Gavriely N., Nissan M., Rubin A.E., Cugell D.W. Spectral characteristics of chest wall breath sounds in normal subjects. Thorax 50 (1995) pp. 12921300. 7. Gavriely N., Herzberg M. Parametric representation of normal breath sounds. J. Appl. Physiol. 73(5) (1992) pp. 1776-1784. 8. Gavriely N., Cugell D.W. Airflow effects on amplitude and spectral content of normal breath sounds. J. Appl. Physiol. 80(1) (1996) pp. 5-13. 9. Kaisla T., Sovijarvi A., Piirla P., Rajala H.M., Haltsonen S., Rosqvist T. Validated Methods for Automatic Detection of Lung Sound Crackles. Medical and Biological Engineering and Computing 29 (1991) pp. 517-521. 10. Laennec R.T.H. De I'auscultation mediate ou traite du diagnostic de maladies des poumons et du coeur, fonde principalement sur ce nouveau moyen d'exploration. Brosson et Chaude. Paris, (1819). 11. Malmberg L.P., Sovijarvi A.R.A., Paajanen E., Piirila P., Haahtela T., Katila T. Changes in Frequency Spectra of Breath Sounds During Histamine Challange Test in Adult Asthmatics and Healthy Control Subjects. Chest 105 (1994) pp. 122-132. 12. Munakata M., Ukita H., Doi I., Ohtsuka Y., Masaki Y., Homma Y., Kawakami Y. Spectral and Waveform Characteristics of Fine and Coarse Crackles. Thorax 46 (1991) pp. 651-657. 13. Pasterkamp H., Kraman S.S., Wodicka G.R. Respiratory Sounds. Advances Beyond the Stethoscope. Am J Respir Crit Care Med 156 (1997) pp. 974987. 14. Pasterkamp H., Powell R.E., Sanchez I. Lung Sound Spectra at Standardized Air Flow in Normal Infants, Children, and Adults. Am J Respir Crit Care Med 154 (1996) pp. 424-430. 15. Pasterkamp H., Carson C , Daien D., Oh Y. Digital Respirasonography. New Images of Lung Sounds. Chests (1989) pp. 1505-1512.
66
16. Piirila P., Sovijarvi A., Kaisla T., Rajala H.M., Katila T. Crackles in Patients with Fibrosing Alveolitis, Bronchiectasis, COPD, and Heart Failure. Chest 99(5) (1991) pp. 1076-1083. 17. Pasterkamp H., Sanchez I. Tracheal Sounds in Upper Airway Obstruction. Chest 102 (1992) pp. 963-965.
67
T H E I M M U N E SYSTEM: B CELL B I N D I N G TO MULTIVALENT A N T I G E N
Gyan Bhanot IBM Research, Yorktown Hts., NY 10598, E-mail: [email protected]
USA
This is a description of work done in collaboration with Yoram Louzoun and Martin Weigert at Princeton University. Experiments in the late 80's by Dintzis etal revealed puzzling aspects of the activation of B-Cells as a function of the valence (number of binding sites) and concentration of presented antigen. Through computer modeling, we are able to explain these puzzles if we make an additional (novel) hypothesis about the rate of endocytosis of B-Cell receptors. The first puzzling result we can explain is why there is no activation for low valence (less than 10-20). The second is why the activation is limited to a small narrow range of antigen concentration. We performed a computer experiment to model the B-Cell surface with embedded receptors diffusing in the surface lipid layer. We presented these surface receptors with antigen with varying concentration and valence. Using experimentally reasonable values for the binding and unbinding probabilities for the binding sites on the antigens, we simulated the dynamics of the binding process. Using the single hypothesis that the rate of endocytosis of bound receptors is significantly higher than that of unbound receptors, and that this rate varies inversely as the square of the mass of the bound, connected receptor complex, we are able to reproduce all the qualitative features of the Dintzis experiment and resolve both the puzzles mentioned above. We were also able to generate some testable predictions on how chimeric B-Cells might be non-immunogenic.
1
Introduction
This paper is a description of work done in collaboration with Yoram Louzoun and Martin Weigert at Princeton University*. I begin with a brief introduction to the human immune system and the role of B and T Cells in it 2 . Next, I describe the B-Cells receptor/antibody and how errors in the coding for the light chains on these receptors can result in chimerical B-Cells with different light chains on the same receptor or different types of receptors on the same cell. After this, I describe the Dintzis experiments 3 ' 4 , 5 and the efforts to explain these experimental results using the concept of an Immunon 6 - 7 . There is also analytic work by Perelson 8 using rate equations to model the binding and activation process. This is followed by a description of our computer modeling experiment, its results and conclusions 1 .
68
2
Brief Description of H u m a n Immune System
The human immune system 2 , on encountering pathogen, has two distinct but related responses. There is an immediate response, called the Innate Response and there is also a slower, dynamic response, called the Adaptive Response. The Innate Response, created over aeons by the slow evolutionary process, is the first line of defense against bacterial infections, chemicals and parasites. It comes into effect immediately and acts mostly by phagocytosis (engulfment). The Adaptive Response is evolving even within an individual, is slower in its action (with a latency of 4-7 days) but is much more versatile. This Adative Response is created by a complex process involving cells called lymphocytes. A single microliter of fluid in the body contains about 2500 lymphocytes. All cellular components of the Immune System arise in the bone marrow from hematopoietic stem-cells, which differentiate to produce the other more specialized cells of the immune system. Lymphocytes derive from a lymphoid progenitor cell and differentiate into two cell types called the B-Cell and the TCell. These are distinguished by their site of differentiation, the B-Cells in the bone marrow and the T-Cell in the thymus. B and T Cells both have receptors on their surface that can bind to antigen (pieces of chemical, peptides, etc.) An important difference between B and T Cell receptors is that B-Cell receptors are bivalent (have two binding areas) while T-Cell receptors are monovalent (with a single binding area). In the bone marrow, B-Cells are presented with self antigen, eg. pieces of the body's own molecules. Those B-Cells that react to such self antigen are killed. Those that do not are released into the blood and lymphatic systems.T-Cells on the other hand are presented with self antigen in the thymus and are likewise killed if they react to it. Cells of the body present on their surface pieces of protein from inside the cell in special structures called the MHC (Major Histocompatibility Complex) molecules. MHC molecules are distinct between individuals and each individual carries several different alleles of MHC molecules. T-Cells are selected in the thymus to bind to some MHC of self but not to any self peptides that are presented on these MHC molecules. Thus, only T-Cells that might bind to foreign peptides presented on self MHC molecules are released from the thymus. There are two types of T-Cells, distinguished by their surface proteins. They are called CD8 T-Cells (also called killer T-Cells) and CD4 T-Cells (also called helper T-Cells). When a virus infects a cell, it uses the cell's DNA/RNA machinery to replicate itself. However, while this is going on, the cell will present on its surface pieces of viral protein on MHC molecules. CD8 T-Cells in the surrounding medium are programmed to bind strongly to such MHC molecules presenting
69 non-self peptides. After they bind to the MHC molecule, they send a signal to the cell to commit suicide (apoptose) and then unbind from the infected cell. Also, once activated in this way, the CD8 T-Cell will replicate aggressively and seek out other infected cells to send them the suicide signal. The CD4 T-Cells on the other hand, recognize viral peptides on B-cells and macrophages (specialized cells which phagocytose or engulf pathogens, digest them and present their peptide pieces on MHC molecules). The role of the CD4 T-Cell, when it binds in this way, is to signal the B-Cell and macrophages to activate and proliferate. B-Cell that are non-reactive to self antigens in the bone marrow are released into the blood and secondary lymphoid tissue. They have a life time of about three days unless they successfully enter lymphoid follicles, germinal centers or the spleen and get activated by binding to antigen presented to them there. Those that have the correct antibody receptors to bind strongly to viral peptide (antigen), will become activated and will start to divide, thereby producing multiple copies of themselves with their specific high affinity receptors. This process is called 'clonal selection' as the clone which is fittest (binds most strongly to presented antigen) is selected to multiply. The B-Cells that bind to antigen will also endocytose their own receptors with bound antigen and present it on their surface on MHC-II molecules for an activation signal from CD4 T-Cells. Once a clone is selected, the B-Cells also mutate and proliferate to produce variations of receptors to achieve an even better binding specificity to the presented antigen. B-Cells whose mutation results in improved binding will receive a stronger activation signal from the CD4 T-Cells and will out-compete the rest. This process is called 'affinity maturation'. Once the optimum binding specificity B-cells are produced, they are released from the germinal centers. Some of these differentiate into plasma cells which release large numbers of antibodies (receptors) with high binding affinity for the antigen. These antibodies mark the virus for elimination by macrophages. Some B-Cells go into a latent phase (become memory B-Cells) from which they may be activated if the infection recurs. It is clear from the above discussion that there are two competing pressures in play when antigen binds to B-Cells. One pressure is to maximize the number of surface bound receptors, until a critical threshold is reached when the BCell is activated and will proliferate. The other pressure is to endocytosis the receptor-antigen complex followed by presentation of the antigen peptide on MHC-II molecules, binding to CD4 T-Cells and an activation signal from that binding. To function optimally, the immune system must carefully balance these two processes of binding and endocytosis.
70
Unbound receptors on the surface of B-Cells are endocytosed at the rate of about one receptor every half hour. However, the binding and activation of B-Cells happens in a time scale of a few seconds to a minute (for references to many of the details of the numerical values used in this paper, refer to the references in 1 ) . If endocytosis is to compete with activation, as it must for the process described above to work, then bound receptors must be endocytosed much more frequently than once every half hour. Since there is no data available on the exact rate of endocytosis for bound receptors, we made the assumption in our simulation that the probability of endocytosis of a single B-Cell receptor bound to antigen is of the same order of magnitude as the probability of binding of antigen to the receptor. There is a strong probability that multiple receptors are linked by bound antigen before they are endocytosed. We make the reasonable assumption that the probability of endocytosis of the receptor-antigen cluster is inversely proportional to the square of the mass of the cluster. Let us now discuss, in a very simplified way, the structure of the B-Cell receptor/antibody. The B-Cell receptor is a Y shaped molecule consisting of three equal sized segments, connected by disulfide bonds. The antigen binding sites are at the tip of the arms of the Y. These binding sites are made up of two strands (heavy and light) each composed of two regions, one which is constant and another which is highly variable, called the constant and variable regions respectively. The process that forms the antibody first creates a single combination of the heavy and light chains (H,L) sections and then combines two such (H,L) sections by disulfide bonds to create the Y shaped antibody. In diploid species, such as humans, whose DNA strands come from different individuals, there are four ways to make the (H,L) combinations using genes from either of the parent DNA strands. Thus if the parent types make HI, LI, and H2, L2 respectively, in principle, it would be possible to make four combinations: (H1,L1), (H2,L2), (H1,L2) and (H2,L1). The classical dogma in immunology is allelic exclusion, which asserts that, in a given B-Cell, when two strands of (H,L) fuse to form a receptor, only the same (H,L) combination is always selected. This will ensure that for a given B-Cell, all the receptors are identical. However, sometimes this process does not work and B-Cells are found with both types of light chains in receptors on the same cell 9 . It turns out that there are two distinct types of light chains called K and A.Normally in humans the ratio of B-Cells with n or A chains is 2:1 with each cell presenting either a KK or a AA light chain combination.However, as mentioned above, sometimes allelic exclusion does not work perfectly and BCells present KX receptors or the same cell presents receptors of mixed type,a combination of some which are KK, some which are AA and some which are KA.
71
A given antigen will bind either to the A or the re chain, or to neither, but not to both. Thus areAB-Cell receptor is effectively monovalent. Furthermore, a B-Cell with mixedrereand AA receptors would effectively have fewer receptors available for a given antigen. It is possible to experimentally enhance the probability of such genetic errors and study the immunogenicity of the resulting B-Cells. This has been done in mice. The surprising result from such experiments is that chimerical B-Cells are non-immunogenic 9 . We shall attempt to explain how this may come about as a result of our assumption about endocytosis.
3
The Dintzis Experimental Results and the Immunon Theory
Dintzis etal 3 ' 4 , 5 did an in-vivo (mouse) experiment using five different fluoresceinated polymers as antigen (Ag). The results of the experiment were startling. It was found that to be immunogenic, the Ag mass had to be in a range of 105 —106 Daltons (1 Dalton = 1 Atomic Mass Unit) and have a valence (number of effective binding sites) greater than 10-20. Antigen with mass or valence outside this range elicited no immune response for any concentration. Within this range of mass and valence, the response was limited to a finite range of antigen concentration. A model based on the concept of an Immunon was proposed to explain the results 6 ' 7 . The hypothesis was that the B-Cell response is quantized, ie. to trigger an immune response, it is necessary that a minimum number of receptors be connected in a cluster cross linked by binding to antigen. This linked cluster of receptors was called an Immunon and the model came to be called the 'Immunon Model'. However, a problem immediately presents itself: Why are low valence antigens non immunogenic? Why can one not form large clusters of receptors using small valence antigen? The Immunon model had no answer for this question. Subsequently, Perelson et al 8 developed mathematical models (rate equations) to study the antigen-receptor binding process. Assuming that B-Cell response is quantized, they were able to show that at low concentration, because of antigen depletion (too many receptors, too little antigen), an Immunon would not form. However, the rate equations made the flaws in the Immunon model apparant. They were not able to explain why large valence antigen were necessary for an immune response nor why even such antigen was tolerogenic (non-immunogenic) at high concentration.
72
4
Modeling the B-Cell Receptor Binding to Antigen: Our Computer Experiment
The activation of a B-cell is the result of local surface processes leading to a cascade of events that result in release of antibody and/or presentation of antigen. The local surface processes are binding, endocytosis and receptor diffusion. Each of these is governed by its own time and length scales, some of which are experimentally known. To model B-cell surface dynamics properly, the size of the modeled surface must be significantly larger than the largest dynamic length scale we wish to model and the time steps used must be smaller than the smallest dynamic time scale. Further, the size of the smallest length scale on the modeled surface must be smaller than the smallest length scale in the dynamics. The size of a B-cell receptor is 3 nm and this is the smallest surface feature we will model. The size of a typical antigen in our simulation is 5-40 nm. The diffusion rate of receptors is of the order of D = 10x1 - 5~ 10 cm 2 /s and the time scale for activation of a cell is of the order of a few seconds to a few tens of seconds (r ~ 100s). Hence the linear size of the surface necessary in our modeling is L > \TDT ~ 1 — 1\im. This is the maximum distance that a receptor will diffuse in a time of about 100s. We choose a single lattice spacing to represent a receptor. The linear size of our surface was chosen to be 1000 lattice units which approximately represents a physical length of 3 — 4/xm. The affinity of receptor-hapten binding is lO 5 ]^^ 1 for a monovalent receptor. The affinity of a bivalent receptor depends on the valence of the antigen and on the distribution of haptens on the antigen. The weight of a single hapten is a few hundred Daltons. Hence, the ratio of the on-rate to the off-rate of a single receptor hapten pair is ~ 100 - 1000. We choose an on-rate of 0.2 and an off rate of 0.001 in dimensionless units. Our unit of time was set to 0.05 milli second. This was done by choosing D in dimensionless units to be 0.1 which means that the effective diffusion rate is (0.1x(3nm)7(0.05ms) ~ 2.0xl0- 1 0 cm 2 /s. The affinity of chimeric B-Cell receptors was set lower because they bind to DNA with a lower affinity. For them, we used an on-rate of 0.1 and an off-rate of 0.01. The cell surface was chosen to have periodic boundary conditions as this simplifies the geometry of the modeling considerably. The size of our cell surface is equivalent to 20% of a real non-activated B-cell.A B-cell typically has 50000 receptors on its surface.Hence, we modeled with 10000 receptors initially placed on random sites of the lattice. In each time step, every receptor was updated by moving it to a neighboring site (if the site was empty) with a probability D = 0.1. Receptors that are bound to antigen were not allowed to
73
move. At every time step, receptors which have free binding sites can bind to other haptens on the antigen or to any other antigen already present on the surface. They can also unbind from hapten to which they are bound. Once an antigen unbinds from all receptors, it is released within 5 time steps on average. Once every 20 time steps, the receptors were presented with new antigen at a constant rate which was a measure of the total antigen concentration. We varied this concentration rate in our modeling. The normal rate of endocytosis of unbound receptors is once every 1/2 hour. If this is the rate of endocytosis for bound receptors also, it will be too small to play a role in antigen presentation. Thus we must assume that a bound receptor has an higher probability of being endocytosed compared to an unbound receptor. A receptor can bind to two haptens and every antigen can bind to multiple receptors. This cross linking leads to the creation of large complexes. We assume that the probability to enodcytose a receptor-antigen complex is inversly proportional to the square of its mass. The mass of the B cell receptor is much higher than the mass of the antigens, so, when computing the mass of the complex we can ignore the mass of the antigen. We thus set the endocytosis rate only as a function of the number of bound receptors. The rate of endocytosis for the entire complex was chosen to be inversly proportional to the square of the number of receptors in the complex. More specifically, we set the probability to endocytose an aggregate of receptors to be 0.0005 divided by the square of the number of receptors in the aggregate. For chimeric B-cells we reduced the numerator in this probability by a factor of 100. 5
Results
The results of our computer study are shown in figure (1), where the solid line shows the number of bound receptors after 10 seconds of simulation as a function of antigen valence. The data are average values over several simulations with different initial positions for receptors and random number seeds. The dashed line shows the number of endocytosed receptors. One observes a clear threshold below which the number of bound surface receptors stays close to zero followed by a region where the number of bound receptors increases and flattens out. This establishes that we can explain the threshold in antigen valence in the Dintzis experiment. The reason for the threshold is easy to understand qualitatively. Once an antigen binds to a receptor, the probability that its other haptens bind to the other arm of the same receptor or to one of the other receptors present in the vicinity is an exponentially increasing function of the number of haptens. Also, once an antigen is multiply bound in a complex, the probability of all
74
the haptens unbinding is an exponentially decreasing function of the number of bound haptens. Given that receptors once bound may be endocytosed, low valence antigen bound once will most likely be endocytosed or will unbind before it can bind more than once (ie. before it has a chance to form an aggregate and lower its probability to endocytose.) As the valence increases, the unbinding probability decreases and the multiple binding probability increases until it overcomes the endocytosis rate. Finally, for high valence, one will reach a steady state between the number of receptors being bound and the number endocytosed in a given unit of time. In figures (2) and (3), we show the number of bound receptors (solid line) and endocytosed receptors (dashed line) as a function of the antigen concentration for two different values of valence. Figure (2) has data for high valence (20) and figure (3) for low valence (5). It is clear that for high valence, there is a threshold in concentration below which there are no bound receptors (no immune response) followed by a range of concentration where the number of bound receptors increases followed by a region where it decreases again. The threshold at low concentration is easy to understand. It is caused by antigen depletion (all antigen that binds is quickly endocytosed). The depletion at high concentration comes about because of too much endocytosis, which depletes the pool of available receptors. For low valence (figure (3)), there is no range of concentrations where any surface receptors are present. The reason is that the valence is too low to form aggregates and lower the rate of endocytosis and is also too low to prevent unbinding events from happening fast enough. Thus all bound receptors get quickly endocytosed. The high rate of endocytosis for high concentration probably leads to tolerance, as the cell will not survive such a high number of holes on its surface. Figures (1), (2) and (3) are the major results of our modeling. They clearly show that the single, simple assumption of an increased rate of endocytosis for bound receptors and reasonable assumptions about the way this rate depends upon the mass of the aggregated receptors is able to explain both the low concentration threshold for immune response as well as the high concentration threshold for tolerogenic behavior in the Dintzis experiment. It can also explain the dependence of activation on valence, with a valence dependent threshold (or alternately, a mass dependent threshold) for activation. Now consider the case of chimeric B-Cells. It turns out that these cells bind to low affinity DNA but are not activated 9 . DNA has a high valence and is in high concentration when chimeric B-Cells are exposed to it. To model the interaction of these B-Cells, we therefore used a valence of 20 and lowered the binding rate. We considered two cases:
75
Case 1: The B-cell has KK and AA receptors in equal proportion. This effectively halves the number of receptors since antigen will bind either to the KK receptor or the AA receptor but not to both. Figure (4) shows the results of our modeling for this case. Note that the total number of bound receptors is very low. This is due to the low affinity. However, the endocytosis rate is high, since receptors once bound will be endocytosed before they can bind again and lower their probability of endocytosis. Thus in this case we would expect tolerogenic behavior because of the low number of bound receptors. Case 2: The K and A are on the same receptor. This means that the receptor is effectively monovalent since antigen that binds to one of the light chains will not, in general, bind to the other. In a normal bivalent receptor, the existance of two binding sites creates an entropy effect where it becomes likely that if one of the sites binds, the other binds as well. The single binding site on the K\ receptors means that antigen binds and unbinds, much like the case of the T-Cell. Thus, although the number of bound receptors at any given time reaches a stady state, the endocytosis rate is low, since receptors do not stay bound long enough to be endocytosed. Figure (5) shows the results of the modeling which are in agreement with this qualitative picture. The non-immunogenicity of K\ cells would come about because of the low rate of endocytosis and consequent lack of T-Cell help. Our modeling thus shows that for chimeric receptors, non-immunogenicity would arise from subtle dynamical effects which alter the rates of binding and endocytosis so that either activation or T-Cell help would be compromised. These predictions could be tested experimentally.
76 Number of bound and endocitosed receptors as a function of valence
£ 300
•«# * u
5 IOO n E
Antigen valence
Figure 1: Dependence of the number of bound receptors and number of endocytosed receptors on Antigen Valence for medium levels of concentration after 10 seconds.
High valence antigen 9000
'
8000
o
Q.
R
£
i
7000 t i
noq
"c
i
6000
*$
d and
75
& o
- • «
8 ?0) o
1 I
5000
i t
4000
* Valence=20
t
t
t
3000
$
*
:
2000
/ ^ N
/
\
1000
...^>^ Antigen concentration
Figure 2: Dependence of the number of bound receptors and number of endocytosed receptors on antigen concentration for high valence antigens after 10 seconds of simulation.
77 Low valence antigen
Antigen concentration
Figure 3: Dependence of the number of bound receptors and number of endocytosed receptors on antigen concentration for low valence antigens after 10 seconds of simulation.
Low affinity K * V cells, with bivalent receptors 350
1
i
i
i
i
i
i
i
ors
5000 nc, 5000 U
i
•"
Q. 0)
S 250 •o (U
«*
o
r,#*
iendoc
s.
r
f 150
-
uno
s~
*~
rot
n
t
S
IM n
f
,''
t
%*
y"
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Time [msj
Figure 4: The number of bound and endocytosed receptors for a cell with 50% KK and 50% AA receptors. These cells would be non-immunogenic because of low levels of activation from the low binding.
78 Number of bound and endocytosed monovalent receptors 1401
0
1
1
1
1
1000
2000
3000
4000
1
1
1
i
5000
6000
7000
8000
9000
10000
Time [ms]
Figure 5: The number of bound and endocytosed receptors for a cell with only KX receptors. These cells would be non-immunogenic because of low levls of endocytosis and consequent lack of T-Cell help.
References 1. Y. Louzoun, M. Weigert and G. Bhanot, "A New Paradigm for B Cell Activation and Tolerance", Princeton University, Molecular Biology Preprint, June 2001. 2. C. A. Janeway, P. Travers, M. Walport and J. D. Capra, "Immunobiology - The Immune System in Health and Disease", Elsevier Science London and Garland Publishing New York, 1999. 3. R. Z. Dintzis, M. Okajima, M. H. Middleton, G. Greene, H. M. Dintzis, "The Immunogenicity of Soluble Haptenated Polymers is determined by Molecular Mass and Hapten Valence", J. Immunol. 143:4, Aug. 15, 1989. 4. J. W. Reim, D. E. Symer, D. C. Watson, R. Z. Dintzis, H. M. Dintzis, "Low Molecular Weight Antigen Arrays Delete High Affinity Memory B cells Without Affecting Specific T-cell Help", Mol. Immunol., 33:17-18, Dec. 1996. 5. R. Z. Dintzis, M. H. Middleton and H. M. Dintzis, "Studies on the Immunogenicity and Tolerogenicity of T-independent Antigens", J. Immunol., 131, 1983. 6. B. Vogelstein, R. Z. Dintzis, H. M. Dintzis, "Specific Cellular Stimulation in the Primary Immune Response: a Quantized Model", PNAS 79:2, Jan.
79 1982. 7. H. M. Dintzis, R. Z. Dintzis and B. Vogelstein, "Molecular Determinants of Immunogenicity, the Immunon Model of Immune Response", PNAS 73, 1976. 8. B. Sulzer, A. S. Perelson, "Equilibrium Binding of Multivalent Ligands to Cells: Effects of Cell and Receptor Density", Math. Biosci. 135:2, July, 1996; ibid. "Immunons Revisited: Binding of Multivalent Antigens to B Cells", Mol. Immunol. 34:1, Jan. 1997. 9. Y. Li, H. Li and M. Weigert, "Autoreactive B Cells in the Marginal Zone that Express Dual Receptors", Princeton University Molecular Biology Preprint, June 2001.
80
S T O C H A S T I C MODELS OF I M M U N E S Y S T E M A G I N G L. MARIANI, G. TURCHETTI Department of Physics, Via Irnerio 46, 40126 Bologna, Italy Centro Interdipartimentale L. Galvani, Universita di Bologna, Bologna, Italy E-mail: [email protected] [email protected] F. LUCIANI Max Planck Institute for Compex Systems, Noetnitzer 38, Dresden, E-mail: [email protected]
Germany
The Immune System (IS) is devoted to recognition and neutralization of antigens, and is subject to a continuous remodeling with age (Immunosenescence). The model we propose, refers to a specific component of the IS, the cytoxic T lymphocytes and takes into account the conversion from virgin (ANE) to memory and effector (AE) phenotypes, the virgin cells injection by thymus and the shrinkage of the overall compartment. The average antigenic load as well as the average genetic properties fix the the parameters of the model. The stochastic variations of the antigenic load induce random fluctuations in both compartments, in agreement with the experimental data. The results on the concentrations are compatible with a previous simplified model and the survival curves are in good agreement with independent demographic data. The rate of mortality, unlikely the Gomperz law, is zero initially and asymptotically, with an intermediate maximum, and allows to explain the occurrence of very long living persons (centenarians).
1
Biological Complexity
The Immune System (IS) preserves the integrity of the organism, continuously challenged by internal and external agents (antigens). The large variety of antigens ranging from mutated cells and parasites to viruses, bacteria and fungi requires a rapid and efficient antagonistic response of the organism. At the top of the philogenetic tree the evolution has developed a specific (clonotipic) immunity which cooperates with the ancestral innate immunity to control the antigenic insults ^h The innate system has an arsenal of dendritic cells and macrophagi with a limited number of receptors capable of recognizing and neutralizing classes of antigens. With the appearance of vertebrates the increase of complexity stimulated the development of a system based on two new types of cells, B and T lymphocytes, with three distinct tasks: to recognize the antigens, to destroy them and to keep track of their structure through a learning process. This kind of immunological memory is the key for a more efficient response to any subsequent antigenic insult caused by an antigen that the organism has already experienced (this is the basis of vaccination). The specific response is based on a variety of memory cells which are activated, by specific
81
molecules of the antigen, presented by the APC (antigen presenting cells) to their receptors. There are two main T cells compartments: the virgin cells, which are produced (with B lymphocytes) in the bone marrow but maturate all their surface receptors in the thymus, and the memory cells, which are activated by the antigenic experience and preserve the information. The memory cells specific to a given antigen form a clone which expands with time, subject to the constraint that total number of cells remains almost constant with a small decrease with age (shrinkage of IS). The virgin cells instead, after reaching a maximum in the early stage of life, decrease continuously since the IS is not able to compensate their continuous depletion due to various biological mechanisms (conversion into memory cells, progressive inhibition of the thymic production, peripheric clonal competition) I2-3!. The systems with self-organizing hardware, cognitive and memory properties and self replicating capabilities are by definition complex. The immune and nervous systems exhibit these features at the highest degree of organization and can be taken as prototypes of complex systems. Indeed the specific (clonotipic) immune system has a hardware hierarchically organized , capable of receiving, processing and storing signals (from its own cytochines and immunoglobulines network and from the environment), and to create a memory, self replicating via the DNA encoding, which allows a long term evolutionary memory. For this reason the mathematical modeling has been particularly intensive on the IS. Since this system exhibits a large number of space time scales, modeling is focused either on specific microscopic phenomena with short time scales or on large scale aspects with long time scales ranging from a few weeks (acute antigenic response) to the entire lifespan. 2
T lymphocytes
We will focus our attention on the dynamics of the T cells populations on a long time scale disregarding the detailed microscopic behavior which is certainly very relevant on short time scales. The virgin T lymphocytes developed by the thymus have a large number of receptors TCR, built recombining genie sequences. This large set of variants (up to 1016 in humans), known as T cell repertoire, allows the recognition by steric contact of the antigen fragments presented by the APC (Antigen Presenting Cells), which degrade the proteins coming from the englobed antigens via proteolytic activities and show the peptides resulted from the cleavage to the surface molecules MHC I 4 ' 5 !. Other stimuli, such as the cytokines t 1 !, determine the differentiation into effector and memory cells and their proliferation (clone expansion). The memory cells, unlikely the effector ones, are long lived and show a sophisticated
82
| T helper ]
ANE (Virgin)
/*™\
AE (memory+effector)
[ T Cytotoxic]
ANE CVirgin)
AE (memory+effector)
Figure 1: Markers of Virgin and Memory plus Effector T lymphocytes. Schematic organization of main T cell pool: CD4+ (Helper) and CD8+ (cytotoxic) lymphocytes.
cognitive property allowing a more efficient reaction against a new insult of an experienced antigen. The T lymphocytes are split into two groups: the cytotoxic and helper T cells. The former attack and destroy cells infected by intra-cellular antigens, such as virus and kind of bacteria, the latter contribute to extracellular antigenic response, not described here. They are labeled by CD8 + (cytotoxic) and CD4 + (helper) according to the surface markers used to identify them. Each group is further split into virgin, effector and memory cells, whose role has been outlined, and are identified by some others surface markers, see figure 1. We are interested in the dynamics of two populations, the Antigen Not Experience (ANE) virgin T cells, and Antigen Experienced (AE) effector and memory T cells, which are identified by the CD95~ and CD95 + surface markers respectively. 3
Modeling immunosenescence
In this note we propose a mathematical model to describe the time variation of the ANE and AE T cells compartments due to the antigenic load and to the remodeling of the system itself. The antigenic load has sparse peaks of high intensity (acute insults) and a permanent low intensity profile with rapid random variations (chronic antigenic stress). In a previous work I 6 ' a simple model for the time evolution AE and ANE T cells concentrations was proposed on the basis of Franceschi's theory of immunosenescence, which sees the entire IS undergoing a very deep reshaping during the life span. The exchange between the compartments were considered due to antigen stimulated conversion and to reconversion due to secondary stimulation. The average antigenic load contributed to define these conversion rates jointly with a genetic average. The deterministic part of the model described the decrease of the ANE CD8 + T
83
cells concentration in agreement with experimental data, while the stochastic forcing, describing the chronic stress I 7 ' 8 !, allowed to obtain individual histories. The spread about the mean trajectory was also compatible with the data on T cells concentration and allowed to obtain survival curves in good agreement with independent demographic data, starting from the hypothesis that the depletion of the ANE T cells compartments is a mortality marker. The present model is intended to introduce some improvements by taking into account the remodeling of the immune system with age and is formulated on the ANE and AE populations rather than on the concentrations. Moreover the antigenic load is introduced on both the ANE and AE variation rates with an adjustable mixing angle. The complete model, that will be described in detail in the next section, has several parameters, but the relevant point is that if we neglect the remodeling, and compute the concentration, the results are very similar to the original ones; moreover the data on the AE T cells are fairly well reproduced t 3 '. The introduction of the remodeling effects shows that a further improvement still occurs especially for the early stage were the simplified model was not adequate. The last part is dedicated to the survival curves obtained from the model. A very important difference is found with respect to the classical Gomperz t 9 l survival law: the rate of mortality vanishes initially and asymptotically whereas it increases exponentially in the Gomperz law. This results, which explains the presence of very long lived individuals (centenarians), supports the biological hypothesis (depletion of ANE T cells compartment) of Franceschi's theory. 4
Mathematical model and results
The mathematical model is defined by dV _ -aV dt ~ dM _ dt
aV
1+
+ fi e~xt
-/3M + /3M 7
M+
. +
—
+ ecos2(6>) f(i)
2//,N/-/^ 2
(*)^)
(1)
where V denotes the number of ANE (virgin) CD8 + cells and M the number of AE (effector +memory) CD8+ cells, M+ = M if M > 0 and M+ = 0 if M < 0. The parameter a gives the conversion rate of virgin cells due to primary antigenic insult whereas {3 is the reactivation rate of memory cells due to secondary antigenic stimulus, which has an inhibitory effect on the virgin cells. In the primary production of AE cells we have taken the conversion and reconversion terms proportional to (1 + ^M+)~l, in order to take into
§ o
84
500
0
Memory
o
c
8 „o
o
° (ft)
°
___
•
,^-V--" o o ^-^"O 0
Jfcg^~t- --I -« -200
-100 0
Time (years)
120
0
Time (years)
120
Figure 2: Comparison of the model with experimental data for the virgin (ANE) and memory plus effector (AE) CD8+ T cells for the following parameters a = 0.025, /? = 0.01, e = 15, 0 = 35°, 7 = 0.004, A = 0.05, ^ = 15 and V(0) = 50. The curves are (V(t)) + fcoy (t) and (M{i)) + kaM{t) with k = - 2 , 0 , 2 .
account the shrinkage of the T cells compartment. The term fie~xt describes the production by thymus which is assumed to decay exponentially. Finally e£(£), where
«(*)> = o
(at)at')) = s(t-t')
(2)
is the contribution of stochastic fluctuations to the conversion rates. The mixing angle 0 gives the weight of this term on the ANE and AE compartments. The results are compared with experimetal data in figure 2. 4-1
Simplified model
It was previously studied in the deterministic version and is obtained from (1) by setting 7 = y, = e = 0. Since M + V is constant the same equations were satisfied by the concentrations v = V/(V + M) and m = M/(V + M) and the stochastic equation for v was t 10 ' 6 ! dv = -{a-fi)v-P ~dl
+ eZ{t)
(3)
where e ~ e/(V r (0)+M(0)), and was solved with initial condition v(0) = 1. The deterministic model obtained from (1) setting e = 0 can be solved analytically and we consider two simple cases: /? — 7 = 0 describing the effect of thymus and /3 = /j, — 0 describing the effect of shrinkage.
85 e=o
9=0
9=45
1000
9=45
500
'" 8 \
' 0
?o
X
f
F
1
^N * ^* >.•" °" " o^9— . . . ^~Time (years)
. . - • - • "
°
,..'' & ,—
s
•—r
^
/u.
S
•f°
t
°
Time (years)
M
° ^-stS^s.
r-.'J*-.
•?nn
120
Virgin
o
^iv
120
-mn °
Time (years)
12
°
°
/ ^ " o o° *
..ta-..
- f
*S- * B
8«
Time (years)
12
°
Figure 3: Comparison with CD95 data ' 2 1 of virgin (ANE) and memory plus effector (AE) populations for the model without shrinkage and thymus for two different mixing angles. The parameters are a - 0.02, /3 = 0.005, £ = 10, V(0) = 400 and 6 ~ 0 (left figures) and 9 = 45° (right figures). The curves are (V{t)) + kav(t) and (M(t)) + kaM(t) with k = -2,0,2.
4-2
Analytic solutions without noise
The deterministic solution of the model without thymus and shrinkage 7 = fi = 0, for initial conditions V(0) and M(0) = 0, reads (V(t)) = V0
(M(t)) = V0a
/3-a
e(0-a)t
_ J
0-a
(4)
and the T cell population is conserved M(t) + V(t) = V(0). The deterministic solution with no shrinkage 7 = 0 can be analytically obtained t 11 !. Since 0 < P < a choosing for simplicity ,5 = 0 one has (V(t)) =
V(0)e-at+v
e-at
_
e-Xt
A—a
(M(t)) = V(0)-(V(t))
+
?-{l-e-»)
(5) The solution with shrinkage and no thymic term n = 0, choosing for simplicity /? = 0, reads (V(t))=V(0)e
-at
(M(t)) = 7 - 1 ( [1 + 2 7 V(0)(1 - e-at)
Y1/2
- 1 ) (6)
The graph of (V(t)} (5) exhibits a peak at t = (log A — loga)/(A — a) if 1/(0) = 0. The peak disappears when the thymus contribution vanishes. Conversely (M(t)) is monotonic increasing but the thymus enhances its value. The shrinkage reduces considerably the increase of (M(t)) whereas it does not affect (V(t)). The stochastic term generates a family of immunological histories. Their spread is measured by the variance. In figure 3 we show the effect of the mixing angle for the same set of parameters chosen for the simplified model.
86 Thymus
Shinkage
Shinkage
500
Virgin
Virgin
§
Thymus
fa
X
inn
-100
Time (years)
120
Time (years)
120
^-^ ••.,
°
Time (years)
" " - —
120
Time (years)
12
°
Figure 4: Comparison with CD95 data I 2 J of virgin (ANE) and memory plus effector (AE) populations for the model with a = 0.025, /9 = 0.01, e = 15, 6 = 35°. On the left side we consider the contribution of thymus with A = 0.05, /x = 15, 7 = 0 and V(0) = 50 . On the right side the contribution of shrinkage is shown for 7 = 0.004, /i = 0 and V(0) = 400. The curves are (V(t)) + kav(t) and (M(t)) •+- kaM{t) with k - - 2 , 0 , 2 .
When 6 grows, the rms spread av of V decreses, whereas the rms spread aM of M increases. The separate effect of thymus and shrinkage is shown in figure 4, for the same parameters as figure 2. 5
Survival curves
The simplified model with noise given by equation (3), corresponds to the Ornstein-Uhlembeck process. The probability density, satisfies the FokkerPlanck equation and has an expilict solution (v)(t)
+ (1 - t;oo)e- t / T
p(v,t) =
1 ^2Trai(t)
exp ~'r [
(v-(v)(t)[ 2a2 (t)
(7) where T = (a - 0)~\ Voo = -0T and a2(t) = \ e2 T (1 - e~2tlT). In figure 5 we compare the results of the model with demographic data. Assuming that the depletion v = 0 of the virgin T cells compartment marks the end of life it is possible to compute from (7) the survival probability up to age t r+00
S(t)
2 du
Jx(t)
x(t) =
a(t)
(8)
Neglecting the thymus and shrinkage effects the concentrations obtained from equation (1) are close to values of the simplified model if d = 0. Indeed when 7 = /x = 0 we have M(t) + V{t) = V(0) + M(0) + ew(t) where w(t) denotes the Wiener process. Setting v = V/{V + M) and V = V0 + eVx and M = M0 +
+ e2 (V0 (w2) - (Vi w))
((v - (v))2) = e2 ((V, - v0 wf)
(9)
87 o
500 o°
"°o\
Sm
*
rvival Pr obab 4 0
• \ \ °
%
«
Mortality Rate 0.1
J\
\
/"" "X / l
V V
i
V
-100 )
Time (years)
150
\
0
50 100 Time (years)
150
0
y 50 100 Time (years)
150
Figure 5: Virgin cells population (V(t)) ± 2av (i) with vx = -0.5, T = 67 and e = 0.016 (left). Corresponding survival curve (center) and mortality rate (right). These values correspond to a - 0.022, 0 = 0.0075, e = 6.5 7 = fi = 0 and V(0) = 400 up to order e2, where e = e/{V(0) + M(0)) and v0 = V0/(Vo + M 0 ). When /3 = 0 asymptotic variance is the same as for the simplified model. The survival probability S(t) obtained from the simplified model has been compared with demographic data. We notice that S(t) is a three parameters function modeling death process as a threshold for a biomarker, which evolves following a deterministic in randomly varying environment I 12 - 13 !. In our case the biomarker is the virgin CD8 + T cells concentration v and the threshold is t>* — 0. The lower integration end point can be written as x(t) = C
exp^y^ Vl - e- 2 «/ T 1
H(f
1/2
(v* - Uoo)
(10)
where tt is the death age {v(tr)} = u* namely e~'*/ T = (u* — Voo)/(l — ^oo)- In figure 5 we fit the demographic data of human males using (8) and the values of the parameters are close to the ones obtained from the fit of CD8 + T cells concentrations.
6
Comparison with Gomperz law
Making the simplifying assumption t* = T we obtain a survival probability depending on two parameters C, T
88
just as the Gomperz law, which is defined by ^
= -RSG
f
= £
->
50(t)=exp(-C0(e^0-l))
(12)
Our mortality rate is not monotonically increasing, as for Gomperz law, but decreases after reaching a maximum, see figure 5. It is better suited to describe the survival of human populations in agreement with demographic data. This property, due to the randomly varying antigenic load on the organism, explains the occurrence of very long lived persons (centenarians). We notice that x(t) oc - i - 1 / 2 as t ->• 0 and x(oo) = C so that -C2/2
Jim S(t) = 1
e
lim S(t) =
—7=r
(13)
We notice that 5(+oo) > 0 means nonzero probability of indefinite survival. However 5(+oo) ~ 1 0 - 3 for C = 3 and it is below 1 0 - 6 for C = 5 our law imposes to fix a reasonable lower bound on C. We further notice that
«*H
f
C
t=T
T
'
V2TT(1 -
e- 2 )
(14)
The meaning of the parameters is obvious: T is the age at which the survival probability is exactly 50% and the slope of the curve there is proportional to C. We can say that C measures the flatness of the graph of S(t). For the mortality rate R = —S/S we have the following asymptotic behavior
7
Conclusions
We consider the long time behavior of the CD8 + virgin T cells and CD8 + antigen experienced T cells compartments and the remodeling of the IS system. The stochastic variations of the antigenic load determine a spread in the time evolution of the cells number, in agreement with experiments. The results are compatible with a previous simplified model for the virgin T cells concentrations and provides survival curves compatible with demographic data. The effect of thymus and remodeling improves the description of the early stage for the virgin T cells and late stage of antigen experienced T cells. 8
Acknowledgments
We would like to thank Prof. Franceschi for useful discussions on the immune system and ageing.
89
9
References 1. A Lanzavecchia, F.Sallustio Dynamics of T lymphocytes Responses: Intermediates,Effector and Memory Science 290, 92 (2000) 2. F. Fagnoni, R. Vescovini, G. Passeri, G. Bologna, M. Pedrazzoni, G. Lavagetto, A. Casti, C. Franceschi, M. Passeri & P. Sansoni, Shortage of circulating naive CD8 T cells provides new insights on immunodeficiency in aging. Blood 95, 2860 (2000) 3. F. Luciani, S. Valensin, R. Vescovini, P. Sansoni, F. Fagnoni, C. Franceschi, M. Bonafe, G. Turchetti, Immunosenescence: The Stress Theory AStochastical Model for CD8+ T cell Dynamics in Human Immunosenescence: implications for survival and longevity, J. Theor. Biol. 213,(2001) 4. A. Lanzavecchia, F. Sallusto, Antigen decoding by T lymphocytes: from synapse to fate determination, Nature Immunology 2, 487 (2001) 5. G.Pawelec et al, T Cells and Aging, Frontiers in Bioscience 3, 59 (1998) 6. F. Luciani, G. Turchetti, C. Franceschi, S. Valensin, A Mathematical Model for the Immunosenescence, Biology Forum 94, 305 (2001). 7. C. Franceschi, S. Valensin, M. Bonafe, G. Paolisso, A. I. Yashin, D. Monti, G. De Benedictis, The network and the remodeling theories of aging: historical background and new perspectives., Exp. Gerontol. 35, 879 (2000) 8. C. Franceschi, M. Bonafe, S. Valensin. Human immunosenescence: the prevailing of innate immunity, the failing ofclonotipic immunity, and the Riling of immunological space. Vaccine. 18,1717(2000) 9. B. Gompertz On the nature of the function expressive of the law of human mortality, and on the new mode of determining the values of life contingencies. Philos. Trans. R. Soc. London 115, 513 (1825) 10. F. Luciani Modelli hsico-matematici per la memoria immunologica e l'immunosenescenza , Master thesis, Univ. Bologna (2000) 11. L. Mariani Modelli stocastici deU'immunologia: Risposta adattativa, Memoria e Longevita' Master thesis, Univ. Bologna (2001) 12. L.A. Gavrilov, N. S. Gavrilova The Biology of Life Span: a quantitative approach ( Harwood Academic Publisher,London, 1991) 13. L. Piantanelli, G. Rossolini, A. Basso, A. Piantanelli, M. Malavolta, A. Zaia, Use of mathematical models of survivorship in the study of biomarker of aging: the role of Heterogeneity, Mechanism of Ageing and Development 122, 1461 (2001)
This page is intentionally left blank
NEURAL NETWORKS AND NEUROSCIENCES
This page is intentionally left blank
93
ARTIFICIAL NEURAL NETWORKS IN NEUROSCIENCE
N. ACCORNERO, M. CAPOZZA Dipartimento di Scienze Neurologiche, Universita di Roma LA SAPIENZA
We present a review of the architectures and training algorithms of Artificial Neural Networks and their role in Neurosciences.
1.
Introduction: Artificial Neural Networks
The way an organism possessing a nervous system behaves depends on how the network of neurons making up that system functions collectively. Singly, these neurons spatially and temporally summate the electrochemical signals produced by other cells. Together they generate highly complex and efficient behaviors for the organism as a whole. These operational abilities are defined as "emergent" because they result from interactions between computationally simple elements. In other words, the whole is more complex than the sum of its parts. Our understanding of these characteristics in biological systems comes largely from studies conducted with artificial neural networks early in the 1980s [1]. Yet the biological basis of synaptic modulation and plasticity were perceived by intuition 40 years earlier by Hebb, and the scheme for a simple artificial neuronal network, the perceptron, was originally proposed by Rosenblatt [2] and discussed by Minsky [3] in the 1960s. An artificial neural network, an operative model simulated electronically (hardware) or mathematically (software) on a digital processor, consists of simple processing elements (artificial neurons, nodes, units) that perform algorithms (stepwise linear and sigmoid functions) on the sum or product of a series of numeric values coming from the various input channels (connections, synapses). The processing elements distribute the results of the output connections multiplying them by the single connection "weights" received from the other interconnected processors. The final complex computational result therefore depends on how the processing units function, on the connection weights, and on how the units are interconnected (the network architecture). To perform a given task (training or learning), an artificial net is equipped with an automated algorithm that progressively changes at least one of these individual
94 computational elements (plasticity) almost exactly as happens in a biological neuronal network.
CONNECTIONS PLASTICITY
F(l)
Z^-—^F(')*P2 F(I)*P3 (l)*P4
SIMULATION
HARDWARE
SOFTWARE
Figure 1 : Comparison between a biological neuron and an artifical neuron, and a hardware-software simulation. The functions of the processing elements (units) and the network architecture are often pre-determined: the automated algorithms alter only the connection weights during training. Other training methods entail altering the architecture, or less frequently, the function of each processing unit. The architecture of a neural network may keep to a pre-determined scheme (for example with the processing elements, artificial neurons, grouped into layers, with a single input layer, several internal layers, and a single output layer). Otherwise it starts from completely chance connections that are adjusted during the training process.
95
A R T I F I C I A L NEURAL NETWORKS TOPOLOGY NETWORK
CLUSTER
LAYERED
Figure 2 Variable architecture of artificial neural networks. Network training may simply involve increasing the differences between the various network responses to the various input stimuli (unsupervised learning) so that the network automatically identifies "categories" of input [4, 5, 6]. Another training method guides the network towards a specific task (making a diagnosis or classifying a set of patterns). Networks designed for pattern classification are trained by trial and error. Training by trial and error can be done in two ways. In the first, an external supervisor measures the output error then changes the connection weights between the units in a way that minimizes the error of the network (supervised learning) [7, 8]. The second training method involves selective mechanisms similar to those underlying the natural selection of biological species — a process that makes random changes in a population of similar individuals, then eliminates those individuals having the highest error and reproduces and interbreeds those with the lowest error. Reiterating the training examples leads to genetic learning of the species [9].
96
0> J
SUPERVISED
I "
BEITA RlJt.ES
M
ERROR 8ACIC-PMOPASATION
P '.
UNSUPERVISED
TOWETlTlWe
UEARftHMG
GENETIC AiGORITHRSS l -HI I l in I i Hi
UtaHFUHlBlH CROSSOVER
Figure 3. Training modalities
The choice of training method depends on the aim proposed. If the network is intended to detect recurrent signal patterns in a "noisy" environment, then excellent results can be obtained with a system trained through unsupervised learning. If one aims to train a diagnostic net on known knowledge, or to train a manipulator robot on precise trajectories, then one should choose a multilayered network trained through unsupervised learning. If the net is designed for use as a model, that is to simulate biologic nervous system functions, then the ideal solution is probably genetic learning. Adding the genetic method to either of the other two methods will improve the overall results. In summary, biological and artificial neural networks are pattern transformers. An input stimulation-pattern produces an output pattern-response, especially suited to a given aim. To give an example from biology: a pattern of sensory stimuli, such as heat localized on the extremity of a limb, results in a sequence of limb movements that serve to remove the limb from the source of heat. A typical example of an artificial network is a system that transforms a pattern of pathologic symptoms into a medical diagnosis.
97
Input and output variables can be encoded as a vectorial series in which the value of a single vector represents the strength of a given variable. The power of vectorial coding becomes clear if we imagine how some biological sensory systems code the reality of nature. The four basic receptors located on the tongue (bitter-sweet-saltacid) allow an amazing array of taste sensations. If each receptor had only ten discrimination levels - and they certainly have more - we could distinguish as many as 10.000 different flavors. On this basis, each flavor corresponds to a point in a four-dimensional space identified by the 4 coordinates of the basic tastes. Similar vectorial coding could make up the input of an artifical neural network designed to identify certain categories of fruit. One or more internal (hidden) layers would transform this coding first into numerous arbitrary hidden (internal) codes, and ultimately into output codes that classify or recognize the information presented.
APPLE
BANANA
BITTER
SWEET
ACID 1
SALTY
CHERRY
« • INPUT VECTORIAL CODING
GRAPE
OUTPUT POSITIONAL CODING
Figure 4. A forward-layered neural network This network designed to recognize or diagnose can be generalized to a wide range of practical applications, from geological surveying to medicine and economic evaluation. This type of network architecture, termed "forward", because its connections all converge towards the output of the system, is able to classify any "atemporal" or "static" event. Yet reality is changeable, input data can change
98
rapidly, and the way in which these data follow one another can provide information that is essential for recognizing the phenomenon sought or for predicting how the system will behave in the future. Enabling a network to detect structure in a time series, in other words to encode "time", means also inserting "recurrent" connections (carrying output back to input). These connections relay back to the input unit the values computed by units in the next layer thus providing information on the pattern of preceding events. Changing of the connection weights during training is therefore also a function of the chain of events. The nervous system is rich in recurrent connections. Indeed, it is precisely these connections that are responsible for perceiving "time". If a "forward" network allows the input pattern to be placed in a single point of the multidimensional vector space, a recurrent network will evaluate this point's trajectory in time.
DYNAMIC RECOGNITION T I M E CODING
PREDICTION
#
Figure 5. Recurrent layered neural network
Advances in neural networks, the unceasing progress in computer science, and the intense research in this field have brought about more profound changes in technology than are obvious at first glance. Paradoxically, because these systems have been developed thanks to digital electronics and by applying strict
99 mathematical rules, the way artificial neural networks function runs counter to classic computational, cognitive-based theories. A distinguishing feature of neural networks is that knowledge is distributed throughout the network itself rather than being physically localized or explicitly written into the program. A network has no central processing unit (CPU), no memory modules or pointing device. Instead of being coded in symbol (algorithmic-mathematical) form, a neural network's computational ability resides in its structure (architecture, connection weights and operational units). To take an example from the field of mechanics, a network resembles a series of gears that calculates the differential between the rotation of the two input axles and relays the result to the output axle. The differential is executed by the structure of the system with no symbolic coding. The basic features of an artificial neural network can be summarized as follows: 1) Complex performance emerging from simple local functions. 2) Training by examples 3) Distributed memory, and fault tolerance. 4) Performance plasticity. Connectionist systems still have limited spread in the various fields of human applications, partly because their true potential remains to be discovered, and partly because the first connectionist systems were proposed as alternatives to traditional artificial-intelligence techniques (expert systems) for tasks involving diagnosis or classification. This prospect met with mistrust or bewilderment, due paradoxically to their inherent adaptiveness and plasticity, insofar as misuse of these qualities can lead to catastrophic results. Disbelievers also objected that the internal logic of artificial neural networks remains largely unknown: networks do not indicate how they arrive at a conclusion. These weaknesses tend to disconcert researchers with a determinist background (engineers, mathematicians and physicists) and drive away researchers in medical and biological fields, whose knowledge of mathematics and computer science is rarely sufficient to deal with the problems connectionist systems pose. 2.
Application of artificial neural networks to neuroscience
Neural networks now have innumerable uses in neuroscience. Rather than listing the many studies here, we consider it more useful to give readers some reference points for general guidance. For this purpose, though many categories overlap or remain borderline, we have distinguished three: applications, research, and modelling. Applications essentially use neural networks for reproducing, in an automated, faster or more economic manner, or in all three ways, tasks typically undertaken by human experts. The category of applications includes diagnostic neural networks (trained to
100 analyze a series of symptoms and generate a differential diagnosis, and even a refined one, and networks designed for pattern recognition (currently applied in clinical medicine for diagnostic electrocardiography, electromyography, electroencephalography, evoked potentials, and neuroophfhalmology) [10] Other networks are designed to segment images (for example, to identify anatomical objects present in radiologic images, and highlight them in appropriate colors or reconstruct them in three dimensions). In general, these networks have received supervised training. Of especial interest to neurologists are diagnostic networks trained to make a typically neurologic diagnosis such as the site of lesion. One of these nets visualizes the areas of probable lesion on a three-dimensional model of the brainstem, thus supplying a topographic, rather than a semantic diagnosis [11].
OPTONET FLOWCHART
autoMAtio visual Finlri ana lijsor output
I*
lncoMpi«t» eonsroous i-isht
Figure 6: Automated diagnosis of the visual field.
101
EMG -NET A NEURAL NETWORK FOR THE DIAGNOSIS OF SURFACE MUSCLES ACTIVITY
"wi m i P ' ' lfcO(.H."<»l PREAMPLIFIERS
TIMr SFPIFS
SPECTROGRAMS
FORWARD N.N.
Figure 7: Automated electromyographic diagnosis.
BRAINSTEM-NET
DATA
REPORT FORM
NEURAL NERWORK
3D MODEL
CLINICAL
NEUR0PHYSI0L0G1CAL
5268 VOXELS
_ l Figure 8: 3d-images computed by a forward-layered neural network
102
Neural networks of the diagnostic type have many practical uses (for example, automated recognition of EEG epileptic abnormalities quickens the tedious examination of a 24-48 hour dynamic EEG recording). They have the advantage of being able to process even noisy data, and can smoothly degrade the network's response to excessive signal deterioration, whereas traditional expert systems function well up to a given signal-to-noise ratio but then stop responding. Physicians often seem reluctant to make use of neural networks. The reasons why they do so are many and merit an analysis in depth that is outside the scope of this chapter. To put it in brief, we suspect that physicians fear that a diagnosis provided automatically by a machine will diminish their authority. An extremely interesting field is the use of a neural network as a research tool in areas where more traditional research tools seem to have exhausted their potential. These cases essentially call for unsupervised networks, able to discover correlations, regularity, and hidden patterns that other methods fail to disclose. In this context, neural networks function as powerful multivariate and non-linear statistical tools. Research conducted by Roberts and Tarassenko [12] shows, for example, that the traditional division of sleep into five stages according to visual inspection of the EEG is an arbitrary classification. Using a neural network they identified seven EEG attractors during sleep (and one during wakefulness). Each sleep stage arises from a characteristic trajectory between some of these attractors: and all these dynamic events originate from the competitive interactions between three processes. Last, the research field of greatest interest to the neuroscientist is also the area where the cormectionist method of analysis finds widest agreement, namely the use of neural networks in nervous system modeling. In this field, neural networks are strictly speaking used not as tools but as models- though simplified ones - of biologic nervous system functioning. Because these models are simulated on computers, they can study the properties of systems in a dynamic, quantitative manner, starting from the single local phenomena emerging from collective interactions: memory, imagination, language, and maybe in the future, from consciousness. In effect, artificial neural network simulation offers the first - and at present the only chance — of bringing the study of higher nervous system functions back from psychology and philosophy into the realms of the natural quantitative sciences. No longer will research be limited to observing and formulating hypotheses on nervous system function: it will also be able to test function experimentally, though for the time being in a limited way. Sensory systems behave similarly to biological systems (artificial retina). Artificial neuronal systems for motor control display an ability for unsupervised learning on the control of a physical artificial arm or one simulated on a computer [13, 14].
103
LEARNING BY DOING
Figure 9: network
Model of motor control implemented with an unsupervised neural
They will also control a double-inverted pendulum model of standing posture that appears able to learn the upright posture spontaneously and to compensate for perturbations in balance due to environmental conditions. Studies of this type have helped us to understand some of the essential mechanisms underlying the development of central nervous system sensorimotor control. This knowledge has been put to various practical uses including neurorehabilitation and attempts to construct sensory and motorized prostheses. A connectionist (neural network) approach allows one to investigate not only the functioning, but also the birth and evolution of simple nervous systems [15, 16]. The unavoidable, spontaneous affinity between connectionism and the computational branch of evolutionary biology that studies the formation and transformation of elementary organisms simulated by genetic algorithms has lead to extraordinary results in the simulation of "artificial organogenesis" (the eye) and of "artificial life". The term artificial life refers to a computer simulation and study of dynamic ecosystems. These systems are described as artificial in the sense that they are
104
originally designed by humans and are immaterial, but they evolve and reproduce in an autonomous and often unpredictable manner. The lack of a material constitution limits the ability of stimulation to describing the physicochemical properties of organic and inorganic materials in mathematical terms. To date, most studies focus on macroscopic behaviors including reproduction, movement strategies, and energy exchange with the simulated environment (the search for food) or the appearance of cooperative behaviors (including storms and schools offish). The search for criteria that will distinguish between living and non-living organisms is as old as man. It also seems ever more arduous as investigational techniques become increasingly refined. Currently the borderline between the two, if the question is legitimate, lies between crystalline mineral structures and self-replicating biological structures (DNA-RNA). At behavioral level, a useful definition is that proposed by Monod, indicating three essential features of the living world: teleonomy (the presence of a structural plan), autonomous morphogenesis and reproductive invariance. But again, these three characteristics are wholly interdependent. Most probably, the only law that distinguishes organic from inorganic matter is their tendency to saturate the environment with copies of themselves, thus modifying the environment itself, whenever possible to their own advantage. Because many species found in nature compete with one another and cooperate towards the same aim, local situations of dynamic equilibrium are reached, termed ecosystems. The scale of observation is obviously an important variable since the living world seems to be organized into ecosystems within other ecosystems, rather like Chinese boxes. From this viewpoint, expecting to simulate fully a biologic ecosystem, however small, within the isolation and immaterial setting of a mathematical process, may seem absurd and misleading. Yet if the aim is not to simulate the ecosystem exactly but to understand only some of the rules governing the biological world then the method is right and can be enlightening. The opportunities for investigation in this field stem from at least three determinant coincidences: 1) the development and spread of sufficiently powerful digital computers; 2) progress in studies on connectionism; and 3) an improved understanding of genetic biological mechanisms that allowed basic laws to be simulated with mathematical formulas termed "genetic algorithms".
105
ARTIFICIAL LIFE COMPUTER SIMULATION OF EVOLUTION AND BEHAVIOUR OF SIMPLE ORGANISMS IN AN ENVIRONMENT GENETIC ALGORITHMS AND THE RULES OF "NATURAL SELECTION "ARE EMBEDDED IN THE SIMULATED SYSTEM . STARTING CONDITIONS ARE CASUAL . SPECIES EVOLVE BECAUSE BEST-FITTING ORGANISMS .THAT CAN UTILIZE THE RESOURCES OF THE ENVIRONMENT TO REPRODUCE EFFICIENTLY, ARE SELECTED
PARENTS
11D01D10010
010011110101001011
01101 11101010100.10
011011110101010000
NATURAL SELE( i l 1
3« %
:POS
GENOME
ER
i,'ur~Ti'~ •]
or>~'-,rFiuc-.
Fig 10: Model of artificial life simulated using genetic algorithms on populations (hundreds to thousands) of unsupervised neural networks that compete in an artifical environment.
3.
Conclusions
After a fatiguing course, with trials and tribulations lasting more than 50 years, connectionism has at last achieved the recognition it deserves. It is now beginning to show its enormous potential for managing and understanding complex phenomena that the deterministic means hitherto available could not approach. The coming years should therefore witness a major breakthrough in scientific investigation. We recommend readers who wish to approach this technology, as well as reading explanatory texts (1,8), to begin experimenting using one of the inexpensive commercial software programs suitable for a personal computer of modest performance. These programs enable even inexpert users to set up and train a neural network suitable for multiple applications. Innumerable Internet sites are dedicated to neural networks: in response to keywords including "neural networks" any research engine will supply hundreds, or often
106
thousands of links. Many of these sites offer neural nets as "freeware" (software that can be downloaded free and used without restrictions) and "shareware" (software that can be downloaded free-of-charge for personal evaluation and ultimately eliminated or bought).
References [I] D. Parisi: Intervista sulle reti neurali, Bologna, II Mulino, 1989. [2] F. Rosenblatt: Principles of neurodynamics. Perceptrons and the theory of brain mechanisms, Washington, D.C., Spartan Books, 1962. [3] M. Minsky, S. Papert: Perceptrons, Cambridge, Mass., MIT Press, 1969. [4] G. A. Carpenter, S. Grossberg: Neural dynamics of category learning and recognition: attention, memory consolidation, and amnesia, in J. Davis, R. Newburgh, and E. Wegman (Eds.): Brain Structures, Learning, and Memory, AAAS Symposium Series, 1986. [5] S. Grossberg (Ed): Neural networks and natural intelligence, Cambridge, Mass., MIT Press, 1988. [6] T. Kohonen: Self-organization and associative memory, 3 r ed., Berlin/' Heidelberg/New York, Springer-Verlag, 1989. [7] J.J. Hopfield, D.W. Tank: Neural computation of decision in optimization problems, Biol. Cybern. 52, 141-152, 1985. [8] D.E. Rumelhart , J.L. McClelland: Parallel distributed processing, Cambridge, Mass., MIT Press, 1986. [9] D.E. Goldberg: Genetic algorithms in search, optimization, and machine learning, Reading, Mass., Addison-Wesley, 1989. [10] N. Accornero, M. Capozza: OPTONET: neural network for visual field diagnosis. Med. & Biol. Eng. & Comput, 1995, 33, 223-226. [II] G. Cruccu, M. Capozza, S. Bastianello, M. Mostarda, A. Romaniello, N. Accornero: 3D Brainstem Mapping of "anatomical" and "functional" lesions, 9th European Congress of Clinical Neurophysiology, Bologna, Monduzzi Editore, 1998. [12] S. Roberts, L. Tarassenko: New method of automated sleep quantification, Med. & Biol. Eng. & Comput, 1992, 30, 509-517. [13] B.W. Mel: Connectionist Robot Motion Planning, San Diego, CA, Academic Press Inc., 1990. [14] M. Capozza. N. Accornero: Rete neurale non supervisionata per il controllo di un arto simulato, VI Congr. Nazionale Tecnologie Informatiche e Telematiche Applicate alle Neuroscienze (ANINs), Milano, 1997. [15] G.M. Edelman: Neural darwinism, New York, Basic Books, 1989. [16] N. Accornero, M. Capozza: Vita artificiale: connessionismo e meccanismi evolutivi genetici. Riv. Neurobiologia, 1994, 40 (5/6), 445-449.
107 BIOLOGICAL NEURAL NETWORKS: MODELING MEASUREMENTS
AND
RUEDI STOOP AND STEFANO LECCHINI Institut
fur Neuroinformatik,
ETHZ/UNIZH, CH-8057 Zurich [email protected]
Winterthurerstr.
190,
When interaction among regularly firing neurons is simulated (using measured cortical response profiles as experimental input), besides complex network dominated behavior, embedded periodicity is observed. This is the starting point for our theoretical analysis of the potential of neocortical neuronal networks for synchronized firing. We start from the model that complex behavior, as observed in natural neural firing, is generated from such periodic behavior, lumped together in time. We address the question of whether, during periods of quasistatic activity, different local centers of such behavior could synchronize, as required, e.g., by binding theory. It is shown that for achieving this, methods of self-organization are insufficient - additional structure is needed. As a candidate for this task, thalamic input into layer IV is proposed, which, due to the layer's recurrent architecture, may trigger macroscopically synchronized bursting among intrinsically non-bursting neurons, leading in this way to a robust synchronization paradigm. This collective behavior in layer IV is hyperchaotic; its characteristic statistical descriptors agree well with the characterizations obtained from in vivo time series measurements of cortical response to visual stimuli. When we evaluate a novel, biological relevant measure of complexity, we find indications that the natural system has a tendency of tuning itself to the regions of highest complexity.
1
Introduction
One emergent question in biology and in the computational sciences, is how the brain processes information. The answer to this question is important for at least two reasons. Firstly, if a spike event is taken to give the basic unit of the clock time, the brain is incredibly efficient, at cycle times that are of the order of milliseconds and thus way below those of artificial information processing devices. Secondly, cortical data are full of noise. However, our perception of noise has recently changed, from that of a kind of hindrance to something potentially useful, as noise has been shown to be able to synchronize ensembles, to lead to phase transitions, and to have many more unexpected effects. Moreover, insight has been gained that noise is being used to a large extent by biological systems. Striking examples are Brownian motors [1] or stochastic resonance [2]. From the technological point of view, insight into nature and potential of noise is important, as it is the primary obstacle to making electronic compounds ever smaller, which would imply not only the optimization of the occupied space, but also minimizing signal processing time
108
and energy spent. As a consequence, the expectation that there may be useful, still undiscovered, computational principles within the cortex, is not entirely of speculative nature. These principles, when combined with the speed of modern computers, could lead to a jump in the computational power of artificial computation, from the hard- and software points of view. We have recently shown [3] that to distinguish noisy firing from firing in terms of patterns, fractal dimension log-log plots are useful, where pattern firing is documented by parallel steplike behavior of the embedded correlation curves, whereas noisy firing yields convex, non-converging curves. In Fig. 1, we show two cases, obtained from a set of in vivo measurements of cat neocortical neurons (VI). When looking for principles able to convert noise into patterns, the principle of locking offers itself as a solution. Upon small-scale uncorrelated noisy input, a neuron is put on a limit-cycle solution. Such neurons, when coupled more strongly, will engage in locked firing. One possibility then would be to explain in vivo complex firing behavior as the switching between different locked states. This mechanism would not only provide an ideal A/D-converter device, it moreover would lead to an encoding of stimuli optimal under different information-theoretic aspects [4]. In the discussion of the computational properties of the human brain, an important issue of current interest is the feature-binding problem, which relates to the cortical task of associating one single object with its different features [5-6]. As a solution to this problem, synchronization among neuron firing has been proposed - in opposition to the concept of so-called grandmother cells. The purpose of this contribution is to investigate, under which conditions a neocortical network is able to evolve towards self-organized complexity, documented in the emergence of spatio-temporal structures, e.g., synchronization.
2
Results
Of primary importance therefore would be, to have solid, generally accepted, experimental proofs of synchronization. This, however, at the moment seems not to be available. We will find that also our, more theoretical, approach, requires details of neuronal connectivity and efficacy that are far from being resolved, not only from the physiological, but also from the mathematical point of view (e.g., the question from what degree of connectivity on a network can be modeled as all-to-all connected). Our modeling assumption will be that for layers I-III and V-VI, a hierarchy of interactions is relevant, from weak to strong, from topologically far to near. This implies that a hierarchy of couplings should be considered and suggests, that for the interaction, mod-
109
a)
b)
0
o
-20 -20
log e
0
-20
log
loge
-0.5
-4
|og
e
0
-3.5
-7 -4
e
-1
Figure 1. In vivo measurements (cat, visual cortex VI). log-log correlation plots and interspike probability distributions of a) a noisy, and b) a pattern-firing neuron.
eling as point processes should be allowed. It suggests furthermore, that for layers with low connectivity, a coupled map lattice model (CML), where the interactions should essentially be next-neighbor order, is reasonable, although for finding coherent, synchronized behavior, all-to-all or mean-field coupling would be more promising. Layer IV finally is, due to its distinct connectivity, modeled as an all-to-all coupled network. As the result of our investigations, it emerges that the origin of synchronized behavior (if existent in nature) is unlikely in layers I-III and V-VI. We find: 1) The qualitative behavior in the CML-model of layers I-II and V-VI, is independent from the local lattice map; 2) In the absence of noise, the CML-model is unable to synchronize in a selforganized manner. Synchronization, however, could emerge in an input-driven way; 3) Collective, roughly synchronized, behavior can be generated in a model of
110
layer IV, which also is able to generate responses that are very close to in vivo measured neuronal firing; 4) The highest complexity of the CML-model is found where the global behavior of the network is close to the border between order and disorder. It is possible that the objective of this property is to facilitate the inheritance of rhythms of firing to other layers than IV. 3
Methods
Model of computational layers For the computational layers I-III/V-VI, we use a CML approach, where, in a first step, we approximate the layer by weakly coupled ensembles of, in itself, more strongly coupled regularly firing pyramidal neurons. More precisely, for the CML-site maps, we use the profiles of in-vitro measured pyramidal neurons, when they are perturbed at otherwise quasistatic dynamical conditions, by excitatory and inhibitory pulses. In quasistatic conditions, the individual neurons are driven to regular firing by uncorrelated small-scale noise, but, due to stronger coupling with nearest neighbors, engage in locked states [7], which may be expressing computational results [4]. It has been shown that recurrent connections on these computational circuits can act as controllers of the periodicity of the locking. In other words, they modify the computational results returned by the circuit [8]. For this case, we find that self-organized synchronization, as needed to support the binding by synchronization hypothesis, is virtually impossible. The natural next step is then to add second order perturbations among the sites. However, even for this refined case, we end up with absence of self-organized synchronization. The CML-site maps are derived from binary interaction, although the extension to n-ary interaction, or even to interaction among synchronized ensembles, is straightforward. The description of binary interaction is by means of maps of the circle / : i+i =
(modulo
1).
Here, ft is the ratio of the self-oscillation frequency over the perturbation frequency, and T(i)/T0 measures the lengthening / shortening of the unperturbed interspike interval due to the perturbation, as a function of the phase at which the perturbation arrives. Note that both quantities can be measured in experiments; this is how we base our derivation upon experimental data. The formula describes what happens to a noise-driven neuron (where the noise follows a central-limit theorem and therefore drives the neuron into regular firing), when it is perturbed by a similar, strongly coupled "neighboring" neuron
(1)
111
(note, however, that in the mathematical literature, this type of interaction is often referred to as weak interaction [9]). From experiments of increased perturbation strengths, we found that its effect can be parameterized as g(,K) := T{uK)ITQ = (T(hK0)/T0 - \)K + 1,
(2)
where KQ is a normalization, chosen such that at K = 1, 75 percent of the maximal experimentally applicable perturbation strength is obtained. The perturbation response experiments are performed for excitatory, as well as for inhibitory, perturbations. The first experimental finding is that chaotic response may be attained from pair interaction, but only if the interaction is of inhibitory nature [10]. This is essentially a consequence of the greater efficacy of inhibitory synapses, a fact that is well-known in physiology. Note also that our biology-motivated normalization differs from the usual mathematical one, which attributes the value K = 1 to the critical value of the map, i.e., when the map / ceases to be invertible. The second finding is that, as is predicted by the theory of interacting limit cycles, locking into periodic states is abundant, and that the measure of quasiperiodic firing relations between the neurons quickly vanishes as a function of the perturbation strength K. A last finding is that when going from the static to the quasistatic case, lockings into a hierarchy of periodicities is observed, exactly of the type that is predicted by the associated Farey-tree. In fact, our results can be interpreted as the first experimental proof of the limit cycle nature of regularly firing cortical neurons. Consequently, as a first approximation, the activity within the centers of stronger interacting neurons is described by locking on Arnold tongues (see, e.g. [7]). Beyond the bi- or n-ary strong interaction, there also is a weaker exchange of activity, which can be modeled as diffusive interaction, among the more strongly coupled centers. In this way, we arrive at a coupled map lattice model, that we base on measured binary interaction profiles at physiological conditions (including all kinds of variability, e.g., interaction types, coupling strengths) i,j(tn+i) •= (1 - k2kij)fKn((f>ij{tn))
+ —kij^2(j)k,i(tn),
(3)
nn
where is the phase of the phase-return map, at the indexed site, and nn again denotes the cardinality of the set of all nearest-neighbors of site i, j . &2 describes the overall coupling among the site maps. This global coupling strength is locally modified by realizations kij, taken from some distribution, which may or may not have a first moment (in the first case, k2 can be normalized to be the global average over the local coupling strengths). In Eq. 3, the first term reflects the degree of self-determination of the phase at site
112
{i,j}, the second term reflects the influence by nearest-neighboring (i.e., the ones producing the strongest interactions) centers. Absence of self-organized synchronization Synchronized behavior, as we understand it, should be observable as the emergence of non-local structures within the firing behavior of the neurons in the network. In the case of initially independent behavior, we may expect that due to the coupling, a simpler macroscopic behavior will be attained, which could be taken as the expression of the corresponding perceptional state. Extended simulations, however, yield the result that, for biologically reasonable parameters, the response of the network is essentially unsynchronized, despite the coupling. Extrapolations from simpler models, for which exact results are available [11], provide us with the explanation why. Generically, from weakly coupled regular systems, regular behavior can be expected. If only two systems are coupled, generally a shorter period than the maximum of the involved periodicities emerges. If, however, more partners are involved, a competition sets in, and high periodicities most often are the result. Typically, synchronized chaotic behavior results from the coupling of chaotic and regular systems, where the chaotic contribution is strong enough. Otherwise, the response will be regular. When chaotic systems are coupled, however, synchronized chaotic behavior as well as macroscopically synchronized regular behavior, may be the result (e.g., [11]). For the last option, we need to focus on the evolution of cyclic eigenstates, as they show how this collective behavior might emerge. We performed simulations using 2-d networks, with diffusive coupling between 20 x 20 to 100 x 100 local maps of excitatory / inhibitory interaction. In agreement with the above expectations, we found no signs of macroscopic, self-organized synchronization, using physiologically motivated variability on the parameters (inhibitory / excitatory site maps with individual "excitability" K, locally varying diffusive coupling strength, etc.). To better understand the results of our simulations, we compared with an idealized model, that should be a better candidate for collective synchronization. This model is a diffusively coupled model where at all sites we have identical tent maps. It may be argued against this comparison that, whereas the maps derived from the experiments are non-hyperbolic, this model is hyperbolic, which is a non-generic situation. Through simulations, however, it is found, that the corresponding model with non-hyperbolic (e.g., parabola) site maps, shares the main properties of the tent-map model, i.e., the phenomenology is much stronger determined by the coupling than by the chosen site maps. The advantage of the model of coupled tent maps is, that it can be solved analytically. In our case the largest network
113
Lyapunov exponent [12] is of relevance, which can be calculated by using the thermodynamic formalism approach as follows. First, it must be realized that the coupled map lattice can be mapped onto a matrix representation of the form: (
(1 - k2)a
0
> 0
(1 - k2)a
'-a 0 0 %a
\ 0
M =
(4) 0
V
IfaQlfa
0
(l-k2)a\J
where a is the (absolute) slope of the local tent maps and k2 is the diffusive coupling strength. The thermodynamic formalism formally proceeds by raising the (matrix) entries to the (inverse) temperature /3, and then focusing, as the dominating effect, on the largest eigenvalue as a function of the inverse temperature. For large network sizes, the latter converges towards fi(p,k) = (\(l-k2)a\)P
+ (ak2)0.
(5)
This expression explicitly shows that the network behavior is determined by two sources: by the coupling (fc2) and by the site instability (a). Using this expression of the largest eigenvalue, we obtain the free energy of our model as F(/3) = log((| a(l — k2) Q'3 + (ak2)@). From the free energy, the largest network Lyapunov exponent is derived as a function of the diffusive coupling strength k2 and the slope of the local maps a, according to the formula X=
-^F(/3,k2)\0=1,
(6)
- ' 1 - k2 log(| a(l - k2) |) + ak2 log(ak2) a | 1 — k2 I +ak2
(7)
which yields the final result
K{a,k2) —
Fig. 2a shows a contour plot of A n (a, k2), for identically coupled identical tent maps, over a range of {a, A;2}-values. In Fig. 2b, a cut through this contour plot is shown, at parameters that correspond to those used in the numerical simulations of the biologically motivated, variable coupled map lattice, displayed in Fig. 2c. The qualitative equivalence of the two approaches is easily seen. Numerical simulations of coupled parabola show furthermore that the displayed characteristic behavior is preserved even in the presence of non-hyperbolicities. As a function of the slope a of the local tent map (which corresponds to the local excitability K) and of the coupling strength k2, contour lines indicate the instability of the network patterns. As can be seen, due to the coupling, even for locally chaotic maps (a > 1), stable network patterns may evolve (often
114
in the form of statistical cycling, see [11])- Upon further increasing the local instability, chaotic network behavior of turbulent characteristics emerges.
Figure 2. a) Network Lyapunov exponent A n , describing the stability of patterns of a network of coupled tent maps, as a function of the absolute site map slope a and coupling k^. Contour lines of distance 0.25 are drawn, where the heavy line, indicating the location of zero Lyapunov exponents \ n — 0), separates stable networks (to the left) from unstable networks (to the right), b) Cut through the contour plot of a), at a = 1.25. c) Maximal site-Lyapunov exponent A m a x of a network of locked inhibitory site maps, as a function of the coupling £2- For the network, the local excitability is K = 0.5 for all sites, and Q is from the interval [0.8,0.85]. The behavior of this network closely follows the behavior predicted by the tent-map model. For some simulations, the peak at &2 = 1 is not so obvious. These cases are better modeled by a slightly modified behavior [4].
The stable patterns do not correspond to emergent macroscopic, comparable to synchronized, behavior. Therefore, in order to estimate the potential for synchronization, we need to concentrate on the parameter region where macroscopic patterns evolve, that is, on the statistical cycling regime. However, the parameter space that corresponds to this behavior, is very small, even for the tent map model. When we compare the model situation with our simulations from biologically motivated variable networks, we again observe that the overall picture provided by the tent map model of identical maps still applies. To show this in a qualitative manner, we compare the contour plot
115
of the tent map model with the numerically calculated Lyapunov exponent of the biological network, which shows the identical qualitative behavior. Based on our insight into the tent-map model behavior, we conclude that in the biologically motivated network, a notable degree of global synchronization would require a large subset of all binary connections to be in the chaotic interaction regime. This possibility, however, only exists for the inhibitory connections (excitatory connections are unable to reach this state [10]). Moreover, the part of the phase space on which the chaotic maps dwell, is rather small (although of nonzero measure, see [11]). It is then reasonable to expect that for the network including biological variability, statistical cycling is of vanishing measure, and therefore cannot provide a means of synchronizing neuron firing on a macroscopic scale. To phrase it more formally: this implies that by methods of self-organization, the network cannot achieve states of macroscopic synchronization. In addition, we also investigated whether Hebbian [13] learning rules acting on the weak connections between centers of stronger coupling could be a remedy for this lack of coherent behavior. Even with this additional mechanism, the model does not show macroscopic synchronization. The observation that the tent and the biological response site maps yield qualitative identical properties, has some additional bearing. In simple systems of nearest-neighbor coupled map lattices, it is found that the order parameter corresponding to the average phase would display a phase transition at high enough coupling strength, as the system is essentially equivalent to an Ising model at a finite temperature. This is qualitatively similar to our model, where a first order phase transition is observed at the coupling &2 = 1, for all values of the local instability.
Synchronization via thalamic input Layer IV's task is believed to be more centered around amplification and coordination, than on computation. Accurately modeling layer IV is more difficult, since, because of the smaller size of its typical neurons, it is difficult to measure in-vitro response profiles. Our model of the "amplifier" layer IV is based on biophysically detailed, variable model neurons that are connected in an all-toall fashion. This ansatz is partially motivated by the facts that layer IV is more densely connected than other layers and that natural, measured responses can easily be reproduced when using this setting. If synchronization - understood as an emergent, not a feed-forward property - is needed for computational and cognitive tasks, the question remains how this property is generated. In our simulations of biophysically detailed models of layer IV cortical architecture [3,14], we discovered a strong tendency of this layer to respond to stimula-
116
tion with coarse-grained synchronization. This synchronization is based on intrinsically non-bursting neurons that develop the bursting property, as a consequence of the recurrent network architecture and the feed-forward thalamic input. Detailed numerical simulations yield the result that, in the major part of the accessible parameter space, collective bursting emerges. That is, all individual neurons are collectivized, in the sense that, in spite of their individual characteristics, they all give rise to dynamics with very similar, on a coarse grained scale synchronized, characteristics (see Fig. 3). In fact, using methods of noise cleaning (noise, in this sense, are small variations due to the individual neuron characteristics), we find that the collective behavior can be represented in a four-dimensional model, having a strong positive (Ai ~ 0.5), a small positive, a zero and a very strong negative Lyapunov exponent. More explicitly, it is found that the basic behavior of the involved neuron types are identical and hyperchaotic [15]. The validity of the latter characterization has been checked by comparing the Lyapunov dimensions {(IKY ~ 3.5) with the correlation dimensions (d ~ 3.5). Moreover, different statistical tests have been performed to assess that noise-cleaning did not modify the statistical behavior of the system in an inappropriate way. As a function of the feed-forward input current, we observed an astonishing ability of the layer IV network to generate well-defined firing patterns.
i
i
i
i
i
i
i
i
i
kkk -80
-40
0 time lag [ms]
40
80
Figure 3. Coarse-grained synchronized activity of layer IV dynamics, where 80 excitatory and 20 inhibitory individual neurons are coupled in an all-to all fashion. The cross-correlogram between two excitatory neurons indicates strong synchronization.
117
When we compared the responses from the layer IV model with data from in vivo anesthetized cat (17 time series from 4 neurons of unspecified type from unspecified layers), we found corresponding behavior. Not only the Lyapunov exponents (generally hyperchaotic, Xmax ~ 0.8, the second exponent little above zero), correspond well to the simulated ones from the model {\max ~ 0.5, second exponent also little above zero). Also, the measured dimensions were in the range predicted by the model of layer IV; specific characteristic patterns found in vivo could be reproduced by our simulation model with ease. Of particular interest are step-wise structures found in the log-log-plots used for the evaluation of the dimensions [16] (see Fig. 1). However, as the majority of the measured in vivo neurons could not be attributed to layer IV, the natural hypothesis is that the other layers inherit these characteristics from the latter.
4
Complexity of network response and in vivo data
To investigate in more detail the relation between layer IV with the remaining layers, we calculated for the coupled map lattice model a recently proposed, biologically relevant complexity measure, Cs(l,0) [17]. To evaluate this quantity, we first calculated from the largest eigenvalue the free energy of the network, and from this quantity, C s ( l , 0 ) . We find that the highest complexities are situated in the area beyond the line separating negative and positive Lyapunov exponents (see Fig. 4, and compare with Fig. 2). Moreover, the area of highest complexity roughly coincides with the area where the model Lyapunov exponents agree with the in vivo measured exponents. This suggests that the natural system has the tendency of being tuned to a state of high complexity. These coincidences of modeling and experimental aspects lead us to believe that the ability of the network, to fire in well-separated characteristic time scales or in whole patterns, is not accidental, but serves to evoke corresponding responses by means of resonant cortical circuits. However, as has been mentioned above, not every neuron shares this property. In our recent studies of in vivo anesthetized cat data, we found, in evoked or spontaneous firing experiments, essentially three different classes of behavior. The neurons of the first class show no patterns in their firing at all. The neurons of the second class are able to pick up stimulation patterns and convert them into well-defined firing patterns. Neurons of the third class respond with smeared patterns that seem not to be compatible with the stimulation paradigms (for the first two classes see Fig. 1). With regard to the interspike distributions, for the first class, a long-tail behavior of the interspike distribution is characteristic. For the second class, a clean separation of the distribution into two regimes, dominated by individual interspike intervals and by compound pat-
118 2
1.5
k2
0.5
0 0
1
2
3
4
a Figure 4. Region in the parameter space (site map slope a and coupling £2), where the biologically relevant complexity measure C$(l,0) is maximal. The contour lines increase in steps of 0.1, starting from the line at the rigth upper corner with C s ( l , 0 ) = 0.1. From the comparison of Lyapunov exponents, it is suggestive that the measured biological neurons are tuned to working points of maximal complexity.
terns, respectively, is found. The characteristics of the last class indicate a mixture of the properties of the two other classes. In all cases, the behavior at long interspike interval times is governed, in the log-log plot, by a linear part, i.e., is long-tail. 5
Relevance of cortical chaos
Chaotic firing emerges from the proposed model, as well as from the in vivo data that we compare with, with nearly identical Lyapunov exponents and fractal dimensions. The agreement between Kaplan-Yorke and correlation dimensions [12] corroborates the consistency of the results obtained. The question then arises of what functional, possibly computational, relevance this phenomenon could be associated with? Cortical chaos essentially reflects the ability of the system to express its internal states (e.g., a result of computation) by choosing among different interspike intervals (ISI) or, more generally, among distinct patterns of firing. This mechanism can be viewed in a broader context. Chaotic dynamics is generated through the interplay of distinct unstable periodic orbits, where the system follows a particular orbit until, due to the instability of the orbit, the orbit is lost and the system follows another orbit, and so on. It is then natural to exploit this wealth of structures hidden within chaos, especially for technical applications. The task that needs to be solved to
119
do so, is the so-called targeting, and chaos control, problem: The chaotic dynamics first needs to be directed onto a desired orbit, on which it then needs to be stabilized, until another choice of orbit is submitted. From an informationtheoretic point of view, information content can be associated with the different periodic orbits. This view is related to beliefs that information is essentially contained in the patterns of neuronal firing. If well-resolved interspike intervals can be extracted from the spike trains of a neuron, the interspike lengths can directly be mapped onto symbols. A suitable transition matrix then specifies the allowed, and the forbidden, successions of interspike intervals. I.e., this transition matrix provides an approximation to the grammar of the natural system. In the case of collective bursting, it may be more useful to associate information content with firing patterns consisting of characteristic intermittent successions of spikes. In a broader context, the two approaches can be interpreted as realizations of a statistical mechanics description by means of different types of ensembles [18-19]. In the case of artificial systems or technical applications, strategies on how to use chaos to transmit messages, and more generally, information, are well developed. One basic principle used is that small perturbations applied to a chaotic trajectory are sufficient to make the system follow a desired symbol sequence containing the message [20]. This control strategy is based upon the property of chaotic systems known as "sensitive dependence on initial conditions" . Another approach, which is currently the focus of applications in areas of telecommunication, is the addition of hard limiters to the system's evolution [21-22]. This very robust control mechanism, can, due to its simplicity, even be applied to systems running at Giga-Hertz frequencies. It has been shown [23] that optimal hard limiter control leads to convergence onto periodic orbits in less than exponential time. In spite of these insights into the nature of chaos control, which kind of control measures should be associated with cortical chaos, however, is unclear. In the collective bursting case of layer IV, one possible biophysical mechanism would be a small excitatory post-synaptic current. When the membrane of an excitatory neuron is perturbed at the end of a collective burst with an excitatory pulse, the cell may fire additional spikes. Alternatively, at this stage inhibitory input may prevent the appearance of spikes and terminate bursts abruptly. In a similar way, also the firing of inhibitory neurons can be controlled. Another possibility, is the use of local recurrent loops to establish delay-feedback control [24]. In fact, such control loops could be one explanation for the abundantly occurring recurrent connections among neurons. The relevant parameters in this approach are the time delay of the re-fed signal, and the synaptic efficacy, where especially the latter seems biologically realistic. In addition to the encoding of information,
120
one also needs read-out mechanisms, able to decode the signal at the receiver's side. Thinking in terms of the encoding strategies outlined above, this would amount to the implementation of spike-pattern detection mechanisms. Besides simple straightforward implementations based on decay times, more sophisticated approaches, such as the recently discovered activity-dependent synapses [25-27], seem natural candidates for this task. Also the interactions of synapses, with varying degrees of short-term depression and facilitation, could provide the selectivity for certain spike patterns. Small populations of neurons, which (due to variable axonal, and synaptic potential propagation delays) achieve supra-threshold summation only for particular input spike sequences, is yet another possible mechanism.
6
Conclusion
We find that the origin of synchronized firing in the cortex - if its existence can be proven experimentally - would most likely be in layer IV, and be heavily based on recurrent connections and simultaneous thalamic feed-forward input. We expect that firing in patterns, in this layer, is able to trigger specific resonant circuits in other layers, where then the actual computation is done (which we propose to be based on the symbol set of an infinity of locked states [4]). It is a fact that the majority of ISI measurements from in vivo cat visual cortex neurons and simple statistical models of neuron interaction show emergent long-tail behavior, but this also might be the result of the interaction between different areas of the brain or a consequence of the input structure, or of a mixture of all of them. Long-tail interspike interval distributions are in full contrast to the current assumption of a Poissonian behavior that originates from the assumption of random spike coding. We propose that in the measurements that are compatible with the Poissonian spike train assumption, layer IV explicitly shuts down the long-range interactions via inhibitory connections or by pumping energy into new temporal scales that no longer sustain the ongoing activity. Long-tailed ISI distributions could, however, also be of relevance from the point of view of the tendency of the system of tuning itself to a state of maximal complexity Cs(l,0). At such a state, due to the structure of the measure, long range and slowly decaying correlations can be expected to dominate the dynamical behavior. Recently, it has been reported that noise can synchronize even coupled map lattices with variable lattice maps [28]. It will be worthwhile investigating whether this property also holds for our site maps (which we are willing to believe), and whether this finally can be attributed to a kind of mean-field activity generated by the noise. To what extent the assumptions made are the
121
relevant ones will have to be explored in continued work in the future, from the biological, as well as from the mathematical side.
References
1. Chaos and Noise in Biology and Medicine, eds. M. Barbi and S. Chillemi (World Scientific, Singapore, 1998). 2. J.K. Douglass, L. Wilkens, E. Pantazelou, and F. Moss, Nature 365, 337340 (1993). 3. R. Stoop, D. Blank, A. Kern, J.-J. v.d. Vyver, M. Christen, S. Lecchini, and C. Wagner, Cog. Brain Res., in press (2001). 4. R. Stoop, L.A. Bunimovich, and W.-H. Steeb, Biol. Cybern. 83, 481-489 (2000). 5. C. Von der Malsburg in Models of Neural Networks II, eds. E. Domany, J. van Hemmen, and K. Schulten, 95-119 (Springer, Berlin, 1994). 6. W. Singer in Large-Scale Neuronal Theories of the Brain, eds. C. Koch and J. Davis, 201-237 (Bradford Books, Cambridge MA, 1994). 7. R. Stoop, K. Schindler, and L.A. Bunimovich, Acta Biotheoretica 48, 149-171 (2000). 8. R. Stoop in Nonlinear Dynamics of Electronic Systems, eds. G. Setti, R. Rovatti, G. Mazzini, 278-282 (World Scientific, Singapore, 2000). 9. F.C. Hoppensteadt and E.M. Izhiekevich, Weakly Connected Neural Networks (Springer, New York, 1997). 10. R. Stoop, K. Schindler, and L.A. Bunimovich, Nonlinearity 13, 1515-1529 (2000). 11. J. Losson and M. Mackey, Phys. Rev. E 50, 843-856 (1994). 12. R. Stoop and R F . Meier, J. Opt. Soc. Am. B 5, 1037-1045 (1988); J. Peinke, J. Parisi, O.E. Roessler, and R. Stoop, Encounter with Chaos (Springer, Berlin, 1992). 13. D. Hebb, The Organization of Behavior (Wiley and Sons, New York, 1949). 14. D. Blank, PHD thesis (Swiss Federal Institute of Technology ETHZ, 2001). 15. O.E. Roessler, Phys. Lett. A 7 1 , 155-159 (1979). 16. A. Celletti and A. Villa, Biol. Cybern. 74, 387-393 (1996). 17. R. Stoop and N. Stoop, submitted (2001). 18. R. Stoop, J. Parisi, and H. Brauchli, Z. Naturforsch. a 46, 642-646 (1991). 19. C. Beck and F. Schloegel, Thermodynamics of Chaotic Systems: An Introduction (Cambridge University Press, Cambridge, 1993).
122 20. S. Hayes, C. Grebogi, E. Ott, and A. Mark, Phys. Rev. Lett. 73, 17811784 (1994). 21. N. Corron, S. Pethel, and B. Hopper, Phys. Rev. Lett. 84, 3835-3838 (2000). 22. C. Wagner and R. Stoop, Phys. Rev. E 63, 017201 (2000). 23. C. Wagner and R. Stoop, J. Stat. Phys. 106, 97-107 (2002). 24. K. Pyragas, Phys. Lett. A 170, 421-428 (1992). 25. L. Abbott, J. Varela, K. Sen, and S.B. Nelson, Science 275, 220-224 (1997). 26. M.V. Tsodyks and H. Markram, Proc. Natl. Acad. Sci. USA 94, 719-723 (1997). 27. A. M. Thomson, J. Physiol. 502, 131-147 (1997). 28. C. Zhou, J. Kurths, and B. Hu, Phys. Rev. Lett. 87, 098101 (2001).
123
SELECTIVITY P R O P E R T Y OF A CLASS OF E N E R G Y B A S E D L E A R N I N G RULES IN P R E S E N C E OF N O I S Y SIGNALS A.BAZZANI, D.REMONDINI Dep. of Physics
and Centro Interdipartimentale v.Irnerio 46 40126 Bologna, and INFN sezione di Bologna E-mail:
Galvani Univ. of ITALY [email protected]
Bologna,
N.INTRATOR Inst,
for Brain
and Neural Systems,
Brown
Univ., Providence
02912 RI,
USA
G.CASTELLANI DIMORFIPA
and Centro Interdipartimentale di Sopra 50, 4OO64 Ozzano
Galvani, Univ. of Bologna, dell'Emilia, ITALY
v.
Tolara
We consider the selectivity property of a class of energy based learning rules with respect to the presence of clusters in the input distribution. These rules are a generalization of the BCM learning rule and use the distribution momenta of order > 2. The analytical results show that selective solutions are possible for noisy input data up to a certain signal to noise ratio and that the introduction of a bias in the input signal could improve the selectivity. We illustrate this effect with some numerical simulations in a simple case.
1
Introduction to the B C M neuron model
The BCM neuron 1 has been introduces to analyze the plasticity property of a biological neuron. In particular the model takes into account the LTP (Long Term Potentiation) 7 and LTD (Long Term Depotentiation) 4 phenomena observed in the visual neuron response under modification of the experience. In its simplest formulation the BCM model assumes that the neuron response c to an external stimulus d 6 R " is linear: c = m • d where m are the synaptic weights. The changing of the weights m is described by the equation m = ${c,0)d
(1)
where 8 is an internal threshold of the neuron and the typical shape of the function $ is given by figure 1. At each time the external signal d can be considered the realization of a random variable of given distribution. If the threshold 9 is fixed the equilibrium position c = 9a is unstable and only the LTD behavior is described by eq. (1). To overcome this difficulty one introduces the hyphotesis that the threshold 6 depends on the external envi-
124
Figure 1. Typical shape of the function (c,#) that defines the BCM neuron.
ronment0, according to = < c2 > ' - *
(2)
where < > denotes the average with respect to the input distribution. Using the definition (2), we get a stochastic differential equation whose integration should be explained. Let At the integration step we have m{t + At) - m(t) + $(m(£) • d, 6)d At;
(3)
if the realizations of d are independent in the limit At —> 0, one can prove that m(t) satisfies the average equation 6 m=< $((m(t)
-d,0)d>
(4)
which substitutes the initial equation (1). A simple class of possible functions $ is given by ${c,6)=c(c
.p-2 _ 0)
and the average equation (4) reads rh = d£/dm the energy function
£ =
< c p > _ « V 2q
" T h e external environment is assumed to be stationary.
(5) where we have introduced
(6)
125
The case p = 3 and q — 2 is commonly referred as the BCM case. Due to the presence of high order momenta (p > 3) the energy function (6) provides a complex non-supervised analysis of the data, performing exploratory projection pursuit that seeks for multi-modality in data distributions 5 . The existence of stable equilibria is related to the existence of local minima for the energy (6). A simple calculation shows that the condition p < 2q is necessary for the existence of stable non-trivial fixed points in the equation (4), that performs a LTP behavior. We are interested in stable solutions that select different clusters eventually present in the data distribution. A ideal selective solution should give a high output c when the input d belongs to a single fixed cluster and c = 0 if d belongs to the other clusters. A general approach to study the stability property of the equilibrium points gives the following Proposition 2 : the stable equilibrium points m* of eq. (4) are related to the local maxima i/* of the homogeneous function Tvp f(y) = -j-
y
G
Rn
(7)
constrained on the unit sphere yCy = 1 where T is the symmetric p-tensor associated to the p-momenta of the input distribution Tilt...,in =
>
ii,...,in
= l,...,n
(8)
and C is the metric defined by the second moment matrix CV, = < didj >; the correspondence is explicitly given byTO*= pf(y*)y*-
2
Selective equilibria for the B C M neuron
According to a previous result 3 the BCM neuron has a selectivity property with respect to n linear independent vectors Vj e R". Let the input signal rfbea discrete random variable that takes values among the vectors Vj with equal probability l/n and let Oj = y • Vj, a standard calculation shows that the function (7) can be written in the form
126
constrained on the unit sphere V - o2- = n. Let us suppose oi > 0b, then by differentiating eq. (9) we get the system ^ + o r = 0 * * = - * j=2,..,n J aoj aoj o\ Then the critical points are computed from the equations o
r
0>r
2
-°r
2
)=°
j=2,.-,n
(10)
(ii)
It is easy to check the existence of a local maximum Oj = 0, j = 2, ..,n and O! = y/n so that the BCM neuron is able to select only the first vector i>i among all the possible input vector. According to the relation Oj = y • Vj the vectors y defined by the equilibrium solutions are directed as the dual base of the input vector Vj. We call this property the selectivity property of the BCM neuron 3 . We observe that in the value of selective solutions o are independent from the lengths and the orientation of the input vectors v; this is not true for the corresponding outputs c = m • v of the neuron whose values depend on the choice of the signals v. However the numerical simulations show that the basin of attraction of the stable selective solutions has a strong dependence from the norms of the vectors v; this effect makes very difficult to distinguish signals of different magnitude and we assume that a procedure could be applied in order to normalized the input signals. The situation becomes more complicate when we consider the effect of an additive noise on the selectivity property of a BCM neuron. Let us define the random variable
j
where £ is a random vector that selects with equal probability one of the vector Vj and 77 is a gaussian random vector where < rjj > = 0 and < rjiTjj >= Sijcr2 jn i,j = 1,.., n c . For seek of simplicity we consider the case p = 3, q = 2, but the calculations can be generalized in a straightforward way. It is convenient to introduce the matrix V whose lines are the vectors v and the positive define symmetric matrix S = VVT where VT is the transposed matrix. The cubic function (7) can be written in the form
/(°) = ^ E o , 3 + ^ ( ° - ^ l o ) I > k
(13)
Due to the parity property of the function (7) we can restrict our analysis to the subspace oj > 0 j = 1,.., n c This choice allows to keep constant the signal to noise ratio varying the dimensionality n.
127
and it is constrained on the ellipsoid n
YJo)+a2{o-S~1o)=n
(14)
3= 1
After some algebraic manipulations oi+az(S
L
o)i =
2
—-2
we get the equation \
ii
i=i
s
i=i
(15)
,2 +2^(5^0), ^ o , + n - X : < j=i
j=i
According to our ansatz the r.h.s. of eq. (15) is of order 0(a2)/oi can estimate 2 (0{l) oi ~ cH I — ^ + aO(l)oi j
so that we
(16)
where we have defined a = max; = 2,.., n IS/i*! to take into account the leading term of (S~1o)i and 0(a2) to denote a term of order a2. Eq.(16) shows that we can have a selective solution only if a
4
(n-l)[^+aO(lK]
+ 0(n) = 0
(17)
where we have estimated a2(o- S~1o) ~ 0(n). If a ~ 1/n andCT4< 0 ( n ) , the equation (17) has a solution 0\ = 0(^/n) and eq. (16) provides o; = 0(a2 /y/n) according to our initial ansatz. Moreover there is a critical value crc a bifurcation occurs in the solution space and the selective solution becomes complex 2 . The basin of attraction of each stable solution cannot be analytically estimated by the previous methods, but one has some indications from the numerical results. A big difference among the measures of the various basin could destroy the selectivity property since some selective solutions has a very low probability to attract the initial condition of the synaptic weights. As we have observed there is a strong dependence of the basin measure from the norm of the input vectors and to avoid this effect we assume that \\d\\ ~ 1 applying a normalization procedure when it is necessary. Therefore we consider essentially the selectivity property of a BCM neuron with respect to
128
the different directions of the unperturbed signal Vj. The role of the noise amplitude a is to reduce the "attraction force" of the stable solutions and eventually to change the stability property through a bifurcation phenomenon when it exceeds a critical value. 3
Effect of a bias in B C M neuron
The quantity a (see eq. (17)) plays a crucial role in the selectivity property of the BCM neuron. It has a easy geometrical interpretation in the space of the input signals d (cfr. eq. (12)). According to the definition of the matrix S, a = max/ =2 ,..,n {S^l is directly related to the projection of the vector vi on the hyper-plane defined by the vectors v2,....,fTl. Indeed a is equal to 0 when v\ is orthogonal to the other vectors vi I = 2,..,n . The equations (15) and (17) indicate that for a fixed level a of the noise the selectivity for the vector v\ of the BCM neuron is maximum when vi is orthogonal to the other unperturbed vectors v2,----,vn. This remark suggests that the selectivity property of a BCM neuron whose input distribution has the form (12), the selectivity property for a given unperturbed vector v\ is optimized if one introduces a strategy to satisfy the condition v\ • vi = 0 for I = 2, ..,n. A common procedure to optimize the performance of a neural network is to introduce a bias b in the input signals. This is equivalent to translate the input distribution (cfr. eq. (12))
d = J2 vAi -b + r} = J2(VJ ~ Wi + V 3
( 18 )
3
where b £ R n and we have used the property V • £j = 1. Then we choose b in order to satisfy the conditions (vi-b)-(vt-b)
=0
l = 2,..n
(19)
A simple solution to (19) is given by a linear combination of the vectors v2, ...,t>n: b = Y^k=2 PkVk where the coefficients (3k satisfy to the equation ( v i -J^PkVk)
-vi = 0
l = 2,..n
(20)
The solution exists since V2,.-.,vn are linearly independent; it is straightforward to verify that vi — b is also orthogonal to the vector b. If one introduces the bias b, the matrix S is diagonalized in two blocks (the first block is the element Su) and a = 0; therefore we expect an increase of the selectivity for the input signal v\. We observe that different bias are necessary to increase
129
the selectivity with respect to the different input signals Vj and an exhaustive analysis of the input space would require a neural network of n neurons with inhibitory connections. The introduction of a bias b changes the norms of the input vectors so that it is necessary to apply a normalization procedure that could decrease the signal to noise ratio; this may destroy the advantages of a bias. At the moment it is not available an efficient procedure which automatically computes the bias; this problem is at present under consideration. 4
Numerical simulations
In order to show the selectivity property of a BCM neuron and the effect of a bias we have considered a the plane case; the input distribution has been defined by eq. (12) where the vectors v\,V2 lie on the unit circle at different angles a. We have normalized each input vector d on the unit circle so that the effect of noise enters only in the phase; the noise level a has been varied in the interval [0,1]. We study two cases for the energy function (6): p = 3 and g = 2 that corresponds to the BCM neuron and p = 4 and g = 3 that simulates kurtosis-like learning rule for the neuron. The initial synaptic weights are choosen in a neighborhood of the origin near the vertical axis. To quantify the separability we introduce the quantity A = \m • v2\ - \m • vi\
V2a\\m\
(21)
that measures the distance between the projections of the signals v\ and V2 along the direction of the stable solution m*. When A is > 1 we can detect the presence of two clusters in the input distribution with high probability. In the fig. 2 we report A as a function of noise level a in the case of a separation angle a — 90° and a = 10°; we have used a statistics of 106 input vectors. We observe that in the case of the BCM neuron and a = 90 the selectivity decreases in a sudden way at a certain value of the noise level. This effect is due to the presence of a critical noise level ac (see eq. (17)) at which the selective solutions bifurcate in the complex plane. In the case of the kurtosis-like neuron the presence of a critical value ac is not detected in the figure 2; this is a peculiar property of the kurtosis energy and of the relation between the second and the fourth momenta of a gaussian distribution. This is illustrates in the fig. 3 where we have plotted the neuron outputs o = y -v\ (black curve) and o = y • v? (red curve) in the case a = 90°: the presence of a bifurcation for the BCM neuron (left plot) is clear. However in the case a = 10° the selectivity of the kurtosis-like neuron is lost very soon (fig. 2
130
0
0.2
0.4
0.6
0
Figure 2. Separability property for a BCM (black circles) and kurtosis-like (red squares) neuron; the left plot refers to a separation angle a = 90° between the input signals whereas the right plot refers to a separation angle a — 10°; we have used a statistics of 10 6 input vectors and the threshold A = 1 is also plotted.
0
0
0.2
0.4
0.6
0.8
1
0.5
0
0.2
0.4
0.6
0.8
1
Figure 3. Normalized neuron outputs o = y • VJ j = 1,2 for the selective solution in the BCM case (left plot) and in the kurtosis-like case (right plot); the separation angle between the input vectors is a = 90°.
right); this is the effect of the appearance of a stable non-selective solution that attracts the neuron. We have checked the effect of a bias b that selects the second input V2 in the case of a separation angle a = 10°. The results are plotted in figure 4. The fig. 4 shows that the introduction of a bias increases the selectivity both for the BCM and kurtosis-like neuron; both the neurons loose their selectivity at a noise level a ~ .6, but the BCM neuron performs a better separation of the input clusters.
131 100
10
A
0.1
0.01 0.001 0
0.2
0.4
0.6
a
0.8
1
0
0.2
0.4
0.6
0.8
1
a
Figure 4. Comparison of the separability property without (circles) and with (squares) a bias for a separation angle a = 10° between the input vectors: the left plot refers to the BCM neuron whereas the right plot to the kurtosis-like neuron.
5
Conclusions
The analytical study of the selectivity property of neurons whose learning rules depend on the input distribution momenta of order > 3, suggests that a better performance could be achieved by using a bias in the input data. A numerical simulation on a simple example shows that this prevision is correct also for noisy input data. However further studies are necessary to understand the effect of a low statistics in the input data since the bias could decrease the signal to noise ratio. Moreover an algorithmic procedure to compute the bias is not yet available. References 1. E. L. Bienenstock and L. N Cooper and P. W. Munro,Journal of Neuroscience 2, 32 (1982). 2. A. Bazzani, D.Remondini,N. Intrator and G. Castellani,submitted to Neural Computation , (2001). 3. C.Castellani, N.Intrator, H.Shouval and L.N.Cooper, Network:Comput.Neural Syst. 10, 111 (1999). 4. Dudek, S. M. and Bear, M. F. Proc. Natl. Acad. Sci. 89, 4363 (1992). 5. Friedman, J. H. J. Am. Stat. Ass. 82, 249 (1997). 6. N. Intrator and L. N Cooper,Neural Networks 5, 3 (1992). 7. A. Kirkwood, H.-K. Lee and M. F. Bear, Nature 375, 328 (1995).
132 Pathophysiology of Schizophrenia: fMRI and Working Memory
GIUSEPPE BLASI AND ALESSANDRO BERTOLINO
Universita degli Studi di Bari, Dipartimento
di Scienze Neurologiche
e
Psichiatriche
P.zza Giulio Cesare, 11 -70124 - Bari, Italy E-mail: [email protected]
Functional Magnetic Resonance (fMRI) is an imaging technique with high spatial and temporal resolution that allows investigation of in vivo information about the functionality of discrete neuronal groups during their activity utilizing the magnetic properties of oxy- and deoxy-hemoglobin. fMRI permits the study of normal and pathological brain during performance of various neuropsychological functions. Several research groups have investigated prefrontal cognitive abilities (including working memory) in schizophrenia using functional imaging. Even if with some contradictions, large part of these studies have reported relative decrease of prefrontal cortex activity during working memory, defined as hypofrontality. However, hypofrontality is still one of the most debated aspects of the patophysiology of schizophrenia because the results can be influenced by pharmacotherapy, performance and chronicity. The first fMRI studies in patients with schizophrenia seemed to confirm hypofrontality. However, more recent studies during a range of working memory loads showed that patients are hypofrontal at some segments of this range, while they are hyperfrontal at others. These studies seem to suggest that the alterations of prefrontal functionality are not only due to reduction of neuronal activity, but they probably are the result of complex interactions among various neuronal systems.
Functional Magnetic Resonance Imaging (fMRI) Like its functional brain imaging forebears single photon emission tomography (SPECT) and positron emission tomography (PET), fMRI seeks to satisfy a longterm desire in psychiatry and psychology to define the neurophysiological (or functional) underpinnings of the so-called 'functional' illnesses. For much of the last century, attempts to define the 'lesions' causing these illnesses, such as schizophrenia, major depression and bipolar disorder, have been elusive, leading to their heuristic differentiation from 'organic' illnesses, like stroke and epilepsy, with more readily identifiable pathogenesis. FMRI offers several advantages in comparison to functional nuclear medicine techniques, including low invasiveness, no radioactivity, widespread availability
133
and virtually unlimited study repetitions [49]. These characteristics, plus the relative ease of creating individual brain maps, offer the unique potential to address a number of long-standing issues in psychiatry and psychology, including the distinction between state and trait characteristics, confounding effects of medication and reliability [80]. Finally, the implementation of 'realtime' fMRI will allow investigators to tailor examinations individually while a subject is still in the scanner, promising true interactive studies or 'physiological interviews' [26]. The physical basis of fMRI is the blood oxygenation level dependent (BOLD) effect, that is due to oxygenation-dependent magnetic susceptibility of hemoglobin . Deoxyhemoglobin is paramagnetic, causing slightly attenuated signals intensity in MRI image voxel containing deoxygenated blood. During neuronal firing, localized increases in blood flow oxygenation and consequently reduced deoxyhemoglobin cause the MRI signal to increase. It is therefore assumed that these localized increases in BOLD contrast reflect increases in neuronal activity. The BOLD mechanism has been further clarified by more recent experiments. By using imaging spectroscopy, which allows selective measurement of both deoxyhemoglobin and oxyhemoglobin, Malonek and Grinvald [52] demonstrated that hemoglobin-oxygenation changes in response to neuronal activation are biphasic: an early (<3 s), localized increase in deoxyhemoglobin (often referred to as the 'initial dip') is followed by a delayed decrease in deoxyhemoglobin and a concomitant increase in oxyhemoglobin. Malonek et al. showed that the initial increase in deoxyhemoglobin is caused by an increase in cerebral metabolic rate of oxygenation without matching cerebral blood flow response. The later increase in cerebral blood flow causes the subsequent decrease in deoxyhemoglobin and the concomitant increase in oxyhemoglobin [51].
Working memory Working memory is a construct that describes the ability to transiently store and manipulate information on line to be used for cognition or for behavioral guidance [2,40]. A key aspect of working memory is its capacity limitation, usually reflected in cognitive testing as decreasing performance in response to increasing working memory load [31,45,56,65]. Numerous functional neuroimaging studies have used the spatial location and temporal characteristics of the 'activation' response during working memory to localize this cognitive phenomenon to regionally distinct components within a larger distributed network [8,16,17,18,19,44,54]. For example, activation in dorsolateral prefrontal cortex (DLPFC) appears to be related to the active maintenance of information over a delay [17,19] and/or the manipulation of this information [68]. In contrast, activation in areas like the anterior cingulate is more the result of increase effort or task complexity [3,12,58]. Parametric working memory tasks, most notably the popular 'n-back' task [32],
134
are ideally suited to examine issues of dynamic range since working memory load can be increased during the same experiment. The 'no back' control task simply requires the identification of the number currently seem. The working memory conditions require the encoding of currently seen numbers and the concurrent recall of numbers previously seen and retained over a delay: as memory load increased, the task required the recollection of respectively one stimulus ('one back'), two stimuli ('two back') or three stimuli ('three back') previously seen while encoding additional incoming stimuli. Callicott et al. have identified characteristics of working memory capacity using this parametric 'n-back' working memory task involving increasing cognitive load and ultimately decreasing task performance during fMRI in healthy subjects. Loci within dorsolateral prefrontal cortex (DLPFC) evinced exclusively an 'inverted-U' shaped neurophysiological response from lowest to highest load, consistent with a capacity-constrained response. Regions outside of DLPFC, in contrast, were more heterogeneous in response and often showed early plateau or continuously increasing responses, which did not reflect capacity constrains. However, sporadic loci, including the premotor cortex, thalamus and superior parietal lobule, also demonstrated putative capacity-constrained responses, perhaps arising as an upstream effect of DLPFC limitations or as a part of a broader network-wide capacity limitation. These results demonstrate that regionally specific nodes within the working memory network are capacity-constrained in the physiological domain [10].
Schizophrenia, fMRI and Working Memory Based on multiple clinical, neuropathological and functional neuroimaging studies, it is clear that schizophrenia is a brain disorder arising from subtle neuronal deficits (for lack of more specific terminology) [73]. These deficits likely arise in a few key regions such as dorsolateral prefrontal cortex and hippocampal formation, that result in widespread, multifaceted and devastating clinical consequences [77]. These neuronal deficits are clearly heritable, although in a complex fashion from multiple genes interacting in an epistatic fashion with each other and the environment [47,62]. It is reasonable to assume that these neuronal deficits, clearly resulting in quantifiable behavioral abnormalities in schizophrenic patients, will produce predictable, quantifiable aberrations in neurophysiology that can be 'mapped' using fMRI. Attempts to map the physiological signature of putative prefrontal cortex (PFC) neuronal pathology in schizophrenia have been numerous, but the results have been inconsistent and controversial. Of the various functional neuroimaging findings in schizophrenia, reduced function of PFC, so called 'hypofrontality', has been both the most prominent and controversial [1,28,43,78]. According to its proponents, hypofrontality is a marker of PFC dysfunction in schizophrenia that most reliably
135
arises during demanding cognitive tasks that tax PFC function [13,79]. A corollary of this explanation s that cortical activation is relatively 'normal' during cognitive tasks that are less taxing on PFC [4,74]. On the other hand, critics have raised a number of objections regarding the relationship between hypofrontality and schizophrenia, invoking issues of experimental design and related inconsistencies. For example, an alternative interpretation of hypofrontality is that it arises as an epiphenomenon of patient behavior, specifically task performance, that is tipically abnormal in patients with schizophrenia [28,42,61]. Thus, while many studies of PFC function in schizophrenia have reported reduced PFC activation when patients perform poorly [11,13,25,27,69,76,77], others have observed normal [21,28,55], reduced [20,81] and even increased PFC activation [53,69] when patients' performance is near normal. Regardless of these uncertainties, most authors agree that the physiological responses of the schizophrenic brain are abnormal when cognitive challenges are beyond these patients' behavioural capacity [75]. The interpretation of hypofrontality in the context of capacity limitations is further complicated by recent studies in healthy subjects. For example, Goldberg et al. found that healthy subjects performing a dual task paradigm became relatively hypofrontal when pushed beyond their capacity to maintain accuracy [34]. In the above cited study, Callicott et al. found evidence of an inverted-U shaped PFC response to parametrically increasing working memory difficulty in healthy subjects who became relatively hypofrontal as they were pushed beyond their working memory capacity [10]. In addition, diminished PFC activity coincident with diminished behavioural capacity has been found in single-unit recording studies in non-human primates during working memory tasks [29,30] and in electrophysiological studies in humans attempting complex motor tasks [33]. Thus, under certain circumstances, hypofrontality can be a normal physiological response to excessive load. Collectively, these data make it difficult to resolve whether hypofrontality as a 'finding' in schizophrenic patients is a direct (i.e. disease dependent) manifestation of PFC pathology or whether hypofrontality simply reflect diminished behavioural capacity as might occur for any subject pushed beyond capacity (i.e. disease independent). To complicate matters further, there is also evidence that the 'healthy' relationship between reduced working memory capacity and PFC neuronal function could be over-activation of PFC (i.e. relative hyperfrontality). Rypma and D'Esposito recently demonstrated that healthy controls who have longer reaction times during a working memory task respond by increasing activation in dorsal but not in ventral PFC [63]. They interpreted these results as a reflection of reduced efficiency of working memory information manipulation within dorsal PFC. Further, they interpreted the failure of reaction time to correlate with fMRI activation in ventral PFC as a reflection of the putative link between ventral PFC and working memory maintenance functions [22,57,66,67,72]. Thus, it is conceivable that under certain circumstances schizophrenic patients might evidence over-activation especially in dorsal PFC given their poor performance.
136
Working memory deficits are well documented in many studies of patients with schizophrenia [14,23,24,35,36,37,46,59,70,80]. While working memory is thought to be capacity limited in all subjects [45,56], schizophrenic patients appear to have additional capacity limitations presumed to arise from dorsal PFC dysfunction [38,39,41]. Callicott et al. [9], using fMRI, mapped the response to varying working memory difficulty in patients with schizophrenia and healthy comparison subject using the above cited parametric version of the 'n-back' task. Consistent with earlier neuropsychological studies, he found that patients with schizophrenia have limited working memory capacity compared to healthy subjects. Although patients activated the same distributed working memory network, the response of patients with schizophrenia to increasing working memory difficulty was abnormal in dorsal PFC. The salient characteristic of PFC dysfunction in schizophrenia in this paradigm was not that the PFC was relatively 'up' or 'down' in terms of activation when compared to healthy subjects; rather, the salient characteristic was an inefficient dynamic modulation of dorsal PFC neuronal activity. While several regions within a larger cortical network also showed abnormal dynamic responses to varying working memory difficulty, the fMRI response in dorsal PFC (areas 910, 46) met additional criteria for a disease-dependent signature of PFC neuronal pathology. In fact, at higher memory difficulties (lback and 2back) wherein patients showed diminished working memory capacity, dorsal PFC was consistently hyperresponsive. Furthermore, there was a functional distinction between the response of ventral and dorsal PFC, even though both were abnormal to some extent. In contrast to dorsal PFC, ventral PFC (BA47) was hypo-responsive to varying memory difficulty [9]. While hypofrontality as a finding generates continued debate, there is less debate that PFC neuronal pathology exists in schizophrenia and that this pathology may be more prominent in dorsal PFC (areas 9, 46). Similarities between some of the clinical symptoms of schizophrenia - particularly between the negative or deficit symptoms in schizophrenics and those of patients with frontal lobe lesions have long implicated PFC in schizophrenia [48,60]. Even though the heterogeneity of clinical symptomatology implicates multiple brain regions, evidence that schizophrenia fundamentally involves dorsal PFC neuronal pathology continues to accumulates from many directions [50,64]. For example, proton magnetic resonance spectroscopy studies have repeatedly found reduced concentrations of an intraneuronal chemical N-acetylaspartate (NAA) in PFC [7,5,6,15,71]. Furthermore, those studies that have examinated sub-regions within PFC have found NAA reductions in dorsal but not ventral PFC [5,6,7]. In addition, Callicott et al demonstrated dorsal but not ventral PFC NAA reductions specifically predicted the extent of negative symptoms in schizophrenic patients [9]. These and other data provide a strong basis for the assumption that specific neurocognitive abnormalities in schizophrenia (particularly working memory) result from physiological dysfunctions of PFC neurons.
137
A note of caution is in order when attempting to attribute to one primary node of dysfunction (here, dorsal PFC) behavioral or physiological abnormalities arising during the use of a task that evokes a wide cortical network. It remains uncertain as to whether these problems arise from inherent neuronal abnormality primarily in dorsal PFC or as a result of abnormal feedforward or feedback input to PFC from neuronal pathology in other brain areas. Because working memory relies on an integrated network, it is likely that there are significant interactions between PFC and other nodes within this network, including parietal cortex, anterior cingulate and the hippocampal area. One could argue that these non-PFC regions may play an important modulatory roles in working memory either directly or indirectly via their reciprocal connectivity with PFC. Thus, it is conceivable that limited working memory capacity in schizophrenic patients arises as a result of neural pathology in these non-PFC regions to produce abnormal hyperesponsiveness to varying working memory difficulty.
Figure 1: fMRI of patients with schizophrenia during the 1-back task.
In conclusion, patients with schizophrenia seem to have a combination of reduced cortical physiological efficiency and behavioral capacity. Dorsal PFC
138
neuronal responses - putatively linked to more executive working memory functions like information manipulations - may be relatively more impaired in schizophrenia than ventral PFC regions associated with maintenance of working memory content. A non-behavioral, biological measure of PFC neuronal pathology also reveales that these patients have specific reductions in dorsal PFC NAA measures that specifically predict functional abnormalities in dorsal PFC. Thus, we infer that dorsal PFC neuronal pathology is a plausible cause of cortex-wide abnormal physiological responses in working memory.
References 1.
2. 3.
4.
5.
6.
7.
8.
9.
Andreasen N.C., Rezai K., Alliger R., Swayze V.W., Flaum M., Kirchner P., Cohen G., O'Leary D., Hypofrontality in neuroleptic-naive patients and in patients with chronic schizophrenia: assessment with Xenon 133 single-photon emission computed tomography and the tower of London, Arch. Gen. Psychia.49 (1992) pp. 943-958. Baddeley A., Working memory (Clarendon Press, Oxford, 1986). Barch D.M., Braver T.S., Nystrom L.E., Forman S.D., Noll D.C. and Cohen J.D., Dissociating working memory from task difficulty in human prefrontal cortex, Neuropsychologia 35 (1997) pp. 1373-1380. Berman K.F., Illowsky B.P., Weinberger D.R., Physiological dysfunction of dorsolateral prefrontal cortex in schizophrenia. IV. Further evidence for regional and behavioral specificity, Arch. Gen. Psychiat. 45 (1988) pp. 616622. Bertolino A., Callicott J.H., Elman I., Mattay V.S., Tedeschi G., Frank J.A., Breier A. and Weinberger D.R., Regionally specific neuronal pathology in untreated patients with schhizophrenia: a proton magnetic resonance spectroscopic imaging study, Biol. Psychiat., 43 (1998) pp. 641-648. Bertolino A., Callicott J.H., Nawroz S., Mattay V.S., Duyn J.H., Tedeschi G., Frank J.A. and Weinberger D.R., Reproducibility of proton magnetic resonance spectroscopic imaging in patients with schizophrenia, Neuropsychopharmacology, 18 (1998) pp. 1-9. Bertolino A., Nawroz S., Mattay V.S., Barnett A.S., Duyn J.F., Moonen C.T., Frank J.A., Tedeschi G. and Weinberger D.R., Regionally specific pattern of neurochemical pathology in schizophrenia as assesed by multislice proton magnetic resonance spectroscopic imaging, Am. J. Psychiat, 153 (1996) pp. 1554-1563. Braver T.S., Cohen J.D., Nystrom L.E., Jonides J., Smith E.C. and Noll D.C, A parametric study of prefrontal cortex involvment in human working memory, Neuroimage 5 (1997) pp. 49-62. Callicott J.H., Bertolino A., Mattay V.S., Langheim F.J.P., Duyn J., Copcpola R., Goldberg T.E. and Weinberger D.R., Psysiological dysfunction of the
139
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
dorsolateral prefrontal cortex in schizophrenia revisited, Cerebral Cortex, 10 (2000) pp. 1078-1092. Callicott J.H., Mattay V.S., Bertolino A., Finn K., Coppola R., Frank J.A. Goldberg T.E. and Weinberger D.R., Physiological characteristics of capacity constrains in working memory as revealed by functional MRI, Cerebral Cortex 9 (1999) pp. 20-26. Callicott J.H., Ramsey N.F., Tallent K., Bertolino A., Knable M.B., Coppola R., Goldberg T., van Gelderen P., Mattay V.S., Frank J.A., Moonen C.T. and Weinberger D.R., Functional magnetic resonance imaging brain mapping in psychiatry: methodological issues illustrated in a study of working memory in schizophrenia, Neuropsychopharmacology 18 (1998) pp. 186-196. Carter C.S., Braver T.S., Barch D.M., Botvinick M.M., Noll D. and Cohen J.D., Anterior cingulate cortex, error detection, and the online monitoring of performance, Science 280 (1998) pp. 747-749. Carter C.S., Perlstein W., Ganguli R., Brar J., Mintun M., Cohen J.D., Functional hypofrontality and working memory dysfunction in schizophrenia, Am. J. Psychiat. 155 (1998) pp. 1285-1287. Carter C.S., Robertson L., Nordhal T., Chaderjian M., Kraft L. and O'ShoraCeleya L., Spatial working memory deficits and their relationship to negative symptoms in unmedicated schizophrenia patients, Biol. Psychiat., 40 (1996) pp. 1285-1287. Cecil K.M , Lenkinski R.E., Gur R.E., Gur R.C., Proton magnetic resonance spectroscopy in the frontal and temporal lobes of neuroleptic naive patients with schizophrenia, Neuropsychopharmacology, 20 (1999) pp. 131-140. Cohen J.D., Forman S.D., Braver T.S., Casey B.J., Servan-Schreiber D. and Noll D.C., Activation of the prefrontal cortex in a nonspatial working memory task with functional MRI, Human Brain Map 1 (1994) pp. 293-304. Cohen J.D., Perlstein W.M., Braver T.S., Nystrom L.E., Noll D.C., Jonides J. and Smith EE., Temporal dynamics of brain activation during a working memory task, Nature 386 (1997) pp. 604-608. Courtney S.M., Petit L., Maisog J.H., Ungerleider L.G. and Haxby J.V., An area specialized for spatial working memory in human frontal cortex, Science 279(1998) 1347-1351. Courtney S.M., Ungerleider L.G., Keil K. and Haxby J.V., Transient and sustained activity: a distributed neural system for human working memory, Nature 386 (1997) 608-611. Curtis V.A., Bullmore E.T., Brammer M.J., Wright I.C., Williams S.C.R., Morris R.G., Sharma T., Murray R.M. and McGuire P.K., Attenuated frontal activation during a verbal fluency task in patients with schizophrenia, Am. J. Psychiat, 155 (1998) pp. 1056-1063. Curtis V.A., Bullmore E.T., Morris R.G., Brammer M.J., Williams S.C.R., Simmons A., Sharma T., Murray R.M. and McGuire P.K., Attenuated frontal
140
22.
23.
24.
25.
26.
27.
28.
29.
30.
31. 32.
33.
34.
activation in schizophrenia may be task independent, Sch. Res. 37 (1999) pp. 35-44. D'Esposito M., Aguirre G.K., Zarahn E., Ballard D., Shin R.K. and Lease J., Functional MRI studies of spatial and nonspatial working memory, Brain Res. Cogn. Brain Res., 7 (1998) pp. 1-13. Fleming K., Goldberg T.E., Binks S., Randolph C , Gold J.M. and Weinberger D.R., Visuospatial working memory in patients with schizophrenia, Biol. Psychiat, 41 (1997) pp. 43-49. Fleming K., Goldberg T.E., Gold J.M. and Weinberger D.R., Verbal working memory dysfunction in schizophrenia: use of a Brown-Peterson paradigm, Psychiat. Res., 56 (1995) pp. 155-161. Fletcher P.C, McKenna P.J., Frith CD., Grasby P.M., Friston K.J. and Dolan R.J., Brain activation in schizophrenia during a graded memory task studied with functional neuroimaging, Arch. Gen. Psychiat. 55 (1998) pp. 1001-1008. Frank J.A., Ostuni J.L., Yang Y., Shiferaw Y., Patel A., Qin J., Mattay, V.S., Lewis B.K., Levin R.L. and Duyn, J.H., Technical solution for an interactive functional MR imaging examination: application to a physiologic interview and the study of cerebral physiology. Radiology 210 (1999) pp. 260-268. Franzen G. and Ingvar D., Absence of activation in frontal structures during psychological testing of chronic schizophrenics, J. Neurol. Neurosurg. Psychiat., 38 (1975) pp. 1027-1032. Frith CD., Friston K.J., Herold S., Silbersweig D., Fletcher P., Cahill C , Dolan R.J., Frackowiak R.S., Liddle P.F., Regional brain activity in chronic schizophrenic patients during the performance of a verbal fluency task, Br. J. Psychiat. 167 (1995) pp. 343-349. Funahashi S., Bruce C.J. and Goldman-Rakic P.S., Mnemonic coding of visual space in monkey's dorsolater prefrontal cortex, J. Neurophysiol., 61 (1989) pp. 331-349. Funahashi S., Bruce C.J. and Goldman-Rakic P.S., Neuronal activity related to saccadic eye movements in the monkey's dorsolateral prefrontal cortex, J. Neurophysiol., 65 (1991) pp. 1464-1483. Fuster J.M., The prefrontal cortex (Raven Press, New York, 1980). Gevins A.S., Bressler S., Cutillo B., Illes J., Miller J., Stern J. and Jex H., Effect of prolonged mental work on functional brain topography, Electroenceph. Clin. Neuropsysiol. 76 (1990) pp. 339-350. Gevins A.S., Morgan N.H., Bressler S.I., Cutillo B.A., White R.M., Illes J., Greer D.S., Doyle J.C and Zeitlin G.M., Human neuroelectric patterns predict performance accuracy, Science, 235 (1987) pp. 580-585. Goldberg T.E., Berman K.F., Fleming K., Ostrem J., VanHorn J.D., Esposito G., Mattay V.S., Gold J.M. and Weinberger D.R., Uncoupling cognitive workload and prefrontal cortical psysiology: a PET rCBF study, Neuroimage, 7 (1998) pp. 296-303.
141
35. Goldberg T.E., Patterson K.J., Taqqu Y. and Wilder K., Capacity limitations in short-term memory in schizophrenia: tests of competing hypoteses, Psychol. Med., 28 (1998) pp. 665-673. 36. Goldberg T.E., Weinberger D.R., Berman K.F., Pliskin N.H. and Podd M.H., Further evidence for dementia of the prefrontal type in schizophrenia? A controlled study of teaching the Wisconsin Card Sorting Test, Arch. Gen. Psychiat., 44 (1987) pp. 1008-1014. 37. Goldberg T.E. and Weinberger D.R., Thought disorder, working memory and attention: interrelationships and the effects of neuroleptic medications, Int. Clin. Psychopharmacol., 10(Suppl 3) (1995) pp. 99-104. 38. Goldberg T.E. and Weinberger D.R., Probing prefrontal function in schizophrenia with neuropsychological paradigms, Schizophr. Bull, 14 (1988) pp. 179-183. 39. Goldman-Rakic P.S., Prefrontal cortical dysfunction in schizophrenia: the relevance of working memory. In Psychopathology and the brain, ed. by Carroll B.J. and Barnett J.E. (Raven Press, New York, 1991). 40. Goldman-Rakic P.S., Regional and cellular fractionation of working memory, PNAS 93 (1996) pp. 13473-13480. 41. Goldman-Rakic P.S., Working memory dysfunction in schizphrenia, J. Neuropsychiat. Clin. Neurosci., 6 (1994) pp. 348-357. 42. Gur R.C., Gur R.E., Hypofrontality in schizophrenia: RIP, Lancet 345 (1995) pp. 1383-1384. 43. Ingvar D. and Franzen G., Distribution of cerebral activity in chronic schizophrenia, Lancet 2 (1974) pp. 1484-1486. 44. Jonides J., Smith E.E., Koeppe R.A., Awh E., Minoshima S. and Mintun M.A., Spatial working memory in humans as revealed by PET, Nature 363 (1993) pp. 583-584. 45. Just M.A. and Carpenter P.A., A capacity theory of comprehension: individual differences in working memory, Psychol. Rev. 99 (1992) pp. 122-149. 46. Keefe R.S., Roitman S.E., Harvey P.D., Blum C.S., DuPre R.L., Prieto D.M., Davidson M. and Davis K.L., A pen-and-paper human analogue of a monkey prefrontal cortex activation task: spatial working memory in patients with schizophrenia, Schizophr. Res, 17 (1995) pp. 25-33. 47. Kidd K.K., Can we find genes for schizophrenia?, Am J Med Genet 74 (1997) pp. 104-111. 48. Kraepelin E., Dementia precox and paraphrenia (E.&S. Livingstone, Edinburgh, 1919). 49. Levin J.M., Ross M.H. and Renshaw P.F., Clinical applications of functional MRI in neuropsychiatry, J Neuropsychiatry Clin Neurosci 7 (1995) pp. 511522. 50. Lewis, D.A., Development of the prefrontal cortex during adolescence: insights into vulnerable neural circuits in schizophrenia, Nueropsychopharmacology, 16 (1997) pp. 385-398.
142
51. Malonek D., Dirnagl U., Lindauer U., Yamada K., Kanno I. and Grinvald A., Vascular imprints of neuronal activity: relationships between the dynamics of cortical blood flow, oxygenation, and volume changes following sensory stimulation, PNAS 94 (1997) pp. 14826-14831. 52. Malonek D. and Grinvald A., Interactions between electrical activity and cortical microcirculation revealed by imaging spectroscopy: implications for functional brain mapping, Science 272 (1996) pp. 551-554. 53. Manoach D.S., Press D.Z., Thangaraj V., Searl M.M., Goff D.C., Halpern E., Saper C.B. and Warach S., Schizophrenic subjects activate dorsolateral prefrontal cortex during a working memory task as measured by MRI, Biol. Psychiat, 45 (1999) pp. 1128-1137. 54. McCarthy G., Blamire A.M. Puce A., Nobre A.C., Bloch G., Hyder F., Goldman-Rakic P.S. and Shulman R.G., Functional magnetic resonance imaging of human prefrontal cortex activation during a spatial working memory task, PNAS 91 (1994) pp. 8690-8694. 55. Mellers J.D.C., Adachi N., Takei N., Cluckie A., Toone B.K. and Lishman W.A., Pet study of verbal fluency in schizophrenia and epilepsy, Br. J. Psychiat. 173 (1998) pp. 69-74. 56. Miller G.A., The magical number seven, plus or minus two: some limits on our capacity for processing information, Psychol. Rev. 63 (1956) pp. 81-97. 57. Owen A.M., Evans A.C. and Petrides M., Evidence for a two-stage model of spatial working memory processing within the lateral frontal cortex: a positron emission tomography study, Cereb. Cortex, 6 (1996) pp. 31-38. 58. Pardo J.V., Pardo P.J., Janer K.W. and Raichle M.E., The anterior cingulate cortex mediates processing selection in the Stroop attentional conflict paradigm, PNAS 87 (1990) pp. 256-259. 59. Park S. and Holzman P.S., Schizophrenics show spatial working memory deficits, Arch. Gen. Psychiat, 49 (1992) pp. 975-982. 60. Piercy M., The effects of cerebral lesions on intellectual function: a rewiev of current research trends, Br. J. Psychiat, 110 (1964) pp. 310-352. 61. Price M., Friston K.J., Scanning patients with task they can perform, Hum.Brain Map., 8 (1999) pp. 102-108. 62. Risch N. and Merikangas K., The future of genetic studies of complex human diseases, Science 273 (1996) pp. 1516-1517. 63. Rypma B. and D'Esposito M., The role of prefrontal brain regions in components of working memory: effects of memory load and individual differences, PNAS, 96 (1999) pp. 6558-6563. 64. Selemon L.D. and Goldman-Rakic P.S., The reduced neuropil hypotesis: a circuit based model of schizophrenia, Biol. Psychiat, 45 (1999) pp. 17-25. 65. Shallice T., From neuropsychology to mental structure (Cambridge University Press, Cambridge, 1988). 66. Smith E.E. and Jonides J., Neuroimaging analyses of human working memory, PNAS, 95 (1998) pp. 12061-12068.
143
67. Smith E.E. and Jonides J., Storage and executive processes in the frontal lobes, Science, 283 (1999) pp. 1657-1661. 68. Smith E.E., Jonides J., Marshuetz C. and Koeppe R.A., Components of verbal working memory: evidence from neuroimaging, PNAS 95 (1998) pp. 876-882. 69. Stevens A.A., Goldman Rakic P.S., Gore J.C., Fulbright R.K. and Wexler B.E., Cortical dysfunction in schizophrenia during auditory word and tone working memory demonstrated by functional magnetic resonance imaging, Arch. Gen. Psychiat. 55 (1998) pp. 1097-1103. 70. Stone M., Gabrieli J.D., Stebbins G.T. and Sullivan E.V., Working strategic memory deficits in schizophrenia, Neuropsychology, 12 (1998) pp. 278-288. 71. Thomas M.A., Ke Y., Levitt J., Caplan R., Curran J., Asarnow R. and McCracken J., Preliminary study of frontal lobe 1H MR spectroscopy in childhood onset schizophrenia, J. Magn. Reson. Imag., 8 (1998) pp. 841-846. 72. Wagner A.D., Working memory contributions to human learning and remembering, Neuron, 22 (1999) pp. 19-22. 73. Weinberger D.R., Implications of normal brain development for the pathogenesis of schizophrenia, Arch. Gen. Psychiatry 44 (1987) pp. 660-669. 74. Weinberger D.R. and Berman K.F., Prefrontal function in schizophrenia: confounds and controversies, Phil. Trans. R. Soc. Med. 351 (1996) pp. 14951503. 75. Weinberger D.R. and Berman K.F., Speculation on the meaning of cerebral metabolic hypofrontality in schizophrenia, Schizophr. Bull., 14 (1988) pp. 157168. 76. Weinberger D.R., Berman K.F. and Illowsky B.P., Physiological dysfunction of dorsolateral prefrontal cortex in schizophrenia III. A new cohort and evidence for a monoaminergic mechanism, Arch. Gen. Psychiat. 45 (1988) 609-615. 77. Weinberger D.R., Berman K.F., Suddath R. and Torrey E.F. Evidence of dysfunction of a prefrontal-limbic network in schizophrenia: a magnetic resonance imaging and regional cerebral blood flow study of discordant monozygotic twins, Am J Psychiatry 149 (1992) pp. 890-897. 78. Weinberger D.R., Berman K.F. and Zee R.F., Physiologic dysfunction of dorsolateral prefrontal cortex in schizophrenia. I. Regional cerebral flood evidence, Arch. Gen. Psychiat. 43 (1986) pp. 114-124. 79. Weinberger D.R., Mattay V., Callicott J., Kotrla K., Santha A.,van Gelderen P., Duyn J., Moonen C. and Frank J., fMRI applications in schizophrenia research, Neuroimage 4 (1996) pp. 118-126. 80. Wexler B.E., Stevens A.A., Bowers A.A., Sernyak M.J. and Goldman-Rakic P.S., Word and tone working memory deficits in schizophrenia, Arch. Gen., Psychiat, 55 (1998) pp. 1093-1106. 81. Yurgelun-Todd D.A., Waternaux CM., Cohen B.M., Gruber S.A., English CD. and Renshaw P.F., Functional magnetic resonance imaging of schizophrenic patients and comparison subjects during word production, Am. J. Psychiat, 153 (1996) pp. 200-205.
144
A N N FOR ELECTROPHYSIOLOGICAL ANALYSIS OF NEUROLOGICAL D I S E A S E R. B E L L O T T I 1 ' 2 ' 4 ^ . DE CARLO 2 ' 4 ,M. DE TOMMASO 1 ' 3 , O. DIFRUSCOL0 3 ,R. MASSAFRA 2 ,V. SCIRUICCHI0 3 ,S. STRAMAGLIA 1 ' 2 ' 4 1
Center of Innovative Technologies for Signal Detection and Processing, Bari 2 Department of Physics, University of Bari 3 Department of Neurological and Psichiatric Sciences, University of Bari 4 J.N.F.N., Bari E-mail: [email protected] , [email protected] The aim of this study was to develop a discriminant analysis based both on classical linear methods, as Fisher's Linear Discriminant (FLD) and Likelihood Ratio Method (LRM), and non-linear Artificial Neural Network (ANN) classifier in order to distinguish between patients affected by Huntington's disease (HD) and normal subjects. R.O.C. curve analysis revealed ANN to be the best classifier. Moreover the network classified gene-carrier relatives as normal thus suggesting the EEG to be a marker of the evolution of the HD.
1
Introduction
A study of electroencephalogram (EEG) of patients affected by the Huntington's disease 1 (HD), also known as corea, and their gene-carrier relatives was done in order to establish the best classifier to discriminate between healthy and not-healthy elements. For this aim three classification systems were considered and their performances were compared: Fisher's Linear Discriminant 2 (FLD), Likelihood Ratio Method 3 (LRM) and Artificial Neural Network 4 (ANN). R.O.C. curves analysis 5 showed ANN to have the best performance. Moreover, gene-carrier relatives' data were submitted to the network in order to investigate the correlation between brain activity and the HD thus revealing the EEG to be related to the phenotipic manifestation of the disease rather than to the genetic anomaly. The paper is organized as follows: in next section (2) the statistics and the process of data extraction are presented. The three classifiers are described in sections (3), (4) and (5) and their performances are compared in section (6) where R.O.C. curves analysis is introduced. In section (7) we focus our attention on the analysis of gene-carrier relatives' data and the conclusions are drawn in section (8).
145
2
Data set
The data set here considered refers to 8 patients affected by the HD, 7 genecarrier first-degree relatives and 7 controls. The EEG signal was sampled at 512 Hz in 2-seconds epochs on 19 electrodes positioned on FP1, FP2, F7, F 3 , FZ, FA, F8, T3, C3, CZ, C4, T4, T5, P 3 , P Z , P4, T6, 0 1 , 0 2 derivations, according to the 10 - 20 system. Artifacts-free random samples were selected to form a data-set constituted of 160 epochs from patients' recordings, 160 epochs from controls and 71 from gene-carrier relatives. These were Fast-Fourier transformed and the power of the brain rhythms a (8 - 12.5)Hz, /3 (13 - 30)Hz, •d (4 - 7.5)Hz and 5 (0.5 — 3.5)Hz was considered. Due to the limited availability of the data, the cross-validation technique6 was applied, which consists in considering all the possible 8 different partitions of the data into a training set of 140 elements and a test set of 20 elements: each partition is submitted to the classification systems and the corresponding outputs are summed for the case of signals-controls classification while averaged for the case of gene-carrier relatives' analysis. 3
Fisher's Linear Discriminant
Fisher's linear discriminant 4 consists in maximizing the classes' separation through the projection of the data from the original 19-dimensional input space onto a 1-dimensional space defined by the versor w=
JMI S ^ m s ~ " ^
^
where ms
= Jf- z l *i s
>
m
c = TF z l X i c
jzcs
(2)
iecc
are the mean vectors of the two classes (signals and controls) and S
™-
zZ
(XJ-mc)(xi-mc)T+
^2
(xj - m s ) ( x j - m s ) T
(3)
is the within-class covariance matrix. The new data variables satisfying the request of maximal separation are z — x-w, defined in the range [-1,4-1] (due to the normalization of the original x data) which are linearly re-scaled in the interval [0,1] in order to be directly compared with the output variables of the other two classification systems.
146
4
Likelihood Ratio Method
Likelihood ratio method is largely used in many fields dealing with classification tasks 7 . Let us define the i-class conditional probability density function, P(xk\i), as the probability density that the A;-th feature is Xk given that it belongs to the i-th class (i = 1,2). If we assume that the EEG recordings at the 19 differents locations on the scalp of the patients are independent, the probability density that a patient generates the data vector x = {x\,... ,X\g) given that he belongs to the i-th class is 19
Pi(x) = l[P(xk\i).
(4)
fc=i
Likelihood ratio for the i-th class is than defined by
Due to the high dimensionality of the feature space, estimation of the probability density (4) by the histogram methodwould not work (see, e.g., the discussion on the curse of dimensionality in Bishop 4 , pp. 51). It is then estimated by a non-parametric approach as the kernel-based method8 in which the functional form is not specified a priori but relies on the data itself. In particular it is given by the sum all over the training set of normal multivariate distributions each one centered on a training data point: 1
*-^
1
i i * - * , - ii 2
where d = 19 is the dimension of the data space and h is a free parameter which plays the role of smoothing parameter of the whole distribution. It is worth remarking that the performance of the LRM classifier may change drammatically as h is varied (see (6)). 5
Artificial Neural Network
Let us consider a two layered feed-forward perceptron 9 whose general structure is represented in figure (1). The input layer has 19 neurons according to the dimension of the features' space; the hidden layer has a number of neurons varying from 2 to 10 and the output layer has only one neuron which, in the training phase, is set to 1 when signals are submitted to the network and to 0 otherwise.
147
Figure 1: Two layered feed-forward perceptron.
The output Vi of each neuron is a sigmoid transfer function of its input i = J2jwijVj where the sum is computed over the neuron of the previous layer: u
v
< = *("«> = I T ^ T
(?)
•
The weights are updated according to the gradient descent learning rule9:
Aw?fw =-r,-^- +aAwff
(8)
where E is the error function E
= \ E K " - °M]2'
<9)
which is a measure of the distance between the network outputs C and the target patterns C = 1,0 respectively for signal and control data. At each iteration the error function reduces until its minimum is attained.
148
The second term in (8), the so called momentum term 10 , represents a sort of inertia which is added in order to let the weights change in the average downhill direction, avoiding sudden oscillations of the Wy around the minimum: this term allows the network to reach the solution more quickly. The network parameters we used are : learning rate n = 0.01, momentum parameter a = 0.1 -r 0.3 and gain factor 0 = 1. 6
R . O . C . curves' analysis
Figures (2), (3) and (4) show tipical output histograms in the a frequency region from the three classification systems. In particular, for LRM analysis, the output histogram of the likelihood Ls, relative to the signal class, is considered both for controls (figure (3), up) and for signals (figure (3), down). Even by visual comparison it is clear that the FLD output gives the worst discrimination between the classes due to the strong overlap of the distributions while both LRM and ANN histograms are more separated and peaked, which means a better classification. The subsequent step is to put an appropriate threshold on the output histograms so that once a new data point (which is not known to be a signal or a control) is submitted to the classifier a decision can be taken on its class depending on the side of the corresponding output with respect to the threshold itself. In order to have a quantitative measure of the performance of the algorithms we use R.O.C. curve analysis which is a good technique to estimate the quality of a classification in the particular case of binary hypothesis to be tested. Given a threshold value on the output histogram, the sensitivity e and the specificity s are defined as H-SS
e =
Tlcc
, Ylss T Tlsc
s=
/ir>\
(10) Tlcc i Tics
where nss and the numbers of the correctly classified signal and control data, respectively, and ncs and nsc the number of misclassifications. Sweeping the threshold parameter through the [0,1] interval, the graphical rappresentation of the sensitivity e versus the specificity s gives the R.O.C. curve.
149
n
r-
0.2
•'• l« • 0.4
Ml
Figure 2: FLD a histogram: controls (up), signals (down). 1
120 100
~T
1
1
'
-
" " " "
80 60 40 20
lb
-0.2
.
0.2
0.2
0.4
»• • I. 0.6
I.I. 0.8
Figure 3: Ls a histogram: controls (up), signals (down).
150 140 120
'
.
1
1
1
-
100
80 60 40
• -
20 0 -0 2
I.. )
,
!
r
,
0.2
0.4
0.6
0.8
60
1
I
50
-
40
-
30
-
20 10
-
0 -0 2
. 1 (
0.2
0.4
0.6
i.... 0.8
IJIJIIIIII 1
1
Figure 4: ANN a histogram: controls (up), signals (down).
In the case of a perfect classification the two terms nsc and ncs tend to zero and therefore both the sensitivity and the specificity tend to 1 as the area under R.O.C. curve: this area is, therefore, an index of the goodness of the performance of the classification system and will be used to compare our three classifiers. R.O.C. curves are shown for FLD, LRM (for different values of the parameter h) and ANN in the different frequency regions a (figure (5)), /3 (figure (6)), d (figure (7)) and 5 (figure (8)). In figure (9) R.O.C. curves relative to ANN are drawn to compare the performances of the network in the different regions. By computing the areas a for a frequencies, we find FLD to be the worst algorithm (a = 0.7954), ANN the best one (a = 0.9877) while LRM has an intermediate performance increasing as h decreases (a = 0.8163 for h — 0.5, a = 0.9314 for h = 0.1 and a = 0.945 for h = 0.05): this LRM behavior is verified also for j3, 8 and $ frequencies. In the other three regions FLD overcomes LRM for h = 0.5 while the order of performance of ANN and LRM (h = 0.1, h — 0.05) is the same as in a. Concerning ANN, its performance increases from 5 (a = 0.9396) to •d [a = 0.9661) to /3 (a = 0.9864) to a (a = 0.9877). Therefore we are led to the conclusion that ANN is, for each frequency, the best of the three classifiers with a minimum performance for 5 rhythms.
151
~K
a R.O.C. curves 1~.
Figure 5: a R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h — 0.1 (triangles), h = 0.05 (rhombi) and ANN (circles). p R.O.C. curves
**• * * *
'•••
****
'**
V
0.1
0.2
0.3
0.4
0.5 specificity
0.6
0.7
0.8
0.9
1
Figure 6: /? R.O.C. curves for FLD (stars), LRM with /i = 0.5 (squares), ft = 0.1 (triangles), h = 0.05 (rhombi) and ANN (circles).
152 * R.O.C. curves
Figure 7: •& R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h = 0.1 (triangles), h = 0.05 (rhombi) and ANN (circles). 8 R.O.C curves
*a^j*-
*
•• *
0.9
•
0.8
* *
* *
-
•
A
• 0.5
0.4
•"•**•%•.
A
07
0.6
• Ht m •
•k
* •* *
*
1• 1
** 1
I
-
0.1
0
I
X 1
0.3
0.2
••
0.1
0.2
0.3
0.4
0.5 specificity
0.6
0.7
0.8
0.9
1
Figure 8: 8 R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h = 0.1 (triangles), h = 0.05 (rhombi) and ANN (circles).
153 ANN R.O.C. curves
** *
0.1
Figure 9: ANN
7
0.2
0.3
0.4
0.5 specificity
0.6
R.O.C. curves for <5 (stars), i9 (triangles), /? (rhombi) and a (circles) region.
Gene-carrier relatives' analysis
Gene-carrier first-degree relatives have not yet shown the disease. From their analysis we expect that if they were classified as not-healthy, the EEG would be close related to the genetic anomaly, meaning that the EEG analysis would be predictive of the appareance of the HD. Otherwise, if they were classified as healthy, the EEG would be related to the manifestation itself of the disease and so the EEG signal would be a marker of the evolution of the HD. Gene-carrier relatives' data were submitted to the network and the corresponding outputs are shown in figures (10), (11), (12) and (13). As one can see, their distributions are peaked around the zero in a, j3 and d regions as the control ones (see figure (4)). In 5 region, instead, the situation seem to be more confused due to the spread of the distribution all over the interval [0,1]. However the performance of the network in classifing signal and control data was shown to be the worst in this region with respect to the others (see figure (9)) thus implying the classification to be bad for the relatives too. Therefore we are led to the following medical conclusion: gene-carrier relatives data are classified as healthy by ANN thus meaning the EEG to be a marker of the evolution of the disease.
154
30 -
-
25-
20 -
15 -
10 -
:
I J J i i u i i ii 0.2
Figure 10: ANN
0.2
Figure 11: ANN
0.4
in 0.6
0.8
gene-carrier relatives' a histogram.
0.4
gene-carrier relatives' /3 histogram.
155
0.2
II
0.4
0.6
II I
Figure 12: ANN
gene-carrier relatives' i) histogram.
Figure 13: ANN
gene-carrier relatives' 5 histogram.
156
8
Conclusions
A comparison between the statistical methods of FLD and LRM and ANN approach to evaluate the classification performance of the EEG data taken from patients affected by the HD was presented. R.O.C. curve analysis clearly showed the supremacy of non-linear ANN approach over the classical linear methods {FLD and LRM). Moreover ANN classified gene-carrier relatives as controls, thus leading to the conclusion that the EEG is a marker of the phenotipic manifestation of the HD. Acknowledgments We thank Carmela Marangi (I.R.M.A.-C.N.R.) and Fabio Bovenga (Physics Department, University of Bari) for helpful discussions. References 1. For general aspects see, e.g., S.E. Folstein, R.J. Leigh, I.M. Parhad, M.F. Folstein Neurology 36, 1986, 1279 - 1283. 2. R.A. Fisher Annals of Eugenics 7, 1936, 179 - 188. Reprinted in Contributions to Mathematical Statistics, John Wiley, New York (1950). 3. See, e.g., R. Bellotti, M. Castellano, C. De Marzo, N. Giglietto, G. Pasquariello, P. Spinelli Computer Physics Communications 78, 1993, 17 — 22 and references therein. 4. C M . Bishop Neural Networks for Pattern Recognition Oxford University Press, Oxford (1995). 5. J.A. Swets "Measuring the accuracy of diagnostic systems" Science, 240, 1285-1293, (1988). 6. M.Stone Journal of the Royal Statist. Society B 36 (1), 1974, 111-147. M. Stone Math. Operat. Statist. Ser. Statistics 9 (1), 1978, 127 - 139. G. Wahba, S. Wold Comm. in Statistics, Series A 4 (1), 1975, 1 — 17. 7. See, e.g., for application dealing with electron-hadron discrimination K.K. Tang Astrophysics Journal 278, 1984, 881 A. Bungener et al. Nuclear Instruments Methods 214, 1983, 261. 8. M. Rosenblatt Annals of Mathematical Statistics 27, 1956, 832 - 837. E. Parzen Annals of Mathematical Statistics 33, 1962, 1065 — 1076. 9. J. Hertz, A. Krogh, R.G. Palmer Introduction to the theory of neural computation Addison-Wesley, 1991. 10. D.E. Rumelhart, J.L. McClelland Parallel Distribuited Processing - Vol.1 MIT Press, Cambridge, MA (1986), pp. 318.
157
DETECTION OF MULTIPLE SCLEROSIS LESIONS IN MRI'S WITH NEURAL NETWORKS P. BLONDA, G. SATALINO, A. D'ADDABBO, G.PASQUARIELLO I.E.S.I.- C.N.R.,
Via Amendola
166/5 - 70126 Bari, Italy,
[email protected]
A. BARALDI I.S.A.O.-C.N.R.,
Via Gobetti 101; 3100-Bologna
-Italy,
[email protected].
R. DE BLASI Cattedra e Servizio di Neuroradiologia
, University ofBari, P.za G. Cesare -70126 Bari, Italy
The objective of this paper is to state the effectiveness of a two-stage learning classification system in the automatic detection of small lesions from Magnetic Resonance Images (MRIs) of a patient affected by multiple sclerosis. The first classification stage consists of an unsupervised neural network module for data clustering. The second classification stage consists of a supervised learning module employing a plurality vote mechanism to relate each unsupervised cluster to the supervised output class having the largest number of representatives inside the cluster. In this paper two different neural network algorithms, i.e. the Enhanced Linde-Buzo-Gray (ELBG) algorithm and the well-known Self-Organizing Map (SOM), have been employed as the clustering module in the first stage of the system, respectively. The results obtained with the two different clustering algorithms have been qualitatively and quantitatively compared in a set of classification experiments. In these experiments, ELBG is equivalent to SOM in terms of classification accuracy and superior to SOM with respect to the visual quality of the output map and robustness to changes in the order and composition of the data presentation sequence. The results confirm the usefulness of the neural classification system in the automatic the detection of small lesions.
1
Introduction
The typical approach to automated recognition of tissue types includes multispectral analysis of Magnetic Resonance Images (MRIs) consisting of tissuedependent parameters such as the Proton Density (PD), T2 (the spin-spin relaxation time) and Tl (the spin-lattice relaxation time). In recent years a novel threedimensional Tl-weighted gradient echo sequence, based on the turbo-flash technique and called Magnetization-Prepared RApid Gradient Echo (MP-RAGE), that can be generated from contiguous and very thin ( 1 . 3 - 3 mm) sections, allows visual detection of small lesions typically affected by partial volume effects and intersection gaps in Tl weighted Spin-Echo (SE) sequences [5], [6]. In this work, two per-pixel nearest multiple-prototype classifiers, based on a hybrid two-stage learning framework [4], are compared, both qualitatively and quantitatively, in the detection of small lesions from a data set consisting of PD-SE, T2-SE, Tl-SE and MP-RAGE images of a volunteer affected by multiple sclerosis.
158
In this data set, supervised (labelled) image areas are manually selected by an expert neuroradiologist to provide the learning algorithms with training and testing data samples. In this classification framework, the first classification stage consists of a pixel-based data clustering algorithm. In the second classification stage, a supervised learning module employing a plurality vote mechanism relates each unsupervised cluster to the supervised output class having the largest number of representatives inside the cluster. Classification accuracy is assessed on a test set. The Enhanced Linde-Buzo-Gray (ELBG) clustering algorithm and the SelfOrganizing Map (SOM) are employed, respectively, as the first classification stage providing unsupervised learning. ELBG is a novel quantization algorithm capable of providing a near-optimal solution to the Mean Square Error (MSE) minimisation problem [10]. Owing to their complementary functional features, the Fully selfOrganizing Simplified Adaptive Resonance Theory (FOSART) clustering network may be adopted to initialize ELBG [1], [2]. On the one hand, FOSART is on-line learning, constructive (i.e., the number of processing elements is not fixed by the user on a a priori basis before processing the data, rather it is set by the algorithm depending on the complexity of the clustering task according to an optimization framework) and cannot shift codewords through Voronoi regions. On the other hand, ELBG is non-constructive, batch learning and capable of moving codewords through Voronoi regions to reduce MSE. For comparison with ELBG, SOM is selected from the literature as a wellknown and successful clustering network. SOM is on-line learning, soft-to-hard (fuzzy-to-crisp) competitive, non-constructive and capable of employing topological relationships between output nodes belonging to a 2-D output array [7]. The rest of this paper is organized as follows. A brief overview of SOM, FOSART and ELBG is provided in section 2. The data set, the classification method and the results are illustrated in section 3. Conclusions follow in section 4. 2
2.1
Clustering networks
SOM
SOM and FOSART are both (fuzzy-to-crisp) competitive clustering networks, but, unlike FOSART, SOM employs inter-node distances in a fixed output lattice rather than inter-pattern distances in input space to compute learning rates. Noticeably, unlike FOSART, SOM deals with topological relationships (e.g., adjacency) among output nodes without explicitly dealing with inter-node (lateral) connections [2]. Despite its many successes in practical applications, SOM has some limitations [7]: termination is not based on optimising any model of the process or its data [1], [2]; the size of the output lattice, the learning rate and the size of the resonance neighbourhood must be varied empirically from one data set to another to achieve
159
useful results [3]; prototype parameter estimates may be severely affected by noise points and outliers. 2.2
FOSART
FOSART is a soft-to-hard (fuzzy-to-crisp) competitive, minimum-distance-tomeans clustering algorithm capable of: i) generating processing units and lateral (intra-layer) connections on an example-driven basis, and ii) removing processing units and lateral connections on a mini-batch basis (i.e., based on statistics collected over subsets of the input sequence to average information over the noise on the data).Potential advantages of FOSART are listed in the following [2]: a)owing to its soft-to-hard competitive learning strategy, FOSART is expected to be less prone to being trapped in local minima and less likely to generate dead units than hard competitive alternatives [3]; b)owing to its neuron removal strategy, it is robust against noise; c) feed-back interaction between attentional and orienting subsystems allows FOSART to self-adjust its network size depending on the complexity of the clustering task; d) the expressive power of networks that incorporate competition among lateral connections in a constructive framework, like FOSART and the Growing Neural Gas (GNG) [2], is superior to that of traditional constructive or non-constructive clustering systems (e.g., SOM) which employ no lateral connection explicitly [2]. As a consequence, FOSART, features an application domain extended to: vector quantization; entropy maximization; and structure detection in input data to be mapped in a topologically correct way onto submaps of an output lattice pursuing dimensionality reduction [1]. 2.3
ELBG
ELBG is non-constructive, batch learning and capable of moving codewords through Voronoi regions to reduce MSE. In ELBG, templates eligible for being shifted and split are those whose "local" contribution to the MSE value is, respectively, below and above the mean distorsion. Templates eligible for being shifted are selected sequentially and those eligible for being split are selected stochastically (in a way similar to the roulette wheel selection in genetic algorithms). Each selected pair of templates is adjusted locally based on the traditional LBG (c-means) batch clustering algorithm [8]. In [10] ELBG is initialized either randomly or with the splitting-by-two technique proposed in [8]. In this work ELBG is initialised with the FOSART network.
160
3
Experimental results
3.1
Data set
The multispectral data set consists of PD-SE, Tl-SE, T2-SE and Tl MP-RAGE sequences acquired by a Siemens Impact 1.0 Tesla MR a patient affected by multiple sclerosis and white matter lesions in the brain. The MP-RAGE slices from foramen magnum to upper convexity are acquired at 3 mm of thickness to stress the signal deriving from lesions actually present in the nervous system. A single slice was randomly selected from the subset of the MRI volume that showed the lesions. Figures l.a and Lb show, respectively, the Tl-SE and Tl MP-RAGE images. The image size is 256 by 256 pixels. The input values, originally ranging between 0 and 4096, are scaled between 0 and 255 according to a linear transformation. Labelled image areas, manually selected by an expert neuroradiologist in the raw images, belong to classes: White Matter (WM), Grey Matter (GM), Cerebral Spinal Fluid (CSF) Pathologic lesions (PT), Background (BC), and other (see Fig. lc). Approximately 66% of the 7544 supervised (labelled) pixels are extracted randomly for training the classification system (Tablel). The remaining pixels are used for testing. Three different extractions are carried out to obtain three training and test data set pairs characterised by different orders of the selected patterns to be fed to the system. Thus, in each classification experiment, the classification accuracy is averaged over three training/testing procedures.
Figure 1 Input images: (a) Tl MP - RAGE; (b) Tl-SE; and (c) labeled image areas. Table 1
Class Labels Training Points Test Points 3.2
WM 1102 573
GM 853 441
CSF 816 435
PT 310 169
BC 558 290
Other 1314 683
Total 4953 2591
Classification training and testing
Two classification experiments are carried out to: i) compare the capabilities of ELBG and SOM of detecting small lesions due to multiple sclerosis in MR images,
161
and ii) assess the utility of Tl MP-RAGE vs Tl-SE images. Let us identify a labelled (supervised) pixel as an input-output vector pair (Xj,Yj), where Xi=(/|,i,...,/|,D) £ ^ D is an input data vector, D is the input space dimensionality, _/i?kS R , i*=l,..., M, k=l,...,D, is the feature component, M represents the number of input patterns, while Yj=(yj,i,...,>>;. J , i=l,..., M, is the output labelling vector and L is the total number of classes. Classification results are averaged over three runs where a different selection of training and testing data sets is adopted. During learning, the unsupervised first stage of the TSH classifier employs a training set where data labels are ignored, while the supervised second classification stage is trained with labelled data, i.e., with (input, output) data pairs. Once the first classification stage reaches convergence, the second classification stage is trained to relate each cluster, detected by the first stage, to the supervised output class Yj having the largest number of representatives inside the cluster. In the first set of classification experiments, the unsupervised first stage of the TSH classifier is implemented as an ELBG module initialised by a FOSART clustering network. In the first classification stage the number of input units is equal to the number D of input spectral bands considered, whereas the number of output nodes depends on the FOSART input parameter p. Increasing values of input vigilance threshold p are employed until the testing Overall Classification Accuracy (OA) of the TSH classification system remains constant or start decreasing. Table 2 shows the training and testing results of this first TSH classifier when the Tl MP-RAGE, T2-SE and PD-SE image bands are used as input. The number of output nodes detected by FOSART and the different p values employed by FOSART are reported in the first and second column of Table 2, respectively. The number of epochs required to train the TSH system is shown in the third column of Table 2 as the sum of FOSART and ELBG training epochs. The OA percentage values of FOSART and ELBG are reported in columns 4 and 5 for training data and column 6 and 7 for testing data, respectively. Figure 2(a) shows the classified image obtained by the first TSH system that employs 174 output clusters and the Tl MP-RAGE, T2-SE, PD-SE bands as input. A lower qualitative and quantitative performance is obtained using Tl-SE in place of Tl MP-RAGE when the traditional Tl-SE image replaces the Tl MP-RAGE band. Table 2 ELBG Average Results. Input data: T1_MPR; PD_SE; T2JSE Out Neurons
Vigilance Threshold
Total iterations
17 42 174 248
0.002 0.005 0.015 0.02
3+9 3+10 3+9 3+10
Training Data OA(%) FOSART ELBG preprocessing
74.2 80.1 84.9 87.1
78.9 84.4 87.9 88.0
Test Data OA(%) FOSART ELBG OASt. preprocessing Dev
75.3 81.2 86.5 87.7
80.1 85.7 88.2 88.9
0.2 0.6 0.2 0.6
162
In the second set of experiments, the unsupervised first stage of the second TSH classifier is implemented as a SOM where the number of input nodes is set equal to the input space dimensionality D=3, while the number of output nodes is set equal to the number of nodes detected in the first experiment by FOSART, to make any classification comparison between the two experiments consistent. Table 3 shows the training and testing results obtained by this second TSH classification system when the Tl MP-RAGE, T2-SE and PD-SE image bands are used as input. These results are almost equivalent to those shown in Table 2. The SOM learning rate a is set to 0.02 in all simulations and the number of training epochs is set equal to the total number of epochs required by the first TSH system to train. This number of epochs is considered sufficient for SOM (in fact, the OA values of SOM do not change significantly when the number of training epochs is increased). Figure 2 (b) shows the image obtained by the second TSH classifier with the Tl MP-RAGE, T2SE, PD-SE input bands and 174 output clusters. In terms of performance stability with respect to changes in the order and composition of the presentation sequence, SOM features an OA standard deviation of 0.7 % during testing. Table 3 SOM AVERAGE RESULTS. Input date: T1_MPR; PD_SE; T2_SE,a=0.02
Out Neurons 17 42 174 248
Iterations 13 12 13 13
Training Data OA(%) 80.7 83.9 87.3 87.4
Test Data OA(%) OA St.Dev 81.3 0.7 85.1 0.7 88.1 0.7 88.4 0.5
163
4
Results and conclusions
In multi-spectral MRI classification tasks, a TSH classification system performs better when a Tl MP-RAGE image, featuring high anatomical definition, replaces the traditional Tl-SE band. Exploitation of the whole set of MR image bands does not significantly improve the TSH classification performance. In our experiments, summarized in Tables 2 and 3 where, respectively, ELBG and SOM are employed as the clustering stage of the TSH classification system, similar performances are obtained in terms of OA. However, the ELBG module performs better than SOM in terms of MSE minimization, especially when the number of training epochs is small as shown in Figures 4. The absolute difference in MSE between ELBG and SOM decreases with the number of output clusters. In terms of performance stability, ELBG is more robust than SOM to changes in the order and composition of the presentation sequence. 700,00
700,00 600,00 500,00 -
-SOM MSE
600,00
Elbg MSE
500,00
— SOM MSE • Elbg MSE
uito0,00
ujtoO.OO
CO
^00,00
%)0,00
200,00 -
200,00
100,00
100,00
0,00 0
5 10 iteration number
0,00 15
5 10 iteration number
15
Figure 4 MSE of SOM and ELBG with 42 and 174 clusters.
Besides quantitative evaluations, Figures 2(a) and 2(b), generated by two TSH classification systems employing 174 clusters are qualitatively compared by an expert neuroradiologist who considers Figure 2(a), generated by ELBG, more significant than Figure 2 (b), produced by SOM. In this example, SOM detects more false positives than ELBG, i.e., SOM tends to overestimate the lesion class to which many interface areas located between white and grey matter are assigned. Both ELBG and SOM are incapable of detecting a right frontal lesion, which is visible in SE sequences but has a normal grey matter appearance in MP-RAGE. Our experiments in multi-spectral MR image labelling seem to indicate that: a) the ELBG and SOM clustering networks employed in the TSH classification scheme are equivalent in terms of classification accuracy, b) ELBG is better than SOM in
164
minimizing MSE at small epoch numbers; c) ELBG is less sensitive to noise and/or false positive and this allows a more correct identification of multiple sclerosis lesions; c) ELBG is more stable than SOM with respect to small changes in the order and composition of the presentation sequence. Future works will assess the utility of interslice and intersubject MR data in the detection of multiple sclerosis lesions by means of two-stage supervised learning classifiers where both classification stages employ labelled data pairs to train. In this type of classifiers, the density of clusters (basis functions) is made independent of input vector density but dependent on the complexity of the (input, output) mapping at hand, to avoid generation of mixed clusters of input vectors that are closely spaced in input space but belong to different classes. References 1. A. Baraldi and P. Blonda, A survey on fuzzy neural networks for pattern recognition: Part I, IEEE Trans. Systems, Man and Cybernetics - Part B: Cybernetics 29 (1999) 778-785. 2. A. Baraldi and P. Blonda, A survey on fuzzy neural networks for pattern recognition: Part II, IEEE Trans. Systems, Man and Cybernetics - Part B: Cybernetics 29 (1999) 786-801. 3. J. C. Bezdek and N. R. Pal, Two soft relative of learning vector quantization, Neural Networks 8 (1995) 729-743. 4. P. Blonda, V. la Forgia, G. Pasquariello, and G. Satalino, Feature extraction and pattern classification of remote sensing data by a modular neural system, Optical Engineering 35 (1996) 536-542. 5. M. Brant-Zawadzki, G. D. Gillan, and W. R. Nitz, MP RAGE: a threedimensional, Tl-weighted, gradient-echo sequence. Initial experience in the brain, Radiology 182 (1992) 769-775. 6. B. Johnston, M. S. Atkins, B. Mackiewich, and M. Anderson, Segmentation of multiple sclerosis lesions in intensity corrected multispectral MRI, IEEE Trans.on Medical Imaging 15 (1996) 154-169. 7. T. Kohonen, Self-Organizing Maps (Springer Verlag, Berlin, 1995). 8. Y. Linde, A. Buzo, and R. M. Gray, An algorithm for vector quantizer design, IEEE Trans, on Communications 28 (1980) 84-94. 9. T. Martinetz and K. Schulten, topology representing networks, Neural Networks 7 (1994) 507-522. 10. M. Russo and G. Patane, The Enhanced-LBG algorithm, Neural Networks 14 (2001), 1219-1237.
165
MONITORING RESPIRATORY MECHANICS USING ARTIFICIAL NEURAL NETWORKS G. PERCHIAZZI, G. HEDENSTIERNA Department of Clinical Physiology, Uppsala University Hospital, S-75185 Uppsala, Sweden e-mail: zperchiazzi(d).vahoo.com A. VENA, L. RUGGIERO, R. GIULIANI AND T. FIORE Department of Emergency and Transplantation, Bari University, Policlinico Hospital, Piazza Giulio Cesare, 11, 70124 Bari-Italy
Application of mechanical ventilation requires reliable tools for extracting respiratory mechanics. Artificial neural networks (ANN) have been used by the authors to perform this specific task. In the reported experiments, ANN have showed a good performance on both simulated and real tracings of mechanical ventilation recordings. These results suggest that ANN may play a role in the development of future bed-side monitoring tools.
1
Introduction
In animal species, the major task of the respiratory system is to exchange gases between blood and atmosphere. In order to perform this particular task, the system works like bellows: when the inspiratory muscles contract, the intra-thoracic volume increases and a negative pressure in the airways is generated. The difference in pressure between atmosphere and airways determines a gas flow towards the internal, gas-exchanging part of the lung (alveoli). Different pathologic conditions can affect this system. Situations that impair the capacity of exchanging gas ("lung failure") or the efficiency of the gas flow dynamics ("pump failure"), may require mechanical ventilation. It consists in making the patient exchange gases by using an endotracheal tube connected to an external cyclic pump ("mechanical ventilator"). The respiratory system is composed of a conduction system (devoted mainly to convey gas to the respiratory part of the lung) and a respiratory part (where the gas exchange effectively takes place). In relation to gas dynamics during artificial ventilation, the mechanical properties of the respiratory system that have medical importance are resistance (RRS) and compliance (CRs). The change of RRs and CRS from their normal value is an indicator of potential pathology. Different techniques have been proposed to monitor RRS and CRS during ongoing mechanical ventilation. Among them, the most used is the Interrupted Flow Technique (IFT). See figure 1. When the flow is constant, the interruption of its delivery will cause a fall in pressure that is related to the resistive properties of the
166
respiratory system. Maintaining a constant gas volume in the lungs (preventing the patient from expiring) for some seconds, the recorded pressure in the airways after a transient, is related mainly to the elastic components of the lung. Although the described technique remains the gold standard for measuring respiratory mechanics in ventilated patients, new approaches are necessary. The weak point of IFT is the necessity of performing a maneuver on a ventilated patient, interrupting the sequence of ventilation and requiring an operator that pushes an inspiratory-hold button (end- Inspiratory Hold Maneuver, e-IHM). 25 -i
Airways Pressure 'PLAT
1
[cmH2Q] PEEP
DROP
Inspiratory Flpw
0
[s]
1
0.5 i Airways J**~~-^-~~M
Flow -Tidal Volume
[1/sec]
c = *"RS
R
RS_
Tidal Volume PPLAT-PEEP "PEAK ~ PDROP
Inspiratory Flow
Figure 1: Interrupted Flow Technique It is felt the necessity of a tool capable of a constant monitoring of respiratory mechanics, during ongoing mechanical ventilation, with features of robustness and noise immunity. We tested an Artificial Neural Networks (ANN) based technology for extracting respiratory system mechanics during ongoing mechanical ventilation.
167
2
Review of Literature
ANN-based technologies have been extensively used in different medical application. A review of literature was published in three issues of "The Lancet" [3,5,6], where it is possible to read that the most of these application regards computer-aided decision tools. Coming to signal analysis, it is possible to note that a large variety of tracings has been studied using ANNs: electrocardiograms [7], electromyograms [1], electroencephalograms [9] , arterial pulse waveforms [2] and evoked potentials in anesthesia [8]. The application of ANNs at respiratory signals analysis, shows that the most of the efforts have been concentrated on pattern identification problems. Leon and Lorini [10] investigated the capability of ANNs to identify spontaneous and pressure support ventilation modes from respiratory signals. Wilks and English [21] used ANNs to classify the efficiency of respiratory patterns, in order to predict changes of the 0 2 saturation. Snowden et al. [20] fed an ANN with blood gas parameters and the ventilator settings that determined them, in order to obtain new ventilator settings. Bright et al. [4] have described the use of an ANN to identify upper airway obstruction from flow-volume loops. Leon et al. [11] developed an ANN-based system to detect esophageal intubation using airways flow and pressure signals. Rasanen and Leon [19] , giving the respiratory tracings of healthy and oleic acid injured lungs of dogs, trained an ANN to assess the presence of lung damage and its extent. 3
Experiences of the authors
Aim of the experiments reported here was to test whether ANNs can assess respiratory system compliance and resistance starting from airway pressure and flow (PAW, F A W ) . TO train an ANN it is necessary to provide examples of the tracings to be faced during its use. In a preliminary phase, we used a model of the respiratory system developed on a computer via software, inspired to the studies of Otis [13,17] (See figure 2). The model provided curves that were obtained under different mechanical conditions. The ANN had to learn to associate the curves with the RRS and CRS that determined them. We implemented simulations of mechanical ventilation, varying for mechanical parameters and ventilatory support. These first experiments showed the applicability of the method [15,16] .Then we decided to evaluate the performances of the method in noise-affected conditions. In a joint project of the Department of Clinical Physiology of Uppsala University (Sweden) and the Department of Emergency and Transplantation of Bari University (Italy), we studied an animal model of acute lung injury (ALI). When moving to biological models, the first problem to face is to provide the large amount of examples to be used for ANN training.
168
Figure 2: The Otis model of the lung
Our idea was to use a well known effect of a substance, oleic acid (OA) when injected in a central vein of an animal. It modifies, by acting on the lung structures, the mechanical properties of the lung, creating a time related damage starting at its administration (see Neumann et al., [12] ). Ten pigs were ventilated in Volume Controlled - Constant Flow Mechanical Ventilation (VC-CFMV) and ALI was induced by multiple OA injections. We recorded PAW and FAW at different time intervals, in order to have different snapshots of respiratory mechanics while the damage was establishing. The ANN had to extract RRS and CRS from the recorded curves (that presented an e-IHM). During the training phase, the curves plus the expected RRS and CRs were given at the same time to the ANN. The expected RRS and CRS were obtained by applying on each curve the IFT performed manually by an expert. Then the trained ANN was tested: only the tracings were given and the yielded results were compared to the expected ones. The ANN was successfully trained. At this point we fed the ANN with tracings coming from a new group of four pigs. The aim was to observe its performance in a prospective way. Performance on the assessment of CRS remained very high, adjustment of ANN implementation was suggested for the assessment of RRS. The results were published in The Journal of Applied Physiology [14]. The described experiments demonstrated the applicability of the method by comparing the gold standard (the IFT) and ANN-based technologies on curves having an e-IHM. A further step was to train an ANN to extract CRS from breaths not having an e-IHM. Twentyfour pigs, ventilated in VC-CFMV, were studied. They underwent ALI induction by multiple OA injections. At different time intervals, recordings of
169
more than ten breaths were obtained during steady state (see figure 3). At the end of each series, an e-IHM was performed. This last breath was used to calculate CRS according to IFT. The breath preceding the one having the e-IHM (and not having any flow interruption) had to be given to the ANN (Dynamic Breath, DB). We gave to the ANN the Pressure/Volume loop of each DB and the CRS calculated on the successive breath (this last having an e-IHM). The ANN had to associate the DB to the static CRS obtained by IFT on the successive breath. The results showed that ANNs were able to extract static CRS without needing to stop inspiratory flow [18] .
Airways Pressure
Artificial Neural Network
Interrupter Technique
time Figure 3: Experimental design for training ANNs without using e-IHM
Conclusions These experiments show that it is possible to extract lung mechanics variables from respiratory tracings applying ANN-based technologies. Future work has to be focused on taking advantage of robustness and noise immunity of ANN. In the field of mechanical ventilation a new clinical necessity is arising: the aim to control a ventilator in "closed-loop". It concerns the use of information coming on-line from the connected patient (as mechanics variables and blood gases partial pressure), to titrate ventilation strategy, breath by breath. The capability of interfacing complex variables and the claimed robustness of their performances, suggest that ANNs will play a role in these future applications: both in information extraction and in signal integration.
References 1. Abel, E.W., P.C. Zacharia, A. Forster, and T.L. Farrow. Neural network analysis of the EMG interference pattern. Med Eng Phys 18 (1996) pp. 1217. 2. Allen, J. and A. Murray. Comparison of three arterial pulse waveform classification techniques. J Med Eng Technol20 (1996) pp. 109-114. 3. Baxt, W.G. Application of artificial neural networks to clinical medicine. Lancet 346 (1995) pp. 1135-1138. 4. Bright, P., M.R. Miller, J.A. Franklyn, and M.C. Sheppard. The use of a neural network to detect upper airway obstruction caused by goiter. Am J Respir Crit Care Med 157 (1998) pp. 1885-1891. 5. Cross, S.S., R.F. Harrison, and R.L. Kennedy. Introduction to neural networks. Lancet 346 (1995) pp. 1075-1079. 6. Dybowski, R. and V. Gant. Artificial neural networks in pathology and medical laboratories. Lancet 346 (1995) pp. 1203-1207. 7. Heden, B., H. Olin, R. Rittner, and L. Edenbrandt. Acute myocardial infarction detected in the 12-lead ECG by artificial neural networks. Circulation 96 (1997) pp. 1798-1802. 8. Huang J., H., Y. Lu, A. Nayak, and J. Roy R. Depth of anesthesia estimation and control. Ieee Transactions On Biomedical Engineering 46 (1999) pp. 7681. 9. Jando, G., R.M. Siegel, Z. Horvath, and G. Buzsaki. Pattern recognition of the electroencephalogram by artificial neural networks. Electroencephalogr Clin Neurophysiol 86 (1993) pp. 100-109. 10. Leon, M.A. and F.L. Lorini. Ventilation mode recognition using artificial neural networks. Comp Biomed Res 30 (1997) pp. 373-378. 11. Leon, M.A., J. Rasanen, and D. Mangar. Neural network-based detection of esophageal intubation. Anesth Analg 78 (1994) pp. 548-553. 12. Neumann, P., J.E. Berglund, E.F. Mondejar, A. Magnusson, and G. Hedenstierna. Dynamics of lung collapse and recruitment during prolonged breathing in porcine lung injury. J Appl Physiol 85 (1998) pp. 1533-1543. 13. Otis, A.B., C.B. Mckerrow, R.A. Bartlett, J. Mead, M.B. Mcilroy, N.J. Selverstone, and E.P. Radford. Mechanical factors in distribution of pulmonary ventilation. J.Appl.Physiol. (1956) pp. 427-443. 14. Perchiazzi, G., M. Hogman, C. Rylander, R. Giuliani, T. Fiore, and G. Hedenstierna. Assessment of respiratory system mechanics by artificial neural networks: an exploratory study. J Appl Physiol 90 (2001) pp. 18171824. 15. Perchiazzi, G., L. Indehcato, N. D'Onghia, C. Coniglio, A.M. Fanelli, and R. Giuliani. Assessing respiratory mechanics of inhomogeneous lungs using
171
16.
17.
18.
19.
20.
21.
artificial neural network: network design. Proceedings of APICE Congress (1998) pp. 209-212. Perchiazzi, G., L. Indelicate, N. D'Onghia, E. De Feo, A.M. Fanelli, and R. Giuliani. Assessing respiratory mechanics of inhomogeneous lungs using artificial neural network: preliminary results. Proceedings of APICE Congress (1998) pp. 213-216. Perchiazzi, G., S. Martino, G. Contino, M.E. Rosafio, F. Puntillo, V.M. Ranieri, and R. Giuliani. Alveolar overdistension during constant flow ventilation: study of a model. Acta Anaesthesiol Scand 40 (1996) pp. A210. Perchiazzi, G., L. Ruggiero, M. Hogman, R. Giuliani, T. Fiore, and G. Hedenstierna. Neural networks extract respiratory system compliance without needing to stop respiratory flow. Intensive Care Med 26 (2000) pp. S294. Rasanen, J. and M. Leon. Detection of lung injury with conventional and neural network-based analysis of continuous data. J Clin Monit 14 (1998) pp. 433-439. Snowden, S., K.G. Brownlee, S.W. Smye, and P.R.F. Dear. An advisory system for artificial ventilation of the newborn utilizing a neural network. Med Inform 18 (1993) pp. 367-376. Wilks, P.A.D. and M.J. English. A system for rapid identification of respiratory abnormalities using a neural network. Med Eng Phys 17 (1995) pp. 551-555.
This page is intentionally left blank
GENOMICS AND MOLECULAR BIOLOGY
This page is intentionally left blank
175
CLUSTER ANALYSIS OF D N A - C H I P DATA EYTAN DOMANY Department
of Physics
of Complex Systems, Weizmann Institute Rehovot 76100, Israel E-mail: [email protected]
of
Science,
DNA chips are novel experimental tools that have revolutionized research in molecular biology and generated considerable excitement. A single chip allows simultaneous measurement of the level at which thousands of genes are expressed. A typical experiment uses a few tens of such chips, each focusing on one sample such as material extracted from a particular tumor. Hence the results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. I provide here a very basic introduction to the subject, with no prior knowledge of any biology assumed. I will explain what genes are, what is gene expression and how it is measured by DNA chips. I will also explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments. I will present results obtained from analysis of data obtained from brain tumors and breast cancer.
1
Introduction
This talk and paper has three parts, aimed at explaining the meaning of its title. The first part is a crash course in biology, starting from genes and transcription and ending with an explanation of what DNA chips are. The second part is an equally concise introduction to cluster analysis, leading to a recently introduced method, Coupled Two-Way Clustering (CTWC), that was designed for the analysis and mining of data obtained by DNA chips. The third section puts the two introductory parts together and demonstrates how CTWC is used to obtain insights from the anaysis of gene expression data in several clinically relevant contexts, such as colon cancer and leukemia. 2 2.1
A Crash Course in Biology Gene Expression
I present here a severely oversimplified description of many very complex processes. My aim is to introduce only those concepts that are absolutely essential for understanding the data that will be presented and analyzed. The interested reader is referred to two excellent textbooks x ' 2 .
176 ^-CELL
NUCLEUS ^
\ D.N'A
MBOSOMR
4
Figure 1. Caricature of a eucaryotic cell: its nucleus contains DNA, whereas the ribosomes are in the cytoplasm.
Fig 1 depicts a schematic drawing of a eucaryotic cell, enclosed by its membrane. Embedded in the cell's cytoplasm is its nucleus, surrounded and protected by its own membrane. The nucleus contains DNA, an one dimensional molecule, made of two complementary strands, coiled around each other as a double helix. Each strand consists of a backbone to which a linear sequence of bases is attached. There are four kinds of bases, denoted by C,G,A,T. The two strands contain complementary base sequences and are held together by hydrogen bonds that connect two matching pairs of bases; G-C and A-T. A gene is a segment of DNA, which contains the formula for the chemical composition of one particular protein. Proteins are the working molecules of life; nearly every biological function is carried out by a protein. Topological^, a protein is also a chain, each link of which is one of 20 amino acids, connected head to tail by covalent peptide bonds. A gene is nothing but an alphabetic cookbook recipe, listing the order in which the amino acids are to be strung when the corresponding protein is synthesized. Genetic information is encoded in the DNA molecule, in the linear sequence in which the bases on the two strands are ordered; a triplet of three consecutive bases codes for one particular amino acid. The genome is the collection of all the chemical formulae for the proteins that an organism needs and produces. The genome of a simple organism such as yeast contains about 6400 genes; the human genome has between 40,000 and 60,000. An overwhelming majority (98%) of human DNA contains noncoding regions (introns), i.e. strands that do not code for any particular protein. Here is an amazing fact; every cell of a multicellular organism contains its entire genome! That is, every cell has the entire set of recipes the organism may ever need; the nucleus of each of the reader's cells contains every piece of information needed to make a copy (clone) of him/her! Clearly, cells of a complex organisms, taken from different organs, have entirely different func-
177 PROTEIN SYNTHESIS:
Figure 2. Transcription involves synthesis of mRNA, a copy of the gene encoded on the DNA (left). The mRNA molecules leave the nucleus and serve as the template for protein synthesis by the ribosomes (right).
tions and the proteins that perform these functions are very different; cells in our retina need photosensitive molecules, whereas our livers do not make much use of these. A gene is expressed in a cell when the protein it codes for is actually synthesized. There will be differences between the expression profiles of different cells, and even in a single cell there are variations of expression that are dictated by external and internal signals that reflect the state of the organism and the cell itself. Synthesis of proteins takes place at the ribosomes. These are enormous machines (made also of proteins) that read the chemical formulae written on the DNA and synthetise the proten according to the instructions. The ribosomes are in the cytoplasm, whereas the DNA is in the protected environment of the nucleus. This poses an immediate logistic problem - how does the information get transferred from the nucleus to the ribosome? 2.2
Transcription
The obvious solution of information transfer would be to rip out the piece of DNA that contains the gene that is to be expressed, and transport it to the cytoplasm. The engineering analogue of this strategy is the following. Imagine an architect, who has a single copy of a design for a building, stored on the hard disk of his PC. Now he has to transfer the blueprint to the construction site, in a different city. He probably will not opt for tearing out his hard disk and mailing it to the site, risking it being irreversibly lost or corrupted. Rather, he will prepare several diskettes, that contain copies of his design, and mail these in separate envelopes. This is precisely the strategy adopted by cells. When a gene receives a command to be expressed, the corresponding
178
double helix of DNA opens, and a precise copy of the information, as written on one of the strands, is prepared (see Fig 2). This "diskette" is a linear molecule called messenger R N A , m R N A and the process of its production, subsequent reading by the ribosome and synthesis of the corresponding protein a is called transcription. In fact, when many molecules of a certain protein are needed, the cell produces many corresponding mRNAs, which are transferred through the nucleus' membrane to the cytoplasm, and are "read" by several ribosomes. Thus the single master copy of the instructions, contained in the DNA, generates many copies of the protein (see Fig 2). This transcription strategy is prudent and safe, preserving the precious master copy; at the same time it also serves as a remarkable amplifier of the genetic information. A cell may need a large number of some proteins and a small number of others. That is, every gene may be expressed at a different level. The manner in which the instructions to start and stop transcription are given for a certain gene is governed by regulatory networks, which constitute one of the most intricate and fascinating subjects of current research. Transcription is regulated by special proteins, called transcription factors, which bind to specific locations on the DNA, upstream from the coding region. Their presence at the right site initiates or suppresses transcription. This leads us to the basic paradigm of gene expression analysis: The " biological state" of a cell (or tisue) and the ongoing biological processes are reflected by its expression profile: the expression levels of all the genes of the genome. These, in turn, are reflected in the concentrations of the corresponding mRNA molecules. This paradigm is by no means trivial or perfectly true. One may argue that the state of a cell at a given moment is defined by its chemical composition, i.e. the concentration of all the constituent proteins. There is no assurance that these concentrations are directly proportional to the concentrations of the related mRNA molecules. The rates of degradation of the different mRNA, the efficiency of their transcription to proteins, the rate of degradation of the proteins - all these may vary. Nevertheless, this is our working assumption; specifically, we assume that for human cells the expression levels of all 40,000 genes completely specify the state of the particular tissue from which the cells were taken. The question we turn to answer is "Actually the mRNA is "read" by one end of another molecule, transfer RNA; the amino acid that corresponds to the triplet of bases that has just been read is attached to the other end of the tRNA. This process, and the formation of the peptide bond between subsequent amino acids, takes place on the ribosome, which moves along the mRNA as it is read.
179
- how does one measure, for a given cell or tissue, the expression levels of thousands of genes? 2.3
DNA chips
A D N A chip is the instrument that measures simultaneously the concentration of thousands of different mRNA molecules. It is also referred to as a DNA microarray or macroarray, depending on the number of genes measured (see 3 for a recent review of the technology). DNA macroarrays, produced by Affymetrix4, can measure simultaneously the expression levels of up to 12,000 genes; the less expensive spotted arrays 5 do the same for several thousand. Schematically, this is done by dividing a chip (a glass plate of about 1 cm across) into "pixels" - each dedicated to one gene g. Billions of pieces of single strand DNA taken from g are attached to the dedicated pixel. The mRNA molecules are extracted from cells taken from the tissue of interest (such as tumor tissue obtained by surgery) and their concentration is largely enhanced. Fluorescent markers are attached to these mRNA molecules. The solution of marked and enhanced mRNA molecules is placed on the chip and the mRNA molecules, originally extracted from the tissue, are diffusing over the dense forest of single strand DNA that were placed on the chip. When such an mRNA encounters a part of the gene of which it is a perfect copy, it attaches to it - hybridizes - with a high affinity (considerably higher than with a bit of DNA of which it is not a perfect copy). When the mRNA solution is washed off, only those molecules that found their perfect match remain stuck to the chip. Now the chip is illuminated with a laser, and these stuck probes fluoresce; by measuring the light intensity emanating from each pixel, one obtains a measure of the number of probes that stuck, which, in turn, is proportional to the concentration of these mRNA in the investigated tissue. In this manner one obtains, from a chip on which Ng genes were placed, Ng numbers that represent the expression levels of these genes in that tissue. A typical experiment provides the expression profiles of several tens of samples (say Ns « 100), over several thousand (Ng) genes. These results are summarized in an Ng x Ns expression table; each row corresponds to one particular gene and each column to a sample. Entry Ags of such an expression table stands for the expression level of gene g in sample s. For example, the experiment on colon cancer, first reported by Alon et al 6 , contains Ng = 2000 genes whose expression levels passed some threshold, over Ns = 62 samples, 40 of which were taken from tumor and 22 from normal colon tissue. Such an expression table contains up to several hundred thousand numbers; the main issue addresed in this paper concerns the manner in which
180
such vast amounts of data are "mined", to extract from it biologically relevant meaning. Several obvious aims of the data analysis are the following: 1. Identify genes whose expression levels reflect biological processes of interest (such as development of cancer). 2. Groups the tumors ito classes that can be differentiated on the basis of their expression profiles, possibly in a way that can be interpreted in terms of clinical classification. If one can partition tumors, on the basis of their expression levels, into relevant classes (such as e.g. positive vs negative responders to a particular treatment), the classification obtained from expression analysis can be used as a diagnostic and thereupeutic tool 6 . 3. Finally, the analysis can provide clues and guesses for the function of genes (proteins) of yet unknown role c . This concludes the brief and very oversimplified review of the biology background that is essential to understand the aims of this research. In what follows I present a method designed for mining such expression data. 3 3.1
Cluster Analysis Supervised versus unsupervised analysis
Say we have two groups of samples, that have been labeled on the basis of some external (i.e. not contained in the expression table) information, such as clinical identification of tumor and normal samples; and our aim is to identify genes whose expression levels are significantly different for these two groups. Supervised analysis is the most suitable method for this kind of task. The simplest way is to treat the genes one at a time; for gene g we have Ns expression levels Ags, and we propose as a null hypothesis that the these numbers were picked at random, from the same distribution, for all samples s. There are well established methods to test the validity of such a hypothesis and to calculate for each gene a statistic whose value indicates whether the null hypothesis should be accepted or rejected, as well as the probability Pg for error (i.e. for rejecting the null hypothesis on the basis of the data, "For example one hopes to use the expression profile of a tumor to select the most effective therapy. c T h e statement "the human genome has been solved" means that the sequences of 40,000 genes are known, from which the chemical formulae of 40,000 proteins can be obtained. Their biological function, however, remains largely unknown.
181
even though it is correct). An alternative supervised analysis uses a subset of the tissues of known clinical label to train a neural network to separate them into the known classes on the basis of their expression profiles. The generalization ability of the network is then estimated by classifying a test set of samples (whose correct labels are also known), that was not used in the training process. The main disadvantage of supervised methods is their being limited to hypothesis testing. If one has some prior knowledge which can lead to a hypothesis, supervised methods will help to accept or reject it. They will never reveal the unexpected and never lead to new hypotheses, or to new partitions of the data. For example, if the tumors break into two unanticipated classes on the basis of their expression profiles, a supervised method will not be able to discover this. Another shortcoming is the (often very common) possibility of misclassification of some samples. A supervised method will not discover, in general, samples that were mistakenly labeled and used in, say, the training set. The alternative is to use unsupervised methods of analysis. These aim at exploratory analysis of the data, introducing as little external knowledge or bias as possible, and "let the data speak". That is, we explore the structure of the data on the basis of correlations and similarities that are present in it. In the context of gene expression, such analysis has two obvious goals: 1. Find groups of genes that have correlated expression profiles. The members of such a group may take part in the same biological process. 2. Divide the tissues into groups with similar gene expression profiles. Tissues that belong to one group are expected to be in the same biological (e.g. clinical) state. The method presented here to accomplish these aims is called clustering. 3.2
Clustering - statement of the problem.
The aims of cluster analysis 7 ' 8 can be stated as follows: given N data points, Xj, i = 1,...,N embedded in D-dimensional space (i.e. each point is represented by D components or coordinates), identify the underlying structure of the data. That is, peartition the N points into M clusters, such that points that belong to the same cluster are "more similar" to each other than two points that belong to different clusters. In other words, one aims to determine whether the N points form a single " cloud", or two, or more; in respectable unsupervised methods the number of clusters, M, is also determined by the algorithm.
182 r(RESOLUTION) ik
H i l l l J | [ l l l l l l l l lll.l I I I I
Figure 3. Left: Each zebra or giraffe is represented as a point on the neck length - coloration shape plane. The points form two clouds marked by the black ellipses. At higher resolution (controlled by the parameter T), we notice that the cloud of the giraffes is in fact composed of two slightly separated sub clouds. The corresponding dendrogram is presented on the right hand side.
The clustering problem, as stated above, is clearly ill posed. No definition was given for what is "more similar"; furthermore, as we will see, the manner in which data points are assigned to clusters depends on the resolution at which the data are viewed. The last concern is addressed by generating a dendrogram or tree of clusters, whose number and composition varies with the resolution that is used. To clarify these points I present a simple example for a process of "learning without a teacher", of which clustering constitutes a particular case. Imagine the following experiment; find a child who has never seen either a giraffe or a zebra, and expose him to a large number of pictures of these animals without saying a word of instruction. On each animal shown the child performs a series of D measurements, two of which are most certainly L, the length of the neck, and E, the excentricity of the coloration (i.e. the ratio of the small dimension and the large). Each animal is represented, in the child's brain, as a point i n a D dimensional space. Fig. 3 depicts the projection of these points on the two dimensional (L,E) subspace. Even though initially the child will see "animals" - i.e. assign all points to a single cloud - with time he will realize (as his resolution improves) that
183
in fact the data break into two clear clouds; one with small values of L and E, corresponding to the zebras, and the second - the giraffes - with large L and E PS 1. The child, not having been instructed, will not know the names of the two kinds of animals he was exposed to, but I have no doubt that he will realize that the pictures were taken of two different kinds of creatures. He has performed a clustering operation on the visual data he has been presented with. Let us pause and consider the data and the statements that were made. Are there indeed two clouds in Fig 3? As we already said, when the data are seen with low resolution, they appear to belong to a single cloud of animals. Improved resolution leads to two clouds - and closer inspection reveals that in fact the cloud of giraffes breaks into two sub-clouds, of points that have similar colorations but different neck lengths! Apparently there were mature fully developed giraffes with long necks, and a group of young giraffes with shorter necks. Finally, when resolution is improved to the level of discerning individual differences between animals, each one forms his own cluster. Thus the proper way of representing the structure of the data is in the form of a dendrogram, also shown in Fig 3. The vertical axis corresponds to a parameter T that represents the resolution at which the data are viewed. The horizontal axis is nominal - it presents a linear ordering of the individual data points (as identified by the final partition, in which each cluster consists of one individual point). The ordering is determined by the entire dendrogram - it can be thought of as a highly nonlinear mapping of the data from D to one dimension. In any clustering algorithm that we use, we should look for the two features mentioned here, of (a) yielding a dendrogram that starts with a single cluster of N points and ends with N single-point clusters, and (b) providing a one-dimensional ordering of the data. 3.3
Clustering Algorithms
There are numerous clustering algorithms. Even though each aims at achieving a truly unsupervised and objective method, every one has built in, implicitly or explicitly, the bias of its inventor as to how a " cluster should look" - e.g. a tight, spherical cloud, or a continuous region of high relative density and arbitrary shape, etc. Average linkage 7 , an agglomerative hierarchical algorithm that joins pairs of clusters on the basis of their proximity, is the most widely used for gene expression analysis 9 . K-means 7 , s and Self Organized Maps 10 are algorithms that identify centroids or representatives for a preset number of groups; data points are assigned to clusters on the basis of their distances from the cen-
184
troids. There are several physics related clustering algorithms, e.g. Deterministic Annealing n and Coupled Maps 12 . Deterministic Annealing uses the same cost function as K-means, but rather than minimizing it for a fixed value of clusters K, it performs a statistical mechanics type analysis, using a maximum entropy principle as its starting point. The resulting free energy is a complex function of the number of centroids and their locations, which are calculated by a minimization process. This minimization is done by lowering the temperature variable slowly and following minima that move and every now and then split (corresponding to a second order phase transition). Since it has been proved that in the generic case the free energy function exhibits first order transitions, the deterministic annealing peocedure is likely to follow one of its local minima. We use another physics-motivated algorithm, which maps the clustering problem onto the statistical physics of granular ferromagnets 13 . 3.4
Superparamagnetic
Clustering (SPC)
14
The algorithm assigns a Potts spin Si to each data point i. We use q = 20 components; the results depend very weakly on q. The distance matrix DyHXi-X.I
(1)
is constructed. For each spin we identify a set of neighbors; a pair of neighborings interact by a ferromagnetic coupling J^ = f{D{j) with a decreasing function / . We used a Gaussian decay, but since the interaction between nonneighbors is set to J = 0, the precise form of the function has little influence on the results. The energy of a spin configuration {5} is given by ^[{5}] = - S ^ - [ l - * ( 5 i > 5 j ) ]
(2)
The summation runs over pairs of neighbors. We perform a Monte Carlo simulation of this disordered Potts ferromagnet at a series of temperatures. At each temperature T we measure the spin-spin correlation for every pair of neighbors, Gij=<[S(Si,Sj)-l/q]/[l-l/q]>
(3)
where the brackets < • > represent an equilibrium average of the ferromagnet (2), measured at T. If i and j belong to the same ordered "grain", we will have Gij « 1, whereas if the two spins are uncorrelated, dj « 0. Hence we threshold the values of G^; if G^ > 0.5 the data points i and j are connected
185
by an edge. The clusters obtained at temperature T are the connected components of the resulting graph. In fact, the simple thresholding is supplemented by a "directed growth" process, described elsewhere. At T = 0 the system is in its ground state, all Si have the same value, and this procedure generates a single cluster of all N points. At T = oo we have iV independent spins, all pairs of points are uncorrelated and the procedure yields N clusters, with a single point in each. Hence clearly T controls the resolution at which the data are viewed; as it increases, we generate a dendrogram of clusters of decreasing sizes. This algorithm has several attractive features, such as (i) the number of clusters is determined by the algorithm itself and not externally prescribed (ii) Stability against noise in the data; (iii) ability to identify a dense set of points, that form a cloud of an irregular, non-spherical shape, as a cluster, (iii) generating a hierarchy (dendrogram) and providing a mechanism to identify in it robust, stable clusters. The physical basis for the last feature is that if a cluster is made of a dense set of points on a background of lower density, well separated from other dense regions, it will form (become an independent magnetized grain) at a low temperature T\ and dissociate into subclusters at a high temperature T 2 . The ratio of the temperatures at which a cluster "dies" and "is born", R = T2/Ti, is a measure of its stability. SPC was used in a variety of contexts, ranging from computer vision 15 to speech recognition 14 . Its first direct application to gene expression data has been 16 for analysis of the temporal dependence of the expression levels in a synchronized yeast culture 17 ' 9 , identifying gene clusters whose variation reflects the cell cycle. d Subsequently, SPC was used 18 to identify primary targets of p53, a tumor suppressor that acts as a trascription factor of central importance in human cancer. Our ability to identify stable (and statistically significant) clusters is of central importance for our usage of SPC in our algorithm for gene expression analysis. 4 4-1
Clustering Gene Expression Data Two way clustering
The clustering methodology described above can be put to use for analysis of gene expression data in a fairly straightforward way, bearing in mind the d
We have also discovered in this analysis that the samples taken at even indexed time intervals were placed in a freezer!
186
questions and aims metioned above. We clearly have two main seemingly distinct aims; to identify groups of co-regulated genes which probably belong to the same machinery or network, and to identify molecular characteristics of different clinical states and discriminators between them. The obvious way to go about these two tasks is by Two Way Clustering. First view the N samples as the objects to be clustered; each is represented by a point in a G dimensional "feature space", where G is the number of genes for which expression levels were measured (in fact one works only with a subset of the genes on a chip - those that pass some preset filters). This analysis yields a dendrogram of samples, with each cluster containing samples with sizeable pairwise similarities of their expression profiles measured over the entire set of genes. The second way of looking at the same data is by considering the genes as the objects to be clustered; G data points embedded in an N dimensional feature space. This analysis groups together genes on the basis of their correlations over the full set of samples. In Fig. 4 we present the results of two-way clustering data obtained for 36 brain tumors (see th enext section for details). We show here the expression matrix, with the rows corresponding to the genes and columns to samples. The dendrograms the correspond to the two clustering operations described above are shown next to the matrix, whose rows and columns have been already permuted according to the linear order imposed by the two dendrograms. This is the type of analysis that has been widely used in the gene expression clustering literature. It represents a holistic approach to the problem; using every piece of reliable information to look at the entire grand picture. This apprach does have, however, several obvious shortcomings; overcoming these was the motivation to develop a method which can be viewed as taking a more reductioninst approach, while improving significantly the signal to noise ratio of the processed data. 4-2
Coupled Two Way Clustering - Motivation
The main motivation of introducing CTWC 19 was to increase the signal to noise ratio of the expression data. There are two different kinds of "noise" the method is designed to overcome. The first of these is a problem generated by the very advantage and most exciting aspect of DNA-chips - the ability to view expression levels of a very large number of genes simultaneously. Say one stays, after initial filtering, with two thousand genes, and one wishes to study a particular aspect of the samples (e.g. differentiating between several kinds of cancer). Chances are
187
Figure 4. Two-way clustering of brain tumor data; the two dendrograms, of genes and samples, are shown next to the expression matrix.
t h a t the genes which participate in the pathology of interest constitute only a small subset of t h e total 2000 - say we have 40 genes whose expression indeed distinguishes the samples on the basis of the process t h a t is studied. Hence the desired "signal" resides in 2 % of the total genes t h a t are analysed; the remaining 98 % behave in a way t h a t is uncorrelated with these and introduce nothing but noise. T h e contribution of the relevant genes to the distance between a pair of samples will be overwhelmed by the random signal of the much larger irrelevant set. My favorite example for this situation is t h a t of a football stadium, in which 99,000 spectators scream at random, while 1000 others are singing a coherent t u n e . These 1000 are, however, scattered all over the stadium - the chance t h a t a listener, standing at the center of the field, will be able to identify the t u n e are very small. If only we could identify the singers, concentrate t h e m into one stand and point a directional microphone at t h e m - we could hear the signal! In the language of gene expression analysis, we would like to identify the relevant subset of 40 genes, and use only their expression levels to characterize the samples. In other words, t o project t h e d a t a p i n t s representing the samples from t h e 2000 dimensional space in which they are embeddded, down to a 40 dimensional subspace, and to assess t h e structure of the d a t a (e.g. - do they form two or more distinct groups?) on the basis of this projected representation. A similar effect may arise due to the subjects; a partition of
188
the genes which is much more relevant to our aims could have been obtained had we used only a subset of the samples. Both these examples have to do with reducing the size of the feature space. Sometimes it is important to use the reduced set of features to cluster only a subset of the objects. For example, when we have expression profiles from to kinds of leukemia patients, ALL and AML, with the ALL patients breaking further into two sub-families, of T-ALL and B-ALL, the separation of the latter two subclouds of points may be masked by the interpolating presence of the AML group. In other words, a special set of genes will reveal an internal structure of the ALL cloud only when the AML cloud is removed. These two statements amount to a need to work with special submatrices of the full expression matrix. The number of such submatrices is, however, exponential in the size of the dataset, and the obvious question that arises is how can one select the "right" submatrices in an unsupervised and yet efficient way? The CTWC algorithm provides a heuristic answer to this question. 4-3
Coupled Two Way Clustering -
Implementation
CTWC is an iterative process, whose starting point is the standard two way clustering mentioned above. Denote the set of all samples by SI and that of all genes used as Gl. The notation S1(G1) stands for the clustering operation of all samples, using all genes, and G1(S1) for clustering the genes using all samples. From both clustering operations we identify stable clusters of genes and samples, i.e. those for which the stability index R exceeds a critical value and whose size is not too small. Stable gene clusters are denoted as GI with 1=2,3,... and stable sample clusters as SJ, J=2,3,... In the next iteration we use every gene cluster GI (including 1=1) as the feature set, to characterize and cluster every sample set SJ. These operations are denoted by SJ(GI) (we clearly leave out S1(G1)). In effect, we use every stable gen cluster as a possible "relevant gene set"; the submatrices defined by SJ and GI are the ones we study. Similarly, all the clustering operations of the form GI(SJ) are also carried out. In all clustering operations we check for the emergence of partitions into stable clusters, of genes and samples. If we obtain a new stable cluster, we add it to our list and record its members, as well as the clustering operation that gave rise to it. If a certain clustering operation did not give rise to new significant partitions, we move down the list of gene and sample clusters to the next pair. This heuristic identification of relevant gene sets and submatrices is nothing but an exhaustive search among the stable clusters that were generated. The number of these, emerging from G1(S1) is a few tens, whereas S1(G1)
189
generates a few stable sample clusters usually. Hence the next stage typically involves less than a hundred clustering operations. These iterative steps stop when no new stable clusters beyond a preset minimal size are generated, which usually happens after the first or second level of the process. In a typical analysis we generate between 10 and 100 interesting partitions, which are searched for biologically or clinically interesting findings, on the basis of the genes that gave rise to the partition and on the basis of available clinical labels of the samples. It is important to note that these labels are used a posteriori, after the clustering has taken place, to interpret and evaluate the results. 5
Applications of C T W C for gene expression data analysis
So far CTWC has been applied primarily to analysis of data from various kinds of cancer. In some cases we used publicly available data, with no prior contact with the groups that did the original acquisition and analysis. Our initial work on colon cancer 6 and leukemia 20 fall in this category. Subsequently we collaborated with a group at the University Hospital at Lausanne (CHUV) on Glioblastoma - in this work we were involved from early in the data acquisition stage. Our current collaborations include work on colon cancer and breast cancer. In the latter case we worked with publicly available data, but its choice and the challenge to improve on existing analysis came from our collaborators. We are also involved in work on leukemia and on meiosis 21 in yeast; finally, the same method was applied successfully 22 to analyze data obtained from an "antigen chip", used to study the antibody repertoire of subjects that suffer from autoimmune diseases, such as diabetes. I will limit the discussion here to presentation a few select results obtained for glioblastoma 23 and for breast cancerItaiMSc. 5.1
CTWC analysis of brain tumors (gliomas)
Brain tumors are classified into three main groups. Low grade astrocytoma (A) are small sized tumors at an early stage of development. Cancerous growth may recur after their removal, giving rise to secondary gliomas (SC). The third kind are primary (PR) glioblastoma (GBM); this classification is assigned when at the stage of initial diagnosis and discovery the tumor is already of a large size. A dataset 51 of 36 samples was obtained by a group from the University Hospital at Lausanne 2 3 . 17 of these were from PR GBM, 4 - from SC, 12 were from A and 3 from human glioma cell lines grown in culture. Expression profiles were obtained using Clontech Atlas 1.2 arrays of
190
m
•T^fTTTT'T"
<$-
,!©-
&).<&)
a
5
10
is
Figure 5. The operation 51(G5), clustering all tumors on the basis of their expression profiles over the genes of cluster G5. A stable cluster, 511 emerges, containing all the non-primary tumors and only two of the primaries.
1176 genes. For each gene g the measured expression value for tumor sample s was divided by its value in a reference sample composed of a mixture of normal brain tissue. We filtered the genes by keeping only those for which the maximal value of this ratio (over the 36 samples) exceeded its minimal value by at least a factor of two. 358 genes passed this filter and constituted our full gene set (71, which was clustered using expression ratios over 5 1 . The G1(S1) clustering operation (see Fig 4) yielded 15 stable gene clusters. The complementary operation S1(G1) did not yield any partition of the samples that could be given clear clinical interpretation. One of the stable gene clusters, G5, contained 9 genes. When the expression levels of only these genes are used to characterize the tumors [in the operation denoted 51(G5)], a large and stable cluster, 511, of 21 tumors, emerged (see Fig 5. This cluster contained all the 12 astrocytoma and all 4 SC tumors. Three of the remaining 5 tumors of 511 were cell lines and two were registered as PR GBMs. Pathological diagnosis was redone for these two tumors; one was found to contain a significant oligoastrocytoma component, and much of the piece of the other, that was used for RNA extraction, was diagnosed as of normal brain ifiltrative zone. Hence the expression levels of G5 gave rise to a nearly perfect separation of PR from non-PR (A and SC tumors). The genes of G5 were significantly upregulated in PR and downregulated in A and SC. These findings made good biological sense, since three of the genes in G5 (VEGF, VEGFR and PTN) are related to angiogenesis. Angiogenesis is
191
the process of development of blood vessels, which are essential for growth of tumors beyond a certain critical size, bringing nutrition to and removing waste from the growing tissue. Since PR GBM are large, Upregulation of of genes that are known to be involved in angiogenesis is a logical consequence of the fact that PR GBM are large tumors. An important application of the method concerns investigation of the genes that belong to G5; in particular, one of the genes of G5, IGFBP2, was of considerable interest with little existing clues to its function and role in cancer development. Our finding, that its expression is strongly correlated with the angiogenesis related genes came as a surprise that was worth detailed further study. The co-expression of genes from the IGFBP family with VEGF and VEGFR has been demonstrated in an independent experiment that tested this directly for cell lines under different conditions. This example demonstrates the power of CTWC; a subgroup of genes with correlated expression levels was found to be able to separate P R from non-PR GBM, whereas using all the genes introduced noise that wiped out this separation. In addition, by looking at the genes of this correlated set, we provided an indication for the role that a gene with previously unknown function may play in the evolution of tumors. For other findings of interest in this data set we refer the reader to the paper by Godard et al 23 . 5.2
Breast Cancer Data
In a different study, on breast cancer, we used publicly available expression data of Perou et al 24 . The choice of this particular data set was guided by D. Botstein, who informed us that these were of the highest quality, were submitted to most extensive effort for analysis and challenged us to demonstrate that our method can extract findings that eluded previous treatments. The results of this study are available25; here I present only one particular new finding. The Stanford data contained expression profiles of 65 human samples (51) and 19 cell lines. 40 tumors were paired, with samples taken before and after chemotherapy (with doxorubicin), to which 3 (out of 20) subjects responded positively. 1753 genes (Gl) passed initial filtering; the clustering operation 51 (Gl), of all the samples using their expression profiles over all these genes, did not yield any clear meaningful partitions. Perou et al realized the same point that has motivated us to construct CTWC, namely that one has to prune the number of genes that are used in order to improve the signal to noise ratio. They ranked the genes according to a figure of merit they in-
192
-C
-fcj
iJP
after10
fttt^d
3ar before \}0
n
\ ^
10
20
30
40
50
10
20 30 Genes
Figure 6. The operation S1(G46), clustering all tumors on the basis of the proliferation related genes of G46. We found a cluster (b) which contained all three samples from patients for who chemotherapy was successful, taken before the treatment. Cluster (b) contained 10 out of the 20 "before" samples.
troduced, which measures the proximity of expressions of the two samples taken from the same patient before and after chemotherapy, versus the (expectedly larger) dissimilarity of samples from different patients. The 496 top scorers constituted their "intrinsic gene set" which was then used to cluster the samples. We did not use this intrinsic set but rather, applied CTWC on the full sets of samples and genes. In the G1(S1) operation we found several stable gene clusters. One of these, G46, contained 33 genes, whose expression levels correlate well with the cells' proliferation rates. Only 2 out of these made it into the intrinsic set of Perou et al; hence they could not have found any result that we obtained on the basis of these genes. The operation 51(G46) identified three main clusters; (a) of samples with
193
low proliferation rates - these are 'normal breast - like'; (b) samples with intermediate, and (c) with high proliferation rates. Interestingly, the "before treatment" samples taken from all three tumors for which chemotherapy did succeed were in cluster (b), whereas the corresponding 'after treatment' samples were in (a), the 'normal breast - like' cluster. Therefore the genes of G46 can perhaps be used a posteriori, to indicate success of treatment on the basis of their expression measured after treatment and, more importantly, may have predictive power with respect to the probability of success of the doxorubicin therapy, that was used. Intermediate expression of the G46 genes may serve as a marker for a relatively high success rate of the Doxorubicin treatment (3/10 versus 3/20 for the entire set of "before treatment" samples). Clearly these statements are backed only by statistics based on small samples, but they do indicate possible clinical applications of the method, provided experiments on more samples strengthen the statistical reliability of these preliminary findings. 6
Summary
DNA chips provide a new, previously unavailable glimpse into the manner in which the expression levels of thousands of genes vary as a function of time, tissue type and clinical state. Coupled Two Way Clustering provides a powerful tool to mine large scale expression data by identifying groups of correlated (and possibly co-regulated) genes which, in turn, are used to divide the samples into biologically and clinically relevant groups. The basic "engine" used by CTWC is a clustering algorithm rooted in the methodology of and insight gained from Statistical Physics. The extracted information may enlarge our body of general basic knowledge and understanding, especially of gene rgulatory networks and processes. In addition, it may provide clues about the function of genes and their role in various pathologies; one can also hope to develop powerful diagnostic and prognostic tools based on gene microarrays. Acknowledgments I have benefited from advice and assistance of my students G. Getz, I. Kela, E. Levine and many others. I am particularly grateful to the community of biologists who were extremely open minded, receptive and helpful at every stage of our entry to their fields: D. Givol provided our first new data, as well as invaluable advice and encouragement. The CHUV group, in particular Monika Hegi and Sophie Godard, shared their data and knowledge generously,
194
D. Notterman and U. Alon were instrumental in getting us started on their colon cancer experiment, D. Botstein guided us towards his best breast cancer data, I. Cohen was a powerful driving force motivating us to apply our methods to "antigen chips" which he invented. Our work has been supported by grants from the Germany-Israel Science Foundation (GIF) the Israel Science Foundation (ISF) and the Leir-Ridgefield Foundation. References 1. B. Alberts, D. Bray, J. Lewis,M. Raff, K. Roberts and J. D. Watson, Molecular Biology of the Cell, 3rd edition, (Garland Publishing, New York, 1994). 2. J. L. Gould and W. T. Keeton, Biological Science, 6th edition (W.W. Norton & Co., New York, London, 1996). 3. A. Schulze and J. Downward, Nature Cell. Biol. 3, 190 (2001) 4. See http://www.affymetrix.com for information. 5. See http://cmgm.stanford.edu/pbrown/mguide/index.html 6. U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine Proc. Natl Acad. Sci. USA 96, 6745 (1999). 7. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, (Prentice Hall, Englewood Cliffs NJ, 1988). 8. O.R. Duda, P. E. Hart and D. G. Stork, Pattern Classification (John Wiley & Sons Inc., New York 2001) 9. M. Eisen, P. Spellman, P. Brown, and D. Botstein, Proc. Natl. Acad. Sci. USA 95, 14863 (1998). 10. T. Kohonen, Self-Organizing Maps (Springer, Berlin 2001) 11. K. Rose, E. Gurewitz and G. C. Fox, Phys. Rev. Lett 65, 945 (1990). 12. L. Angelini, F. De Carlo, C. Marangi, M. Pellicoro and S. Stramaglia, Phys. Rev. Lett. 85, 554 (2000). 13. M. Blatt, S. Wiseman, and E. Domany, Phys. Rev. Lett. 76, 3251 (1996). 14. M. Blatt, S. Wiseman, and E. Domany, Neural Comp. 9, 1805 (1997). 15. E. Domany, M. Blatt, Y. Gdalyahu and D. Weinshall, Comp.Phys.Comm. 121, 5 (1999). 16. G. Getz, E. Levine, E. Domany, and M. Zhang Physica A 279, 457 (2000). 17. P. T. Spellman et al, Mol.Biol.Cell 9, 3273 (1998). 18. K. Kannan, N. Amariglio, G. Rechavi, J. Jakob-Hirsch, I. Kela, N. Kaminski, G. Getz, E. Domany and D. Givol, Oncogene 20, 2225 (2001). 19. G. Getz, E. Levine and E. Domany, Proc. Natl. Acad. Sci. USA 97, 12079 (2000).
195
20. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C D . Bloomfield, and E.S. Lander, Science 286, 531 (1999). 21. M. Primig et al, Nature Genetics 26, 415 (2000). 22. F. Quintana, G. Getz, G. Hed, E. Domany and I. R. Cohen, (submitted 2002). 23. Human gliomas and contributes to their classification, S. Godard, G. Getz, H. Kobayashi, M. Nozaki, A.-C. Diserens, M.-F. Hamou, R. Stupp, R. C. Janzer, P. Bucher, N. de Tribolet, E. Domany, M. E. Hegi,(submitted, 2002). 24. C M . Perou et al, Nature 406, 747 (2000). 25. I. Kela, Unraveling Biological Information from Gene Expression Data, Using Advanced Clustering Techniques, (M.Sc. Thesis, Weizmann Institute of Science, 2001).
196 CLUSTERING mtDNA SEQUENCES FOR H U M A N EVOLUTION STUDIES
C. MARANGI Dipartimento Interateneo di Fisica, Universita di Bari, 70126 Bari, Italy Istituto per le Applicazioni del Calcolo "M. Picone ", Sezione di Bari, CNR, 70126 Bari, Italy E-mail: c. marangi@area. ba. cnr. it L. ANGELINI, M. MANNARELLI, M. PELLICORO , S. STRAMAGLIA Dipartimento Interateneo di Fisica, Universita di Bari, 70126 Bari, Italy Center of Innovative Technologies for Signal Detection and Processing, 70126 Bari, Italy M. ATTIMONELLI Dipartimento
di Biochimica e di Biologia Molecolare,
Universita di Bari, 70126 Bari, Italy
M. DE ROBERTIS Dipartimento
di Genetica ed Anatomia Patologica,
Universita di Bari, 70126 Bari, Italy
L. NITTI Center of Innovative
D.E.T.O., Universita di Bari, 70126 Bari, Italy Technologies for Signal Detection and Processing,
70126 Bari, Italy
G. PESOLE Dipartimento
di Fisiologia e Biochimica
Generali, Universita di Milano,20133Milano,
Dipartimento
di Biochimica e di Biologia Molecolare,
Italy
C. SACCONE Universita di Bari, 70126 Bari, Italy
M. TOMMASEO Dipartimento
di Zoologia,
Universita di Bari, 70126 Bari, Italy
A novel distance method for sequence classification and intraspecie phylogeny reconstruction is proposed. The method incorporates biologically motivated definitions of DNA sequence distance in the recently proposed Chaotic Map Clustering algorithm (CMC) which performs a hierarchical partition of data by exploiting the cooperative behavior of an inhomogeneous lattice of chaotic maps living in the space of data. Simulation results show that our method outperforms, on average, the simple and most widely used approach to intra specie phylogeny reconstruction based on the Neighbor Joining (NJ) algorithm. The method has been tested on real data too, by applying it to two distinct datasets of human mtDNA HVRI haplotypes from different geographical origins. A comparison with results from other well known methods such as Stochastic Stationary Markov method and Reduced Median Network has also been performed.
197
1
Introduction
The study of genetic diversity provide a powerful instrument to infer the historical patterns of human evolution by assessing relationships among populations on the basis of nucleotide composition of specific DNA sequences [12]. Limiting ourselves to consider intra specie evolution, we assume that a molecular clock exists so that DNA mutations appear at a more or less constant speed (on a large time scale) for all evolutionary lines. That results in a correlation between the mutation rate and the length of time intervals: the differences at molecular level would play the role of estimators of the divergence time among groups belonging to the same specie. The final goal is the reconstruction of a phylogenetic tree, i.e. the evolutionary temporal lines through which human groups differentiate. In the debate about the appropriate genetic analysis for evolution studies, a prominent role has been achieved by analysis of mitochondrial DNA (mtDNA). Although the mtDNA contains only a small percentage of the total information of the human genome (0.0006%), it is known to represent an efficient marker of biological intra specific diversity. This haploid genome is not recombinant and is transmitted through maternal lines, i.e. it is inherited as a single block or haplotype. Moreover the mtDNA exists in a large number of copies in each cell and shows a higher mutation rate than nuclear genes, which appears to be a relevant feature for estimation of genetic distances and for ancient DNA studies [1]. In particular the HVRI and HVRII hypervariable regions of the human mtDNA D-loop have been extensively used to study human population history and to estimate the age of MRCA (Most Recent Common Ancestor), a still controversial problem [8]. In order to reconstruct a phylogeny within a human population of a given geographical area, individuals belonging to different groups and sharing the same pattern of variant sites (haplotype) are clustered in extended macro classes (haplogroups) according to the measured genetic distance among different haplotypes. If the haplogroup discrimination is performed in a hierarchical way, results at different hierarchical levels can be identified as different branch levels of a phylogenetic tree. It is clear that the choice of a clustering methodology is crucial to obtain a classification hierarchy which is coherent with the anthropological observation. Moreover, since we are typically dealing with datasets in high dimensional spaces (genetic sequences may be as long as several thousands of nucleotide basis) we are looking for clustering algorithms with low computational complexity. In this paper we propose a novel approach to phylogeny reconstruction based on the recently proposed Chaotic Map Clustering algorithm (CMC) [2,3] which relies on the cooperative behaviour of a inhomogeneous lattice of coupled chaotic maps. In the original formulation , CMC is a clustering tool to process an input dataset of arbitrary nature. To tailor CMC to the specific application we define new
198 distance measures biologically motivated by the heterogeneous variation rates at different sequence sites. The paper is organized as follows. In section 2, we briefly describe the CMC algorithm together with a method for parameter estimation. In section 3 we report simulation results in order to compare CMC with the most widely applied algorithm for phylogeny reconstruction, namely the Neighbor Joining algorithm. In section 4 two alternative definitions of sequence distance are proposed to take into account the heterogeneous variability of sequence sites. In section 5 results of application to haplogroup classification of datasets from the Pacific area are briefly described. Conclusions are drawn in section 6. 2
CMC algorithm
A new clustering algorithm has been recently proposed [2], which is based on the cooperative behaviour of a inhomogeneous lattice of coupled chaotic maps whose dynamics leads to the formation of clusters of synchronized maps sharing the same chaotic trajectory [11]. Cluster structure is biased by the architecture of the couplings among the maps and a full hierarchy of clusters can be achieved using the mutual information of map pair states as similarity index. The Chaotic Map Clustering (CMC) performs a non-parametric partition of a data without prior assumptions about the number of classes and the geometric distribution of clusters. In the following we briefly review some bases of the CMC algorithm. Let us consider a set of N points (representing here DNA sequences) in a Ddimensional space (with D equal to the number of variant sites in the sequence). We assign a real dynamical variable x, e [-1,1] to each point and define pair-interactions JtJ - exp\-dv/2a::) where a is the local length scale and dy is a suitable measure of distance between points i and j in our D-dimensional space. The time evolution of the system is given by:
*,('+')=-^-I-V?V^ where C, = JV, ; ., and f(x) = 1 -2x2. Due to the choice of the function/the equation represents the dynamical evolution of chaotic maps x, coupled through pair interactions J . The lattice architecture is fully specified by fixing the value of a as the average distance of A>nearest neighbors pairs of points in the whole system (our results are quite insensitive to the particular value of A:). To save computational time, we consider only interactions between maps whose distance is less than 3 a, setting all the other J., to zero.
199 Starting from a random initial configuration of x, the dynamical equations are iterated until the system attains its stationary regime, corresponding to a macroscopic attractor which is independent of the initial conditions. To study the correlation properties of the system, we consider the mutual information [19] between maps as follows. If the state of element 7 is x, >0 then it is assigned a value 1, otherwise it is assigned 0: this generates a sequence of bits, in a set time interval, which allows the calculation of the Boltzmann entropy //j for the i-th map. In a similar way the joint entropy Hy is calculated for each pair of maps and finally the mutual information is defined as / = H, + Hj - Hv . The mutual information is a good measure of correlation [18] and it is precision independent, due to the coarse graining of the dynamics. If maps i and j evolve independently then / = 0 ; if the two maps are exactly synchronized then the mutual information achieves its maximum value, here equal to ln2, due to our choice of the function / The algorithm identifies clusters with the linked components of the graph obtained by drawing a link between all the pairs of maps whose mutual information exceeds a threshold 0. Since long-range correlation is present, all the scales in the dataset contribute to the mutual information pattern and the threshold 6 controls the resolution at which data are clustered. Each hierarchical clustering level corresponds to a dataset partition with a finite stability region in the 0 parameter. The most stable solution identifies the optimal partition of the given dataset. The computational cost of the CMC algorithm scales as Nlog(N) with the dataset size N. We note that since clustering is performed in a hierarchical way, the algorithm can provide an effective tool foi phylogeny reconstruction. Results at different hierarchical levels can be identified as different branch levels of a phylogenetic tree, wheras the stable clustering solution represents the terminal branching of the tree. We limit ourselves to consider a valid phylogenetic tree whenever we are dealing with homologous sequences verifying stationary conditions on the stochastic evolutionary process, i.e. when one compares processes having the same type of dynamics on different lineages, as it is the case of intra specie evolution. Let us spend a few words on the selection of the algorithm parameters, namely the number of interacting maps k and the resolution 0. We stress here that the algorithm is a deterministic one, since the dependence from the initial random configuration of the maps is wiped out because of the peculiar dynamics of chaotic systems . That implies a dependence of final results on the particular choice of the external parameter, even though, as already tested in several contexts of application [3], clustering solutions provided by the CMC algorithm display robustness against quite a rough tuning of k and 0. To improve the reliability of clustering results, a validation technique has been recently proposed [10] that provides a satisfying solution to the parameter setting problem as well as a good test of the robustness of a given cluster algorithm with
200
respect to noise. Hereafter we only describe the guidelines of the method, which can be viewed as an alternative to bootstrap for assessing the reliability of a cluster analysis. Further details and applications to different algorithms can be found in [10,3]. The method can be easily implemented as follows. A set of values V for the cluster parameters is used to perform a clustering on the N points of a given dataset and on a number of subsets of size rN (0
Simulations
Simulations have been performed to compare the performances of CMC with a
tree 1
tree 2
Lk^k Figure 1. Trees constructed by connecting 64 arbitrary taxonomic units for simulation purpose.
well known and widely used algorithm belonging to the same class of distance methods, namely the Neighbor Joining [17]. Two arbitrary trees (treel and tree2), each connecting 64 taxonomic units have been constructed and displayed in Fig. 1 using the web application for drawing phylogenetic trees Phylodendron [6]. For each tree, 200 random datasets of sequences with length of 80 units have been generated by Montecarlo simulations
201
using the program SeqGen [14] with the simple Kimura two-parameter generation model. The variability has been assumed as uniform throughout the sequence and low , with a transition versus transversion ratio of 2, and an equal starting probability for the four nucleotide basis is imposed. In order to determine the pairwise distances we used the simple Kimura twoparameter model as for the purpose of the simulation there was no need to obtain accurate genetic distance estimates. The sequence distance calculation, as well as the NJ tree reconstruction have been performed by the PHILIP package's programs DNADIST and NEIGHBOR respectively [5]. We applied a routine of the same package (TREEDIST) to compute the Symmetric Distance (SD) of Robinson and Foulds [15] between each reconstructed tree and the initial tree used for sequence generation. The Symmetric Distance between two trees is defined as the number of the partitions (unrooted trees) or clades (rooted trees) that are on one tree and not on the other. For fully resolved, i.e. bifurcating, trees the Symmetric Distance must be an even number ranging from 0 to twice the number of internal branches, which for n
simulation Index
Simmetric Distance
Figure 2. Left plot: the difference between CMC symmetric distance from tree 1 and the corresponding measure for NJ is reported for each simulation. Right plot: the histogram of CMC symmetric distances from treel (white) is compared with the corresponding one by NJ (black). Overlap regions are displayed in grey.
units is 4n-6. Odd numbers can be obtained if the input trees can have rnultifurcations . The results obtained by NJ have then been compared with the ones produced by CMC for the same pairwise distance matrix. Before analyzing the results we have to
202
stress that the absolute value of symmetric distance is not related to any statistical interpretation and that only tree topologies are used in the computation, neglecting branch length information. In the left plots of Fig. 2 and Fig. 3 we report for both initial trees the difference, computed for each simulation, between NJ and CMC symmetric distance from the tree. In the right plots of Fig. 2 and Fig. 3 the comparison between the SD histograms obtained by CMC and NJ is shown, for the first and the second tree respectively. The k parameters has been fixed in the range (3,10) for both cases. We note that, since NJ produces only bifurcating trees and the bin size is set to one, there are no counts in bins corresponding to odd values of SD. Although with the above mentioned restrictions on the quantitative interpretation of SD measure , we observe that, on average, CMC outperforms NJ method.
40
80
80
100
120
simulation index
140
100
180
200
Symmetric Detance
I
Figure 3. Left plot: the difference between CMC symmetric distance from tree2 and the corresponding measure for NJ is reported for each simulation. Right plot: the histogram of CMC symmetric distances from tree2 (white) is compared with the corresponding one by NJ (black). Overlap regions are displayed in grey.
4
Distance Measures
On account of several human population studies based on the HVRI and HVRII regions, doubts arise on the reliability of classical evolutionary models such as Jukes-Cantor, Kimura e Maximum Likelihood [7], mainly due to the strong assumptions they make about a constant mutation rate at different sites. Here we propose a distance measure that incorporates the biological evidence of heterogeneous variation rate at different sites which results from a recent theoretical analysis on site variability, supported by simulation and experimental data [13].
203
Let us remind that a DNA sequence can be represented by a one-dimensional strings of letters taken from a 4 symbol alphabet {A,C,G,T}, standing for the four nucleotide bases the DNA is composed of. A simple genetic distance (p-distance) between two individuals can be defined as the number of differences of nucleotides at same sites of a selected DNA segment, divided by the sequence length. Or it can be estimated by modelling the variation probability for both transition and transversion as for the Kimura two-parameter model. The major drawback of such definitions is that they do not take into account heterogeneous variation rate at different sites. An alternative distance measure can be introduced that incorporates a weight in terms of site variability, recently defined as a reliable measure of different evolution rate on sites [13]. We define a distance between two sequences i andy of length S as follows
=YSsv'
d •j
i—i
a
s=l
where <5* is 1 if ij pair exhibits a different nucleotide at site 5 , otherwise it is 0. The term vs represents the variability of site s and is defined as
where Kr is an estimate of the overall genetic distance of the ij pair as determined by a given model of the stochastic evolution process. For the following we will adopt the Stationary Markov Model [9,16]. The site variability is then normalized to the maximum value vmax it takes on the whole dataset. Site variability is then incorporated in the CMC algorithm as a weight in distance definition, providing a suitable measure of the different information content related to each site. Note that the introduction of site variability, as a weight on the distance, implies a correlation among sites. A further distance definition can be introduced for applications in the context of haplogroup discrimination and population divergence time estimate, where the grouping of haplotypes is usually performed on the basis of shared patterns at relatively rapidly changing sites. Discrimination occurs at highly variant sites which should then give the most relevant contribution to cluster identification. Based on that, the novel distance definition, with the same notation as above, can be weighted by an 'entropic' term
where Es is expressed as an entropy
204
E-=£p;iog(p;) whereas the index /, running over the number of different nucleotides, represents the frequency of the nucleotide / at site s calculated with respect to the given dataset. An appealing feature of 'entropic' distance is the lack of bias by any biological model of genetic distance, although, depending on the data-set, the information provided without any complementary assumption on sequence generating processes, could be insufficient to solve sequence classification ambiguities. Of course the 'entropic' distance is strictly related to the specific context of haplogroup discrimination. Depending on the dataset under investigation, the two distance measures can appear as more or less correlated, although they cannot be considered as equivalent. The main difference is in the correlation among sites introduced by the site variability definition. Since it is questionable whether site variations along a sequence have to be considered as really independent, this could be regarded as an intriguing feature of a sequence classification based on the site variability concept.
- 0 Group I
•s
o
- *
- o Group II . o Group III
19 Q Ht>
3<>
170
St-
5*
an?
Q21
DJS
Figure 4. Phenogram representing the evolution of the Pacific area's data set. The time scale is given in terms of the resolution parameter 0. The cluster size at each branching point is represented by a number and a circle of variable size. On the right, the final classification is reported for clusters of size > 4.
5
Applications to real sequence data
As an application of the CMC method, we report the analysis of a sample of 202 subjects from the Pacific area, known to generate 89 haplotypes and which have been also thoroughly studied by anthropological point of view (results of the
205
anthropological analysis can be found in Tommaseo et al.). The dataset consisted of 89 sequences of 71 variant sites taken from the mtDNA hypervariable region HVRI. The CMC results are shown in Fig. 4 as a phenogram illustrating a temporal evolution where the time scale is fixed in terms of the resolution parameter . New clusters originate at each branching point, their cardinality being described by both a number and a circle of decreasing radius with cluster size. The final classification has been obtained with a distance weighted by site variability and external parameters k, 9, respectively set at &=10 and (9=0.35. The optimal parameters have been selected by applying the above described resampling method on 100 resamplings randomly extracted from the original dataset with r =0.75 . The optimal value of 9, corresponding to the terminal branching of the phenogram represents the final classification to be compared with results from known techniques. Clustering results were found to be consistent with anthropological data. The same sample has been investigated with other two methods , widely used in DNA sequence classification, the Neighbor Joining method, and the Reduced Median Network (RMN). The NJ method generates a tree starting from an estimate of the genetic distance matrix, here calculated by Stationary Markov Model. As for CMC results, the NJ tree evidenced three main subdivisions. The largest group of sequences (49 haplotypes) - identified as Group I - is clearly distinguished from the two other clusters, Group II and Group III (Table 1). The RMN method was used for deeper insight of haplotype genetic relationships. It generates a network which harbours all most parsimonious trees. The resultant network (data not shown) is quite complex, as a consequence of the high number of haplotypes considered in the analysis, while its reticulated structure reflects the high rate of homoplasy in the dynamic evolution of the mtDNA HVRI region. The topological structure of the RMN also evidences three major haplotype clusters, which reflects the same "haplotype composition" shown by the NJ tree that has been constructed on the distance matrix computed by the Stationary Markov Model (Table 1).
6
Conclusions
In this paper we propose a novel distance method for phylogeny reconstruction and sequence classification that is based on the recently proposed CMC algorithm, as well as on a biologically motivated definition of distance. The main advantage of the algorithm relies in high effectiveness and low computational cost which make it suitable for analysis of large amounts of data in high dimensional spaces. Simulations on artificial datasets show that CMC
206 algorithm outperforms, on average, the well known NJ method, in terms of measured Symmetric Distance between the true tree and the reconstructed one. ref.
V
E
S
R
R
ref.
V
E
S
R
ASMAT 404 ASMAT_416 MUYU_428 ASMAT_391 ASMAT_427 KETEN_192
I I I I I I
KETEN_134
II
II
UNA_70 UNA_35
II II II II II II
KETEN_223 ASMAT399 CITAK_357
I I I
UNA_75 UNA_44 MAPPI_309
II II III
CITAK352 MAPPI_331 LANI_15 DANI_34
ASMAT_389 CITAK359 UNA_38
I I I
AWYN_385 AWYN_364 AWYN_374
III u HI HI III u III III III III III III
UNA_65
I
MUYU_415 MUYU_345 UNAJ72 UNA_74 DAN1_33
UNA_83 CITAKJ34
CITAK_353 III III III III ASMAT_422 III III III III
UNA63
I I I
LANI_18 UNA_64
I I
CITAK_351 MAPPI367
III III III III III III III III
LANI_8 DANI_24
CITAK_317 UNA^94
II
I II
CITAK_343 DANI32
III III III III HI III HI III
UNAJ15 UNA_40 ASMAT_393 AWYN_320 DANI_23 MAPPI378 ASMAT_419 LANI_17 CITAK341
ref.
V
E
S
MAPPI_302 ASMAT_397 UNA_93
UNA_78 UNA_102
II II II II II
II II II II
II II II
II II
II II
II II
II II
II II
u s s
III III III III
HI III III III
MAPPI_387
UNA_98
II
II
MAPPI370 CITAK_292 KETEN_152
II II II
II II II
LANI10 DANI_27
U
ASMAT_403 CITAK350 ASM AT 411
UNA_109 KETEN_220
U U
ASMAT_401 ASMAT_402 AWYN_382 LANI49
UNA_89 CITAK_286 UNA_36 DANI25
II II II II
II II II II
ASMAT_426 UNA 45 MUYU_347 AWYN_315
U U U U
u u u u u u u u
ASMAT_407 ASMAT_396
AWYNJ76 KEPI384
II II
II II
C1TAK_291
U
III III III
U
s S s s s s s s III s III s s s III s
Table 1. Comparison of classification results obtained on the Pacific area data set by the CMC method (V=site variability distance , E=entropic distance), Neighbor Joining performed on Stationary Markov Model (S) distance matrix, Reduced Median Network (R) Since we are dealing with a distance method of general applicability , any prior biological information has to be coded in an ad hoc distance definition, in order to improve the reliability of sequence grouping. That is the rationale for the
207
introduction of site variability and entropy terms in distance measures that account for the dependency of classification on the different rates of variation occurring on sites. Performances obtained by applying both distance definitions on two population datasets have been compared with classification obtained using SMM and Reduced Median Network [4]. We found that our method performs as well as the two known techniques but at lower complexity and computational costs. Moreover, compared to RMN, the method has the main advantage of providing an easy reading and interpretation of results regardless the dataset size. Further investigations are currently carried on regarding the use of CMC method for phylogenetic inference and the possibility to perform divergence time estimate by relating internal node depths of CMC trees to the estimated number of substitutions along lineages. Acknowledgements This work has been partially supported by the MURSTPRIN99 and by "program Biotecnologie, legge 95/95 (MURST 5%)"- Italy. References 1. Anderson, S., A. T. Bankier, B. G. Barrell, M. H. L. de Bruijn, A. R. Coulson, et al., 1981 Sequence and organization of the human mitochondrial genome. Nature 290:457-465. 2. Angelini, L., F. De Carlo, C. Marangi, M. Pellicoro and S. Stramaglia, 2000 Clustering data by inhomogeneous chaotic map lattices. Phys. Rev. Letters 85(3): 554-557. 3. Angelini, L., F. De Carlo, M. Mannarelli, C. Marangi, G. Nardulli, M. Pellicoro, G. Satalino, S. Stramaglia, 2001 Chaotic neural network clustering: an application to landmine detection by dynamic infrared imaging. Optical Engineering Volume 40, Issue 12, pp. 2878-2884. 4. Bandelt, H. J., P. Forster, C. S. Bryan and M. B. Richards, 1995 Mitochondrial portraits of human population using median network. Genetics 141: 743-753. 5. Felsenstein, J., 1993 PHYLIP (Phylogeny Inference Package), Department of Genetics, University of Washington, Seattle. 6. Gilbert D.G., IUBio Archive for Biology Data and Software, USA. 7. Hasegawa, M. and Yano T., 1984 Maximum likelihood method of phylogenetic inference from DNA sequence data. Bulletin of the Biometric Society of Japan 5:1-7. 8. Hasegawa, M. and S. Horai, 1991 Time of the deepest root for polymorphism in human mitochondrial DNA. J. Mol. Evol. 32(l):37-42.
208
9. 10. 11.
12. 13.
14.
15. 16.
17. 18. 19.
Lanave, C , G. Preparata, C. Saccone and G. Serio, 1984 A new method for calculating evolutionary substitution rates. J. Mol. Evol. 20:86-93. Levine, E. and E. Domany, 2000 Resampling Method For Unsupervised Estimation Of Cluster Validity. Preprint arXiv:physics/0005046, 18 May 2000. Manrubia, S.C. and C. Mikhailov, 1999 Mutual Synchronization and Clustering in Randomly Coupled Chaotic Dynamical Networks, Phys. Rev. E 60:15791589. Pagel, M., 1999 Inferring the historical patterns of biological evolution. Nature 401: 877-884. Pesole, G. and C. Saccone, 2001 A novel method to estimate substitution rate variation among sites in large dataset of homologous DNA sequences. Genetics 157(2):859-865. Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Applic. Biosci., 13: 235-238. Robinson, D. F., and L. R. Foulds. 1981. Comparison of phylogenetic trees. Math. BioSci. 53:131-147. Saccone, C , C. Lanave, G. Pesole and G. Preparata, 1990 Influence of base composition on quantitative estimates of gene evolution. Meth. Enzymol. 183:570-583. Saitou, N. and M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. Sole, R.V., S.C. Manrubia, J. Bascompte, J. Delgado and B. Luque, 1996 Phase transitions and complex systems. Complexity 4:13-26. Wiggins, S., 1990 Introduction to applied nonlinear dynamical systems and chaos. Springer, Berlin.
209
F I N D I N G REGULATORY SITES F R O M STATISTICAL ANALYSIS OF N U C L E O T I D E F R E Q U E N C I E S I N T H E U P S T R E A M R E G I O N OF E U K A R Y O T I C G E N E S
a
M. Caselle" and P. Provero a ' 6 Dipariimento di Fisica Teorica, Universita di Torino, and INFN, sezione di Torino, Via P. Giuria 1, 1-10125 Torino, Italy. e-mail: [email protected], [email protected] Dipartimento di Scienze e Tecnologie Avanzate, Universita del Piemonte Orientale, 1-15100 Alessandria, Italy.
Dipartimento
F. Di Cunto and M. Pellegrino di Genetica, Biologia e Biochimica, Universita di Torino, Via Santena 5 bis, I-10100, Torino, Italy. e-mail: [email protected]
We discuss two new approaches to extract relevant biological information on the Transcription Factors (and in particular to identify their binding sequences) from the statistical distribution of oligonucleotides in the upstream region of the genes. Both the methods are based on the notion of a "regulatory network" responsible for the various expression patterns of the genes. In particular we concentrate on families of coregulated genes and look for the simultaneous presence in the upstream regions of these genes of the same set of transcription factor binding sites. We discuss two instances which well exemplify the features of the two methods: the coregulation of glycolysis in Drosophila melanogaster and the diauxic shift in Saccharomyces cerevisiae.
1
Introduction
As more and more complete genomic sequences are being decoded it is becoming of crucial importance to understand how the gene expression is regulated. A central role in our present understanding of gene expression is played by the notion of "regulatory network". It is by now clear that a particular expression pattern in the cell is the result of an intricate network of interactions among genes and proteins which cooperate to enhance (or depress) the expression rate of the various genes. It is thus important to address the problem of gene expression at the level of the whole regulatory network and not at the level of the single gene 1 ' 2 ' 3,4 ' 5 . In particular, most of the available information about such interactions concerns the transcriptional regulation of protein coding genes. Even if this is not the only regulatory mechanism of gene expression in eukaryotes it is certainly the most widespread one.
210
In these last years, thanks to the impressive progress in the DNA array technology several results on these regulatory networks have been obtained. Various transcription factors (TF's in the following) have been identified and their binding motifs in the DNA chain (see below for a discussion) have been characterized. However it is clear that we are only at the very beginning of such a program and that much more work still has to be done in order to reach a satisfactory understanding of the regulatory network in eukaryotes (the situation is somehow better for the prokaryotes whose regulatory network is much simpler). In this contribution we want to discuss a new method which allows to reconstruct these interactions by comparing existing biological information with the statistical properties of the sequence data. This is a line of research which has been pursued in the last few years, with remarkable results, by several groups in the world. For a (unfortunately largely incomplete) list of references s e e 2,3,4,5,6,7,8,9 j n p a r t i c u i a r the biological input that we shall use is the fact that some genes, being involved in the same biological process, are likely to be "coregulated" i.e. they should show the same expression pattern. The simplest way for this to happen is that they are all regulated by the same set of TF's. If this is the case we should find in the upstream" region of these genes the same T F binding sequences. This is a highly non trivial occurrence from a statistical point of view and could in principle be recognized by simple statistical analysis. As a matter of fact the situation is much more complex than what appears from this idealized picture. TF's not necessarily bind only to the upstream region. They often recognize more than one sequence (even if there is usually a "core" sequence which is highly conserved). Coregulation could be achieved by a complex interaction of several TF's instead than following the simple pattern suggested above. Notwithstanding this, we think that it is worthwhile to explore this simplified picture of coregulation, for at least three reasons. • Even if in this way we only find a subset of the TF's involved in the coregulation, this would be all the same an important piece of information: It would add a new link in the regulatory network that we are studying. • Analyses based on this picture, being very simple, can be easily performed on any gene set, from the few genes involved in the Glycolysis (the first example that we shall discuss below) up to the whole genome (this will be the case of the second example that we shall discuss). This "With this term we denote the portion of the DNA chain which is immediately before the starting point of the open reading frame (ORF). We shall characterize this region more precisely in sect.3 below.
211
feature is going to be more and more important as more and more DNA array experiment appear in the literature. As the quantity of available data increases, so does the need of analytical tools to analyze it. • Such analyses could be easily improved to include some of the features outlined above, keeping into account, say, the sequence variability or the synergic interaction of different TF's. To this end we have developed two different (and complementary) approaches. The first one (which we shall discuss in detail in sect.3 below) follows a more traditional line of reasoning: we start from a set of genes which are known to be coregulated (this is our "biological input") and then try to recognize the possible binding sites for the TF's. We call this approach the "direct search" for coregulating TF's. The second approach (which we shall briefly sketch in sect.4 below and is discussed in full detail in 1 0 ) is completely different and is particularly suitable for the study of genome-wide DNA array experiments. In this case the biological input is taken into account only at the end of the analysis. We start by organizing all the genes in sets on the basis of the overrepresented common sequences and then filter them with expression patterns of some DNA array experiment. We call this second approach the "inverse search" for coregulating TF's. It is clear that all the candidate gene interactions which we identify with our two methods have to be tested experimentally. However our results may help selecting among the huge number of possible candidates and could be used as a preliminary test to guide the experiments. This contribution is organized as follows. In sect.2 we shall briefly introduce the reader to the main features of the regulatory network (this introduction will necessarily be very short, the interested reader can find a thorough discussion for instance in n ) . We shall then devote sect.3 and 4 to explain our "direct" and "inverse" search methods respectively. Then we shall discuss two instances which well exemplify the two strategies. First in sect.5 we shall study the coregulation of glycolysis in Drosophila melanogaster. Second, in sect.6 we shall discuss the diauxic shift in Saccharomyces cerevisiae. The last section will be devoted to some concluding remarks. 2
Transcription factors
As mentioned in the introduction, a major role in the regulatory network is played by the Transcription Factors, which may have in general a twofold action on gene transcription. They can activate it by recruiting the transcription
212 machinery to the transcription starting site by binding enhancer sequences in the upstream noncoding region, or by modifying chromatine structure, but they can also repress it by negatively interfering with the transcriptional control mechanisms. The main point is that in both cases TFs act by binding to specific, often short DNA sequences in the upstream noncoding region. It is exactly this feature which allows TF's to perform a specific regulatory functions. These binding sequences can be considered somehow as the fingerprints of the various TF's. The main goal of our statistical analysis will be the identification and characterization of such binding sites. 2.1
Classification
Even if TF's show a wide variability it is possible to try a (very rough) classification. Let us see it in some more detail, since it will help understanding the examples which we shall discuss in the following sections. There are four main classes of binding sites in eukaryotes. • Promoters These are localized in the region immediately upstream of the coding region (often within 200 bp from the transcription starting point). They can be of two types: - short sequences like the well known CCAAT-box, TATA-box, GCbox which are not tissue specific and are recognized by ubiquitous TFs — tissue specific sequences which are only recognized by tissue specific TFs • Response Elements These appear only in those genes whose expression is controlled by an external factor (like hormones or growing factors). These are usually within lkb from the transcription starting point. Binding of a response element with the appropriate factor may induce a relevant enhancement in the expression of the corresponding gene • Enhancers these are regulatory elements which, differently from the promoters, can act in both orientations and (to a large extent) at any distance from the transcription starting point (there are examples of enhancers located even
213
50-60kb upstream). They enhance the expression of the corresponding gene. • Silencers Same as the enhancers, but their effect is to repress the expression of the gene. 2.2
Combinatorial regulation.
The main feature of TF's activity is its "combinatorial" nature. This means that: • a single gene is usually regulated by many independent TF's which bind to sites which may be very far from each other in the upstream region. • it often happens that several TF's must be simultaneously present in order to perform their regulatory function. This phenomenon is usually referred to as the "Recruitment model for gene activation" (for a review see x) and represents the common pattern of action of the TF's. It is so important that it has been recently adopted as guiding principle for various computer based approaches to detect regulatory sites (see for instance 4 ). • the regulatory activity of a particular TF is enhanced if it can bind to several (instead of only one) binding sites in the upstream region. This "overrepresentation" of a given binding sequence is also used in some algorithms which aim to identify TF's. It will also play a major role in our approach. 3
The "direct" search method
In this case the starting point is the selection of a set of genes which are known to be involved in the same biological process (see example of sect. 5). Let us start by fixing a few notations: • Let us denote with M the number of genes in the coregulated set and with gi, i = 1 • • • M the genes belonging to the set • Let us denote with L the number of base pairs (bp) of the upstream non coding region on which we shall perform our analysis. It is important to define precisely what we mean by "upstream region" With this term we denote the non coding portion of the DNA chain which is immediately before the transcription start site. This means that we do not consider as
214
part of this region the UTR5 part of the ORF of the gene in which we are interested. If we choose L large enough it may happen that other ORFs are present in the upstream region. In this case we consider as upstream region only the non coding part of the DNA chain up to the nearest ORF (even if it appears in the opposite strand). Thus L should be thought of as an upper cutoff. In most cases the length of the upstream region is much smaller and is gene dependent. We shall denote it in the following as L(g). • In this upstream region we shall be interested in studying short sequences of nucleotides which we shall call words. Let n be the length of such a word. For each value of n we have N = 4" possible words Wi,i = 1 • • • N. The optimal choice of n (i.e. the one which optimize the statistical significance of our analysis) is a function of L and M. We shall see some typical values in the example of sect.5 In the following we shall have to deal with words of varying size. When needed, in order to avoid confusion, we shall call A;-word a word made of k nucleotides. Let us call U the collection of upstream regions of the M genes p i , . . . QM- Our goal is to see if the number of occurrences of a given word W{ in each of the upstream regions belonging to U shows a "statistically significant" deviation (to be better defined below) from what expected on the basis of pure chance. To this end we perform two types of analyses. First level of analysis This first type of analysis is organized in three steps ^Construction of the "Reference samples". The first step is the construction of a set of p "reference samples" which we call Ri,i = 1, • • -p. The Ri are nonoverlapping sequences of LR nucleotides each, extracted from a noncoding portion of the DNA sequence in the same region of the genome to which the genes that we study belong but "far" from any ORF. From these reference samples we then extract for each word the "background occurrence probability" that we shall then use as input of the second step of our analysis. The rationale behind this approach is the idea that the coding and regulating parts of the genome are immersed in a large background sea of "silent" DNA and that we may recognize that a portion of DNA has a biological function by looking at statistical deviations in the word occurrences with respect to the background.
215
However it is clear that this is a rather crude description of the genome, in particular there are some obvious objections to this approach: — There is no clear notion of what "far" means. As we mentioned in the introduction one can sometimes find TF's which keep their regulatory function even if they bind to sites which are as far as ~ 50kb from the ORF — It is possible that in the reference samples the nucleotide frequencies reflect some unknown biological function thus inducing a bias in the results — It is not clear how should one deal with the long repeated sequences which very often appear in the genome of eukaryotes We shall discuss below how to overcome these objections. Background probabilities. For each word w we study the number of occurrences n(w, i) in the ith sample. They will follow a Poisson distribution from which we extract the background occurrence probability of the word. This method works only if p and LR are large enough with respect to the number of possible words N (we shall see in the example below some typical values for p and LR). However we have checked that our results are robust with respect to different choices of these background probabilities. • Significant words. From these probabilities we can immediately construct for each n-word the expected number of occurrences in each of the upstream sequences of U and from them the probabilities p(n, s) of finding at least one n-word simultaneously present in the upstream region of s (out of the M) genes. By suitably tuning L, s and n we may reach very low probabilities. If notwithstanding such a low probability we indeed find a n-word which appears in the upstream region of s genes then we consider this fact as a strong indication of its role as binding sequence for a TF's. We may use the probability p(n, s) as an estimate of the significance of such a candidate binding sequence. As we have seen the critical point of this analysis is in the choice of the reference sample. We try to avoid the bias induced by this choice by crossing the above procedure with a second level of analysis Second level of analysis
216
The main change with respect to the previous analysis is that in this case we extract the reference probabilities for the n-words from an artificial reference sample constructed with a Markov chain algorithm based on the frequencies of k-words with k << n (usually k = 1,2 or 3) extracted from the upstream regions themselves. Then the second and third step of the previous analysis follow unchanged. The rationale behind this second approach is that we want to see if in the upstream region there are some n-words (with n = 7 or 8, say) that occur much more often than what one would expect based on the frequency of the A;-words in the same region. These two levels of analysis are both likely to give results that are biased according to the different choices of reference probabilities that define them. However, since these biases are likely to be very different from each other, it is reasonable to expect that by comparing the results of the two methods one can minimize the number of false positives found. 4
The "inverse" search method
A major drawback of the analysis discussed in the previous section is that it requires a precise knowledge of the function of the genes examined. As a matter of fact a large part of the genes of eukaryotes have no precisely know biological function and could not be studied with our direct method. Moreover in these last years the richest source of biological information on gene expression comes form microarray experiments, thus it would be important to have a tool to study gene coregulation starting from the output of such experiments. These two observations suggested us the inverse search method that we shall briefly discuss in this section. We shall outline here only the main ideas of the method, a detailed account can be found in 1 0 . The method we propose has two main steps: first the ORFs of an eukaryote genome are grouped in (overlapping) sets based on words that are overrepresented in their upstream region, with respect to their frequencies in a reference sample which is made of all the upstream regions of the whole genome. Each set is labelled by a word. Then for each of these sets the average expression in one ore more microarray experiments are compared to the genome-wide average: if a statistically significant difference is found, the word that labels the set is a candidate regulatory site for the genes in the set, either enhancing or inhibiting their expression. An important feature is that the grouping of the genes into sets depends only on the upstream sequences and not on the microarray experiment considered: It needs to be done only once for each organism, and can then be used to analyse an arbitrary number of microarray experiments.
217 Table 1: Genes involved in the glycolysis.
Gene Aid Eno Gapdhl Gapdh2 Hex ImpL3 Pfk
Description Aldolase Enolase Glyceraldehyde 3-ph. dehydrogenase 1 Glyceraldehyde 3-ph. dehydrogenase 2 Hexokinase L-lactate dehydrogenase 6-phosphofructokinase
Locus AE003755 AE003585 AE003839 AE003500 AE003756 AE003563 AE003755
Chromosome 3R 2L 2R X 3R 3L 2R
We refer t o 1 0 for a detailed description of how the sets are constructed, we only stress here that this construction only requires three external parameters which must be fixed by the user: the length L of the upstream region (see sect.3 for a discussion of this parameter), the length n of the words that we use to group the sets and a cutoff probability P which quantifies the notion of "overrepresentation" mentioned above. 5
Example: glycolysis in Drosophila melanogaster
As an example of the analysis of sect.3, we studied the 7 genes of Drosophila melanogaster involved in glycolysis. These genes are listed in Tab.l. We performed our analysis with two choices of the parameters: 1] Promoter region. In this first test we decided to concentrate in the promoter region. Thus we chose L < 100. With this choice, and since M = 7, we are bound to study n-words with n = 3,4,5 in order to have a reasonable statistical significance. In particular we concentrate on n = 3 In the first level of analysis we chose LR = 100 and p = 1000 (p is the number of reference samples). In the second level of analysis we chose k — 1 (k being the number of nucleotides of the k-words used to construct the Markov chain). We found (among few other motifs which we do not discuss here for brevity) that a statistically relevant signal is reached by the sequence GAG. This result has a clear biological interpretation since it is the binding site of an ubiquitous TF known as GAGA factor which belongs to the class of the so called "zinc finger" TF's 6 . We consider this finding as a good validation test of the whole procedure. 6
The commonly assumed binding site for the GAGA factor is the sequence GAGAG, however it has been recently realized that the minimal binding sequence is actually the 3-word G A G 1 2 .
218 Table 2: Probability p(n, 7) of finding a ra-word in the upstream region of all the 7 genes involved in glycolysis. In the first column the value of n, in the second the result obtained using the background probabilities. In the last two columns the result obtained with the Markov chains with k = 1 and k = 2 respectively.
n
p(n, 7)
6 0.346 7 0.007 8 0.00025
p(n,7), k = l 0.76 0.013 0.000034
p(n,7), k = 2 0.78 0.022 0.00011
2] large scale analysis In this second test we chose L = 5000. This allowed us to address nwords with n = 6,7,8. For the reference samples we used LR = 5000 and p = 21 As a result of our analysis we obtained the probabilities p(n, s) of finding at least one n-word in the upstream region of s out of the 7 genes that we are studying. As an example we list in Tab.2 the values of p(n, s) for s = 7 and n = 6,7,8. For the Markov chain analysis we used k= 1,2. In this case we found a 7-word which appeared in the upstream region of all the seven genes: A fact that, looking at the probabilities listed in tab.3 certainly deserves more attention. The word is T T T A A A T . A survey in the literature shows that this is indeed one of the binding sequences of a TF known as "even-skipped" which is known to regulate segmentation (and also the development of certain neurons) in Drosophila. This TF has been widely studied due to its crucial role in the early stages of embryo development, but it was not directly related up to now to the regulation of glycolysis. 6
Example: diauxic shift in S. cerevisiae
As an example of the analysis of sect.4, we studied the so called diauxic shift, (i.e. the metabolic shift from fermentation to respiration), in S. cerevisiae the pattern of gene expression during the shift was measured with DNA microarrays techniques in Ref. 13 . In the experiment gene expression levels were measured for virtually all the genes of at seven time-points while glucose in the medium was progressively depleted. As a result of our analysis we found 29 significant words, that can be grouped into 6 motifs (i.e. groups of similar words). Five of them correspond to known regulatory motifs (for a database of known and putative TF's binding sites in S. cerevisiae see ref. 4 ). In particular
219
three of them: STRE, MIG1 and UME6 (for the meaning of these abbreviations see again 4 ) were previously known to be involved in glucose-induced regulation process, while for the two other known motifs: PAC and RRPE this was a new result. We consider the fact of having found known regulatory motifs a strong validation of our method. Finally we also found a new binding sequence: ATAAGGG, which we could not associate to any known regulatory motif. 7
Conclusions
We have proposed two new methods to extract biological information on the Transcription Factors (and more generally on the mutual interactions among genes) from the statistical distribution of oligonucleotides in the upstream region of the genes. Both are based on the notion of a "regulatory network" responsible for the various expression patterns of the genes, and aim to find common binding sites for TFs in families of coregulated genes. • The methods can be applied both to selected sets of genes of known biological functions (direct search method) or to the genome wide microarrays experiments (inverse search method). • They require a complete knowledge of the upstream oligonucleotide sequences and thus they can be applied for the moment only to those organisms for which the complete genoma has been sequenced. • In the direct method, once the set of coregulated genes has been chosen, no further external input is needed. The significance criterion of our candidates binding sites only depends on the statistical distribution of oligonucleotides in the upstream region (or in nearby regions used as test samples) • Both can be easily implemented and could be used as standard preliminary tests, to guide a more refined analysis. Even if they already give interesting results, both our methods are far from being optimized. In particular there are three natural directions of improvement. a] Taking into account the variability of the binding sequences, b] Recognizing dyad like binding sequences (see for instance 7 ) which are rather common in eukaryotes,
220
c] Recognizing synergic interactions between TF's. Work is in progress along these lines. Needless to say the candidate binding sequences that we find with our method will have to be tested experimentally. However our method could help to greatly reduce the number of possible candidates and could be used as a guiding line for experiments. References 1. M. Ptashne and A. Gann, Nature 386 (1997) 569 2. A. Wagner, Nucleic Acids Research 25 3594-3604 (1997). 3. S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church, Nature Genetics 22 281-285 (1999). 4. Y. Pilpel, P. Sudarsanam and G.M. Church, Nature Genetics 29 153-159 (2001). Web supplement: http://genetics.med.harvard.edu/~tpilpel/MotComb.html 5. H.J. Bussemaker, H. Li and E.D. Siggia, Nature Genetics 27 167-171 (2001). 6. J. van Helden, B. Andre and J. Collado-Vides, J. Mol. Biol. 281 827-842 (1998). 7. J. van Helden, A. F. Rios and J. Collado-Vides, Nucleic Acids Research 28 1808-1818 (2000). 8. J. D. Hughes, P. W. Estep, S. Tavazoie and G. M. Church, J. Mol. Biol. 296 1205-1214 (2000). 9. R.Hu and B. Wang, Archive: http://xxx.sissa.it/abs/physics/0009002 10. M. Caselle, F. DiCunto and P.Provero, "Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes." Submitted to BMC Bioinformatics. 11. B. Alberts et al., Molecular Biology of the Cell (Garland Publishing Inc., New York, 1994). 12. R.C. Wilkins and J.T. Lis Nucleic Acids Research 26 2672-2678 (1998). 13. J.L. DeRisi, V.R. Iyer and P.O. Brown, Science 278 680-686 (1997).
221
REGULATION OF EARLY GROWTH RESPONSE-1 GENE EXPRESSION AND SIGNALING MECHANISMS IN NEURONAL CELLS: PHYSIOLOGICAL STIMULATION AND STRESS GIUSEPPE CIBELLI Department of Pharmacology and Human Physiology, University ofBari, P.le G. Cesare 11, 70124 Bari, and Chair of Human Physiology, University ofFoggia Medical School, V.le L. Pinto, 71100 Foggia Italy E-mail: [email protected] Extracellular signals trigger important adaptative responses that enable an organism to cope with a changing environment. The induction of immediate early genes is a key initial event in responses to diverse stimuli as changes in inducible transcription factor expression lead to a complex array in transcriptional control signals for use in the coordination of late response gene expression. The early growth response-1 (Egr-1), first identified as an immediate early gene induced by mitogenic stimulation, and subsequently shown to be activated by diverse exogenous stimuli including growth factors, hormones and neurotransmitters, encodes for a zinc finger transcription factor involved in the regulation of growth and differentiation. This article reviews recent findings about the expression, the signaling pathways, and the biological activity of Egr-1 in neuronal cells, following differentiation and apoptosis, as paradigms for nervous system functioning under physiological and stress-related conditions.
1
Introduction
Stimulated neurons process and transmit the information by either short-term cell surface-dependent events that immediately process and convey information about the stimulus, or long-term intracellular messenger systems-mediated events, inducing changes in gene expression. Immediate-early genes are the first downstream nuclear targets, activated by different second messenger signaling cascades, linking membrane events to the nucleus, thus altering the neurons' responses to subsequent stimuli. These genes are defined by rapid, often transient, transcriptional induction occurring in the absence of de novo protein synthesis. Immediate early genes encode many functionally different products such as secreted proteins and cytoplasmic enzymes. In particular, a subclass of these genes encodes inducible transcription factors, proteins that control the expression of genes. By now, the best characterized immediate-early gene-encoded transcription factors include AP-1, composed of members of the fos and jun families, and the early growth response (Egr) family of transcription factors. Here, we focus on Egr1, the most extensively characterized member of the Egr gene family, first identified as an immediate-early gene involved of cellular growth and differentiation control and after confirmed to be a transcriptional regulatory protein. The potential role of
222
Egr-1 in neuronal differentiation and programmed cell death, as naturally occurring paradigms of plasticity, will be discussed. 2
The Egr-1 transcription factor
The Egr-1 transcription factor [45] also known as zif268 [11], Krox-24 [28]), tis8 [31]), or nerve growth factor induced (NGFI)-A [34]), is a member of the early growth response family of transcription factors which also includes Egr-2, Egr-3 and Egr-4 [3]. The human Egr-1 consists of 533 amino acids with a calculated molecular weight of 57 kDa [45], but because of extensive phosphorylation it runs at 75-80 kDa [8, 55]. Egr-1 contains three zinc finger domains in the C-terminal portion of the cysteine2-histidine2 subtype, suggesting that it is a DNA binding and regulatory protein. The expression of Egr-1 in cell culture systems can be induced by a range of stimuli, as extensively reviewed by Gashler and Sukhatme [17], including growth factors, phorbol esthers, hypoxia, ionizing radiations, tissue damage and signals that result in neuronal excitation, such as membrane depolarization or brain seizures. Table 1 represents a summary of the stimuli which have been studied with respect to the expression of Egr-1 in the nervous system. The time course of Egr-1 expression is typical of inducible transcription factors mRNAs, and resembles that of c-fos [8, 45]. The expression of Egr-1 has been extensively studied in the mammalian brain. In the adult, low basal Egr-1 mRNA expression is detected in the rat cortex, amygdala, striatum, cerebellar cortex and hippocampus [11]. Egr-1 mRNA is expressed at low levels in the early postnatal rat cortex, midbrain, cerebellum, and brainstem. The Egr-1 message increases throughout postnatal development to adult levels suggesting a role for Egr-1 in postnatal maturation of the brain [56]. 3
Second messenger systems and cis-acting elements
As general mechanism, extracellular stimuli activate second messenger systems whose end-kinases, e.g. ERK, JNK, SAPK, PKA, translocate to the nucleus and phosphorylate transcription factors already bound to DNA, thus leading to the activation of the general transcriptional machinery to initiate mRNA synthesis. Multiple intracellular pathways contribute to the regulation of Egr-1 expression. Egr-1 is induced by the calcium ionophore, A23187, in PC12 cells [14], dibutyryl cAMP (dbcAMP) in intact cortical cells monolayers [52] and the L-type voltagesensitive calcium channel agonists in cultured cortical neurons [36]. Phosphorylation of the cAMP response element (CRE)-binding protein (CREB) is required for Egr-1 transcriptional activation through the CRE in response to
223
interleukin-3 and the granulocyte macrophage colony-stimulating factor in myeloid leukemia cells [43]. Activation of other second messenger systems leads to increased Egr-1 mRNA expression. Phorbol esters (TPA) stimulation of COS and PC 12 cells transactivates promoter-driven reporter construct either via the TPAresponse element present in the same position relative to the serum response element (SRE), or via TPA stimulation of the protein kinase C (PKC) and a subsequent effect at the SRE [42]. The PKC pathway appears fundamental in mediating Egr-1 induction in response to X-irradiation [19]. Induction of Egr-1 by growth factors and stress are mediated through different subgroups of MAP kinases which may also differentially affect Egr-1 function on its target genes [30].
Table 1. Stimuli which induce Egr-1 expression in the nervous system. Stimulus Cell type/brain region Reference NGF Endothelin EQF, PDGF Glutamate NMDA Dopamine agonists Amphetamine Cocaine Caffeine Morphine withdrawl VIP CRF Ethanol withdrawl Light pulse Restraint stress Axotomy Focal cerebral injury Focal ischemia Electroconvulsive shock LTP
PC 12 cells Astrocytes Glioma cells Cortico-striatal monolayer Cerebral cortex, hippocampus, hypothalamus Striatal neurons, caudate, putamen, cortex Basal ganglia, n. accumbens, olfactory bulb Basal ganglia, n, accumbens Striatum Cerebral cortex, CAI-3, dentate gyrus, Corrtical neurons LC neurons Cerebral cortex, cerebellum, brainstem N. suprachiasmatic, visual cortex Cerebral cortex Induced in denervated areas Cerebral cortex, hippocampus, basal ganglia Nerve and non-nerve cells Neocortex, hippocampus Granule cells of ipsilateral dentate gyrus
[45] [22] [23] [53] [4] [44, 6] [35] [35] [47] [5] [52] [12] [57] [16] [7] [27] [21] [1] [13]
fl31
The architecture of the Egr-1 promoter has been described by several groups which have cloned the murine [11], rat [10] and human Egr-1 gene [42]. The upstream region of the mouse Egr-1 gene contains five SREs. In addition, putative regulatory elements in the Egr-1 promoter include a Spl-, a CRE-, an AP-1-like elements, and two CCAATT sequences. The human Egr-1 gene promoter contains these sequence in a conserved position. The SREs are the dominant regulators of Egr-1 transcription [33]. The SREs mediate Egr-1 responses to TPA, growth factors and serum [15]. The SRE is a 22 bp segment that contains the inner core sequence CCA/T-6-GG, similar to the CArG box present in other inducible immediate early genes, which is the binding element for the serum response factor (SRF), a nuclear phosphoprotein present in most cell types [51]. As homodimer, the SRF binds to
224
Elk-1, a member of the ternary complex factor family of Ets domain proteins, over the SREs [33]. The phosphorylation of Elk-1 in response to growth factors and other stimuli is responsible of activation of transcription. Fig. 1 shows the schematic representation of the mechanism of Egr-1 transcription induced by the SREs. STIMULUS
end-kinase phosphorylation nuclear translocation
Eg r .1
TRANSCRIPTIONAL ACTIVATION
Figure 1. Mechanism of Egr-1 transcription induced by the SREs..
A high affinity Egr-1 binding CG-rich DNA sequence 5'-GCGGGGGCG-3', termed EBS, is also found in the Egr-1 promoter [48], thus allowing Egr-1 to positively regulate its own expression by binding with high affinity to the EBS in the promoter [8, 43]. 4
Structure-function mapping
The structure of a complex formed between the three zinc fingers of the Egr-1 and its cognate DNA binding site has been extensively analyzed [3]. Four distinct activation domains have been identified within the Egr-1 molecule, three of them localized in the N-terminal region [40]. Other investigators described an extensive activation domain from amino acids 3 to 281 [18]. Finally, a domain for transcriptional repression is contained between the activation domain and the DNA binding domain [18]. This repression domain functions as a binding site for two cellular inhibitors, NGFI-A binding protein 1 and 2 (NAB1 and NAB2) [41, 46], which may negatively modulate transactivation by Egr-1 [49], thus conferring to Egr-1 protein a bipartite function in alternatively activating or repressing
225
transcription. The structural features and the defined activity domains of the Egr-1 protein are depicted in fig. 2. activation domain 1
repressor DNAbinding domain domain
533
Figure 2. The modular structure of the zinc finger protein Egr-1.
5
Egr-1 and neuronal differentiation
Egr-1 induction has been correlated with the onset of differentiation in several cell types. In particular, monocytic differentiation of U-937 and HL-60 myeloid leukemia cells induces Egr-1 expression [26, 25]. Neuronal differentiation has been extensively investigated by using PC 12 cells as model cell line. Nerve growth factor causes an initial mitogenic response in PC 12 cells, followed by growth arrest and differentiation into sympathetic neuron-like cells with extended neurites and induces sustained activation of extracellular signal-regulated protein kinases (ERK) [50]. In addition, NGF stimulation of PC12 induces expression of Egr-1 [34, 45]. We have recently reported that the neuropeptide corticotropin-releasing factor (CRF) induces neurite outgrowth in immortalized locus coeruleus-like CATH.a cells, suggesting a potential role for CRF as a neurotrophic factor for noradrenergic locus coeruleus neurons [12]. In addition, we used the CRF-induced CATH.a cells neurite outgrowth as a bioassay to study CRF signaling in these cells. Our results, which are summarized in fig. 3, indicate that cAMP-dependent protein kinase (PKA) inhibitors block CRF differentiation entirely. Likewise, dbcAMP induces neurite outgrowth of CATH.a cells indistiguishable from CRF-treated cells. Moreover, we found that CRF induces the transcriptional activity of CREB. The inhibition of the MAP kinase pathway, in particular inhibition of ERK, also blocks CRF-induced neurite outgrowth. Furthermore, CRF stimulates the transcriptional activity of the transcription factor Elk-1. In PC 12 cells, NGF activates ERK, the kinase translocates to the nucleus and phosphorylates transcription factors such as Elk-1 [54, 58]. Elk-1 and other activated transcription factors subsequently induce transcription of those genes whose gene products are required for the differentiation process. Inhibition of MAP kinase kinase (MEK) blocks the differentiation of PC 12 cells by nerve growdi factor [39]. We obtained very similar data with CRF-differentiated CATH.a cells using the MEK inhibitor PD98059. Neuronal differentiation of PC12 cells can be induced not only by NGF but also by an increase in the intracellular cAMP concentration. In
226
CATH.a cells, CRF and dbcAMP induced the differentiation of the cells. While NGF activates ERK as discussed, cAMP activates the cAMP-dependent protein kinase via binding to the regulatoy subunit of the holoenzyme. In many cell-types, cAMP antagonizes with the ERK pathway. In PC 12 cells, however, a positive crosstalk exists between the cAMP and the ERK signaling pathway. cAMP does not only activate the cAMP-dependent protein kinase in PC 12 cells. Interestingly, cAMP activates MAP kinase and Elk-1 in PC 12 cells through a pathway involving the small G-protein Rap-1 [54], which is, in turn, activated by a family of cAMP binding proteins termed cAMP-GEFs in a cAMP-dependent, but PKA-independent manner [24]. Our data suggest that both PKA-dependent and PKA-independent effects of cAMP could account for the attivation of ERK in CATH.a cells as well as in PC 12 cells, indicating that both the cAMP and the ERK signaling pathways are involved in signal transduction of CRF. By using the Egr-1 DNA binding domain as a selective antagonist of Egr1-mediated transcription, Levkovitz et ah reported that the expression of this Egr-1 inhibitor construct suppresses neurite outhgrowth elicited in PC 12 cells by NGF, but not by dbcAMP, indicating that Egr-1 expression is necessary, but not sufficient for eliciting neurite outgrowth [29]. Conversely, the neuron-specific activator of cyclindependent kinase 5, p35, has been identified as one of the targets of the NGFstimulated ERK pathway that are essential for neurite outgrowth in PC 12 cells [20]. The transcription factor Egr-1 is required for induction of p35, as the activation of ERK by NGF correlates with the observed expression patterns for Egr-1 mRNA and p35 mRNA and protein. To further define an essential signaling pathway, downstream of ERK, that leads to CRF-induced neuronal differentiation, we analyzed the effect of CRF on the transcriptional activity of the Egr-1 promoter. We showed that CRF activates the Egr-1 reporter strikingly, most likely via the upstream SREs. The fact that CRF very strongly activated the Egr-1 promoter suggests that the transcription factor Egr-1 is necessary for the CRF-initiated CATH.a differentiation process. CRF c A M P ^ H-89 RAP-1
PKA
MEK PD98059
11
ERK
I
1
Elk1
CREB
\
*'' Egr-1
\ responsive genes neurite outgrowth
Figure 3. Mechanism of action of CRF-induced CATH.a cells neurite outgrowth by Egr-I.
227
6
Egr-1 in neuronal programmed cell death
In the last years several reports described Egr-1 as a proapoptotic molecule [32]. Egr-1 biosynthesis was reported to be stimulated in melanoma cells treated with the apoptotic stimulus thapsigargin [37]. A p5 3-dependent and a p5 3-independent pathway was proposed to explain the proapoptotic activity of Egr-1. In melanoma cells, expressing a wild-type p53 protein, Egr-1 directly upregulated transcription of the p53 gene, followed by the synthesis of p53 mRNA and protein [38]. In contrast, transcriptional upregulation of the tumor necrosis factor a promoter was proposed as a mechanisms by which Egr-1 may induce apoptosis in cells expressing a nonfunctional p53 protein [2]. In the nervous system, an enhanced expression of Egr-1 has been connected with neuronal apoptosis of cerebellar granule cells [9]. We have studied nitric oxide (NO)-induced changes in gene transcription in the human neuroblastoma cell line SH-SY5Y that undergo cell death upon treatment with NOC-18 as NO donor [Cibelli G., Policastro V., Rossler O. and Thiel G., Nitric oxide-induced programmed cell death in human neuroblastoma cells is accompanied by the synthesis of Egr-1, a zinc finger transcription factor, J. Neurosci. Res., in press]. Our results indicate that NO-induced signaling specifically elevates the transcriptional activation potential of the ternary complex factor Elk-1. The finding that Elk-1 is part of a NO-induced signaling cascade in neuronal cells logically requested a search for Elk-1-regulated genes. Therefore, we measured Egr1 promoter activities, following administration of NOC-18, and detected an increase in Egr-1 promoter controlled reporter gene transcription, indicating that the Egr-1 gene is a nuclear target for NO signaling in SH-SY5Y cells. Following the NOinduced signaling cascade in SH-SY5Y cells, we demonstrated that NO stimulates the biosynthesis of Egr-1. Furthermore, a striking increase in the transcriptional activation potential of Egr-1-responsive genes was measured, due to elevated concentrations of Egr-1. Taken together, these findings suggest that Egr-1 may be an integral part of the NO-triggered apoptosis signaling cascade in SH-SY5Y neuroblastoma cells. A model for the mechanism of action of Egr-1 following NOinduced apoptosis in SH-SY5Y cells is proposed in fig. 4.
228 NO
I ERK - » - Elk1 Egr-1 -« '
I
Egr-1 responsive genes
I apoptosis Figure 4. Mechanism of action of Egr-1 following NO-induced apoptosis in SH-SY5Y cells.
7
Acknowledgements
The author wishes to thank Prof. Gerald Thiel for scientific collaboration and helpful discussion, Dr. Beatrice Greco for critical reading of the manuscript and Prof. Carlo Di Benedetta for continuous support on this project. References 1. Abe K., Kawagoe J., Sato S., Sahara M , Kogure K., Induction of the zinc finger gene after transient focal ischemia in rat cerebral cortex, Neurosci. Lett. 123 (1991) pp. 248-250. 2. Ahmed M. M., Sells S. F., Venkatasubbarao K., Fruitwala S. M., Muthukkumar S. , Harp C , Mohiuddin M., Rangnekar V. M., Ionizing radiation-inducible apoptosis in the absence of p53 liked to transcription factor EGR-1, J. Biol. Chem. 272 (1997) pp. 33056-33061. 3. Beckmann A. M. and Wilce P. A., Egr transcription factor in the nervous system, Neurochem. Int. 31 (1997) pp. 477-510. 4. Beckmann A. M., Matsumoto I. and Wilce P. A., AP-1 and Egr DNA-binding activities are increased in rat brain during ethanol withdrawal, J. Neurochem. 69 (1997) pp. 306-314. 5. Beckmann A. M., Matsumoto I., Wilce P. A., Immediate early gene expression during morphine withdrawal, Neuropharmacology 34 (1995) pp. 1183-1189. 6. Bhat R. V, Worley P. F, Cole A. J, Baraban J. M., Activation of the zinc finger encoding gene krox-20 in adult rat brain: comparison with zif268, Brain Res. Mol. Brain Res. 13 (1992) pp. 263-266. 7. Bing G. Y., Filer D., Miller J. C , Stone E. A., Noradrenergic activation of immediate early genes in rat cerebral cortex, Brain Res. Mol. Brain Res. 11 (1991)pp.43-46.
229 8. Cao X., Koski R. A., Gashler A., McKiernan M , Morris C. F., Gaffney R, Hay R. V., Sukhatme V. P., Identification and characterization of the Egr-1 gene product, a DNA-binding zinc finger protein induced by differentiation and growth signals , Mol. Cell. Biol. 10 (1990) pp 1931-1939. 9. Catania M. V., Copani A., Calogero A., Ragonese G. I., Condorelli D. F., Nicoletti F., An enhanced expression of the immediate early gene, Egr-1, is associated with neuronal apoptosis in culture, Neuroscience 91 (1999) pp. 1529-1538. 10. Changelian P. S., Feng P., King T. C , Milbrandt J., Structure of the NGFI-A gene and detection of upstream sequences responsible for its transcriptional induction by nerve growth factor, Proc. Natl. Acad. Sci. U S A. 86 (1989) pp. 377-381. 11. Christy B. A., Lau L. F., Nathans D., A gene activated in mouse 3T3 cells by serum growth factors encodes a protein with zinc finger sequences, Proc. Natl. Acad. Sci. USA 85 (1988) pp. 7857-7861. 12. Cibelli G., Corsi P., Diana G., Vitiello F., Thiel G., Corticotropin-releasing factor triggers neurite outgrowth of a catecholaminergic immortalized neuron via cAMP and MAP kinase signalling pathways, Eur. J. Neurosci. 13 (2001) pp. 1339-1348. 13. Cole A. J., Saffen D. W., Baraban J. M., Worley P. F., Rapid increase of an immediate early gene messenger RNA in hippocampal neurons by synaptic NMDA receptor activation, Nature 10 (1989) pp. 474-476. 14. Day M. L., Fahrner T. J., Aykent S., Milbrandt J., The zinc finger protein NGFI-A exists in both nuclear and cytoplasmic forms in nerve growth factorstimulated PC12 cells, J. Biol. Chem. 265 (1990) pp. 15253-15260. 15. DeFranco C , Damon D. H., Endoh M., Wagner J. A., Nerve growth factor induces transcription of NGFIA through complex regulatory elements that are also sensitive to serum and phorbol 12-myristate 13-acetate, Mol. Endocrinol. 7 (1993) pp. 365-379. 16. Ebling F. J., Maywood E. S., Staley K., Humby T., Hancock D. C , Waters C. M., Evan G. I. and Hastings M. H., The role of N-methyl-D-aspartate-type glutamatergic neurotrasmission in the photic induction or immediate-early gene expression in the suprachiasmatic nuclei of the Syrian hamster, J. Neuroendocrinol. 3 (1991) pp. 641-652. 17. Gashler A. and Sukhatme V. P., Early growth response protein 1 (Egr-1): prototype of a zinc-finger family of transcription factors, Prog. Nucl. Ac. Res. And Mol. Biol. 50 (1995) pp. 191-224. 18. Gashler A. L., Swaminathan S., Sukhatme V. P., A novel repression module, an extensive activation domain, and a bipartite nuclear localization signal defined in the immediate-early transcription factor Egr-1, Mol. Cell. Biol. 13 (1993) pp. 4556-4571.
230
19. Hallahan D. E., Sukhatme V. P., Sherman M. L., Virudachalam S., Kufe D., Weichselbaum R. R., Protein kinase C mediates x-ray inducibility of nuclear signal transducers EGR1 and JUN, Proc. Natl. Acad. Sci. U S A. 88 (1991) pp. 2156-2160. 20. Harada T., Morooka T., Ogawa S., Nishida E., ERK induces p35, a neuronspecific activator of Cdk5, through induction of Egrl, Nat. Cell. Biol. 3 (2001) pp. 453-459. 21. Honkaniemi J., Sagar S. M., Pyykonen I., Hicks K. J., Sharp F. R., Focal brain injury induces multiple immediate early genes encoding zinc finger transcription factors, Brain Res. Mol. Brain Res. 28 (1995) pp. 157-163. 22. Hu R. M., Levin E. R., Astrocyte growth is regulated by neuropeptides through Tis 8 and basic fibroblast growth factor, J Clin Invest. 93 (1994) pp.1820-1827. 23. Kaufmann K., Thiel G., Epidermal growth factor and platelet-derived growth factor induce expression of Egr-1, a zinc finger transcription factor, in human malignant glioma cells, J. Neurol. Sci. 189 (2001) pp.83-91. 24. Kawasaki H., Springett G. M., Mochizuki N., Toki S., Nakaya M., Matsuda M., Housman D. E., Graybiel A. M., A family of cAMP-binding proteins that directly activate Rapl, Science 282 (1998) pp. 2275-2279. 25. Kharbanda S., Nakamura T., Stone R., Hass R., Bernstein S., Datta R., Sukhatme V. P., Kufe D., Expression of the early growth response 1 and 2 zinc finger genes during induction of monocytic differentiation, J Clin. Invest. 88 (1991)pp.571-757. 26. Kharbanda S., Rubin E., Datta R., Hass R., Sukhatme V., Kufe D., Transcriptional regulation of the early growth response 1 gene in human myeloid leukemia cells by okadaic acid, Cell Growth Differ. 4 (1993) pp. 1723. 27. Leah J. D., Herdegen T., Murashov A., Dragunow M., Bravo R., Expression of immediate early gene proteins following axotomy and inhibition of axonal transport in the rat central nervous system, Neuroscience 57 (1993) pp.53-66. 28. Lenaire P., Revelant O., Bravo R., Charnay P., Two mouse genes encoding potential transcription factors with identical DNA-binding domains are activated by growth factors in cultured cells, Proc. Natl. Acad. Sci. USA 85 (1988) pp. 4691-4695. 29. Levkovitz Y., O'Donovan K. J., Baraban J. M., Blockade of NGF-induced neurite outgrowth by a dominant-negative inhibitor of the egr family of transcription regulatory factors, J Neurosci. 21 (2001) pp. 45-52. 30. Lim C.P. , Jain N., Cao X., Stress-induced immediate-early gene, egr-1, involves activation of p38/JNKl, Oncogene. 16 (1988) pp. 2915-2926. 31. Lim R. W., Varnum B. C , Herschman H. R., Cloning of tetradecanoyl phorbol ester-induced primary response sequences and their expression in densityarrested swiss 3T3 cells and a TPA nonproliferative variant, Oncogene 1 (1987) pp. 263-270.
231 32. Liu C , Rangnekar V. M., Adamson E., Mercola D., Suppression of growth and transformation and induction of apoptosis by EGR-1, Cancer Gene Ther. 5 (1998) pp. 3-28. 33. McMahon S. B., Monroe J. G., A ternary complex factor-dependent mechanism mediates induction of egr-1 through selective serum response elements following antigen receptor cross-linking in B lymphocytes, Mol. Cell. Biol. 15 (1995) pp. 1086-1093. 34. Milbrandt J., A nerve growth-induced gene encodes a possible transcriptional regulatory factor, Science 238 (1987) pp. 797-799. 35. Moratalla R., Robertson H. A., Graybiel A. M., Dynamic regulation of NGFI-A (zif268, egrl) gene expression in the striatum, J. Neurosci. 12 (1992) pp. 26092622. 36. Murphy T. H., Worley P. F., Baraban J. M., L-type voltage-sensitive calcium channels mediate synaptic activation of immediate early genes, Neuron. 7 (1991) pp. 625-635. 37. Muthukkumar S., Nair P., Sells S. F., Maddiwar N. G., Jacob R. J., Rangnekar V. M., Role of EGR-1 in thapsigargin-inducible apoptosis in the melanoma cell line A375-C6, Mol. Cell. Biol. 15 (1995) pp. 6262-6272. 38. Nair P , Muthukkumar S., Sells S. F., Han S.-S., Sukhatme V. P , Rangnekar V. M., Early growth response-1-dependent apoptosis is mediated by p53, J. Biol. Chem. 272 (1997) pp. 20131-20138. 39. Pang, L., Sawada, T., Decker, S. J. and Saltiel, A. R., Inhibition of MAP kinase kinase blocks the differentiation of PC-12 cells induced by nerve growth factor, J. Biol. Chem. 270 (1995) pp. 13585-13588. 40. Russo M. W., Matheny C , Milbrandt J., Transcriptional activity of the zinc finger protein NGFI-A is influenced by its interaction with a cellular factor, Mol. Cell. Biol. 13 (1993) pp. 6858-6865. 41. Russo M. W., Sevetson B. R., Milbrandt J., Identification of NAB1, a repressor of NGFI-A- and Krox20-mediated transcription, Proc. Natl. Acad; Sci. U S A . 92 (1995) pp. 6873-6877. 42. Sakamoto K. M., Bardeleben C , Yates K. E., Raines M. A., Golde D. W., Gasson J. C , 5' upstream sequence and genomic structure of the human primary response gene, EGR-1/TIS8, Oncogene. 6 (1991) pp. 867-871. 43. Sakamoto KM, Fraser JK, Lee HJ, Lehman E, Gasson J C , Granulocytemacrophage colony-stimulating factor and interleukin-3 signaling pathways converge on the CREB-binding site in the human egr-1 promoter, Mol. Cell. Biol. 14 (1994) pp. 5975-5985. 44. Simpson C. S., Morris B. J., Stimulation of zif/268 gene expression by basic fibroblast growth factor in primary rat striatal cultures, Neuropharmacology 34 (1995) pp. 515-520. 45. Sukhatme V. P., Cao X., Chang L. C , Tsai-Morris C , Stamenkovich D., Ferreira P. C. P., Cohen D. R., Edward S. A., Shows T. B., Curran T., Le Beau
232
46.
47.
48.
49.
50.
51. 52.
53.
54.
55.
56.
M. M., Adamson E. D., A zinc finger -encoding gene coregulated with c-fos during growth and differentiation, and after cellular depolarization, Cell 53 (1988) pp. 37-43. Svaren J., Sevetson B. R., Apel E. D., Zimonjic D. B., Popescu N. C , Milbrandt J., NAB2, a corepressor of NGFI-A (Egr-1) and Krox20, is induced by proliferative and differentiative stimuli, Mol. Cell. Biol. 16 (1996) pp. 35453553. Svenningsson P., Johansson B., Fredholm B. B., Caffeine-induced expression of c-fos mRNA and NGFI-A mRNA in caudate putamen and in nucleus accumbens are differentially affected by the N-methyl-D-aspartate receptor antagonist MK-801, Brain Res. Mol. Brain Res. 35 (1996) pp. 183-189. Swirnoff A. H. and Milbrandt J., DNA-binding specificity of NGFI-A and related zinc finger transcription factors, Mol. Cell. Biol. 15 (1995) pp. 22752287. Thiel G., Kaufmann K., Magin A., Lietz M., Bach K., Cramer M., The human transcriptional repressor protein NAB1: expression and biological activity, Biochim. Biophys. Acta. 1493 (2000) pp. 289-301. Traverse S., Gomez N., Paterson H., Marshall C , Cohen P., Sustained activation of the mitogen-activated protein (MAP) kinase cascade may be required for differentiation of PC 12 cells. Comparison of the effects of nerve growth factor and epidermal growth factor, Biochem J. 288 (1992) pp. 351-355. Treisman R., Identification and purification of a polypeptide that binds to the cfos serum response element, EMBO J. 6 (1987) pp. 2711-2717. Vaccarino F. M., Hayward M. D., Le H. N., Hartigan D. J., Duman R. S., Nestler E. J., Induction of immediate early genes by cyclic AMP in primary cultures of neurons from rat cerebral cortex, Brain Res. Mol. Brain Res. 19 (1993) pp. 76-82. Vaccarino F. M., Hayward M. D., Nestler E. J., Duman R. S. and Tallman J. F., Differential induction of immediate early genes by excitatory amino acid receptor types in primary cultures of cortical and striatal neurons, Mol. Brain Res. 12 (1992) pp. 233-241. Vossler, M. R., Yao, H., York, R. D , Pan, M.-G., Rim, C. S. and Stork, P. J. S, cAMP activates MAP kinase and Elk-1 through a B-Raf- and Rap 1-dependent pathway, Cell 89 (1997) pp. 73-82. Waters C. M., Hancock D. C , Evan G. J., Identification and characterisation of the egr-1 gene product as an inducible, short-lived, nuclear phosphoprotein, Oncogene 5 (1990) pp 669-674. Watson M. A., Milbrandt J., Expression of the nerve growth factor-regulated NGFI-A and NGFI-B genes in the developing rat, Development 110 (1990) pp. 173-183.
233
57. Wilce P. A., Le F., Matsumoto I., Shanley B. C , Ethanol inhibits NMDAreceptor mediated regulation of immediate early gene expression, Alcohol Alcohol Suppl. 2 (1993) pp. 359-363. 58. York, R. D., Yao, H., Dillon, T., Ellig, C. L., Eckert, S. P., McCleskey, E. W. and Stork, P. J. S., Rapl mediates sustained MAP kinase activation induced by nerve growth factor, Nature 392 (1998) pp. 622-628.
234 G E O M E T R I C A L A S P E C T S OF P R O T E I N FOLDING
CRISTIAN MICHELETTI International
School for Advanced Studies (S.I.S.S.A.) Via Beirut 2-4, 34014 Trieste, Italy E-mail: [email protected]
and
INFM,
An increasing amount of experimental evidence is supporting the view that certain aspects of protein folding are more influenced by topological issues rather than chemical details. Here we focus on two questions stimulated by these observations: (a) is it possible to exploit the information contained in the native shape of proteins to obtain clues about the main events of the folding process? (b) can one identify a general mechanism, based on geometrical considerations, that can account for the ubiquitous presence of secondary motifs (such as helices) in proteins? We tackle both questions with concepts and tools that are particularly apt for revealing the role exerted by the native-state topology in the folding process. In particular we show that the mere knowledge of the native shape of a viral enzyme, the HIV-1 protease, allows a reliable identification of the key sites that ought to be targetted by inhibiting drugs. Finally, concerning the wide presence of secondary motifs in proteins, we present a selection criterion based on optimal packing requirement that is able to single out protein-like helices among all possible three-dimensional structures. This may hint at a general criterion adopted by nature to promote viable protein motifs.
1
Introduction
Two of the properties that distinguish small globular proteins from random heteropolymers are the ubiquitous presence of recurrent geometrical motifs and the ability to fold rapidly and reversibly into the native state, i.e. the shape providing maximum biological activity. It is generally believed that these special properties are the result of evolutionary pressure to optimise the protein chemical composition. Recently, an increasing amount of evidence has accumulated showing that, besides detailed chemistry, also the geometrical shape of native states has been especially selected to optimise the folding process 1 . Here we focus on two questions that arise spontaneously from these considerations. Firstly we try to characterize the main events of the folding process by using schematic topology-based models. It is found that there are a number of obligatory steps that heavily influence the whole folding process. The knowledge of such crucial stages is not only of theoretical interest but could be used to develop drugs tailored to target viral enzymes. In the next section we report a validation of such strategy for the HIV-1 protease. In the last section we focus on a more general problem, namely the ubiquitous presence of secondary motifs (such as helices and sheets) in natu-
235
ral proteins 2 . The presence of secondary structures was first predicted by Pauling 3 with a reasoning involving saturation of hydrogen-bonds. It is interesting to note, however, that the number of hydrogen bonds is nearly the same when a sequence is in an unfolded structure in the presence of a polar solvent or in its native state rich in secondary structure content 4 . More recently, a number of studies 4 ' 5 ' 6 have attempted to re-visit the emergence of secondary motifs in terms of general geometric criteria (rather than invoking chemical affinities and propensities) but failed to observe a realistic secondary content. In our study we have considered a novel perspective where "thick" three-dimensional structures are selected in terms of their ability to be optimally packed, i.e. space-filling. It will be shown that this simple requirement is sufficient to select helical shapes with the same aspect ratio observed in natural proteins. It is a pleasure to acknowledge the collaboration with Jayanth Banavar, Paolo Carloni, Fabio Cecconi, Amos Maritan, Flavio Seno and Antonio Trovato who have contributed to the results discussed here. 2
Topology-based study of the HIV-1 protease folding process
A major advancement in the characterization and understanding of the folding process was the discovery that small proteins under physiological conditions can fold reproducibly into their unique native state 7 . This result posed the folding problem, that is the prediction of the native structure from the knowledge of the protein chemical composition. Despite the enormous progress made in the field, protein folding from first principles is still an unsolved problem. This is probably due to the fact that the detailed characterization of the folding process entails the study of non-equilibrium dynamics in a rugged free energy landscape 8 . Ultimately, the advent of more powerful computers will certainly allow to simulate the detailed folding dynamics of entire proteins. In a striking contrast, several recent advancements have been possible thanks to the introduction of concepts and folding models of surprising simplicity, as recently pointed out by D. Baker x . At the heart of this line of investigation there are some recent theoretical and experimental studies showing that the topology of the native structure of a protein plays an important role in determining many of the attributes of the folding process 9 ' 1,10 ' 11 ' 12 ' 13 ' 14 (also see the contribution in this volume of Lattanzi and Maritan "The physics of motor proteins"). The various theoretical models that are able to capture the influence of the native geometry on the folding process are generally referred to as "topologybased" . Their common starting point is the knowledge of the native confor-
236
mation that is exploited in order to construct effective energy functions that admit the target native state as the one with lowest energy. The characterization of the folding process is then carried out in terms of the most probable routes that take to the target structure starting from an arbitrary unfolded (disordered) polymer configuration. Here we focus on a model 9 that ascribes a favorable attractive energy to the native contacts, so to bias the folding dynamics towards the known native state, I V One of the simplest (or perhaps, the simplest) energy scoring functions that accomplishes this is defined as follows:
£?(r) = - £ > £ .
A?;,.,
(l)
where To is the known native state, T is a trial structure that has the same length of T 0 . A s is the contact matrix of structure 5 , whose element Ay is 1 if residues i and j are in contact in the native state (i.e. their Ca separation is below the cutoff r = 6.5 A) and 0 otherwise. This symmetric matrix encodes the topology of the protein. The energy-scoring function of Eq. (1) ensures that the state of lowest energy is attained in correspondence of structures with the same contact map of To- This, in principle, may lead to a degenerate ground state since more than one structure can be compatible with a given contact matrix. In practice, however, unless one uses unreasonably small values of r, the degenerate structures are virtually identical. In fact, for r K, 6.5 A the number of distinct contacts is about twice the protein length; this number of constraints nicely matches the number of degrees of freedom of the peptide (two dihedral angles for each non-terminal CA), thus avoiding both under- and over-constraining the ground states. The introduction of this type of topology-based folding models can be traced back to the work of Go and Scheraga 9 . For a long time, the interesting property of these systems was the presence of an all-or-none folding process, that is the finite-size equivalent of first-order transitions in infinite systems. This is illustrated in the example of Fig. 1 where we have reported energy and specific heat of the model applied to the target protein 1HJA; the code refers to the Protein Data Bank tag. The plotted data were obtained through stochastic (Monte Carlo) equilibrium (constant-temperature) samplings. It is interesting to note the presence of a peak, that can be identified with the folding transition of the model system. At the peak, about 50 % of the native structure (measured as the fraction of formed native contacts 16,17 ) is formed, consistently with analogous results on different proteins 15 . It is, however, possible to investigate the equilibrium properties of the system in finer
237
1hja -20 -30
-50 -60
100
80
d<E>/dT
60 40 20 0
Figure 1. Plots of the energy (top) and specific heat (bottom) as a function of temperature for protein lhja. The curves were obtained through histogram reweighting techniques.
detail, for example examining the probabilities of individual native contacts to be formed at the various temperatures. Naturally, at high temperatures all contacts will be poorly formed, while at sufficiently low temperatures they will all be established. It is then tempting, and physically appealing, to draw an analogy between this progressive establishment of native structure and the one observed in a real folding process. However, in principle, the equilibrium properties of our model need not parallel the dynamical ones of the real system. Thus, it was a striking surprise when we established that, indeed, a qualitative and even quantitative connection between the two processes could be drawn 10 . In the past years other groups have used similar or alternative techniques to elucidate the role of the native state topology in the folding process n. 12 . 13 . 14 ^ confirming the picture outlined here. An initial validation of this strategy was carried out by considering two target proteins, chymotrypsin inhibitor and barnase, that have been widely investigated in experiments. For each of them we generated several hundred structures having about 40 % native content. It turned out that the most frequent contacts shared by the native conformation of 2ci2 with the others involved the helical-residues 30-42 (see Fig. 2). Contacts involving such
238
residues were shared by 56% of the sampled structures. On the other hand, the rarest contacts pertained to interaction between the helix and /?-strands and between the /3-strands themselves. A different behaviour (see Fig. 2) was found for barnase, where, again, for overlap of « 40%, we find many contacts pertaining to the nearly complete formation of helix 1 (residues 8-18), a partial formation of helix 2, and bonds between residues 26-29 and 29-32 as well as several non-local contacts bridging the /^-strands, especially residues 51-55 and 72-75.
Figure 2. Ribbon plot (obtained with RASMOL) of 2ci2 (left) and barnase (right). The residues involved in the most frequent contacts of alternative structures that form « 40% of the native interactions are highlighted in black. The majority of these coincide with contacts that are formed at the early stages of folding.
Both this picture, and the one described for CI2 are fully consistent with the experimental results obtained by Fersht and co-workers in mutagenesis experiments 18 ' 19 . In such experiments, the key role of an amino acid at a given site is probed by mutating it and measuring the changes in the folding and equilibrium characteristics. By measuring the change of the folding/unfolding equilibrium constant one can introduce a parameter, termed Rvalue, which is zero if the mutation is irrelevant to the folding kinetics and 1, if the change in folding propensity mirrors the change in the relative stability of the folded and unfolded states (intermediate values are, of course, possible). Ideally, the measure of the sensitivity to a given site should be measured as a suitable susceptibility to a small perturbation of the same site (or its environment).
239 Unfortunately, this is not easily accomplished experimentally, since substitution by mutation can be rarely regarded as a perturbation. Notwithstanding this difficulty, from the analysis of the Rvalues obtained by Fersht, a clear picture for the folding stages of CI2 and barnase emerges. In both cases, the crucial regions for both proteins are the same as those identified through the analysis of contact formation probability reported above. This provides a sound a posteriori justification that it is possible to extract a wealth of information about the sites involved in crucial stages of the folding process. Despite the fact, that such sites are determined from the analysis of their crucial topological role with respect to the native state with no input of the actual protein composition, they correlate very well with the key sites determined experimentally. A striking example is provided in the following subsection, which focuses on an enzyme encoded by the HIV virus. It the following we shall show that from the mere knowledge of the contact map of the enzyme, one can isolate a handful of important sites which correlate extremely well the key mutating sites determined in clinical trials of anti-AIDS drugs. 2.1
Application to HIV-1 protease: drug resistance and folding Pathways
To further corroborate the validity of the proposed model in capturing the most delicate folding steps we consider an application to an important enzyme, the protease of the HIV-1 virus (pdb code laid), which plays an essential role in the spreading of the viral infection. Through extensive clinical trials 20 , it has been established that there is a well-defined set of sites in the enzyme that are crucial for developing, through suitable mutations, resistance against drugs and which play a crucial role in the folding process 21 . To identify the key folding sites we looked for contacts whose establishment further enhances the probability of other contacts to be formed. A possible criterion to identify such contacts is through their contribution to the overall specific heat. At a fixed temperature, T, the average energy of the system described by the Hamiltonian (1) can be written as:
(E(T)) = -J2^PiAT)
(2)
'J
where the pij(T) is the equilibrium probability of residues i and j to be in contact. Hence, the specific heat of the system is:
c; m = Ǥ23>__ EA jj*affi.
(3,
240
Thus, contribution of the various contacts to the specific heat will be then proportional to how rapidly the contact is forming as temperature is lowered. The contacts relevant for the folding process, will be those giving the largest contribution to Cv at (or above) the folding transition temperature. Armed with this insight, we can use this deterministic criterion to rank the contacts in order of importance. Our simulations on the protease of HIV-1 21 , are based on an energyscoring function that is more complex than Eq. (1). As usual, amino acids are represented as effective centroids placed on Ca atoms, while the peptide bond between two consecutive amino acids, i, i + 1 at distance r ^ + i is described by the anharmonic potential adopted by Clementi et al. 22 , with parameters a = 20, 6 = 2000. The interaction among non-consecutive residues is treated again in Go-like schemes9 which reward the formation of native contacts with a decrease of the energy scoring function. Each pair of non-consecutive amino acids, i and j , contributes to the energy scoring function by an amount:
Votf? »j
+ 5V1(l-ArHj)(j^j
,
(4)
where ro = 6.8A, r^- denotes the distance of amino acids i and j in the native structure and A r ° is the native contact matrix built with an interaction cutoff, r, equal to 6.5 A. Vo and \\ are constants controlling the strength of interactions (VQ = 20, V\ = 0.05 in our simulations). Constant temperature molecular dynamics simulations were carried out where the equations of motion are integrated by a velocity-Verlet algorithm combined with the standard Gaussian isokinetic scheme 23,21 . Unfolding processes can be studied within the same framework by warming up starting from the native conformation (heat denaturation). The free-energy, the total specific-heat, Cv, and contributions of the individual contacts to Cv were obtained combining data sampled at different equilibrium temperatures with multiple histogram techniques 24 . The thermodynamics quantities obtained through such deconvolution procedures did not depend, within the numerical accuracy, on whether unfolding or refolding paths were followed. The contacts that contribute more to the specific heat peak are identified as the key ones belonging to the folding bottleneck and sites sharing them as the most likely to be sensitive to mutations. Furthermore, by following several individual folding trajectories (by suddenly quenching unfolded conformations below the folding transition temperature, Tf0u) we ascertained that all such
241
dynamical pathways encountered the same kinetic bottlenecks determined as above. For the ft sheets, the bottlenecks involve amino acids that are typically 3-4 residues away from the turns - specifically, residues 61, 62, 72, 74 for ft, 10, 11, 12, 21, 22, 23 for ft and 44, 45, 46, 55, 56, 57 for ft. At the folding transition temperature, T/ 0 w, the formation of contacts around residues 30 and 86 is observed. The largest contribution to the specific heat peak is observed from contacts 29-86 and 32-76 which are, consequently, identified as the most crucial for the folding/unfolding process, and denote this set as the "transition bottleneck", (TB). Such sites are physically located at the active site of HIV-1 PR, which is targeted by anti AIDS drugs 25 . Hence, within the limitations of our simplified approach, we predict that changes in the detailed chemistry at the active site also ruin key steps of the folding process. To counteract the drug action, the virus has to perform some very delicate mutations at the key sites; within a random mutation scheme this requires many trials (occurring over several months). The time required to synthesize a mutated protein with native-like activity is even longer if the drug attack correlates with several bottlenecks simultaneously. This is certainly the case for several anti-AIDS drugs. Indeed Table 1 summarizes the mutations for the FDA approved drugs 20 . In Table 2, we list the sites taking part to the three most important contacts in each of the four bottlenecks TB, ft, ft and ft. Remarkably, among the first 23 most crucial sites predicted by our method, there are 6 sites in common with the 16 distinct mutating sites of Table 1. The relevance of these matches can be assessed by calculating the probability of occurrence by chance. By using simple combinatorial calculations, it is found that the probability to observe at least 6 matches with the key sites of Table 1 by picking 12 contacts at random among the native ones is approximately 1 %. This result highlights the high statistical correlation between our prediction and the evidence accumulated from clinical trials. In conclusion, the strategy presented here, which is entirely based on the knowledge of the native structure of HIV-1 protease, allows one both to identify the bottlenecks of the folding process and to explain their highly significant match with known mutating residues 21 . This and similar approaches should be applicable to identify the kinetic bottlenecks of other viral enzymes of pharmaceutical interest. This could allow a fast development of novel inhibitors targetting the kinetic bottlenecks. This is expected to dramatically enhance the difficulty for the virus to express mutated proteins which still fold efficiently into the same native state with unaltered functionality.
242
Name
Point Mutations
Bottlenecks
R T N 26.27
20, 33, 35, 36, 46, 54, 63, 71, 82, 84, 90 30, 46, 63, 71, 77, 84, 10, 32, 46, 63,71, 82, 84 10, 46, 48, 63, 71, 82, 84, 90 46, 63, 82, 84
TB, 01,02,03
NLF28 I N D 29.30 S Q V 29.30,31
APR32
TB, 0 2 , 0 3 TB, 01,02,03
TB,0i,0 2 ,03 TB, 0 2 , 03
Table 1. Mutations in the protease associated with FDA-approved drug resistance 2 0 . Sites highlighted in boldface are those involved in the folding bottlenecks as predicted by our approach. Pi refers to the bottleneck associated with the formation of the i-th /3-sheet, whereas T B refers to the bottleneck occurring at the folding transition temperature Tf0id (see next Table).
Bottleneck TB 0i 02
ft
Key sites 22, 29, 32, 76, 84, 86 10,11,13,20,21,23 44,45,46,55,56,57 61,62,63,72,74
Table 2. Key sites for the four bottlenecks. For each bottleneck, only the sites in the top three pairs of contacts have been reported.
3
Optimal shape of a compact polymeric chain
Optimal geometrical arrangements, such as the stacking of atoms, are of relevance in diverse disciplines. A classic problem is the determination of the optimal arrangement of spheres in three dimensions in order to achieve the highest packing fraction; only recently has it been proved 33 ' 34 that the answer for infinite systems is a succession of tightly packed triangular layers, as conjectured by Kepler several centuries ago. This problem has had a profound impact in many areas ranging from the crystallization and melting of atomic systems, to optimal packing of objects and subdivision of space 33,34,35,36,37 The close-packed hard sphere problem is simply stated: given N hard spheres of radius R, how should we arrange them so that they can be fit in the box with smallest possible side, IP. Interestingly, the role of R and L can be reversed in the following alternative, but equivalent, formulation: given a set of N points inside a box of side L, how should we arrange them so that the spheres centred in them have the (same) maximum radius, R? Also in this second case, as in the first one, the spheres are not allowed to self intersect or
243
cross the box boundaries. Here we study an analogous problem, that of determining the optimal shapes of closely packed compact strings. This problem is a mathematical idealization of situations commonly encountered in biology, chemistry and physics, involving the optimal structure of folded polymeric chains. Biopolymers like proteins have three dimensional structures which are rather compact. Furthermore, they are the result of evolution and one may think that their shape may satisfy some optimality criterion. This naturally leads one to consider a generalization of the packing problem of hard spheres to the case of flexible tubes with a uniform cross section. The packing problem then consists in finding the tube configuration which can be enclosed in the minimum volume without violating any steric constraints. As for the "free spheres" case, also this problem admits a simple equivalent re-formulation that we found more apt for numerical implementation. More precisely we sought the curve which is the axis, or centerline, of the thickest tube (the analog of the sphere centers in the hard sphere packing problem) that can be confined in the pre-assigned volume 38 . The maximum thickness associated to a given centerline is elegantly defined in terms of concepts recently developed in the context of ideal knot shapes 3 9 ' 4 0 ' 4 1 , 4 2 ' 4 3 ' 4 4 . The thickness A denotes the maximum radius of a uniform tube with the string passing through its axis, beyond which the tube either ceases to be smooth, owing to tight local bends, or it self-intersects. The presence of tight local bends is revealed by inspecting the local radius of curvature along the centerline. In our numerical attempt to solve the problem, our centerline was represented as a succession of equidistant beads. The local radius of curvature was then measured as the radius of the circumcircle going through three consecutive points. Remarkably, the same framework can be used to deal with the non-local restrictions to the maximum thickness occurring when two points, at a finite arclength separation, come in close approach. In this case one can consider the smallest radius of circles going through any non-local triplet of points. When both local and non-local effects are taken into account, one is naturally lead to define the thickness of the chain by considering all triplets of particles and selecting the smallest among all the radii 42 . For smooth centerlines, an appreciable reduction of the complexity of the algorithm can be found by considering only triplets where at least two of the points are consecutive 42 . Besides this intrinsic limitations to the thickness, one also needs to consider the extrinsic ones due to the presence of a confining geometry. In fact, the close proximity of the centerline to the walls of the confining box may further limit the maximum thickness.
244
As for the packing of free spheres, also the present case is sensitive to the details of the confining geometries when the system is finite. An example of the variety of shapes resulting from the choice of different confining geometries is given in Fig. 3.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3. Examples of optimal strings. The strings in the figure were obtained starting from a random conformation of a chain made up of N equally spaced points (the spacing between neighboring points is defined to be 1 unit) and successively distorting the chain with pivot, crankshaft and slithering moves. A stochastic optimization scheme (simulated annealing) is used to promote structures that have larger and larger thickness. Top row: optimal shapes obtained by constraining strings of 30 points with a radius of gyration less than R. a) R = 6.0, A = 6.42 b) R = 4.5, A = 3.82 c) R = 3.0, A = 1.93. Bottom row: optimal shapes obtained by confining a string of 30 points within a cube of side L. d) L = 22.0, A = 6.11 e) L = 9.5, A = 2.3 f) L - 8.1, A = 1.75.
In order to reveal the "true" bulk solution one needs to adopts suitable boundary conditions. The one that we found most useful and robust was to replace the constraint on the overall chain density with one working at a local level. In fact, we substituted the fixed box containing the whole chain, with the requirement that any succession of n beads be contained in a smaller box of side /. The results were insensitive (unless the discretization of the chain was poor) to the choice of n, I and even to replacing the box with a sphere etc.
245
The solutions that emerged out of the optimizaton procedure were perfectly helical strings, corresponding to discretised approximations to the continuous helix represented in Fig. 4b, confirming that this is the optimal arrangement. In all cases, the geometry of the chosen helix is such that there is an equality of the local radius of curvature (determined by the local bending of the curve) and the radius associated with a suitable triplet of non-consecutive points lying in two successive turns of the helix. In other words, among all possible shapes of linear helices, the one selected by the optimization procedure has the peculiarity that the local radius of curvature is equal to the distance of successive turns. Hence, if we inflate uniformly the centerline of this helix, one observes that the tube contacts itself near the helix axis exactly when successive turns touch. This is a feature that is observed only for a special ratio c* = 2.512... of the pitch, p, and the radius, r, of the circle projected by the helix on a plane perpendicular to its axis. As this packing problem is considerably more complicated that the hard spheres one, we have little hope to prove analytically that, among all possible three-dimensional chains, the helix of Fig. 4b is the optimally packed one. However, if we assume that the optimal shape is a linear helix, it is not too difficult to explain why the "magic" ratio we can explain why the "magic" ratio p/r = c* is observed. In fact, when p/r > c* the local radius of curvature, given by p = r{\ + p2 /(2wr)2), is smaller than the half of the distance of closest approach of points on successive turns of the helix (see 4a). The latter is given by the first minimum of 1/2-^2 - 2cos(2?rt) +p2t2 for t > 0. Thus A = p in this case. On the other hand, if p/r < c*, the global radius of curvature is strictly lower than the local radius, and the helix thickness is determined basically by the distance between two consecutive helix turns: A ~ p/2 if p/r 1 in the 'local' regime and / < 1 in the 'non-local' regime. In our computer-generated optimal strings, the value of / averaged over all sites in the chain differed from unity by less than a part in a thousand. It is interesting to note that, in nature, there are many instances of the appearance of helices. It has been shown 10 that the emergence of such motifs in proteins (unlike in random heteropolymers which, in the melt, have structures conforming to Gaussian statistics) is the result of the evolutionary pressure
246
(a)
(b)
(c)
Figure 4. Maximally inflated helices with different pitch to radius ratio, c. (a) c = 3.77: the thickness is given by the local radius of curvature, (b) c = 2.512...: for this optimal value the local and non-local radii of curvature match, (c) c = 1.26: the maximum thickness is limited by non-local effects (close approach of points in successive turns). Note the optimal use of space in situation (b), while in cases (a) and (c), empty space is left between the turns or along the helix axis.
exerted by nature in the selection of native state structures that are able to house sequences of amino acids which fold reproducibly and rapidly 38 and are characterized by a high degree of thermodynamic stability 17 . Furthermore, because of the interaction of the amino acids with the solvent, globular proteins attain compact shapes in their folded states. It is then natural to measure the shape of these helices and assess if they are optimal in the sense described here. The measure of / in a-helices found in naturally-occurring proteins yields an average value for / of 1.03±0.01, hinting that, despite the complex atomic chemistry associated with the hydrogen bond and the covalent bonds along the backbone, helices in proteins satisfy optimal packing constraints. An example is provided in Fig. 5 where we report the value of / for a particularly long a-helix encountered in a heavily-investigated membrane protein, bacteriorhodopsin.
247
1c3w (first helix)
3.4 3.2 3 2.8 2.6 2.4
Ft local R non-local
11
21
11 Helix site Figure 5. Top. Local and non-local radii of curvature for sites in the first helix of bacteriorhodopsin (pdb code lc3w). Bottom. Plot of / values for the same sites.
This result implies that the backbone sites in protein helices have an associated free volume distributed more uniformly than in any other conformation with the same density. This is consistent with the observation 10 that secondary structures in natural proteins have a much larger configurational entropy than other compact conformations. This uniformity in the free volume distribution seems to be an essential feature because the requirement of a maximum packing of backbone sites by itself does not lead to secondary structure formation 5 ' 6 . Furthermore, the same result also holds for the helices appearing in the collagen native state structure, which have a rather different geometry (in terms of local turn angles, residues per turn and pitch 45 ) from average a-helices. In spite of these differences, we again obtained an average / = 1.01 ± 0.03 very close to the optimal situation. 4
Conclusions
In summary, we have shown that topology-based models can lead to a vivid picture of the folding process. In particular, they allow not only the overall
248
qualitative characterization of the rate-limiting steps of the folding process, but also to pinpoint crucial sites that, for viral enzyme, should be targetted by effective drugs. We have carried out a successful validation of this strategy against data for clinical trials on the HIV-1 protease. We have then addressed the question of whether there exists a simple variational principle accounting for the emergence of secondary motifs in natural proteins. A possible selection mechanism has been identified in terms of optimal packing requirements. The numerical evidence presented here support unambiguously the fact that, among all three-dimensional structures with uniform thickness, the ones that make the most economic use of the space are helices with a well-defined geometry. Strikingly, the optimal aspect ratio is precisely the same that is observed in helices of naturally-occurring proteins. This provides a hint that, besides detailed chemical interactions, a more fundamental mechanism promoting the selection and use of secondary motifs in proteins is associated with simple geometric criteria 38 ' 46 . Acknowledgments Support from INFM, Murst Cofin 1999 and Cofm 2001 is acknowledged. References 1. D. Baker, Nature 405, 39, (2000). 2. C. Chothia, Nature 357, 543, (1992). 3. Pauling L., Corey R. B. and Branson H. R., Proc. Nat. Acad. Sci. 37, 205 (1951). 4. Hunt N. G., Gregoret L. M. and Cohen F. E., J. Mol. Biol. 241, 214 (1994). 5. Yee D. P., Chan H. S., Havel T. F. and Dill K. A. J. Mol. Biol. 241, 557 (1994). 6. Socci N. D., Bialek W. S. and Onuchic J. N. Phis. Rev. E 49, 3440 (1994). 7. Anfinsen C. Science 181, 223 (1973). 8. P. G. Wolynes J. N. Onuchic and D. Thirumalai, Science 267, 1619, (1995) 9. N. Go & H. A. Scheraga, Macromolecules 9, 535, (1976). 10. C. Micheletti, J. R. Banavar, A. Maritan and F. Seno, Phys. Rev. Lett. 82, 3372, (1999). 11. Galzitskaya 0 . V. and Finkelstein A. V. Proc. Natl. Acad. Sci. USA 96, 11299 (1999).
249
12. Munoz V. , Henry E. R., Hofrichter J. and Eaton W. A. Proc. Natl. Acad. Sci. USA 95, 5872 (1998). 13. Aim E. and Baker D. Proc. Natl. Acad. Sci. USA 96, 11305 (1999). 14. Clementi C , Nymeyer H. and Onuchic J. N., J. Mol. Biol., in press (2000). 15. Lazaridis T. and Karplus M. Science 278, 1928 (1997). 16. Kolinski A. and Skolnick J. J. Chem. Phys. 97, 9412 (1992). 17. Sali A., Shakhnovich E. and Karplus M. Nature 369, 248 (1994). 18. Fersht A. R. Proc. Natl. Acad. Sci. USA 92, 10869 (1995). 19. Itzhaki L. S., Otzen D. E. and Fersht A. R., J. Mol. Biol. 254, 260 (1995). 20. Ala P.J, et al. Biochemistry 37, 15042-15049, (1998). 21. Cecconi F., Micheletti C , Carloni P. and Maritan A. Proteins: Str. Fund Gen., 43, 365-372 (2001). 22. Clementi C , Carloni P. and Maritan A. Proc. Natl. Acad. Sci. USA 96, 9616 (1999). 23. Evans D. J., Hoover W. G., Failor B. H., Moran B., Ladd A. J. C. Phys. Rev. A 28, 1016 (1983). 24. Ferrenberg A. M. and Swendsen R. H. Phys. Rev. Lett. 63, 1195 (1989). 25. Brown A. J., Korber B. T., Condra J. H. AIDS Res. Hum. Retroviruses 15, 247 (1999). 26. Molla A. et al. Nat. Med. 2, 760 (1996). 27. Markowitz M. et al. J. Virol. 69, 701 (1995). 28. Patick A. K., et. al. Antimicrob. Agents Chemother. 40, 292 (1996) . 29. Condra J.H. et al. Nature 374, 569 (1995). 30. Tisdale M. at al. Antimicrob. Agents Chemother. 39, 1704 (1995). 31. Jacobsen H. et al. J. Infect. Dis. 173, 1379 (1996). 32. Reddy P. and Ross J. Formulary 34, 567 (1999). 33. Sloane N. J. A. Nature 395, 435 (1998). 34. Mackenzie D. Science 285, 1339 (1999). 35. Woodcock L.V. Nature 385, 141 (1997). 36. Car R. Nature 385, 115 (1997). 37. Cipra B. Science 281, 1267 (1998). 38. A. Maritan, C. Micheletti, A. Trovato and J. R. Banavar, Nature 406, 287, (2000). 39. Buck G. and Orloff J. Topol. Appl. 61, 205 (1995). 40. Katritch V., Bednar J., Michoud D., Scharein R.G., Dubochet J. and Stasiak A., Nature 384, 142 (1996). 41. Katritch V., Olson W. K., Pieranski P., Dubochet J. and Stasiak A. Nature 388, 148 (1997).
250
42. Gonzalez O. and Maddocks J. H. Proc. Natl. Acad. Sci. USA 96, 4769 (1999). 43. Buck G., Nature 392, 238 (1998). 44. Cantarella J., Kusner R. B. and Sullivan J. M., Nature 392, 237 (1998). 45. Creighton T. E., Proteins - Structures and Molecular Properties, W.H. Freeman and Company, New York (1993), pag. 182-188. 46. A. Maritan, C. Micheletti and J. R. Banavar, Phys. Rev. Lett. 84, 3009, (2000).
251
T H E PHYSICS OF M O T O R P R O T E I N S G. L A T T A N Z I International
School for Advanced Studies (S.I.S.S.A.) and INFM, 34013 Trieste, Italy E-mail: [email protected]
via Beirut,
2~4,
via Beirut,
2~4,
A. M A R I T A N International
School for Advanced Studies (S.I.S.S.A.) 34013 Trieste, Italy The Abdus Salam International Center for Theoretical 34100 Trieste, Italy
and INFM, Physics,
Strada
Costiera
11,
Motor proteins are able to transform the chemical energy of ATP hydrolysis into useful mechanical work, which can be used for several purposes in living cells. The paper is concerned with problems raised by the current experiments on motor proteins, focusing on the main question of conformational changes. A simple coarse-grained theoretical model is sketched and applied to the motor domain of the kinesin protein; regions of functional relevance are identified and compared with up-to-date information from experiments. The analysis also predicts the functional importance of regions not yet investigated by experiments.
1
Introduction to the biological problem
The increasing precision in the observation of single cells and their components can be compared to the approach of one of our cities by air 1 : at first we notice a complex network of urban arteries (streets, highways, railroad tracks). Then, we may have a direct look at traffic in its diverse forms: trains, cars, trucks and buses traveling to their destinations. We do not know the reason for that traffic, but we know that it is essential to the welfare of the entire city. If we want to understand the rationale for every single movement, we need to be at ground level, and possibly drive a single element of the traffic flow. In the same way, biologists have observed the complex network of filaments that constitute the cytoskeleton, the structure that is responsible also for the mechanical sustain of the cell. Advances in experimental techniques have finally cast the possibility of observing traffic inside the cell. This transport system is of vital importance to the functioning of the entire cell; as ordinary traffic jam, or a dafect in the transportation network of a city, can impair its organized functionmg, occasional problems in the transport of chemical components inside the cell can be the cause of serious cardiovascular diseases or neurological disorders.
252
The study of the transportation system and its molecular components is therefore of great relevance to medicine. Recent advances in single molecule experiments 2,3 allowed us to be spectators at the ground level for the first time, i.e. to observe the single molecular elements of the traffic flow inside the cells. These components are called protein motors. 1.1
Fuel: ATP
The fuel for such motors is ATP (adenosine triphosphate). ATP is an organic molecule, formed by the nucleotide adenine, ribose sugar and three phosphate bonds, held together by two high energy phosphoanhydride bonds 4 . Removal of a phosphate group from ATP leaves ADP (adenosine diphosphate) and an inorganic molecule of phosphate, Pj, as in the hydrolysis reaction: ATP + H20—>ADP
+ Pi+H+.
(1)
This reaction corresponds to a release of 7.3 kcal/mol of free energy. Indeed, under standard chemical conditions, this reaction requires almost one week to occur 5 , but it is accelerated, or catalyzed, by proteins. These proteins are called motors, since they are able to transduce one form of energy (chemical) into useful mechanical work. 1.2
Characteristics of motor proteins
Protein motors are very different from our everyday life motors: first, they are microscopic, and therefore they are subject to a totally different physical environment, where, for instance, thermal agitation has a strong influence on their motion. In addition, their design has been driven by evolutionary principles operating for millions of years, and therefore they are optimized to have a high efficiency, and specialized to the many different purposes required in the functioning of living cells. Our everyday motors usually operate with temperature differences, therefore, no matter how clever we are in designing the motor, its efficiency is always limited by the Carnot theorem 6 . This is no longer true for motor proteins. Indeed any temperature difference, on the length scale of proteins (the nanometer) would disappear in a few picoseconds, therefore they are isothermal machines, operating at the constant temperature of our body. They are not limited by Carnot theorem, and their efficiency could be rather close to 1, meaning that they are able to convert chemical energy almost entirely into useful work.
253
1.3
Different families, different tasks
Most molecular motors perform sliding movements along tracks using the energy released from the hydrolysis of ATP, to make macroscopic motion, such as muscle contraction and maintain cell activities. Among these, the most important families are: myosins, kinesins and dyneins. The study of myosin dates back to 1864. It is usually found in bundles, as in the thick filaments in our muscle cells and is extremely important for muscle contraction. Kinesin, discovered in 1985, is a highly processive motor, i.e. could take several hundred steps on a filament called microtubule without detaching 7 ' 8 , whereas muscle myosin was shown to execute a single "stroke" and then dissociate. Kinesins form a large superfamily, and the individual superfamily proteins operate as motor molecules in various cell types with diverse cargoes. Given that transportation requirements are particularly demanding and complex in neurons, it is not a surprise that the highest diversity of kinesins is found in brain 1 . The discovery of kinesin only partially explained how membrane vesicles are transported in cells. Some movements, such as retrograde axonal transport, are in the direction opposite to this kinesin-dependent movement. Thus there must be a second group of motor proteins responsible for this motility. Such a group exists; it is composed of the dyneins, a superfamily of exceptionally huge proteins. In neurons, kinesin and dynein motor molecules have not only been involved in intracellular axonal and dendritic transport, but also in neuronal pathfinding and migration. Given the various fundamental cellular functions they serve in neurons, such mechanisms, if defective, are expected to contribute to onset or progression of neurological disorders. But these are not the only motor proteins. Another important track is the DNA. And specific machines move upon DNA filaments, unzip them and copy them in RNA. 1.4
Structure
Until 1992, it appeared as though kinesin and myosin had little in common 9 . In addition to moving on different filaments, kinesin's motor domain is less than one-half the size of myosin and initial sequence comparisons failed to reveal any important similarities between these two motors. Their motile properties also appeared to be quite different. In the last few years of research, however, the crystal structures of kinesin have revealed a striking similarity to myosin, the structural overlap pointing
254
to short stretches of sequence conservation 10,11 . This suggested that myosin and kinesin originated from a common ancestor. The opportunity to study and compare numerous kinesin and myosin motors provides a valuable resource for understanding the mechanism of motility. Because kinesin and myosin share a similar core structure and evolutionary ancestry, comparison of these motors has the potential to reveal common principles by which they convert chemical energy into motion 9 . Members of each family have similar motor domains of about 30 — 50% identical residues that can function as autonomous units. The proteins are differentiated by their nonmotor or tail domains. The motor domains, also called head domains, have no significant identity in amino acid sequence, but they have a common fold for binding the nucleotide (ATP). Adjacent to the head domain lies the highly a-helical neck region. It regulates the binding of the head domain by binding either calmodulin or calmodulin-like regulatory light chain subunits (called essential or regulatory light chains, depending on their function). The tail domain contains the binding sites that determine whether the tail binds to the membrane or binds to other tails to form a filament, or attaches to a cargo. Motor proteins are composed of one or two (but also rarely three) motor domains, linked together by the neck regions, which form the neck linker part of the motor. To understand how hydrolysis of ATP is coupled to the movement of a motor head along filaments, we need to know the three-dimensional structure of the head domain. An important feature in the structure of the myosin head is the presence of two clefts on its surface. One cleft is bound to the filament, while the other contains the ATP binding site. The clefts are separated by 3.5 nm, a long distance in a protein. The presence of surface clefts provides a mechanism for generating large movements of the head domain: we can imagine how opening or closing of a cleft in the head domain, by binding or releasing ATP, causes the head domain to pivot about the neck region, so that a change in the conformation of the protein may occur. 1.5
Conformational change: experiments
Conformational changes have been detected using advanced experimental techniques such as Fluorescence Resonance Energy Transfer12 (FRET); FRET determines the distance between two probes on a protein, called donor (D) and acceptor (A). When the emission spectrum of the donor fluorophore and the excitation spectrum of the acceptor fluorophore overlap, and they are located close to each other (in the order of nanometers), the excited energy of
255
the donor is transferred to the acceptor without radiation, resulting in acceptor fluorescence. When the donor and acceptor are far apart, the donor fluoresces. It is therefore possible to determine the distance between the donor and acceptor fluorophores attached to two different sites on a protein by monitoring the color of the fluorescence. For myosin, it has been shown 13 that fluorescence intensities of the donor and acceptor vary spontaneously in a flip-flop fashion, indicating that the distance between the donor and acceptor changes in the range of hundreds of angstroms; that is, the structure of myosin is not stable but instead thermally fluctuates. These results suggest that myosin can go through several metastable states, undergoing slow transitions between the different states. 1.6
The problem of conformational change
The ATP binding pocket is a rather small region in the myosin (or kinesin) motor domain. Yet, the information that ATP is bound to this well localized site, can be transferred to very distant regions of the domain, so that the entire protein may undergo a conformational change. FRET can be used to monitor the distance between parts of the motor domain, but it is not possible to probe all of the regions, because of time and budget limitations. The identification of possible targets for FRET experiments would be therefore a task for theoretical modeling. A theoretical model would be of great help in answering some of the following questions: which parts are expected to undergo the largest displacements in a conformational change? Which are sensible to ATP binding? Which are important for the biological function of the protein? Which are responsible for the transfer of information? 2
Gaussian Network Model ( G N M )
Many theoretical models have been proposed for the analysis of protein structure and properties. The problem with protein motors is that they are huge proteins, whose size prevents any attack by present all-atoms computer simulations. To make things worse, even under an optimistic assumption of immense computer memory to store all necessary coordinates, the calculations needed would show only few nanoseconds of the dynamics, but the conformational rearrangements usually lie in the time range of milliseconds. Therefore, a detailed simulation of the dynamics is not feasible and furthermore it is not guaranteed that all the details can shed light on the general mechanism. Yet, in recent years, dynamical studies have increased our appreciation of
256
the importance of protein structure and have shed some light on the central problem of protein folding14'15'16. Interestingly, coarse grained models proved to be very reliable for specific problems in this field. The scheme is as follows. Proteins are linear polymers assembled from about 20 amino acid monomers or residues. The sequence of aminoacids (primary structure) varies for different molecules. Sequences of amino acid residues fold into typical patterns (secondary structure), consisting mostly of helical (a helices) and sheetlike (/? sheets) patterns. These secondary structure elements bundle into a roughly globular shape (tertiary structure) in a way that is unique to each protein (native state). Therefore, the information on the detailed sequence of amino acids composing the protein, uniquely encodes its native state. Once the latter is known, one may forget about the former (this is the topological point of view). The GNM is a recently developed simple technique which drives this principle to its extreme. It has been applied with success to a number of large proteins 17 and even to nucleic acids 18,19 . 2.1
Theory
Bahar et al. 20 proposed a model for the equilibrium dynamics of the folded protein in which interactions between residues in close proximity are replaced by linear springs. The model assumes that the protein in the folded state is equivalent to a three dimensional elastic network. The nodes are identified with the Ca atoms" in the protein. These undergo Gaussian distributed fluctuations, hence the name Gaussian Network Model. The native structure of a given protein, together with amplitudes of atomic thermal fluctuations, measured by x-ray crystallography, is reported in the Brookhaven Protein Data Bank 21 (PDB). Given the structure of a protein, the Kirchhoff matrix of its contacts is defined as follows:
r-=t- = | ~ l 3
*
l i f r
\0
r c
(2)
ifry>rc
[Z)
«-
N
r « = - 2_^ Ti/t
(3)
where the non-zero off-diagonal elements refer to residue pairs i and j "Carbon atoms in amino acids are numbered with Greek letters: for each residue there is at least one carbon atom, Ca, but there could be also additional carbon atoms, called Cg,
257
that are connected via springs, their separation r^ being shorter than a cutoff value rc for inter-residue interactions. The diagonal elements are found from the negative sum of the off-diagonal terms in the same row (or column); they represent the coordination number, i.e. the number of individual residues found within a sphere of radius rc. The Kirchhoff matrix is conveniently used 22 for evaluating the overall conformational potential of the structure: V = |ARTrAR.
(4)
Here A R is the iV-dimensional vector whose elements are the 3 dimensional fluctuation vectors A R j of the individual residues around their native position, while 7 represents a free parameter of the model. The cross-correlations between residue fluctuations are found from the simple Gaussian integral:
(AR; • ARj) = - J - f(AHi
• ARj)<J ( ^
A R ; j e - ' ^ ' ^ r f j A R } (5)
where the integration is carried over all possible fluctuation vectors A R , Zjv is the partition function, and 5 (J2i ARj) accounts for the constraint of fixing the position of the center of mass. This integral can be calculated exactly, yielding: (ARi.ARj) = ^ I [ r -
1
]i.
(6)
where T _ 1 is the inverse of F in the space orthogonal to the eigenvector with zero eigenvalue. This inverse can be expressed in a more elegant way as a sum over all non-zero eigenvalues Xk and eigenvectors ii* of T, so that (ARi • ARj) can be finally expressed as the sum over the contributions [ARj • ARj]^, of the individual modes:
[AR. • A R j ] , = ^
£ y
A" 1 [ u ^ . .
(7)
k
The summation is performed over all N — 1 non-zero eigenvalues of T. The mean square (ms) fluctuations of individual residues can be readily found from eq. (7), taking i = j . The analysis of single modes indeed yields much more information; it has been argued 20 that residues active in the fastest modes (largest A^) have a
258
very strong resistance to conformational changes and are therefore thought to be important in maintaining the structure, or in underlying the stability of the folded state. Residues active in the slowest modes, on the other hand, are susceptible to large scale (global) motions. It is reasonable to conceive that such motions are associated with the collective dynamics of the overall tertiary structure, and thereby relevant to biological function. 2.2
Application: Kinesin Motor Domain
The coordinates of the backbone of human kinesin motor domain were downloaded from the Brookhaven Protein Data Bank. This protein structure has been resolved in presence of the ADP nucleotide and a magnesium ion 11 . We concentrated only on the backbone chain and assumed that each amino acid is well represented by an effective centroid coinciding with the Ca atom. Hence, in this simplified model, each aminoacid is represented by one bead. The ADP molecule is represented by 4 beads, located at the center of mass of the adenine, at the center of mass of ribose and at the positions occupied by the two P atoms. The obtained structure is modeled by 327 beads. The Kirchhoff matrix was constructed assuming a cutoff distance of 7 A, which has been shown to be a reasonable assumption for a wide range of folded proteins 20 . The matrix was diagonalized; its eigenvalues were sorted in ascending order and the corresponding eigenvectors were normalized so that: TV
^ ( u f c u i O a = 1 Vfc,
(8)
where TV = 323, i.e. we normalized the fluctuations so that the sum of the residue fluctuations is 1. We focus on the first 4 vibrational modes, which might be related to the biological function of the protein. The normalized fluctuations corresponding to the first vibrational mode of the kinesin structure are reported in figure 1(a). The first vibrational mode, which should be the most relevant for the biological function of the motor domain, implies a large motion of the loop L l l (230-253) coupled to the helix Q4 (254-268). Indeed this helix is known as the relay helix9 and is thought to play a very important role in the biological function of the motor domain. Its motion is coupled to the binding of ATP at the phosphate switches (residues 199-200 for switch I in the current nomenclature, and residue 232 for switch II), which have been identified to constitute the j-phosphate sensor, i.e. the regions that sense the presence of the third phosphate group on ATP.
259 0.09 0.08
(a)
0.07 0.06
Loop L l l
0.05 0.04 0.03 0.02 0.01
._/Vj
relay 50
100
150
ZOO
250
300
350
Figure 1. Normalized fluctuations in the first four vibrational modes of the motor domain of kinesin. (a) Mode 1: loop L l l experiences the largest fluctuation; (b) Mode 2: vibration of both ends; (c) Mode 3: microtubule binding regions; (d) Mode 4: switch I.
Inspection of the switch regions of myosin, kinesin, and G proteins indeed suggested that ATP binding and phosphate release trigger the most critical structural changes in the cycle. Comparison of the myosin and kinesin structures revealed that small movements of the 7-phosphate sensor are transmitted to distant regions of the protein using a similar element: a long helix that is connected to the switch II loop 9 . This highly conserved helix was called relay helix. The relay helix is the key structural element in the communication pathway linking the catalytic site, the binding site of the microtubule and the mechanical elements in both kinesin and myosin. Therefore, the first mode of GNM is in agreement with the current picture of a switch-based mechanism. It describes the transmission of elastic energy between the most important regions of the protein: the switches sense the presence of ATP; their vibration is coupled to that of loop L l l , which lies on the microtubule, to weaker fluctuations of other microtubule binding regions and to structural elements located at the tip of the protein (loops L6 and L10 with adjacent secondary
260
motifs), as shown in figure 1(a). It is also remarkable to observe that the fluctuation of residues in the relay helix are not uniform: those closer to loop L l l experience a larger amplitude of vibration, when compared to the farther ones. This is in striking agreement with the recent observation 23 of a 20° rotation of the relay helix in the monomeric kinesin motor KIF1A. The second vibrational mode is still of fundamental importance in the biological function of the motor domain of kinesin. It is reported in figure 1(b). It is evident that the largest fluctuations are experienced by those residues in close proximity to the N - and C-termini. In particular, loop L14 (301-303) and the helix a6 (304-319) for the C-terminus and loop LI (14-47) with the adjacent /32 (48-50) for the N-terminus. The analysis of the second vibrational mode drives our attention on the transmission of the information through the structure of the motor domain. The phosphate sensor, in facts, interacts with the structural elements pointed out in the first mode. The same structural elements interact with both termini in the second vibrational mode. This might be relevant for negative processivity. In facts, the opposite directions of motion of kinesin and Ned (a motor from Drosophila) have inspired various chimera experiments aimed at mapping a directionality element. In conventional plus-end-directed kinesins, in facts, the neck-linker region is attached to the C-terminus, while in minus-end-directed kinesins, it is usually joined to the N-terminus. In these experiments 9 , the motor domain of Ned was joined at its C-terminus to the neck of kinesin. Surprisingly the resultant chimeras moved to the plus-end, even though the catalytic core belonged to that of the minus-end motor, Ned. The converse was also shown to be true: in successive experiments, the motor domain of kinesin was joined to the neck of Ned at its N-terminus. This chimera moved to the minus-end, despite the presence of a catalytic core belonging to a plus-end-directed motor. The same experiments also showed that the correct junction between the motor domain and the neck was important for allowing the motor chimera to move towards the minus-end. Therefore, the experiments indicate that the direction of motion is not determined by the motor domain, but rather by the adjacent joint with the neck region. The second vibrational mode indeed shows that the two regions where the neck-linker may bind have the same importance, and undergo the same vibrational mean square displacement, upon the vibration of mechanical elements correlated to the phosphate sensors. The third vibrational mode is represented in figure 1(c). This vibrational mode involves the microtubule binding regions in the motor domain, in particular again the microtubule binding loop L l l , part of loop L8 (147-173), the
261
relay helix, loop L12 (269-268), helix a5 (279-290). The tip of the protein is again involved in a large amplitude vibration, which is now correlated with the microtubule binding elements and also with the C-terminal of the protein. The fourth vibrational mode is depicted in figure 1(d). This mode drives our attention on the vibrations of the two switches of the motor domain (switch I: residues 199 and 200, and switch II, residue 232) and the mechanical elements in their neighborhoods, in particular, those in proximity of switch I, helix aZ (174-189) and loop L9 (190-202), but there is also a lower peak corresponding to switch II. This may explain how the chemistry is indeed affected by a mechanical force acting on the protein. If we suppose that this force is transmitted through the neck-linker to the C-terminus, then the elastic structure of the protein transmits these vibrations to the switches, and therefore the rate of binding and/or dissociation of nucleotides can be affected by the mechanical force acting on the protein, as observed by Visscher et al. 24 . 3
Conclusions
In this paper we analyzed the motor domain of kinesin with the simple Gaussian Network Model6. This analysis relies on the fact that the conformational change of the kinesin protein should not be sought in a conformational change of the motor domain. Indeed, as proposed by Vale et al. 9 , it is likely that motions within the motor domain are small. It is also unlikely that the catalytic core undergoes large interdomain motions which are needed to drive efficient unidirectional motility. As shown by experiments, the directionality is not determined by the catalytic core, but rather by the adjacent neck-linker region. Therefore the search for a conformational change of the kinesin motor domain might be fruitless, since the conformational change may not be observable experimentally, or detectable by computer simulations. Instead, the transmission of mechanical strain between regions in the motor domain, is of extreme importance. Despite its simplicity, the GNM has been used to address such an issue. The slowest modes drove our attention on structural elements, which were indeed shown to be important in recent experiments, but also on other elements which have not been investigated yet. In particular the GNM analysis seems to indicate that the tip region (which is also thought to interact with the neck-linker) plays an important role not only to counterbalance motions of the other parts, but mostly as a possible mechanical communication channel b
A more comprehensive analysis, where also the kinetics of motor proteins is taken into account 2 6 , 2 7 , can be found in GL's PhD Thesis 2 8 , available on request.
262
among the slowest vibrational modes and therefore the structural elements that are most important for the biological function of the motor. Our opinion on the way the motor domain of kinesin may effectively make use of the binding energy of ATP to generate strain on the neck-linker, is as follows: the phosphate sensor senses the presence of ATP by direct contacts with the third phosphate. These newly formed contacts activate some of the slowest vibrational modes of the motor domain, the first and fourth in our model, for instance. The vibration of the switch regions and their adjacent parts is accompanied by the activation of other regions which may be far apart in the structure, yet their vibrations are strongly correlated with those in the proximity of ATP, in particular the tip. This works as a mechanical amplifier: its vibrations activate all the slowest vibrational modes, in particular the first one, activating the relay helix, the second one activating the neck-linker joined at one of the termini, and the third one allowing the motor domain to rotate on the microtubule binding site. This scheme is consistent with previous experiments and with the current switch based mechanism, as recently proposed by Kikkawa et al. 23 ; it is also consistent with the picture obtained by Wriggers25 using all atoms computer simulations, but requires only a negligible fraction of the corresponding CPU time (the only CPU intensive calculation being the Jacobi diagonalization of a symmetric matrix). In addition our analysis suggests a direct correlation among switch I and II and the C-terminal part of the domain. This dependence could effectively explain how the chemistry could be affected by a mechanical force, as observed by Visscher et al. 24 . The correlation was weaker (essentially active only in mode 2) for the N-terminus, which seems to be more stable to vibrational motions, at least in the available structure. This may imply that chimeras with the neck-linker attached to the N-terminal of this motor domain, could be less efficient than their natural counterparts. More importantly our analysis seems to suggest that a particularly well designed experiment aimed at constraining the tip of the protein, could affect the communication among mechanical elements of the motor domain, by killing the main communication channel among the slowest vibrational modes; therefore such an experiment, if possible, could affect mobility and/or rate of ATP binding/ADP dissociation. Our conclusion is that the GNM, or similar coarse grained models, could be extremely useful in predicting the pathway along which mechanical strain
263
could be transported, reduced or amplified in motor proteins and, in general, in all other cases of extremely massive macromolecules involved in complex reactions upon binding of a nucleotide, or any chemical substance. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
H. Tiedge et al, Proc. Natl. Acad. Sci. USA 98, 6997 (2001). Y. Ishii et al, TRENDS Biotech. 19, 211 (2001) A. Ishijima and T. Yanagida, TRENDS Bioch. Sci. 26, 438 (2001) H. Lodish et al, Molecular Cell Biology (Scientific American Books, New York, 2001). J. Howard, Mechanics of Motor Proteins and the Cytoskeleton (Sinauer Associates, Sunderland, MA, 2001). R.P. Feynman et al, The Feynman Lectures on Physics (Addison-Wesley, Reading, MA, 1966). J. Howard et al, Nature 342, 154 (1989) S. M. Block et al, Nature 348, 348 (1990) R .D. Vale and R. A. Milligan, Science 288, 88 (2000) and references therein. I. Rayment et al, Science 261, 50 (1993) F. J. Kull et al, Nature 380, 550 (1996) S. Weiss, Science 283, 1689 (1999) Y. Ishii et al, Chem. Phys. 247, 163 (1999) C. Micheletti et al, Proteins 42, 422 (2001) A. Maritan et al, Phys. Rev. Lett. 84, 3009 (2000) A. Maritan et al, Nature 406, 6793 (2000) A. R. Atilgan et al, Biophys. J. 80, 505 (2001) and references therein. I. Bahar and R. L. Jernigan, J. Mol. Biol. 281, 871 (1998) B. Lustig et al, Nucl. Ac. Res. 26, 5212 (1998) I. Bahar et al, Phys. Rev. Lett. 80, 2733 (1998) and references therein. F. C. Bernstein et al, J. Mol. Biol. 112, 535 (1977) P. J. Flory, Proc. Roy. Soc. London A 351, 351 (1976) M. Kikkawa et al, Nature 411, 439 (2001) K. Visscher et al, Nature 400, 184 (1999) W. Wriggers and K. Schulten, Bioph. J. 75, 646 (1998) G. Lattanzi and A. Maritan, Phys. Rev. Lett. 86, 1134 (2001) G. Lattanzi and A. Maritan, Phys. Rev. E 64, 061905 (2001) G. Lattanzi, Statistical Physics Approach to Protein Motors, PhD thesis, International School for Advanced Studies, SISSA, Trieste, 2001.
264 PHASING PROTEINS: EXPERIMENTAL LOSS OF INFORMATION AND ITS RECOVERY C. GIACOVAZZO 12 , F. CAPITELLI2, C. GIANNINI2, C. CUOCCI ' AND M. IANIGRO 2 ' Dipartimento Geomineralogico, Universita di Bari, Campus Universitario, via Orabona 4, 70125 Bari, Italy 2 IRMEC (Istituto di Ricerca per lo Sviluppo di MEtodologie Cristallogrqfiche) c/o Dipartimento Geomineralogico, Universita di Bari, Campus Universitario, via Orabona 4, 70125 Bari, Italy E-mail: c.giacovazzo@area. ba. cnr. it
1
Introduction
X-ray (and neutron) diffraction is a classical example where some relevant information on the object under study is lost. A three-dimensional crystal (see Fig. 1), with electron density distribution described by the function p(r), interacts with an incident X-ray beam, so producing thousands of secondary diffracted beams. Just before the screen the situation is fully described by the Fourier transform of p(r), F(r*) = T[p(r)] = ( p(r) exp(2rar * r) dr
where S denotes the crystal space.
Figure 1. Genesis of diffracted beams by interaction of a three-dimensional crystal with an incident x-ray beam. Reconstruction of the crystal by inverse Fourier transform.
265
If we were able to measure F(r*) in modulus and phase [F(r*) is a complex quantity], the trivial calculation of the inverse Fourier transform would provide us by the complete information on p(r): p(r) = T 1 [F(r*)] = T 1 T[p(r)]. Unfortunately we loose in the experiment the phase value of F(r*) and we are only able to measure its modulus. Indeed the intensity of each diffracted beam (the only observable), marked by a triple of integers (h k 1), is related to |Fhki|2 by I hk ,=k,k 2 I 0 LPTE|F hkl | 2
(1)
where I0 is the intensity of the incident beam; k] = e4 / (m2c4) takes into account universal constants (charge and mass of the electron, light velocity); k2 = A.3 Q / V2 is a constant for a given diffraction experiment (Q is the volume of the crystal, V is the volume of the unit cell); P is the polarization factor; T is the transmission factor and depends on the capacity of the crystal to absorb the radiation; L is the Lorentz factor and depends on the diffraction technique; E is the extinction coefficient, which depends on the mosaic structure of the crystal. N
Fhki = S f j exp(27iihrj) = |Fhkl | exp(i(|>hkl)
(2)
j=i
is the structure factor with vectorial index h = (h k 1), fj is the scattering factor of the j * atom (thermal factor included), r, is its position in the unit cell, N is the number of atoms in the cell and 4>hki is the phase of the structure factor Fhk]. A typical experimental outcome is shown in Fig. 2, where a set of diffracted beam intensities are collected over an area detector.
266
mi
Figure 2. Reciprocal-space plane of an oxydized form of the enzyme rhodanese, space group C2, a=156.2 A, b=49.04 A, c=42.25 A, (3=98.6° [from Gliubich F., Gazerro M., Zanotti G., Delbono S., Bombieri G. and Bemi R., Active Site Structural Features for Chemically Modified Forms of Rhodanese. J. Biol. Chem. 271 (1996) pp. 21054-21061].
The question now is: how to obtain the phases from the moduli? If the inverse Fourier transform of |F(r*)|2 is calculated the result is the Patterson function: P(u) = T ' [ | F(r*) |2 ] = T 1 [F(r*) * F(r*) ] where F(r*) is the complex conjugate of F(r*). Owing to the convolution theorem we have P(u) = r 1 [ F(r*) ] * T 1 [F(r*) ] = p(r) * p(-r) .
(3)
Thus P(u) is the autoconvolution of the electron density: its maxima correspond to the interatomic vectors, not to the atomic positions. These last quantities may be obtained only if the Patterson is "deconvoluted", which is a quite difficult problem
267
for large structures. In spite of this difficulty, a general suggestion comes out from eq. (3). If we assume that: i) the Patterson function is univocally defined from the collected diffraction moduli; ii) the Patterson function univocally defines (in principle) the interatomic vectors; iii) the interatomic vectors univocally define the crystal structure, then we can conclude that the set of diffraction moduli contain all the necessary information to define the crystal structure. The above conclusion encouraged several scientists to directly obtain phases from the moduli without passing through the Patterson function: these methods are called "direct methods", and their basic concepts are described in the section 3. Since P(u) = -Z|F h k l fexp(-27iihu) h.k.l
the larger the number of measured moduli, the larger the amount of saved experimental information. Accordingly, the aim of each diffraction experiment is to collect a set of experimental intensities as extended as possible. The ideal situation occurs when the number of observations is much larger than the numbers of structural parameters to find (in this case we say that the structure is overdetermined by the data). It is usual to allocate nine parameters per atom into the asymmetric unit: i.e., the three coordinates x y z and the six thermal anisotropic parameters by. In case of scarsity of experimental data four parameters per atom are defined: the three spatial coordinates and the isotropic vibrational factor B. We will see in the section 4 that for proteins the use of a simple wavelength diffraction experiment does not usually contain sufficient information to overdetermine the crystal structure via one set of the experimental data. 2
From the moduli to the phases
The prior information available before a diffraction experiment usually reduces to: the positivity of the electron density, i.e. p(r) > 0, and consequently fj > 0 for j=l,....,N; the atomicity: i.e, the electrons are concentrated around nuclei. The above information, even if apparently trivial, constitutes a strong restraint on the allowed phase values. Indeed, let: a) S = {h 0 =0,h,,....,h n } be a finite set of indices, origin included;
268 n
b)
H[p(r)]=ZF hi . hj u, Uj i,j=0
be the Hermitian form of order n associated to p(r). Since p is non-negative definite, then also H is non-negative definite: as a conseguence all the Toeplitz determinants ^0
D s =det[(F h i . h .)] =
Mi2-h,
Fo
F
F
hnl>l
h
hl-li2
rhn
Fh2.hn
F
hn-h2
(4)
0
are non-negative. The converse is also true: if Dh > for all S then p is non negative. Since the analytical expression of Ds may involve phases, (4) may be considered as a mathematical restraint for the phase values, generated by the positivity of the electron density distribution. The above result has been exploited in the crystallographic literature by several authors: we quote Harker and Kasper [19], Karle and Hauptmann [26], Goedkoop [16]. More recently the determinantal techniques have been integrated with probabilistic methods, giving rise to effective procedures for the solution of the phase problem (Tsoucaris [34]; de Graaf and Vermin [3]). Let us now come to the problem of extracting the phase information from the moduli. We observe that the direct passage {|F|} -> {*} is not allowed.
o Xn
'0'
Figure 3. A unit cell with origin in O. A shift of origin in O' changes the positional vector Y- into T-.
269 The generic j t h atom is in Pj, and rj is its positional vector, F h the structure factor with vectorial index h when the origin is assumed in O. If we move the origin in O', the new structure factor will be N
N
Fh = Xf J exp(2rahr j ) = ^ e x p ^ T i i n ^ -X 0 )] = exp(-27tihX0)Fh,
(5)
where Xo is the origin shift. We observe that the change of origin generates a phase change equal to (-27thX 0 ). Many sets {<j>h} are therefore compatible with the same set of magnitudes, each set corresponding to a given choice of the origin. This implies that single phases (origin dependent) cannot be determined from the diffraction moduli alone (observable quantities, and therefore origin independent). Luckily there are combinations of phases which are origin independent: they only depend on the crystal structure and therefore can be estimated via the diffraction moduli. Let us consider the product F h ,F h2 ...F hn .
(6)
According to (5) an origin translation will modify (6) into F'h, F 'h 2 -F' h n = Fhl Fh2...Fhnexp[-27ri(h, + h 2 +...+ h n )X 0 ].
(7)
The relation (7) suggests that the product of structure factors (6) is invariant under origin translation if h, + h 2 +...+ h n = 0. These products are called structure invariants. The simplest examples are: N
a) for n=l, F00o
=
/Li^i
is
me
simplest structure invariant (Zj is the atomic
j=i
number of j atom); b) for n=2 eq. (6) reduces to |Fh|2; c) for n=3 eq. (6) reduces to F hl F h2 F S]+K2 ; d) for n= 4 the relation (6) reduces to F h F k F,F s+£+j . Quintet, sextet, ... invariants are defined by analogy. Invariants of order 3 or larger are phase dependent and therefore potentially useful to solve the phase problem. Triplets and quartets are the most important invariants.
270
3
Basic concepts of Direct Methods
The so called direct methods may be divided in two categories: a) reciprocal space techniques; b) real space techniques. Both of them try to find phases directly from the moduli. Let us first describe the set a). The properties of the structure invariants mentioned in the section 2 encouraged the calculation of the conditional distribution functions P (
(8)
where # = h, +
+ +n„
is a structure invariant and {R} is a suitable set of diffraction magnitudes. The mathematical technique to calculate (8) is the following: the set of reflections {E} = {E,,E 2 ,
En,
E p },
p>n
which is considered useful for the estimation of the structure invariant is defined. This may be made via the neighbourhoods principle by Hauptman [20] or by the representation theory by Giacovazzo [9, 11]. Then the joint probability distribution function P (E,,E 2 ,...,E n ,...,E p ) = P (<|>h„ 4>h2,
<|>hn,
, 4>hp, R h l , R h 2 , . . . R h p )
is derived. This distribution is of basic interest since R h l , R h 2 , . . . . , R h
(9)
are known
from experiments, and therefore they constitute the prior information of the probabilistic approach. Finally, the distribution P ( 0 | R h l , R h 2 , . . . . , R h p ) « P(
271
{F}i -> pj(r) -» p raod.(r) -> {F} i+I where {F}j is the set of structure factors at the ith cycle, pj(r) the corresponding electron density, pmodj(r) the modified electron density function (to match the expected behaviour of p), {F}; +1 the set of structure factors in the (i+1)* cycle. These techniques directly exploit the positivity condition without transforming it into complex reciprocal space relationships. 4
Phasing proteins
The impressive achievement in protein phasing obtained in these last years are mainly due (see the discussion below) to simultaneous advances in (see Fig. 4): theoretical development, increasing of the computer power, new radiation sources, sophisticated experimental techniques.
THE PHASE PROBLEM
Figure 4. Phasing proteins: factors allowing new advances.
There are intrinsic difficulties in recovering the protein phases directly from the diffraction moduli: a) the large number of atoms in the unit cell (i.e., only in few cases the asymmetric unit contains less then 500 atoms, often more then 5000). Under these conditions direct methods provide rather flat phase probability distributions; b) in accordance with equation (1), the relation
272
Ihk. * 1/V
holds. Since the unit cell volume of a protein is large the diffraction intensities will be weaker than for small molecules, and therefore their measure less accurate. Quite efficient experimental techniques are therefore necessary for collecting data for which the ratio I/a(I) is sufficiently good; c) protein molecules are irregular in shape and they pack together with gaps between them filled by a liquid. This constitutes a unordered region which ranges from 35 to 75 per cent of the volume of the unit cell, giving rise to diffuse scattering; d) protein molecules are intrinsically flexible (to secure their biological function), and their thermal vibration is generically high. Consequently their atoms are bad scatterers. The above drawbacks limit the number of observations available by a diffraction experiment. While, for small molecules, the available number of reflections per atom in the asymmetric unit is about 100 (i.e., for data up to 0.77 A resolution), the same number for proteins lowers to about 12.5 if the data resolution is 1.54 A. Unfortunately the resolution for proteins is usually between 3 and 1.5 A, so that we are often in the case in which diffraction data do not overdetermine the crystal structure (number of observation comparable with or inferior to the number of parameters to define). We will briefly consider two cases: the first occurs when data resolution is better or equal to 1 A . In this case the ab initio crystal structure solution of the protein may be directly attempted without any use of supplementary data. The second case occurs when the resolution is worse than 1 A: supplementary information is then necessary. Reciprocal space techniques were able to extend the complexity of the solvable structures up to 200 atoms in the asymmetric unit. This extreme success has been overcome in recent years (Weeks et al. [37]) when Shake-and-Bake introduced a new approach: reciprocal and direct space techniques are cyclically and repeatedly alternated. An effective variant of Shake-and-Bake is the program Half-baked SHELX-D (Sheldrick [32]) which preserves the cyclic combination of direct and reciprocal space techniques, but relies more on real space techniques. A third program, SIR2000 (Burla et al. [2]) proved able to solve crystal structures with more than 2000 atoms in the asymmetric unit without any user intervention. It is mainly based on real space techniques: the role of tangent formula is ancillary. In all the above mentioned programs the procedure is the following: random phases are given to a subset of structure factors, and direct methods are applied to drive them towards the correct values. The approach is a multisolution one: several random sets are explored to obtain the structure. The computing time necessary to succeed may be remarkable. As an example, in Table 1 we show, for some protein structures, the cpu time needed to find the correct solution by application of SIR2000: Nasym is the number of non-hydrogen atoms in the asymmetric unit, NH o is the number of bonded water molecule.
273 Table 1. Large Size Structures (up to 2000 atoms in the a. u.); the structure resolution average time is 76.1 hours. Structure references in square brackets.
STRUCTURE CODE
„ „ Reference
TOXIN II LACTAL LYSOZIME OXIDOREDUCTASE HIPIP MYOGLOBINE CUTINASE ISD
[33] [18] [4] [7] [30] [35] [28] [6]
w
w
Nasym-N,
508--96 9 3 5 - 164 1001--108 1106--283 1229--334 1241 --186 1141 --264 1910 -374
SIR2000 TIME (H)
6.3 52.9 1.0 78.4 76.2 19.1 293.2 87.4
Non-ab-initio methods The supplementary information is generally provided by: isomorphous replacement techniques (Green et al. [17]; Bragg and Perutz [1]); anomalous dispersion techniques (Hoppe and Jakubowski [25]; Hendrickson et al [23]); molecular replacement (Rossmann and Blow [31]; Navaza [29]) crystallochemical restraints. Let us first examine the nature of the first three techniques. Isomorphous Replacement. The method requires the preparation of one or more heavy atoms containing derivatives in the crystalline state. The most common technique is by soaking the protein crystal in a solution of the reagent. Then X-ray intensity data are collected both for the native protein and for its derivatives . One speaks of SIR (Single Isomorphous Replacement) or MIR (Multiple Isomorphous Replacement) according to whether one or more derivatives are available. Anomalous dispersion. Atomic electrons can be considered as oscillators with natural frequencies. If the frequency of the primary beam is near to some of these natural frequencies resonance will take place, and the scattering factor may be analitically expressed via the complex quantity F =f0 +Af +if"
(10)
where f0 is the atomic scattering factor in the absence of anomalous scattering. Af and f' are called the real and the imaginary dispersion corrections and assume specific values for each wavelength . Owing to (10) the Friedel law Fhkl = F ^ , and the widely accepted rule Fi = Fj. are no more fulfilled. One speaks of SAS (Singlewavelength Anomalous Scattering) and MAD (Multiple-wavelength Anomalous Dispersion) according to whether diffraction data are collected at one or more wavelengths.
274
Molecular replacement. The same protein, under different crystallization conditions, may crystallize in different space groups. Analogously, homologous proteins (representatives of a divergent evolution from a single primitive protein, but very similar in tertiary structure) often crystallize in different space groups. One can expect that the diffraction patterns of such similar proteins will be related to each other, and that the knowledge of a protein structure can provide useful information to find the structure of other homologous proteins. The problem is a six-dimensional one (three rotation angles and three components of the translation vector have to be fixed). The problem however may be approached as the sum of two three-dimensional ones: first the rotation angles are determined, then the translation is searched. Let us now examine the nature of the supplementary information provided by the three techniques, allowing to solve the protein crystal structure. As soon as the diffraction data of the native and of one derivative are available, the differences Aiso = | F d | - | F P I can be obtained, where Fd represents the generic structure factor of the derivative, F p the corresponding structure factor of the native protein. Magnitudes and signs of the Ajso are determined by the heavy atom substructure; however Aiso does not coincide with FH (the generic structure factor of the heavy atom substructure), owing to the fact that FH = Fd - Fp. SIR and MIR techniques aim first at finding the heavy atom substructure: then they use this information as a prior to phase the native protein. Alternative techniques directly phase the protein from the Aiso (Hauptman [21]; Giacovazzo and Siliqui [8], Giacovazzo et al. [10, 12-14]). The overall problem is not trivial, mostly when lack of isomorphism occurs between the native and the derivative (i.e., the introduction of the heavy atoms in the native crystal structure framework generates too many conformational changes). In general, the signal (say the Aiso) is of the same size of the error, and sophisticated techniques have to be used to succeed. Also SAS and MAD techniques use a two steps procedure: first the substructure of the anomalous scatterers is found, and then this information is used as prior for phasing the protein. Alternative techniques directly phasing the protein from the experimental data have also been proposed (Hauptman [22]; Giacovazzo [15]). If SAS is used, only the anomalous differences Aa„o= | F + | - | F | are employed to locate the anomalous scatterers; however Aano does not coincide with
275 N
F"= V^fj exp(23rihr.). j=i
If MAD is used, besides the anomalous differences, also the dispersion differences (i.e., differences between diffraction moduli measured at different wavelengths) may be employed. Unlike for SIR and MIR techniques, no lack of isomorphism occurs when MAD or SAS techniques are used, but the signal is smaller. SAS and MAD tend to become the method of choice for an ever-increasing number of structural biologists. The reasons are manifold: among others, the tunability of the synchrotron beamlines, which are able to select wavelengths for which the ratio signal to noise is a maximum, and the capability of the modern molecular biology techniques to produce selenomethionine - containing proteins easily and in large quantities. Molecular Replacement. The rotation and the translation searches may fail because the model molecule differs too much from the unknown structure. Additional difficulties arise form the necessity of limiting the data resolution. In general quite low-resolution reflections are omitted because they strongly depend on the solvent; the high - resolution cut-off depends on the similarity between the model and the molecule of the structure under study (high resolution reflections are too sensitive to differences between the model and the protein under study). Since more and more protein crystal structures are solved, more model structures are available to apply molecular replacement techniques. Let us now discuss the role of the information provided by the crystallochemistry. Suppose that an imperfect structural model of the protein is available and that the crystal structure is underdetermined (i.e., low ratio between the number of parameters to refine and the number of observations). The classical tool used in crystallography to optimize structural parameters, say minimize by least squares the quantity S = Iwj(|FJ|0SS-|FJ|calc)2, J
where the summation is extended to all the measured reflexions, is not quite useful. Luckily, bond lengths and valence angles in amino acids are very well known, so they can be held fixed to their theoretical values during the refinement and only torsion angles around single bonds are allowed to vary (Diamond [5]). Also a group of atoms can be treated as a rigid entity when the geometry of the group is believed insensitive to the environment. This is the classical case of the phenil ring : the eighteen positional variables are then reduced only to six (three rotational to define the orientation of the ring, three of translational type to locate the ring). In this way
276
the number of parameters to refine decreases and the ratio (number of observations)/(number of parameters to refine) increases, so improving the efficiency of the least squares procedure. A different but quite efficient alternative is to increase the number of observations: any crystallochemical information may be used as a supplementary observation in the least squares procedure (Konnert [27], Hendrickson and Konnert [24], Waser [36]). Since distances and valence angles are not expected to deviate significantly from the ideal values, one can minimize S>2 - /_,Wi J
(dj(ideal)
_
dj(ica,c))
where dj(caiC) is calculated from the structural model, dj(ideai) is the expected value. Also deviations from the planarity can be minimized (for planar groups), as well as the volume of the chiral atoms (defined for an a-carbon by the product of the interatomic vectors of the three atoms bound to it). The above restraints introduce the amount of information necessary to obtain a quite reliable structural models of proteins. References 1. Bragg. W. L. and Perutz M. F., The structure of haemoglobin. VI. Fourier projections on the 010 plane. Proc. R. Soc. London Ser. A 225 (1954) pp. 315329. 2. Burla M. C , Cavalli M., Carrozzini B., Cascarano G., Giacovazzo C , Polidori G. and Spagna R., SIR2000, a program for the automatic ab initio crystal structure solution of proteins. Acta Cryst. A 56 (2000) pp. 451-457. 3. de Graaff R. A. G. and Vermin, W. J., The use of Karle-Hauptman determinants in small-structure determinations. II. Acta Cryst. A 38 (1982) pp. 464-470. 4. Deacon A. M., Weeks C. M., Miller R. and Ealick S. E., The Shake-and-Bake structure determination of triclinic lysozyme. Proc. Natl. Acad. Sci. USA 95 (1998) pp. 9284-9289. 5. Diamond R., A real-space refinement procedure for proteins. Acta Cryst. A 27 (1971) pp. 436-452. 6. Esposito L., Vitagliano L., Sica F., Sorrentino G., Zagari A. and Mazzarella L., The Ultrahigh Resolution Crystal Structure of Ribonuclease A Containing an Isoaspartyl Residue: Hydration and Sterochemical. J. Mol. Biol. 297 (2000) pp. 713-732. 7. Ferraroni M., Rypniewski W., Wilson K.S., Viezzoli M.S., Banci L., Bertini I. and Mangani S., The Crystal Structure of the Monomeric Human SOD Mutant
277
8. 9. 10.
11. 12.
13.
14.
15. 16. 17.
18.
19. 20. 21.
22. 23.
F50E/G51E/E133Q at Atomic Resolution. The Enzyme Mechanism Revisited. J. Mol. Biol. 288 (1999) pp. 413-426. Giacovazzo C. and Siliqi D., Improving Direct-Methods Phases by HeavyAtom Information and Solvent Flattening. Acta Cryst. A 53 (1997) pp. 789-798. Giacovazzo C , A general approach to phase relationships: the method of representations. Acta Cryst. A 33 (1977) pp. 933-944. Giacovazzo C , Cascarano G. and Zheng C.-D., On integrating the techniques of direct methods and isomorphous replacement. A new probabilistic formula for triplet invariants. Acta Cryst. A 44 (1988) pp. 45-51. Giacovazzo C , Direct Methods in Crystallography. (Academic, London 1980). Giacovazzo C., Siliqi D. and Spagna R., The ab initio crystal structure solution of proteins by direct methods. II. The procedure and its first applications. Acta Cryst. A 50 (1994) pp. 609-621. Giacovazzo C , Siliqi D. and Zanotti G., The ab initio crystal structure solution of proteins by direct methods. III. The phase extension process. Acta Cryst. A 51 (1995) pp. 177-188. Giacovazzo C., Siliqi D., Gonzalez Platas J., Hecht H.-J., Zanotti G. and York B., The Ab Initio Crystal Structure Solution of Proteins by Direct Methods. VI. Complete Phasing up to Derivative Resolution. Acta Cryst. D 52 (1996) pp. 813-825. Giacovazzo C , The estimation of two-phase and three-phase invariants in PI when anomalous scatterers are present. Acta Cryst. A 39 (1983) pp. 585-592. Goedkoop J. A., Remarks on the theory of phase-limiting inequalities and equalities. Acta Cryst. 3 (1950) pp. 374-378. Green E. A., Ingram V. M. and Perutz M. F., The structure of haemoglobin. IV. Sign determination by the isomophous replacement model. Proc. R. Soc. London Ser. A 225 (1954) pp. 287-307. Harata K., Abe Y. and Muraki M., Crystallographic Evaluation of Internal Motion of Human a-Lactalbumin Refined by Full-matrix Least-squares Method. J. Mol. Biol. 287 (1999) pp. 347-358. Harker D. and Kasper J. S., Phases of Fourier coefficients directly from crystal diffraction data. Acta Cryst. 1 (1948) pp. 70-75. Hauptman H., A new method in the probabilistic theory of the structure invariants. Acta Cryst. A 31 (1975) pp. 680-687. Hauptman, H., On integrating the techniques of direct methods and isomorphous replacement. I. The theoretical basis. Acta Cryst. A 38 (1982) pp. 289-294. Hauptman, H., On integrating the techniques of direct methods with anomalous dispersion. I. The theoretical basis. Acta Cryst. A 38 (1982) pp. 632-641. Hendrickson W. A, Pahler A., Smith J. L., Satow Y., Merrit E. A. and Phizackerley R. P., Crystal structure of core streptavidin determined from multiwavelength anomalous diffraction of synchroton radiation. Proc. Natl. Acad. Sci. USA 86 (1989), pp. 2190- 2194.
278
24. Hendrickson W. A. and Konnert J. H., Incorporation of Stereochemical Restraints into Crystallographic Refinement. In Computing in crystallography, ed. By R. Diamond, R. Ramaseshan and K. Venkatesan (The Indian Academy of Sciences, Bangalore 1980) pp. 13.01-13.23. 25. Hoppe W. and Jakubowski U., The determination of phases of erythrocuorin using the two-wavelength method with iron as anomalous scatterer. In Anomalous Scattering, ed. by S. Ramaseshan and S. C. Abrahams. (Munksgaard, Copenaghen 1975) pp. 437-461. 26. Karle J. and Hauptman H., The phases and magnitudes of the structure factors. Acta Cryst. 3 (1950) pp. 181-187. 27. Konnert J. H., A restrained-parameter structure-factor least-squares refinement procedure for large asymmetric units. Acta Cryst. A 32 (1976) pp. 614-617. 28. Longhi S., Czjzek M., Lamzin V., Nicolas A. and Cambillau C , Atomic Resolution (1.0 A) Crystal Structure of Fusarium solani Cutinase: Stereochemical Analysis. J. Mol. Biol. 268 (1997) pp. 779-799. 29. Navaza J., AMoRe: an automated package for molecular replacement. Acta Cryst. A 50 (1994) pp. 157-163. 30. Parisini E., Capozzi F., Lubini P., Lamzin V., Luchinat C. and Sheldrick G. M., Ab initio solution and refinement of two high-potential iron protein structures at atomic resolution Acta Cryst. D 55 (1999) pp. 1773-1784. 31. Rossmann M. G. and Blow D. M., The detection of sub-units within the crystallographic asymmetric unit. Acta Cryst. 15 (1962) pp. 24-31. 32. Sheldrick G. M., SHELX: applications to macromolecules. In Direct Methods for Solving Macromolecular Structures, ed. S. Fortier (Kluwer Academic Publishers, Dordrecht, The Netherlands, 1998) pp. 401-411. 33. Smith G. D., Pangborn W. A. and Blessing R. H., Phase changes in T3R3f human insulin: temperature or pressure induced? Acta Cryst. D 57 (2001), pp. 1091-1100. 34. Tsoucaris G., A new method for phase determination. The maximum determinant role. Acta Cryst. A 26 (1970), pp. 492-499. 35. Vojtechovsky J., Berendzen J., Chu K., Schlichting I. and Sweet R.M., Implications for the Mechanism of Ligand Discrimination and Identification of Substates Derived from Crystal Structures of Myoglobin-Ligand Complexes at Atomic Resolution. To be Published (PDB code 1A6M). 36. Waser J., Least-squares refinement with subsidiary conditions. Acta Cryst. 16 (1963) pp. 1091-1094. 37. Weeks C. M., DeTitta G. T., Hauptman H. A., Thuman P. and Miller R., Structure solution by minimal-function phase refinement and Fourier filtering. II. Implementation and applications. Acta Cryst. A 50 (1994) pp. 210-220.
279
LIST OF PARTICIPANTS
N. Accornero - Dipartimento di Neurologia, Universita di Roma I, Italy N. Ancona - IESI-CNR, Bari, Italy L. Angelini - Dipartimento di Fisica, Universita di Bari E.O. Ayoola - ICTP Trieste, and Mathematics Department, University of Nigeria A. Bazzani - Dipartimento di Fisica, Universita di Bologna, Italy D. Bellomo, Dipartimento di Elettronica, Politecnico di Bari, Italy R. Bellotti - Dipartimento di Fisica, Universita di Bari, Italy A. Bertolino - Dipartimento di Psichiatria, Universita di Bari, Italy G. Bhanot - IBM and Princeton University, USA M. Bilancia -Dipartimento di Scienze Statistiche, Universita' di Bari, Italy P. Blonda - IESI CNR, Bari, Italy F. Bovenga - Dipartimento di Fisica, Universita di Bari, Italy M. Caselle - Dipartimento di Fisica, Universita di Torino, Italy P. Cea - Dipartimento di Fisica, Universita di Bari, Italy G. Cibelli - Dipartimento di Fisiologia Umana, Universita di Foggia, Italy P. Colangelo - INFN, Sezione di Bari, Italy E. Conte - Dipartimento di Farmacologia e Fisiologia, Universita di Bari, Italy L. Cosmai - INFN, Sezione di Bari, Italy N. Cufaro Petroni - Dipartimento di Fisica, Universita di Bari, Italy F. De Carlo - Dipartimento di Fisica, Universita di Bari, Italy C. De Marzo - Dipartimento di Fisica, Universita di Bari, Italy M. De Tommaso - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy E. Domany - Weizmann Institute, Israel M.R. Falanga -Dipartimento di Fisica, Universita' di Salerno, Italy A. Federici - Dipartimento di Farmacologia e Fisiologia, Universita di Bari, Italy F. Franci - Dipartimento di Matematica Applicata, Universita di Firenze, Italy C. Giacovazzo - IRMEC-CNR, Universita di Bari, Italy R. Giuliani — Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy G. Gonnella - Dipartimento di Fisica, Universita di Bari, Italy L. Guerriero - Dipartimento di Fisica, Politecnico di Bari, Italy P.Ch. Ivanov - CPS, Boston U. & Harvard Medical School, USA H.J. Kappen - Nijmegen University, The Netherlands A. Lamura - Dipartimento di Fisica, Universita di Bari, Italy G. Lattanzi - SISSA, Trieste, Italy S. Lecchini - INI, ETH Zurich, Swiss K. Lehnertz - Dept. of Epileptology, Medical Center, University of Bonn, Germany M. Leone - SISSA Trieste, Italy
280
P. Livrea - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy M. Mannarelli - Dipartimento di Fisica, Universita di Bari, Italy C. Marangi -IRMA-CNR, Bari, Italy E. Marinari - Dipartimento di Fisica, Universita di Roma I, Italy C. Micheletti - SISSA, Trieste, Italy G. Nardulli - Dipartimento di Fisica, Universita di Bari, Italy L. Nitti - Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy G. Paiano - Dipartimento di Fisica, Universita di Bari, Italy G. Palasciano - Dipartimento di Medicina Interna, Universita di Bari, Italy A.M. Papagni - Dipartimento di Farmacologia e Fisiologia, Universita di Bari, Italy S. Pascazio -Dipartimento di Fisica, Universita di Bari, Italy G. Pasquariello - IESI-CNR, Bari, Italy M. Pellicoro - Dipartimento di Fisica, Universita di Bari, Italy G. Perchiazzi - Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy M.V. Piztalis - Dip. di Metodologie Cliniche, Universita di Bari, Italy A. Refice - Dipartimento di Fisica, Universita di Bari, Italy P. Rizzon - Dipartimento di Metodologie Cliniche, Universita di Bari, Italy R. Santostasi - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy G. Satalino - IESI-CNR, Bari, Italy E. Scrimieri - Dipartimento di Fisica, Universita di Bari, Italy F. Simone - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy R. Stoop - INI, ETH Zurich, Swiss S. Stramaglia - Dipartimento di Fisica, Universita di Bari, Italy F. Tecchio, IESS-CNR, Unita' MEG Ospedale Fatebenefratelli Roma, Italy G. Turchetti - Dipartimento di Fisica, Universita di Bologna, Italy A. Xu - Dipartimento di Fisica, Universita di Bari, Italy A. Vena - Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy M. Villani - Dipartimento di Fisica, Universita di Bari, Italy A. Zenzola - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy
AUTHOR INDEX
Accomero, N. 93 Andrzejak, R. G. 17 Angelini, L. 196 Attimonelli, M. 196 Baraldi,A. 157 Bazzani, A. 123 Bellotti,R. 144 Bertolino, A. 132 Bhanot, G. 67 Blasi.G. 132 Blonda,P. 157 Capitelli,F. 264 Capozza, M. 93 Caselle.M. 209 Castellani, G. 123 Chung-Chuan, Lo 28 Cibelli,G. 221 Conte,E. 51 Cuocci, C. 264 D'addabbo.A. 157 David, P. 17 deBlasi,R. 157 de Carlo, F. 144 de Robertis, M. 196 de Tommaso, M. 144 diCunto, F. 209 Difruscolo, O. 144
Domany, E. 175 Elger, C. E. 17 Federici, A. 51 Fiore, T. 165 Fiore.T. 60 Giacovazzo, Giannini, C. Giuliani, R. Giuliani, R.
C. 264 264 165 60
Hedenstiema, G. 165 Ianigro, M. 264 Insolera, G. M. 60 Intrator,N. 123 IvanovP. Ch. 28 Kappen, H. J. 3 Kreuz,T. 17 Lattanzi, G. 251 Lecchini, S. 107 Lehnertz, K. 17 Luciani, F. 80 Mannarelli, M. 196 Marangi, C. 196
282
Mariani, L. 80 Maritan, A. 251 Masssafra, R. 144 Micheletti, C. 234 Mormann, F. 17 Nitti,L. 196 Pasquariello, G. 157 Pellegrino, M. 209 Pellicoro, M. 196 Perchiazzi, G. 165 Perchiazzi, G. 60 Pesole, G. 196 Provero, P. 209
Remondini, D. 123 Rieke, C. 17 Ruggiero, L. 165 Saccone, C. 196 Satalino, G. 157 Sciruicchio, V. 144 Stoop, R. 107 Stramaglia, S. 144 Stramaglia, S. 196 Tommaseo M. 196 Turchetti, G. 80 Vena, A. 165 Vena, A. 60
ISBN 981-02-4843-1
World Scientific www. worldscientific. com 4881 he
9 "789810"248437"