Water and Biomolecules: Physical Chemistry of Life Phenomena

biological and medical physics, biomedical engineering biological and medical physics, biomedical engineering The ﬁel...

65 downloads 792 Views 16MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

biological and medical physics, biomedical engineering

biological and medical physics, biomedical engineering The ﬁelds of biological and medical physics and biomedical engineering are broad, multidisciplinary and dynamic. They lie at the crossroads of frontier research in physics, biology, chemistry, and medicine. The Biological and Medical Physics, Biomedical Engineering Series is intended to be comprehensive, covering a broad range of topics important to the study of the physical, chemical and biological sciences. Its goal is to provide scientists and engineers with textbooks, monographs, and reference works to address the growing need for information. Books in the series emphasize established and emergent areas of science including molecular, membrane, and mathematical biophysics; photosynthetic energy harvesting and conversion; information processing; physical principles of genetics; sensory communications; automata networks, neural networks, and cellular automata. Equally important will be coverage of applied aspects of biological and medical physics and biomedical engineering such as molecular electronic components and devices, biosensors, medicine, imaging, physical principles of renewable energy production, advanced prostheses, and environmental control and engineering.

Editor-in-Chief: Elias Greenbaum, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA

Editorial Board: Masuo Aizawa, Department of Bioengineering, Tokyo Institute of Technology, Yokohama, Japan

Judith Herzfeld, Department of Chemistry, Brandeis University, Waltham, Massachusetts, USA

Olaf S. Andersen, Department of Physiology, Biophysics & Molecular Medicine, Cornell University, New York, USA

Mark S. Humayun, Doheny Eye Institute, Los Angeles, California, USA

Robert H. Austin, Department of Physics, Princeton University, Princeton, New Jersey, USA James Barber, Department of Biochemistry, Imperial College of Science, Technology and Medicine, London, England Howard C. Berg, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA Victor Bloomf ield, Department of Biochemistry, University of Minnesota, St. Paul, Minnesota, USA Robert Callender, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, USA Britton Chance, Department of Biochemistry/ Biophysics, University of Pennsylvania, Philadelphia, Pennsylvania, USA Steven Chu, Lawrence Berkeley National Laboratory, Berkeley, California, USA Louis J. DeFelice, Department of Pharmacology, Vanderbilt University, Nashville, Tennessee, USA Johann Deisenhofer, Howard Hughes Medical Institute, The University of Texas, Dallas, Texas, USA George Feher, Department of Physics, University of California, San Diego, La Jolla, California, USA Hans Frauenfelder, Los Alamos National Laboratory, Los Alamos, New Mexico, USA Ivar Giaever, Rensselaer Polytechnic Institute, Troy, New York, USA Sol M. Gruner, Cornell University, Ithaca, New York, USA

Pierre Joliot, Institute de Biologie Physico-Chimique, Fondation Edmond de Rothschild, Paris, France Lajos Keszthelyi, Institute of Biophysics, Hungarian Academy of Sciences, Szeged, Hungary Robert S. Knox, Department of Physics and Astronomy, University of Rochester, Rochester, New York, USA Aaron Lewis, Department of Applied Physics, Hebrew University, Jerusalem, Israel Stuart M. Lindsay, Department of Physics and Astronomy, Arizona State University, Tempe, Arizona, USA David Mauzerall, Rockefeller University, New York, New York, USA Eugenie V. Mielczarek, Department of Physics and Astronomy, George Mason University, Fairfax, Virginia, USA Markolf Niemz, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany V. Adrian Parsegian, Physical Science Laboratory, National Institutes of Health, Bethesda, Maryland, USA Linda S. Powers, University of Arizona, Tucson, Arizona, USA Earl W. Prohofsky, Department of Physics, Purdue University, West Lafayette, Indiana, USA Andrew Rubin, Department of Biophysics, Moscow State University, Moscow, Russia Michael Seibert, National Renewable Energy Laboratory, Golden, Colorado, USA David Thomas, Department of Biochemistry, University of Minnesota Medical School, Minneapolis, Minnesota, USA

Kunihiro Kuwajima Yuji Goto Fumio Hirata Mikio Kataoka Masahide Terazima (Editors)

Water and Biomolecules Physical Chemistry of Life Phenomena With 125 Figures

ABC

Professor Kunihiro Kuwajima National Institutes of Natural Sciences, Okazaki Institute for Integrative Bioscience 5-1 Higashiyama, Myodaiji, Okazaki 444-8787, Japan E-mail: [email protected]

Professor Yuji Goto Osaka University, Institute for Protein Research 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan E-mail: [email protected]

Professor Fumio Hirata National Institutes of Natural Sciences, Institute for Molecular Science Department for Theoretical and Computational Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki 444-8585, Japan E-mail: [email protected]

Professor Mikio Kataoka Nara Institute of Science and Technology, Graduate School of Materials Science 8916-6 Takayama, Ikoma, Nara 630-0192, Japan E-mail: [email protected]

Professor Masahide Terazima Kyoto University, Graduate School of Science, Department of Chemistry Oiwakecho, Kitashirakawa, Kyoto 606-8502, Japan E-mail: [email protected]

Biological and Medical Physics, Biomedical Engineering ISSN 1618-7210 ISBN 978-3-540-88786-7

e-ISBN 978-3-540-88787-4

Library of Congress Control Number: 2008944102 © Springer-Verlag Berlin Heidelberg 2009 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specif ically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microf ilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specif ic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by SPI Publisher Services, Pondicherry Cover design: eStudio Calamar Steinen SPIN 12251513 57/3180/SPI Printed on acid-free paper 987654321 springer.com

Preface

“Biomolecules”, including proteins, nucleic acids and saccharides, perform various biological activities in “water”. Biomolecules and water molecules simply represent “chemical substances” when each of them exists alone. However, we ﬁnd various biological processes expressed when these substances function together. This book “Water and Biomolecules – Physical Chemistry of Life Phenomena” covers the physical chemistry of such biological processes, and deals with “folding”, “dynamics”, and “function” of biomolecules as they are expressed in close relation to water molecules. Protein misfolding and amyloidogenesis are also included, because these are closely related to protein folding and functional expression, and hence responsible for a number of human diseases. This book is also related to our recent Research Project “Water and Biomolecules”, which was supported for ﬁve years by a Grant-in-Aid for the Scientiﬁc Research in Priority Areas from the Ministry of Education, Science, Culture, Sports and Technology (MEXT) of Japan, and concluded at the end of March of 2008. During the project period, we held an open workshop annually, at which we had several invited talks by expert researchers in our ﬁeld, several oral activity reports from our project members, and poster presentations representing the activities of all the members of the project team. The last workshop was organized by Mikio Kataoka (Nara Institute of Science and Technology), and held in Nara, the oldest capital of Japan, on January 24 and 25, 2008. This book thus consists of 15 chapters, including seven chapters contributed by seven invited speakers (C.M. Dobson, H.J. Dyson, R.M. Levy, J.A. McCammon, C.A. Royer, C.M. Rao, and P.E.Wright) in the last workshop and eight chapters contributed by eight members (Y. Goto, F. Hirata, M. Kataoka, K. Kuwajima, Y. Okamoto, M. Sakurai, M. Terazima, and K. Yoshikawa) who were involved in our project. The chapters are arranged thematically: Chaps. 1–5 describe experimental and simulation studies on the folding of biomolecules, Chaps. 6–12 are related to the dynamics and function of biomolecules, and Chaps. 13–15 deal with the amyloidogenesis of proteins.

VI

Preface

In Chap. 1, Peter E. Wright and his colleagues describe recent advances in mapping transient long range interactions, which are directly implicated in kinetic folding pathways of apomyoglobin. They use NMR relaxation techniques to map out the apomyoglobin folding landscape. Chapter 3 by Takahiro Sakaue and Kenichi Yoshikawa gives an overview and recent developments in the higher-order structure transition between dispersed coil and condensed compact states in giant DNA molecules. The rich transition behaviors found in experiments are analyzed based on the statistical mechanical concept and are discussed in relation to biological signiﬁcance. Chapters 4 and 5 deal with theoretical and computational studies of protein folding. Yuko Okamoto in Chap. 4 gives an excellent overview of generalized-ensemble algorithms for molecular simulations of protein folding, and Ron M. Levy and his colleagues in Chap. 5 describe studies using replica-exchange simulations to explore the complex binding and folding landscapes of proteins, particularly focusing on their recent work using simpliﬁed continuous and discrete representations of these landscapes. Kunihiro Kuwajima and colleagues in Chap. 2 also describe experimental and simulation studies of folding/unfolding of goat α-lactalbumin, and demonstrate the power of combination of experiments and simulations for studying the problems of protein folding. In Chap. 6, H. Jane Dyson and her colleagues describe the structural properties and dynamics of sizable disordered proteins in solution characterized by spectroscopic methods such as NMR. The chapter thus deals with intrinsically disordered proteins, whose functional role in crucial areas such as transcriptional regulation, translation and cellular signal transduction has only recently been recognized. Chapter 7, by Mikio Kataoka and Hironari Kamikobo, describes studies on protein dynamics and the eﬀect of hydration water on the dynamics using photoactive yellow protein as a model protein. Chapter 8 by Masahide Terazima describes studies on the biological reactions in several new techniques developed by his group. The techniques can monitor spectrally silent dynamics in time-domain, using the pulsed laser induced transient grating and transient lens methods. Catherine A. Royer and Roland Winter in Chap. 9 describe the pressure perturbation calorimetry, along with results from many previous densitmetric and high pressure studies to calculate quantitatively the speciﬁc volumes of a model protein, staphylococcal nuclease in both the folded and unfolded states as a function of temperature. Minoru Sakurai in Chap. 12 describes studies on the biological functions of a non-reducing disaccharide, α,α-trehalose as a substitute for water, and on their underlying mechanisms from viewpoints of thermodynamic, hydration and structural characteristics of this sugar. Chapters 10 and 11 deal with theoretical and computational studies of protein dynamics and functions. Fumio Hirata and his colleagues in Chap. 10 describe the application of the 3D-RISM theory, a statistical mechanics theory of molecular liquid, to characterization of proteins in aqueous solutions, particularly focusing on detection of water molecules and ions trapped in pores of proteins. J. Andrew McCammon in

Preface

VII

Chap. 11 gives an excellent overview of how computer simulations can be used quantitatively to interpret the behavior of proteins, including their binding of ligands. In Chap. 13, Chris M. Dobson gives an overview and the conceptual basis of the problems of protein folding and misfolding. The misfolding can often give rise to serious cellular malfunctions that frequently lead to disease. He also describes the results of experiments designed to link the principles of misfolding and aggregation to the eﬀects of such processes in model organisms such as Drosophila. Chapter 14 by Abhay Kumar Thakur and Ch. Mohan Rao describes the recent studies of their group on the possibility of UV exposure as a structural perturbant using mouse prion protein and other amyloidogenic proteins as model systems. Finally, Chap. 15 by Yuji Goto and his colleagues describes the results of recent studies of their group on the direct observation of nucleation and growth of amyloid ﬁbrils using total internal reﬂection ﬂuorescence microscopy combined with thioﬂavin and atomic force microscopy. We thank all the contributors to this book for their time and eﬀort in preparing the manuscripts, and particularly Chris M. Dobson (Cambridge) and Ron M. Levy (Rutgers), who were international advisors to our project, for their interest in the project and a number of very useful suggestions regarding the project. Thanks are also due to Claus E. Ascheron, Balamurugan Elumalai and Adelheid Duhm of Springer Science for their help in publishing this book. Okazaki January 2009

Kunihiro Kuwajima Yuji Goto Fumio Hirata Mikio Kataoka Masahide Terazima

Contents

1 Mapping Protein Folding Landscapes by NMR Relaxation P.E. Wright, D.J. Felitsky, K. Sugase, and H.J. Dyson . . . . . . . . . . . . . . . 1 1.1 NMR Techniques for Studying Protein Folding . . . . . . . . . . . . . . . . . 1 1.2 The Apomyoglobin Folding Landscape . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure of the Kinetic Molten Globule State . . . . . . . . . . . . . . . . . 2 1.4 The Upper Reaches of the Folding Landscape . . . . . . . . . . . . . . . . . . 2 1.5 Paramagnetic Relaxation Probes: Spin Labeling of Apomyoglobin 4 1.6 Model for Transient Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.7 Information from Relaxation Dispersion Measurements . . . . . . . . . . 8 1.8 Folding of an Intrinsically Disordered Protein Upon Binding to a Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Experimental and Simulation Studies of the Folding/Unfolding of Goat α-Lactalbumin K. Kuwajima, T. Oroguchi, T. Nakamura, M. Ikeguchi, and A. Kidera . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Goat α-Lactalbumin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Diﬀerences Between the Unfolding Behaviors of Authentic and Recombinant Goat α-Lactalbumin . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Folding/Unfolding Pathways of Goat α-Lactalbumin . . . . . . . . . . . . 2.4.1 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Summary and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 14 15 15 18 22 23 23 26 32 32 33

X

Contents

3 Transition in the Higher-order Structure of DNA in Aqueous Solution T. Sakaue and K. Yoshikawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Long DNA Molecules in Aqueous Solution . . . . . . . . . . . . . . . . . . . . . 3.2.1 Primary, Secondary, and Higher-order Structures . . . . . . . . 3.2.2 DNA Condensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Looking at Single DNA Molecules . . . . . . . . . . . . . . . . . . . . . 3.3 Statistical Physics of Folding of a Long Polymer . . . . . . . . . . . . . . . . 3.3.1 Some Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Continuous Transition in Flexible Polymers: Coil-Globule Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Discontinuous Transition in Semiﬂexible Polymers . . . . . . . 3.3.4 Instability Due to the Remanent Charge . . . . . . . . . . . . . . . 3.4 Summary and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Higher-order Structure and Genetic Activity . . . . . . . . . . . . 3.4.2 Toward Chromatin Structure . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Generalized-Ensemble Algorithms for Studying Protein Folding Y. Okamoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Generalized-Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Multicanonical Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Multidimensional Extensions of Multicanonical Algorithm . . . . . . . 4.3.1 Replica-Exchange Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Multidimensional Extensions of Replica-Exchange Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Examples of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 37 38 38 40 40 42 42 43 45 51 55 56 56 58

61 61 63 63 67 69 73 75 90 90

5 Protein Folding and Binding: Eﬀective Potentials, Replica Exchange Simulations, and Network Models A.K. Felts, M. Andrec, E. Gallicchio, and R.M. Levy . . . . . . . . . . . . . . . . . 97 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.1 The OPLS-AA/AGBNP Eﬀective Potential . . . . . . . . . . . . . 100 5.2.2 Replica Exchange Molecular Dynamics . . . . . . . . . . . . . . . . . 102 5.2.3 The Network Model of Protein Folding . . . . . . . . . . . . . . . . . 103 5.2.4 Loop Prediction with Torsion Angle Sampling . . . . . . . . . . 103 5.3 Folding of Peptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.1 G-Peptide Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.2 Folding of Other Small Peptides . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.3 Loop Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Contents

XI

5.4

Kinetic Model of the G-Peptide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4.1 The G-Peptide has Apparent Two-State Kinetics After a Small Temperature Jump Perturbation . . . . . . . . . . 108 5.4.2 The G-Peptide has an α-Helical Intermediate During Folding from Coil Conformations . . . . . . . . . . . . . . . . 108 5.4.3 A Molecular View of Kinetic Pathways . . . . . . . . . . . . . . . . . 109 5.5 Ligand Conformational Equilibrium in a Cytochrome P450 Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.5.2 The Population of the Proximal State as a Function of Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.6 Simple Continuous and Discrete Models for Simulating Replica Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.6.1 Discrete Network Replica Exchange (NRE) . . . . . . . . . . . . . 114 5.6.2 RE Simulations using MC on a Continuous Potential . . . . 114 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6 Functional Unfolded Proteins: How, When, Where, and Why? H.J. Dyson, S.-C. Sue, and P.E. Wright . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.1 What is a Functional Unfolded Protein? . . . . . . . . . . . . . . . . . . . . . . . 123 6.2 Where do Functional Unfolded Proteins Occur? . . . . . . . . . . . . . . . . 124 6.3 How Are Functional Unfolded Proteins Studied? . . . . . . . . . . . . . . . . 124 6.4 NMR Spectra: Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . 125 6.5 Dynamic Complexes in CBP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.6 Role of Flexibility in the Function of IκBα . . . . . . . . . . . . . . . . . . . . . 128 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7 Structure of the Photointermediate of Photoactive Yellow Protein and the Propagation Mechanism of Structural Change M. Kataoka and H. Kamikubo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1 Solution X-ray Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.2 Photoactive Yellow Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.3 Solution Structure Analysis of Photointermediate of PYP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.3.1 High-Angle X-ray Scattering of PYP in the Dark and in the Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.3.2 Analysis of High Angle Scattering . . . . . . . . . . . . . . . . . . . . . 142 7.4 Propagation Mechanism of the Structural Change . . . . . . . . . . . . . . 144 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

XII

Contents

8 Time-Resolved Detection of Intermolecular Interaction of Photosensor Proteins M. Terazima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.3 Diﬀusion Coeﬃcient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.4 Time-Resolved Detection of Interprotein Interactions . . . . . . . . . . . . 154 8.4.1 Protein–Protein Interaction of the Photoexcited Photoactive Yellow Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.4.2 Photoinduced Dimerization of AppA . . . . . . . . . . . . . . . . . . . 157 8.4.3 Photoinduced Dimerization and Dissociation of Phototropins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.4.4 Diﬀusion Detection of Interprotein Interaction . . . . . . . . . . 168 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9 Volumetric Properties of Proteins and the Role of Solvent in Conformational Dynamics C.A. Royer and R. Winter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.2 Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.3 Thermal Expansivity and ΔV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 10 A Statistical Mechanics Theory of Molecular Recognition T. Imai, N. Yoshida, A. Kovalenko, and F. Hirata . . . . . . . . . . . . . . . . . . . 187 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 10.2 Outline of the RISM and 3D-RISM Theories . . . . . . . . . . . . . . . . . . . 190 10.3 Recognition of Water Molecules by Protein . . . . . . . . . . . . . . . . . . . . 196 10.4 Noble Gas Binding to Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 10.5 Selective Ion-Binding by Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.6 Pressure-Induced Structural Transition of Protein and Molecular Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 10.7 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 11 Computational Studies of Protein Dynamics J.A. McCammon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 11.2 Brief Survey of Protein Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 11.3 Binding and Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 11.4 Concerted Binding and Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 11.5 Molecular Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

Contents

XIII

12 Biological Functions of Trehalose as a Substitute for Water M. Sakurai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 12.2 Hydration Property of Trehalose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 12.2.1 Property of the Aqueous Solution of Trehalose . . . . . . . . . . 221 12.2.2 Atomic-Level Picture of Hydration of Trehalose . . . . . . . . . 223 12.3 Solid-State Property of Trehalose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 12.3.1 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 12.3.2 Glassy State of Trehalose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 12.4 Biological Roles of Trehalose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 12.4.1 Possible Mechanisms of Anhydrobiosis . . . . . . . . . . . . . . . . . 229 12.4.2 Strategy for Desiccation Tolerance in the Sleeping Chironomid . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 12.4.3 Other Biological Roles of Trehalose . . . . . . . . . . . . . . . . . . . . 234 12.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 13 Protein Misfolding Diseases and the Key Role Played by the Interactions of Polypeptides with Water C.M. Dobson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 13.2 The Importance of Normal and Aberrant Protein Folding in Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 13.3 Protein Aggregation and Amyloid Formation . . . . . . . . . . . . . . . . . . . 247 13.4 Molecular Evolution and the Control of Protein Misfolding . . . . . . 253 13.5 Impaired Misfolding Control and the Onset of Disease . . . . . . . . . . . 255 13.6 Probing Misfolding and Aggregation in Living Organisms . . . . . . . . 257 13.7 The Recent Proliferation of Misfolding Diseases and Prospects for Eﬀective Therapies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 13.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 14 Eﬀect of UV Light on Amyloidogenic Proteins: Nucleation and Fibril Extension A.K. Thakur and Ch. Mohan Rao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 14.2 Amyloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 14.2.1 Structural Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 14.2.2 Nucleation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 14.2.3 Fibril Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 14.3 UV Light as a Potent Structural Perturbant . . . . . . . . . . . . . . . . . . . 272 14.3.1 UV-Induced Aggregation of Prion Protein . . . . . . . . . . . . . . 273 14.3.2 Prevention of UV-Induced Aggregation of Prion Protein . . 274 14.3.3 UV Exposure Alters Conformation of Prion Protein . . . . . 274 14.3.4 UV-Exposed Proteins Failed to Form Amyloid De Novo . . 277

XIV

Contents

14.3.5

Is Subcritical Concentration of UV-Exposed Protein Responsible for Failure to Form Amyloid Fibrils? . . . . . . . . 279 14.3.6 UV-Exposed Amyloidogenic Proteins Form Amyloid Upon Seeding . . . . . . . . . . . . . . . . . . . . . . . . . 280 14.3.7 UV-Exposed Prion Protein Fibrils Show Altered Fibril Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 14.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 15 Real-Time Observation of Amyloid Fibril Growth by Total Internal Reﬂection Fluorescence Microscopy H. Yagi, T. Ban, and Y. Goto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 15.2 Total Internal Reﬂection Fluorescence Microscopy . . . . . . . . . . . . . . 290 15.3 Real-Time Observation of β2-m and Aβ Fibrils . . . . . . . . . . . . . . . . . 291 15.4 Eﬀects of Various Surfaces on the Growth of Aβ Fibrils . . . . . . . . . 292 15.5 Spontaneous Formation of Aβ(1–40) Fibrils and Classiﬁcation of Morphologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 15.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

List of Contributors

Michael Andrec Department of Chemistry and Chemical Biology and BioMaPS Institute for Quantitative Biology Rutgers University, Piscataway NJ 08854, USA Tadato Ban National Advanced Institute of Advanced Science and Technology Midorigaoka 1-8-31, Ikeda Osaka 563-8577, Japan Christopher M. Dobson Department of Chemistry, University of Cambridge, Lensﬁeld Road Cambridge CB2 1EW, UK [email protected] H. Jane Dyson Department of Molecular Biology MB2, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA [email protected] Daniel J. Felitsky Department of Molecular Biology MB2, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA

Anthony K. Felts Department of Chemistry and Chemical Biology and BioMaPS Institute for Quantitative Biology Rutgers University, Piscataway NJ 08854, USA Emilio Gallicchio Department of Chemistry and Chemical Biology and BioMaPS Institute for Quantitative Biology Rutgers University, Piscataway NJ 08854, USA Yuji Goto Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita Osaka 565-0871, Japan [email protected] Fumio Hirata Department of Theoretical and Computational Molecular Science Institute for Molecular Science National Institutes of Natural Sciences, Okazaki, Aichi 444-8585 Japan [email protected] and Department of Functional Molecular Science, School of Physical Sciences

XVI

List of Contributors

Graduate University for Advanced Studies (SOKENDAI) 5-1 Higashiyama, Myodaiji Okazaki, Aichi 444-8585, Japan Mitsunori Ikeguchi International Graduate School of Arts and Science Yokohama City University Tsurumi, Yokohama 230-0045 Japan Takashi Imai Computational Science Research Program, RIKEN, Wako Saitama 351-0198, Japan [email protected] Hironari Kamikubo Graduate School of Materials Science, Nara Institute of Science and Technology, Ikoma Nara 630-0192, Japan [email protected] Mikio Kataoka Graduate School of Materials Science, Nara Institute of Science and Technology, Ikoma Nara 630-0192, Japan [email protected] Akinori Kidera International Graduate School of Arts and Science Yokohama City University Tsurumi, Yokohama 230-0045, Japan Andriy Kovalenko National Institute for Nanotechnology, and Department of Mechanical Engineering University of Alberta, Edmonton Alberta T6G 2M9, Canada [email protected]

Kunihiro Kuwajima Okazaki Institute for Integrative Bioscience, National Institutes of Natural Sciences, 5-1 Higashiyama Myodaiji, Okazaki, Aichi 444-8787 Japan [email protected] and Department of Functional Molecular Science, School of Physical Sciences Graduate University for Advanced Studies (SOKENDAI) 5-1 Higashiyama, Myodaiji Okazaki, Aichi 444-8787, Japan Ronald M. Levy Department of Chemistry and Chemical Biology and BioMaPS Institute for Quantitative Biology Rutgers University, Piscataway NJ 08854, USA [email protected] J. Andrew McCammon Department of Chemistry and Biochemistry, Department of Pharmacology, Center for Theoretical Biological Physics, and Howard Hughes Medical Institute, University of California at San Diego, La Jolla CA 92093-0365, USA [email protected] Takashi Nakamura Okazaki Institute for Integrative Bioscience, National Institutes of Natural Sciences, 5-1 Higashiyama Myodaiji, Okazaki, Aichi 444-8787 Japan Yuko Okamoto Department of Physics Nagoya University, Nagoya Aichi 464-8602, Japan [email protected]

List of Contributors

XVII

Tomotaka Oroguchi International Graduate School of Arts and Science Yokohama City University, Tsurumi Yokohama 230-0045, Japan

Masahide Terazima Department of Chemistry, Graduate School of Science, Kyoto University Kyoto 606-8502, Japan [email protected]

Ch. Mohan Rao Centre for Cellular and Molecular Biology, Council of Scientiﬁc and Industrial Research Hyderabad 500 007, India [email protected] www.ccmb.res.in/staﬀ/mohan

Abhay Kumar Thakur Centre for Cellular and Molecular Biology, Council of Scientiﬁc and Industrial Research Hyderabad 500 007, India

Catherine A. Royer INSERM, U554, CNRS UMR5048 29 rue de Navacelles 34090 Montpellier Cedex, France [email protected] Takahiro Sakaue Fukui Institute for Fundamental Chemistry, Kyoto University Kyoto 606-8103, Japan [email protected] Minoru Sakurai Center for Biological Resources and Informatics, Tokyo Institute of Technology, B-62 Nagatsuta-cho Midori-ku, Yokohama 226-8501 Japan [email protected]

Roland Winter Department of Chemistry, Physical Chemistry I – Biophysical Chemistry Dortmund University of Technology Otto-Hahn Str. 6, D-44227 Dortmund, Germany [email protected] Peter E. Wright Department of Molecular Biology MB2, The Scripps Research Institute 10550 North Torrey Pines Road La Jolla, CA 92037, USA [email protected] Hisashi Yagi Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita Osaka 565-0871, Japan

Shih-Che Sue Department of Molecular Biology MB2, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA

Norio Yoshida Department of Theoretical and Computational Molecular Science, Institute for Molecular Science, Okazaki, Aichi 444-8585 Japan [email protected]

Kenji Sugase Department of Molecular Biology MB2, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA

Kenichi Yoshikawa Department of Physics, Graduate School of Science, Kyoto University Kyoto 606-8502, Japan [email protected]

1 Mapping Protein Folding Landscapes by NMR Relaxation P.E. Wright, D.J. Felitsky, K. Sugase, and H.J. Dyson

Abstract. The process of protein folding provides an excellent example of the interactions of water with biomolecules. The changes in the water–protein interactions along the protein folding pathway provide an important impetus for the formation of the ﬁnal natively folded structure of the protein. NMR spectroscopy provides unique insights into the dynamic protein folding process, and during the past 20 years we have seen the development of a wide range of NMR techniques to probe the kinetic and thermodynamic aspects of protein folding. In particular, with the advent of high-ﬁeld spectrometers and stable isotope labeling techniques, the structure and dynamics of a wide range of disordered and partly ordered proteins at equilibrium have been characterized by NMR. Eﬀorts in our laboratory over a number of years have allowed the sequence-speciﬁc identiﬁcation of sites of local hydrophobic collapse, as well as secondary structure formation and transient long-range interactions in several protein systems, most notably for apomyoglobin, which will be highlighted in this article.

1.1 NMR Techniques for Studying Protein Folding Kinetic folding pathways for proteins that fold on a millisecond timescale can be probed using hydrogen exchange pulse labeling [1,2], where diﬀerential protection of amide protons at various points during folding is detected by NMR. More recently, with the advent of high-ﬁeld spectrometers and 13 C, 15 N, and 2 H labeling techniques, the structure and dynamics of disordered and partly ordered proteins at equilibrium have been characterized by NMR. The upper reaches of the protein folding landscape can be mapped using chemical shift, nuclear Overhauser eﬀect (NOE), spin labeling, relaxation data, and residual dipolar coupling measurements (reviewed in [3]). Eﬀorts in our laboratory over a number of years have allowed the sequence-speciﬁc identiﬁcation of sites of local hydrophobic collapse, secondary structure formation, and transient longrange interactions in several protein systems, most notably in apomyoglobin.

2

P.E. Wright et al.

1.2 The Apomyoglobin Folding Landscape Apomyoglobin, the heme-free version of the muscle protein myoglobin, contains eight helices folded into the canonical globin fold. The kinetic folding pathway, elucidated by hydrogen exchange pulse labeling [4,5], shows the rapid formation of an intermediate species containing the A, B, G, and H helices, which is followed by the slower (∼ms) folding of the remainder of the protein. The equilibrium folding landscape for apomyoglobin is typical for a singledomain protein. In the presence of high concentrations of urea, the protein is completely unfolded [6], and populates an ensemble of structures with little detectable propensity for structure formation. In the acid-unfolded state at pH 2, the protein is largely unfolded, but samples transient secondary structure and hydrophobic clusters in certain parts of the protein but not in others [7]. Equilibrium intermediates corresponding to the ABGH kinetic intermediate are formed at intermediate pHs in the absence of urea. These species, termed molten globules, contain relatively stable helical secondary structure, but ﬂuid tertiary structure. Resonances of the F helix of folded apomyoglobin at pH 6 are invisible because of an exchange on an intermediate timescale between two or more structures with diﬀerent chemical shifts [8].

1.3 Structure of the Kinetic Molten Globule State All globins so far studied pass through a kinetic molten globule intermediate that contains some but not all of the helices. The particular helices that are present in the kinetic intermediate vary according to the amino acid sequence; for example, the intermediate in the folding of the monomeric plant hemoglobin apoleghemoglobin contains the E, G, and H helices instead of the A, B, G, and H helices of apomyoglobin. An extensive series of kinetic and equilibrium folding studies on mutants of apomyoglobin [9–11] have identiﬁed a non-native structure that slows down folding and allows the intermediate to be detected. This is illustrated in Fig. 1.1, which shows the proton occupancies in the molten globule intermediate of apomyoglobin mapped onto the structure of the native, fully folded protein. The most highly protected areas in the intermediate, which likely correspond to the coalesced portion of the polypeptide, do not correspond to contiguous regions in the fully folded protein. Instead, the H helix appears to be translocated in the intermediate by about one helical turn. We conclude that the transition state for folding thus involves resolution of this small area of non-native structure before the ﬁnal native contacts can be made.

1.4 The Upper Reaches of the Folding Landscape One of the strengths of NMR is that it can give per-residue structural information on ensembles of molecules that may contain diﬀerent local structures. An example of this is the acid-unfolded state of apomyoglobin. Chemical shift

1 Mapping Protein Folding Landscapes by NMR Relaxation

3

Fig. 1.1. Model of the apomyoglobin kinetic folding intermediate based on hydrogen exchange pulse labeling and mutagenesis data. The proton occupancies are mapped onto the structure of the holomyoglobin [12]. The degree of amide proton exchange protection is indicated by the intensity of the gray shading and the thickness of the backbone. The most protected regions are indicated by the darkest shade and the thickest backbone. The ﬁgure was prepared using MolMol [13]

data show that there is a detectable propensity for helical backbone dihedral angles in the regions of the protein that correspond to the H and A helices in the native folded state. Relaxation data [7] and spin-labeling studies [14] show the presence of transient native-like long-range interactions between the A and G helix regions in acid-unfolded apoMb. That these transient interactions are native-like and nonrandom must be a consequence of the amino acid sequence alone, and a series of mutant studies of apomyoglobin [9–11] showed that the propensity for local and transient long-range ordering in acid-unfolded apomyoglobin could be correlated with the property “average area buried upon folding” (AABUF) [15] or the modiﬁed hydrophobic eﬀect [16]. In addition, the proton occupancy in the kinetic intermediate also correlates with the AABUF, and changing the local AABUF by designed point mutations also changes the pattern of proton occupancy in the kinetic intermediate [10] (Fig. 1.2). These experiments showed conclusively that the local regions with high AABUF adopt stable structure early in the protein folding process. We next turned to the question of the means whereby the hydrophobic clusters, sometimes separated by long intermediate stretches of the unfolded polypeptide, can interact, and to the hierarchy of folding events. These questions are addressed by using paramagnetic relaxation enhancement (PRE) (spin labels) and 15 N R2 relaxation dispersion.

4

P.E. Wright et al.

Fig. 1.2. Correlation between proton occupancies in the kinetic burst phase intermediate (black circles) and average area buried upon folding (AABUF, gray lines) for wild-type apomyoglobin and for a quadruple mutant (Leu11Gly, Trp14Gly, Ala71Leu, Gly73Trp – termed the GGLW mutant). Reproduced with permission from [10]

1.5 Paramagnetic Relaxation Probes: Spin Labeling of Apomyoglobin The incorporation of a paramagnetic spin label results in broadening of the NMR resonances of nuclei within 15–20 ˚ A from the site of spin labeling. This makes spin labels powerful probes of conformational ensembles. A preliminary spin-label study of apomyoglobin [14] showed that the transient contacts that occur at equilibrium in acid-unfolded apoMb are sequence speciﬁc and region speciﬁc. Resonances are broadened in the immediate vicinity of the spin label, but for some spin label sites, such as E18 (Fig. 1.3), broadening is observed at long range in the G and H helix regions, while for a spin-label site in the E helix, no such long-range broadening is observed. We have recently undertaken a comprehensive spin-label study of apomyoglobin using the data to derive a model that gives rise to a quantitative evaluation of the population of various transient collapsed states [17].

1.6 Model for Transient Interactions For unfolded and partly folded states, the spin label reports on parts of the polypeptide chain that are in transient contact with the segment bearing the spin label. The extent of relaxation enhancement (line broadening) depends on both the distance to the paramagnetic spin label and the lifetime of the interaction. When the chain conformers rapidly interconvert, as is the case

1 Mapping Protein Folding Landscapes by NMR Relaxation

5

Fig. 1.3. Paramagnetic relaxation enhancement proﬁles for apomyoglobin unfolded at pH 2.3 in the presence (left panels) and absence (right panels) of 8 M urea. Data for spin labels attached at residues 18 and 77 is shown. The plots show the ratio of HSQC cross-peak intensity with the spin label oxidized (paramagnetic) and reduced (diamagnetic) as a function of residue number. The solid lines in the left panels represent the broadening proﬁle expected for a random coil polypeptide. The ﬁgure is adapted from data reported in [14]. The positions of the helices in holomyoglobin are shown by the bars at the top of the ﬁgure

in unfolded apomyoglobin, the relaxation enhancement becomes a weighted average over all members of the ensemble: R2P = Σi Ki pi /ri6 , where pi is the fractional population of state i, ri is the distance between the backbone amide proton which gives rise to the NMR cross-peak and the spin label, and Ki is a proportionality constant which depends on both the gyromagnetic ratio of the nucleus under investigation and the correlation time for the electron–nuclear dipole–dipole interaction. The magnitude of Ki is such that even very small populations (<∼1%) can contribute measurably to the overall relaxation rate. Detailed PRE measurements on a total of 14 spin-label sites distributed throughout the molecule conﬁrm that transient long-range interactions in the acid-unfolded apomyoglobin chain are restricted to the ﬁve regions corresponding to the AABUF maxima in the primary sequence; these are designated as regions A, B, C, G, and H (corresponding to the high-AABUF sequences in the A and B helices, the CD loop, and the G and H helices, respectively). The localization of interaction sites to distinct segments of the chain suggests that the paramagnetic relaxation may be modeled via a chemical

6

P.E. Wright et al.

kinetics approach whereby these ﬁve sites transiently associate in various combinations. The unfolded ensemble can thereby be divided into 52 macrostates, which correspond to the 51 diﬀerent possible combinatorial arrangements of the ﬁve interaction sites and the completely dissociated substrate. For example, A can combine with G and H in one cluster, while B and C are interacting in a second cluster to form the macrostate AGH–BC. The long-range intramolecular contacts in a given macrostate act as a set of topological restraints which reduce the chain’s conﬁgurational entropy. For nonspeciﬁc, transient interactions, the relative entropy loss for the formation of diﬀerent contacts (loops) may well be expected to dominate the thermodynamics and thus determine the relative populations of the diﬀerent macrostates. This entropy loss depends directly on the length of the intervening chain segment(s) and relates to the distance distribution function P (r) between two noninteracting sites separated by the same linker length. More speciﬁcally, the entropy loss is deﬁned by the fraction of the distribution in which the interaction site centers are close enough for the two regions to coalesce; this distance is typically estimated from the sum of the radii of two spheres with volumes equivalent to the total van der Waals volumes of all residues within each interaction site. The required distance distributions can be extracted from the paramagnetic relaxation enhancement of spin labels attached centrally in the chain (such as at positions 57 or 77) where no long-range interactions occur, utilizing radius of gyration information to help determine the long-range tails of the distributions. A model in which the relative stabilities of various clusters (and macrostates) are determined solely by the entropic barriers to loop closure discussed above can be ﬁtted to the experimental paramagnetic relaxation data, with a single parameter reﬂecting the mean favorable free energy of interaction required to overcome the entropic penalty for contact formation. The model ﬁts the experimental data surprisingly well (pale gray lines in Fig. 1.4), but not perfectly, as it cannot explain the experimentally observed preference for the C-terminus (G/H regions) to interact with A/B over C. (This latter region should interact more strongly solely on the basis of loop entropy considerations.) When this fact is accounted for (by modeling in weaker pairwise interaction free energies of the C region with other interaction sites), the loop entropy model gives an excellent ﬁt to the PRE data for the 12 spin-label sites that show long-range interactions, as shown via the ﬁt in Fig. 1.4 (dark gray lines). Spin-label data obtained in the presence of 8 M urea at pH 2.3 could also be well ﬁtted by this model; the only contacts persisting under these more destabilizing conditions are relatively short range (AB and GH). The most highly populated macrostates in the pH 2.3 ensemble in the absence of urea include, as expected, the species with no association (A–B–C– G–H). What is perhaps more surprising is that this completely unfolded substate is a minority (30%); most ensemble members have one or more long-range interaction. The most populated (17–40% population) interactions involve

1 Mapping Protein Folding Landscapes by NMR Relaxation

7

Fig. 1.4. Paramagnetic relaxation enhancement proﬁles for apomyoglobin unfolded at pH 2.3 with spin labels at the positions indicated by the arrows. The ﬁtted curves show the initial (pale gray) ﬁts and the ﬁts obtained after correction for diﬀerences in pairwise cluster interaction free energies (dark gray lines).The location of hydrophobic clusters deﬁned from regions of high AABUF are indicated by bars at the top of the ﬁgure. Reproduced with permission from [17]

some of the smallest loop closure events (e.g., AB, GH, and BC) and are independent of whether the interaction is native (GH, BC) or non-native (AB). The former of these observations rationalizes the relatively modest reduction (∼15%) in radius of gyration relative to more completely (denaturant) unfolded ensembles despite the presence of long-range interactions in a majority of the ensembles. Transient interactions also occur between the N- and C-termini of the protein. The populations of these contacts are quite small (less than 4%) consistent with the greater reduction in entropy required to close loops involving the extended intervening linkers. The model predicts multiple species involving diﬀerent combinations of A, B, G, and H, all of similar stability. Because of its stochastic basis, however, the model is likely to underestimate cooperativity; the results are not incompatible with a single ABGH cluster. Strikingly, spin labels that probe this interaction induce quite diﬀerent extents of nonlocal line broadening. A spin label attached at position R139 (in the H region) induces much more N-terminal (A/B) line broadening than one attached at position K140. Similarly, spin labels at positions 11, 15, and 18 in the A region all induce diﬀerent extents of line broadening in G and H. This

8

P.E. Wright et al.

diﬀerential line broadening must reﬂect the relative orientations/positions of the side chains within the cluster, indicating a signiﬁcant degree of speciﬁcity of interaction. In contrast to this observation, spin labels that probe interactions restricted to a single chain terminus all enhance relaxation to similar degrees, suggesting that more localized interactions are signiﬁcantly more heterogeneous and less speciﬁc.

1.7 Information from Relaxation Dispersion Measurements Relaxation dispersion measurements are applied to systems that are undergoing an exchange process on the microsecond to millisecond timescale. Measurement of the R2 relaxation rate in a series of experiments where the pulsing frequency is varied results in additional intensity for resonances of nuclei involved in the exchange process [18,19]. The resulting dispersion curve, showing R2eﬀ as a function of 1/τCP can be ﬁtted to functions such as [19, 20] 1 1 R2eﬀ = R20 + cosh−1 [D+ cosh (η+ ) − D− cos (η− )] kAB + kBA − 2 τCP 2 ψ + 2Δω 1 ±1 + D± = 2 ψ2 + ξ2 1 η± = τCP ±ψ + ψ 2 + ξ 2 2 2 ψ = (kAB + kBA ) − Δω 2 ξ = 2Δω (kAB − kBA ) The parameters derived from these ﬁts give information on the relative population of the two states pA and pB , on the rate of exchange kex (= kAB + kBA ) between the two states, and on the structure of the excited state, which is given by the chemical shift diﬀerence Δω.

1.8 Folding of an Intrinsically Disordered Protein Upon Binding to a Target Coupled folding and binding is a frequent theme in the ﬁeld of intrinsically disordered proteins (see Chap. 6). One of the earliest examples of this phenomenon was the interaction of the phosphorylated kinase-inducible domain (pKID) of the transcription factor CREB with the KIX domain of the transcriptional coactivator CBP. Free pKID is unfolded in solution [21], but folds into an orthogonal pair of helices, αA and αB, upon binding to the folded KIX domain (Fig. 1.5) [23]. We have recently posed the question, what is the

1 Mapping Protein Folding Landscapes by NMR Relaxation

9

Fig. 1.5. Coupled folding and binding during the interaction of the phosphorylated kinase inducible activation domain of the transcription factor CREB (termed pKID) with the globular KIX domain of the CREB binding protein (CBP). The free pKID domain is intrinsically disordered (represented as the unfolded chain on the left), and in the absence of the binding partner it populates an ensemble of conformations. Upon binding to the globular KIX domain (shown as gray surface), it folds into a pair of orthogonal helices (dark gray backbone trace). Reproduced with permission from [22]

Fig. 1.6. 15 N R2 relaxation dispersion proﬁle for Arg124 of pKID recorded at 800 (ﬁlled circles) and 500 MHz (open circles). Dispersion curves for 1 mM [15 N]-pKID in the presence of 0.95, 1.00, 1.05, and 1.10 mM KIX are shown

mechanism by which the folding of the disordered pKID is coupled with binding to the KIX domain? Recent NMR studies including relaxation dispersion have provided intriguing insights into this question. Coupled folding and binding of pKID to KIX was studied by recording a series of HSQC titrations and by 15 N R2 dispersion measurements performed using 15 N-labeled pKID at two magnetic ﬁelds and over a range of pKID:KIX concentration ratios [24] (Fig. 1.6). The HSQC titrations show that at least two processes occur as KIX is titrated into pKID. Both fast (low-aﬃnity) and slow (high-aﬃnity) exchange processes are observed. The NMR data can

10

P.E. Wright et al.

be ﬁtted to a pseudo four-site exchange model, which gives important new insights into the mechanism of coupled folding and binding in this system: pKID +KIX ↔ pKID . . . KIX ↔ pKID : KIX∗ ↔ pKID : KIX . Free

Encounter complex

Intermediate

Bound

The encounter complex represents an ensemble in which nonspeciﬁc hydrophobic interactions occur at a number of sites. The primary interactions in the encounter complex involve a hydrophobic cluster (Y134, I137, L138, and L141) in the unfolded αB region of pKID contacting hydrophobic patches on KIX. The encounter complex was invoked to reconcile the behavior of the cross-peaks in the HSQC titrations with the Δω values obtained from the relaxation dispersion measurements: a better correlation is observed between the Δω values and equilibrium chemical shift diﬀerences Δδ which utilize the encounter complex (Fig. 1.7). The structure of the binding intermediate can also be inferred from the chemical shift and relaxation data. The αA helix is nearly fully folded in the intermediate, whereas the αB helix is only partially folded. In summary, our NMR measurements show that the coupled folding and binding landscape of pKID is complex: Disordered pKID ﬁrst makes transient hydrophobic contacts with KIX, forming an ensemble of encounter complexes that evolve to folded states without dissociation of pKID from the KIX surface. As for the folding of apomyoglobin, the most important interaction in the initiation of coupled folding and binding of pKID is the formation of hydrophobic interactions, which can then play a key role in directing the folding process towards the ﬁnal folded state.

Fig. 1.7. Correlation of 15 N chemical shift diﬀerences (Δω ∗ ) determined from the R2 dispersion measurements with equilibrium shift diﬀerences. Chemical shift differences between free pKID and the fully bound state (ΔδFB ) are shown as black squares, and between the encounter complex and fully bound state (ΔδEB ) are shown as gray circles, with matching shades for the lines of best ﬁt. Reproduced with permission from [24]

1 Mapping Protein Folding Landscapes by NMR Relaxation

11

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.

J.B. Udgaonkar, R.L. Baldwin, Nature 335, 694–699 (1988) H. Roder, G.A. El¨ ove, S.W. Englander, Nature 335, 700–704 (1988) H.J. Dyson, P.E. Wright, Chem. Rev. 104, 3607–3622 (2004) P.A. Jennings, P.E. Wright, Science 262, 892–896 (1993) C. Nishimura, H.J. Dyson, P.E. Wright, J. Mol. Biol. 322, 483–489 (2002) S. Schwarzinger, P.E. Wright, H.J. Dyson, Biochemistry 41, 12681–12686 (2002) J. Yao, J. Chung, D. Eliezer, P.E. Wright, H.J. Dyson, Biochemistry 40, 3561–3571 (2001) D. Eliezer, P.E. Wright, J. Mol. Biol. 263, 531–538 (1996) C. Nishimura, P.E. Wright, H.J. Dyson, J. Mol. Biol. 334, 293–307 (2003) C. Nishimura, M.A. Lietzow, H.J Dyson, P.E. Wright, J. Mol. Biol. 351, 383–392 (2005) C. Nishimura, H.J. Dyson, P.E. Wright, J. Mol. Biol. 355, 139–156 (2006) J. Kuriyan, S. Wilz, M. Karplus, G.A. Petsko, J. Mol. Biol. 192, 133–154 (1986) R. Koradi, M. Billeter, K. W¨ uthrich, J. Mol. Graphics 14, 51–55 (1996) M.A. Lietzow, M. Jamin, H.J. Dyson, P.E. Wright, J. Mol. Biol. 322, 655–662 (2002) G.D. Rose, A.R. Geselowitz, G.J. Lesser, R.H. Lee, M.H. Zehfus, Science 229, 834–838 (1985) H.J. Dyson, P.E. Wright, H.A. Scheraga, Proc. Natl. Acad. Sci. U. S. A. 103, 13057–13061 (2006) D.J. Felitsky, M.A. Lietzow, H.J. Dyson, P.E. Wright, Proc. Natl. Acad. Sci. U. S. A. 105, 6278–6283 (2008) J.P. Loria, M. Rance, A.G. Palmer, J. Am. Chem. Soc. 121, 2331–2332 (1999) M. Tollinger, N.R. Skrynnikov, F.A. Mulder, J.D. Forman-Kay, L.E. Kay, J. Am. Chem. Soc. 123, 11341–11352 (2001) D.G. Davis, M.E. Perlman, R.E. London, J. Magn Reson. B 104, 266–275 (1994) I. Radhakrishnan, G.C. P´erez-Alvarado, H.J. Dyson, P.E. Wright, FEBS Lett. 430, 317–322 (1998) H.J. Dyson, P.E. Wright, Nat. Rev. Mol. Cell Biol. 6, 197–208 (2005) I. Radhakrishnan, G.C. P´erez-Alvarado, D. Parker, H.J. Dyson, M.R. Montminy, P.E. Wright, Cell 91, 741–752 (1997) K. Sugase, H.J. Dyson, P.E. Wright, Nature 447, 1021–1025 (2007)

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat α-Lactalbumin K. Kuwajima, T. Oroguchi, T. Nakamura, M. Ikeguchi, and A. Kidera

Abstract. We studied (1) the unfolding behavior of the authentic and recombinant forms of goat α-lactalbumin and (2) the structure of the transition state of folding/unfolding of the protein, both experimentally and by simulation of the molecular dynamics. Experimentally, the recombinant protein exhibited remarkable destabilization and unfolding-rate acceleration as compared to those of the authentic protein; these diﬀerences were caused by the presence of an extra N-terminal methionine residue in the recombinant form. We also characterized the transitionstate structure by mutational Φ-value analysis, based on which the structure was localized in a region containing the C-helix and the Ca2+ -binding site of the protein. Simulation of the molecular dynamics of unfolding at high temperatures (398 and 498 K) yielded good reproduction of the experimental observations and gave atomically detailed descriptions of the unfolding behavior and the transition-state structure of folding/unfolding. The present series thus demonstrated the power of combination of experiments and simulations for studying the problems of protein folding.

2.1 Introduction One of the major objectives of the physical chemistry studies in water and biomolecules is to fully reproduce the experimentally observed folding/ unfolding behavior of a typical model protein in water by means of molecular simulation. However, the all-atom molecular dynamics (MD) simulation of the folding of a protein from the fully unfolded state to the native structure remains computationally intractable when the size of the target protein is larger than 100 residues and when simulation is carried out with explicit water molecules (i.e., when complete, contextualized simulation is attempted) [1–3]. Nevertheless, explicit-water all-atom MD simulations of unfolding of proteins at high temperatures are expected to yield important insights into the molecular mechanisms of protein folding [4–13]. It can frequently be assumed that even at high temperatures unfolding may basically represent a reversal of the folding transition [14–16]. Thus, unfolding MD simulations will typically

14

K. Kuwajima et al.

yield an atomically detailed picture of both the unfolding and folding behaviors of a protein. Recent experimental advances, including a hydrogenexchange NMR technique and site-directed mutagenesis (Φ-value analysis), have enabled us to obtain the structures of intermediates and the transition state of folding with the resolution of amino acid residues [17–20]; hence, such experimental data serve as diagnostic criteria for the validity of simulation results. Therefore, the combined use of experiments and simulations is considered extremely advantageous for gaining a comprehensive understanding of the molecular mechanisms of protein folding [13, 21]. This chapter describes experimental and simulation studies of the folding/ unfolding of goat α-lactalbumin. Experimentally, we have accomplished two goals. First, we have shown that recombinant α-lactalbumin, which has an additional methionine residue at the N-terminus, is remarkably less stable and unfolds faster than the authentic protein prepared from goat milk [22]. Second, we have characterized the molten globule intermediate and the transition state of folding using a hydrogen-exchange 2D NMR technique and mutational Φ-value analysis, respectively [23]. Here, we carried out unfolding MD simulations of recombinant and authentic α-lactalbumin at high temperatures (398 and 498 K), and the simulation results were then compared with the experimental observations mentioned above [24, 25]. We have thus demonstrated that MD simulations reliably reproduce the experimentally observed faster unfolding of the recombinant protein and the transition-state structure of the folding/unfolding reactions; furthermore, MD simulations provide very useful, atomically detailed descriptions for elucidating protein-folding mechanisms.

2.2 Goat α-Lactalbumin Goat α-lactalbumin is a globular milk protein of 123 amino acid residues with a molecular weight of 14,200 [26]. High-resolution X-ray crystallographic structures are available for both the authentic protein prepared from goat milk and the recombinant protein expressed in Escherichia coli (E. coli ) [22, 26]. The structures of the authentic and recombinant proteins are superimposable onto each other, although the recombinant protein has an extra methionine residue (Met0) at the N-terminus (Fig. 2.1) [22]. The structure of goat α-lactalbumin is composed of two subdomains, an α-domain formed by four α-helices (A-, B-, C-, and D-helices from the N-terminus) and the C-terminal 310 -helix, and a β-domain formed by a three-stranded β-sheet and a 310 -helix [22, 26]. α-Lactalbumin is a Ca2+ -binding protein, and the Ca2+ -binding site is located at the interface between the α- and β-domains, namely, at a loop from the C-terminal side of the 310 -helix involved in the β-domain to the N-terminal side of the C-helix [26–28]. Ca2+ binding to α-lactalbumin remarkably stabilizes the native structure of the protein [29–31]. α-Lactalbumin is a useful model protein for protein-folding studies. It shows the molten globule intermediate under mildly denaturing conditions

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

15

Fig. 2.1. The backbone structures of authentic and recombinant goat α-lactalbumin in the crystal form. The backbone of Mol A of the authentic protein, represented by a wire model, was superimposed on the backbone of the recombinant protein. Gray and black wires represent the authentic and recombinant proteins, respectively. The A. The PDB codes for the Cα -atom RMSD value between the two proteins was 0.54 ˚ authentic and recombinant proteins are 1HFY and 1HMK, respectively

at equilibrium, and the identity between the molten globule intermediate and a kinetic folding intermediate has been well established [32, 33]. In the kinetic refolding of the protein from the fully unfolded state, the molten globule intermediate forms ﬁrst within the dead time of a stopped-ﬂow experiment (a few milliseconds), and then the intermediate folds into the native state [34, 35]. A transition state located around the state of maximum free energy exists between the molten globule intermediate and the native state. Recently, the molten globule state of α-lactalbumin has been shown to possess antitumor activity when complexed with a fatty acid [36, 37], and hence the protein may possess secondary biological activity in addition to the primary activity of native α-lactalbumin, i.e., substrate speciﬁcity modiﬁer activity in a lactose synthase system [38, 39]. The molten globule of α-lactalbumin thus provides an example of the folding intermediate of a protein exhibiting a secondary biological activity.

2.3 Diﬀerences Between the Unfolding Behaviors of Authentic and Recombinant Goat α-Lactalbumin 2.3.1 Experimental Studies On the basis of the equilibrium unfolding curves of authentic and recombinant α-lactalbumin, the recombinant protein is remarkably less stable than the authentic one (Fig. 2.2) [22]. The transition midpoints were 3.2 and 2.7 M guanidine hydrochloride (GdnHCl) for the authentic and recombinant proteins, respectively, and the diﬀerence in the stabilization free energy, ΔΔGNU ,

16

K. Kuwajima et al.

Fig. 2.2. GdnHCl-induced unfolding transition curves for authentic and recombinant goat α-lactalbumin [22]. The ﬁlled diamonds indicate the unfolding transition of the methionine-free recombinant protein produced by CNBr cleavage. The unfolding was carried out at 25◦ C in the presence of 1 mM CaCl2 , 50 mM NaCl, and 50 mM sodium cacodylate (pH 7.0). The transitions were monitored by CD measurements at 222 nm (circles and diamonds) and at 270 nm (triangles), and the transition curves were normalized between the native and fully unfolded baselines. The black line with symbols represents the authentic form, and the gray line with symbols represents the recombinant form. Reproduced with permission from [22]

between the two proteins was as much as 1.1 kcal mol−1 at 3.2 M GdnHCl and 25◦ C [22]. This diﬀerence in stability between the two proteins was solely due to the presence of an extra methionine residue at the N-terminus of the recombinant protein, because the methionine-free recombinant protein produced by cyanogen bromide (CNBr) cleavage recovered its stability and produced an unfolding transition curve coincident with that of the authentic protein (Fig. 2.2) [22]; there is no methionine residue in the mature sequence of goat α-lactalbumin, and therefore, only the N-terminal methionine was removed by CNBr. Berliner and coworkers reported that the recombinant ΔE1 mutant, in which the Glu1 residue of the authentic sequence was genetically removed leaving an N-terminal methionine in its place, showed higher stability than the authentic protein [40, 41]. Therefore, the destabilization of recombinant α-lactalbumin was not due to the presence of the methionine at the N-terminus, but rather due to the extension of the N-terminus by the extra residue. Considering that the X-ray crystallographic structures of authentic and recombinant goat α-lactalbumin were essentially identical to each other, with the exception of the N-terminal region and a loop region between residues 105 and 110 (Fig. 2.1), the diﬀerence in stability between the two proteins was remarkable. We therefore studied the unfolding and refolding kinetics of authentic and recombinant goat α-lactalbumin, induced by GdnHCl concentration jumps,

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

17

using stopped-ﬂow circular dichroism (CD) and ﬂuorescence spectroscopy (Fig. 2.3) [22]. Although the refolding kinetics of the two proteins coincided with each other, the unfolding kinetics was ninefold faster in the case of the

Fig. 2.3. GdnHCl-induced (a) unfolding and (b) refolding kinetic progress curves of authentic and recombinant goat α-lactalbumin [22]. Unfolding was initiated by a concentration jump from 1.0 to 5.4 M, and the refolding process was initiated by a concentration jump from 5.5 to 0.5 M at 25◦ C in the presence of 1 mM CaCl2 , 50 mM NaCl, and 50 mM sodium cacodylate (pH 7.0); the refolding and unfolding kinetics were monitored by the measurement of CD ellipticity at 225 nm using stopped-ﬂow CD. The continuous line denotes the authentic protein, and the ﬁlled squares denote recombinant protein. (b) The inset shows the refolding progress curve within 2 s, and the same notations are used for the reaction curves. Theoretical kinetic progress curves are also shown in (a) and (b). Reproduced with permission from [22]

18

K. Kuwajima et al.

recombinant protein than the authentic protein at 5.4 M GdnHCl and 25◦ C. This indicates that the destabilization of the recombinant protein is primarily associated with the acceleration of the unfolding rate. The molecular mechanisms by which the extension of the N-terminus by the extra methionine residue destabilized recombinant α-lactalbumin remain unclear. Additional conformational entropy of the extra methionine residue in the unfolded state could account for the destabilization and unfolding-rate acceleration of the recombinant protein [22]. Ishikawa and coworkers reported the destabilization of recombinant bovine α-lactalbumin, similarly induced by the extra N-terminal methionine residue, and showed that the enthalpy change of thermal unfolding was the same for the authentic and recombinant proteins, indicating that the destabilization was caused by an entropic effect [42]. However, the destabilization by the extra methionine residue in the lysozyme homologous to α-lactalbumin was rather enthalpic and accompanied by a disruption of hydrogen-bond networks in the N-terminal region [43, 44]. 2.3.2 Simulation Studies We carried out all-atom MD unfolding simulations for the authentic and recombinant forms of goat α-lactalbumin at 398 and 498 K using the program package of MARBLE [24, 25, 45]. The simulations produced 10 trajectories of 5 ns for each form, resulting in 100 ns total at each temperature. Our simulations included explicit water molecules of the TIP3P model [46], and we used the CHARMM22 force ﬁeld to calculate the potential energy [47]. We applied a periodic boundary condition to the system, in which a protein molecule was immersed in water within a rectangular box that contained the protein molecule and 8,571 water molecules for the authentic protein or 8,787 water molecules for the recombinant protein [24]. We observed the initial stages of unfolding in the 398-K simulations and global unfolding in the 498-K simulations, both at atomic-level resolution. In this section, we primarily focus on the results of the 398-K simulations to explore the early stages of the unfolding transition. Unfolding Dynamics Observed by MD Simulations We monitored the unfolding trajectories obtained by the simulations in terms of three structural parameters: (1) the Cα -atom root-mean-square deviation (RMSD) from the native structure, (2) the fractional number of the native contacts (Q), and (3) the radius of gyration (Rg ) of the molecule, and the results thus obtained qualitatively agreed with the experimental observations shown above [24]. Although the kinetic unfolding curves monitored by the above parameters (RMSD, Q, and Rg ) ﬂuctuated greatly with the simulation time and trajectory, the averaged kinetic unfolding curves, obtained by averaging the parameters for the 10 trajectories, indicated that recombinant α-lactalbumin unfolded faster than the authentic protein [24].

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

19

To examine the structural changes along the unfolding trajectories in more detail, we investigated the Q values for the ﬁve core regions (CoreN-term , CoreABC , CoreAB , CoreC-term , and Coreβ ) (Fig. 2.4(a)), which were uniquely deﬁned from the native contact map of goat α-lactalbumin, as a function of simulation time [24]. As shown in Fig. 2.4(b), the CoreN-term that was composed of contacts between residues from 1 (the N-terminus) to 3 and those from 36 to 38 was disrupted within the ﬁrst few hundreds of picoseconds in the recombinant protein at 398 K, i.e., much faster than disruption of the authentic protein. On the other hand, simulations of the other core regions, particularly CoreC-term and Coreβ , did not exhibit any clear diﬀerences between the two proteins during the ﬁrst 5 ns (Fig. 2.4(c)–(f)). Simulations at higher temperatures also revealed that the disruption of CoreN-term allowed water molecules to penetrate into the hydrophobic core (CoreABC ), and this penetration by water triggered global unfolding of the protein molecule (see below) [25, 48]. Hydrogen-Bond Network To further elucidate the faster disruption of CoreN-term in the recombinant protein, we examined the hydrogen-bond network around the N-terminus of the proteins, and analyzed the disruption of these hydrogen bonds during the early stage of unfolding [24]. In authentic goat α-lactalbumin, the N-terminal ammonium group of Glu1 formed three hydrogen bonds with the side-chain oxygen atoms of Asp37, Thr38, and Gln39 (Fig. 2.5(a)). The presence of the extra Met0 in the recombinant protein removed the two hydrogen bonds in Glu1 with Thr38 and Gln39, but the Gln39 side chain formed an alternative hydrogen bond with the new N-terminus of Met0 (Fig. 2.5(b)). As a result, the net diﬀerence in the number of the N-terminal hydrogen bonds was –1 between the recombinant and authentic α-lactalbumin, and the distance between the A in the Oγ atom of Thr38 and the backbone amide N atom of Glu1 was 5 ˚ native recombinant structure. The faster disruption of CoreN-term in the recombinant protein was thus caused by the loss of the hydrogen bond formed by Thr38 with Glu1 and the weakening of the hydrogen bonds formed by Asp37 and Gln39 with the backbone N atom(s). In particular, the time course of the increase in the atomic distance between the Oγ of Thr38 and the backbone N of Glu1 coincided with the time course of the CoreN-term disruption, indicating that both events occurred in concert (Fig. 2.5) [24]. The importance of the hydrogen bond between the Oγ of Thr38 and the backbone N of Glu1 in the thermodynamic stability of goat α-lactalbumin was conﬁrmed experimentally using the T38A mutant, in which Thr38 was replaced by an Ala residue [24]. As expected, the T38A mutation had greater impact on the stability of the authentic form than on the stability of the recombinant form.

20

K. Kuwajima et al.

Fig. 2.4. (a) The ﬁve core regions, CoreN-term , CoreABC , CoreAB , CoreC-term , and Coreβ , deﬁned in terms of native contacts. The core regions are shown on the crystal structure of recombinant goat α-lactalbumin (PDB code: 1HMK) [24]. The extra methionine residue, denoted as M0 in the ﬁgure, at the N-terminus in the recombinant protein is shown in the CPK model. (b)–(f) shows the Q value calculated for each core region: (b) CoreN-term , (c) CoreABC , (d) CoreAB , (e) CoreC-term , and (f) Coreβ . The light-gray and dark-gray curves represent the averaged data for the authentic and recombinant proteins, respectively. Reproduced with permission from [24]

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

21

Fig. 2.5. Close views of interacting atoms in CoreN-term for (a) the authentic and (b) the recombinant proteins. The distance between N of the N-terminus and (c) Oδ of Asp37, (d) Oγ of Thr38, and (e) Nε of Gln39, for the authentic protein, and the distance between N of Glu1 and (c) Oδ of Asp37, and (d) Oγ of Thr38, and (e) the distance between N of Met0 and Nε of Gln39, for the recombinant protein, during simulation at 398 K, averaged for ten trajectories [24]. The light-gray and dark-gray curves represent those for authentic and recombinant goat α-lactalbumin, respectively. Reproduced with permission from [24]

Hinge-Bending Motions of Dynamic Domains To investigate the relationship between the unfolding behavior and the structural dynamics of goat α-lactalbumin, we carried out a principal component analysis of the MD trajectories for the equilibrium dynamics at 298 K and unfolding dynamics at 398 K, and characterized the principal modes as screw motions of dynamic domains according to the method of Hayward et al. [24,49] For both the authentic and recombinant proteins, we identiﬁed the same two dynamic domains, i.e., domain 1 formed by residues 1–35 and 101–120, and domain 2 formed by residues 36–100. The C-helix (residues 86–98) thus belonged to domain 2, and moved together with the Ca2+ -binding site (residues 79–88) and the β-domain (residues 36–85). We found that the hinge-bending motions between the two dynamic domains were characteristic of the dynamics of α-lactalbumin; more importantly, the screw axis of the interdomain motion passes through the protein interior from the C-terminal end of the C-helix to the N-terminus of the protein (Fig. 2.6) [24].

22

K. Kuwajima et al.

Fig. 2.6. The dynamic domains of goat α-lactalbumin, domain 1 (dark gray) and domain 2 (light gray), and the screw axis of the interdomain motion [24]. The C-helix is involved in domain 2 and moves together with the Ca2+ -binding site and the βdomain. Reproduced with permission from [24]

The location of the end of the interdomain screw axis at the N-terminus exerts a dramatic impact on the unfolding behavior of α-lactalbumin because the screw axis corresponds to the hinge axis of the hinge-bending motion [49]. Hinge-bending motion imposes the strongest of all such forces on the hinge axis, and this may lead to the faster disruption of CoreN-term in the recombinant protein, which has a weaker hydrogen-bond network around the N-terminus than the authentic protein. Finally, the conformational entropy eﬀect of the extra methionine residue of the recombinant protein should not be excluded from the present analysis [22]. Such an eﬀect must be present as long as the methionine assumes a ﬁxed structure in the native state, but this eﬀect is diﬃcult to measure by MD simulation [50]. 2.3.3 Conclusions (1) The unfolding behaviors of the authentic and recombinant forms of goat α-lactalbumin are remarkably diﬀerent, although both forms have an identical three-dimensional structure. The recombinant form was found to be 1.1 kcal mol−1 less stable than the authentic form, and the recombinant form unfolded at a ninefold faster rate than the authentic form. The destabilization and unfolding-rate acceleration were due to the presence of an extra methionine residue at the N-terminus in the recombinant protein. (2) We carried out two series of unfolding MD simulations, one for the authentic form and the other for the recombinant form of goat α-lactalbumin at 398 K. The unfolding simulations reasonably reproduced the experimentally observed diﬀerence between the proteins, i.e., the faster rate of unfolding of the recombinant protein. (3) The principal component analysis of the dynamics revealed the hingebending motions of the protein. One end of the screw axis of the motions

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

23

was located at the N-terminus, and this location of the screw axis and the weakening of the hydrogen-bond network in the N-terminal region were responsible for the faster unfolding of the recombinant protein.

2.4 Folding/Unfolding Pathways of Goat α-Lactalbumin 2.4.1 Experimental Studies The folding of α-lactalbumin is well represented by a framework model in which secondary structure units form ﬁrst, prior to the organization of speciﬁc tertiary interactions of amino acid side chains [32–35]. An early kinetic folding intermediate of the protein has characteristics of the molten globule state, which assumes the native-like secondary structure and the compact shape of the molecule without speciﬁc side-chain packing. The rate-limiting step of folding is the process from the molten globule intermediate to the fully native state of α-lactalbumin, and hence the transition state of folding is located between the molten globule and native states. Molten Globule Intermediate We studied the hydrogen exchange kinetics of the authentic and recombinant forms of goat α-lactalbumin in the molten globule state at pHobs 1.7 and 25◦ C using the 1 H–15 N HSQC spectra (Nakamura et al., unpublished); pHobs is the pH meter reading of a D2 O solution. At pH 2, the molten globule state is stable, and its identity with the kinetic folding intermediate has been well established [32–35]. The proteins were 15 N-labeled, and the 15 N-labeled authentic form was obtained by the CNBr cleavage of the 15 N-labled recombinant protein. We carried out hydrogen-exchange reactions of the two proteins in the molten globule state for a variety of exchange times in 90% D2 O, and then quenched the exchange by rapid refolding of the protein at pHobs 5.9 and 25◦ C in the presence of 1.0 mM CaCl2 . The cross-peaks of the HSQC spectra of the hydrogen-exchange-labeled and refolded proteins provided the exchange kinetics of the individual peptide amide protons, which were assigned in the HSQC spectra in the native state (Fig. 2.7). The protection factor, pi , for the amide proton of residue i is given by the ratio of the rate constants kint,i and kobs,i for the intrinsic chemical exchange and the respective observed exchange reactions for residue i as follows: pi =

kint,i . kobs,i

(2.1)

kint,i was determined by the amino acid sequence of the protein as reported by Bai et al. [51], and kobs,i was obtained from the hydrogen-exchange experiment.

24

K. Kuwajima et al.

Fig. 2.7. [15 N, 1 H]-HSQC spectrum of 15 N-labeled recombinant goat α-lactalbumin at pH 6.3 and 25◦ C in 95% H2 O/5% D2 O (Nakamura et al., unpublished). Peaks are labeled with their residue-speciﬁc assignments

The hydrogen-exchange protection proﬁle, given by pi values as a function of i, for the molten globule state of goat α-lactalbumin is shown in Fig. 2.8. The proﬁle was identical, within experimental error, between the authentic and recombinant forms, and only the C-helix was weakly protected with a protection factor of 10–20. This result is thus in contrast with those previously reported for intermediates of guinea pig and human α-lactalbumin [52,53] and several species of lysozyme, in which the A- and B-helices are more strongly protected (protection factor range of 5–500) [54–58]. Indeed, the present ﬁndings are more similar to those obtained for bovine α-lactalbumin, except for the smaller protection factor (pi = 5–7) of the C-helix in the molten globule form of the bovine than the goat protein [59]. On the basis of the ﬁndings of kinetic refolding studies of goat α-lactalbumin obtained by stopped-ﬂow CD measurements, the molten globule folding intermediate at neutral pH was found to possess the capacity to weakly bind to Ca2+ with a binding constant on the order of 103 M−1 (unpublished results), i.e., a value that is 103 –104 -fold smaller than the binding constant of the native protein [30, 60–62]. Therefore, the C-helix and the Ca2+ -binding site are weakly but signiﬁcantly organized in the molten globule state of goat α-lactalbumin. Transition State In mutational Φ-value analyses of the transition state of protein folding, sitedirected mutagenesis is used to introduce nondisruptive mutations at various

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

25

Fig. 2.8. Histograms showing the distribution of protection factors from amide hydrogen exchange for (a) the authentic form and (b) the recombinant form of goat α-lactalbumin in the MG-state at pHobs 1.7 and 25◦ C (Nakamura et al., unpublished)

amino acid residue sites of a protein studied in order to experimentally investigate the thermodynamic stability and kinetic folding and unfolding reactions of the wild-type and mutant proteins [19, 20, 63]. Thermodynamic analyses of the equilibrium unfolding transition curves of proteins provide information about changes in the stabilization Gibbs energy ΔΔG induced by mutation. Kinetic folding and unfolding experiments provide the folding and unfolding rate constants for wild-type and mutant proteins. On the basis of the obtained data, the Φ-value of the transition state is given by the following equation: Φ=1−

WT mutant /kunf ) RT ln(kunf , ΔΔG

(2.2)

where kunfWT and kunfmutant are unfolding rate constants for the wild-type and mutant proteins, respectively; Φ = 1 when the transition-state structure is fully native at the mutation site, whereas Φ = 0 when the transition-state structure is still fully unfolded. Because the unfolding kinetics is often simpler than the refolding kinetics, which may be aﬀected by the heterogeneity of the unfolded state or by the presence of folding intermediates, we typically use the unfolding rate constants to determine the Φ value. To avoid errors caused by long extrapolation along increasing or decreasing denaturant concentrations, we usually employ the ΔΔG, kunfWT , and kunfmutant values at the midpoint of the unfolding transition for the wild-type protein.

26

K. Kuwajima et al.

We introduced 17 mutations (V8A, L12A, V17A, T29I, L52A, I55V, W60A, D87N, I89V, V90A, K93A, I95V, L96A, Y103F, L105A, L110A, and W118F) into recombinant goat α-lactalbumin, and carried out mutational Φ-value analyses of the transition state of folding [23]. As shown by Vanhooren and coworkers, the W60A mutation was disruptive because of a big volume change; thus, for this particular mutation site we employed the Φ value of the W60F mutant reported by Vanhooren et al. in the following series [64, 65]. The results of the Φ-value analysis revealed that the mutants with mutations located in the A-helix (V8A, L12A), the B-helix (V27A, T29I), the C-helix (K93A, L96A), the C–D loop (Y103F), the D-helix (L105A, L110A), and the C-terminal 310 -helix (W118F) have low Φ values of less than 0.2. On the other hand, D87N, which is located at the Ca2+ -binding site, and W60F, which is involved in the β-domain, have relatively high Φ values of larger than 0.9, indicating that the tight packing of the side chains around these residues (Agp87 and Trp60) occurs in the transition state [23]. Another β-domain mutant (I55V) and three C-helix mutants (I89V, V90A, and I95V) were shown to have intermediate Φ values ranging from 0.4 to 0.7. The folding nucleus in the transition state of goat α-lactalbumin is thus not extensively distributed across the molecule, but instead is very localized within a region containing the Ca2+ -binding site as well as the interface between the C-helix and the β-domain (Fig. 2.9) [23]. The above ﬁndings, when taken together with the preceding results regarding the hydrogen-exchange protection proﬁle of the molten globule intermediate, demonstrated that the folding reaction of goat α-lactalbumin is a hierarchical and sequential process, in which folding is initiated in the region of the C-helix and the Ca2+ -binding site, and proceeds from there by organization of the structure around this region (the folding nucleus). 2.4.2 Simulation Studies The MD unfolding simulations at 498 K led to the global unfolding of both authentic and recombinant goat α-lactalbumin, but comparison of the

Fig. 2.9. The Φ-values ((a) experimental Φ-values, and (b) ΦMD obtained from MD trajectories) mapped onto the three-dimensional structure of goat α-lactalbumin [25]. Reproduced with permission from [25]

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

27

trajectories with those at 398 K revealed that the early stages of unfolding at 498 K were very similar to the unfolding trajectories at 398 K [25]. We therefore investigated the unfolding trajectories at 498 K, and the MD simulation results were compared with the experimentally observed kinetics of folding and unfolding at room temperature [25]. In particular, we identiﬁed the transition state of unfolding on the basis of the MD unfolding trajectories, and then compared that structure with the transition-state structure of folding/unfolding experimentally observed by the mutational Φ-value analysis [23]. To identify the transition state of unfolding based on the MD trajectories, however, we have to represent the protein structures, which appear along the unfolding trajectories, in a proper but coarse-grained manner (see below). We therefore developed a coarse-grained coordinate system (segmental Q-coordinates) in which the secondary structure segments can be regarded as folding units [25]. According to the framework model of protein folding, secondary-structure segments (α-helices and β-hairpins) act as the folding units that initially form at an early stage of kinetic folding, and the consolidation and assembly of the folding units are rate-limiting steps of the folding reaction [35, 66–68]. The experimental results for the kinetic folding of goat α-lactalbumin shown above supported the framework model of folding. Segmental Q-Coordinates To construct the segmental Q-coordinates, we divided the structure of the goat α-lactalbumin molecule into eight segments, segment 1 (the N-terminal region and the A-helix), segment 2 (the C-terminal side of the A-helix and the B-helix), segment 3 (a β-hairpin formed by the ﬁrst and second β-strands in the β-domain), segment 4 (the third β-strand and the subsequent loop), segment 5 (the Ca2+ -binding site that includes a 310 -helix and the N-terminal side of the C-helix), segment 6 (the C- and D-helices), segment 7 (the C-terminal 310 helix), and segment 8 (the C-terminal region) (Fig. 2.10) [25]. These segments were local contiguous segments along the primary sequence. Division into the eight segments was based on the contact map of the three-dimensional structure of goat α-lactalbumin, and most of the segments corresponded to secondary-structure segments, i.e., α-helices, or β-hairpins [25]. Each axis of the segmental Q-coordinates is given by the fractional native contact QS (i, j) between a pair of diﬀerent segments i and j when j > i, or within segment i when i = j: The number of native contacts between i and j (within i when i = j) in X QS (i, j) = , The total number of native contacts between i and j (within i when i = j) in N

(2.3)

where X and N denote a protein structure produced by the MD simulation and the native structure, respectively [25]. Although the theoretically possible

28

K. Kuwajima et al.

Fig. 2.10. (a) Contact map of the native structure observed in the equilibrium MD simulation at 298 K [25]. The color scheme corresponds to the eight local segments: 1 (red, residues 0–11), 2 (blue, 12–36), 3 (green, 37–54), 4 (yellow, 55–74), 5 (cyan, 75–88), 6 (orange, 89–106), 7 (gray, 107–119), and 8 (black, 120–123). Seventeen segment pairs (gray) were considered in the segmental Q-coordinates. (b) The crystallographic structure of recombinant goat α-lactalbumin shown in the respective colors representing the eight local segments [25]. (c)–(f) The structural characteristics of Clusters 1, 4, 5, and 9 represented in the respective colors used to depict the eight local segments [25]. The panels on the left show the superimposition of the structures randomly selected from each cluster. The panels on the right show a two-dimensional lattice representation of the 17 segmental Q-coordinates averaged within each cluster. Reproduced with permission from [25]

maximum number of dimensions of conformational hyperspace of the segmental Q-coordinates was 36 (= 8×9/2), there were only 17 coordinates, as trivial coordinates containing less than three native contacts were neglected (blank areas in Fig. 2.10(a)). The advantage of the use of the segmental Q-coordinate system becomes apparent when we compare the unfolding trajectories represented in the conformational hyperspace of the segmental Q-coordinates and those in the hyperspace of the Cartesian coordinates (Fig. 2.11) [25]. Because unfolded conformations are very widely distributed in the hyperspace of the Cartesian

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

29

Fig. 2.11. Two-dimensional representations of the structural ensemble observed in the unfolding trajectories, which were mapped onto the two largest principal components in the 17-dimensional segmental Q-coordinates (a) and in the hyperspace of the Cartesian coordinates (b) [25]. Reproduced with permission from [25]

coordinates, the unfolding trajectories depict a funnel-like shape. On the other hand, the unfolded conformations are all close to each other in the hyperspace of the segmental Q-coordinates, so that pathway-like unfolding trajectories are observed (Fig. 2.11(a)), from which the pathway, intermediates, and transition state of unfolding can be explored. It is of note that whether the protein folding/unfolding is described by a folding pathway or funnel may depend on the coordinate system used to represent the protein structure. Cluster Analysis, Unfolding Pathway, and Transition State By k-means cluster analysis with Euclidean distance in the segmental Q-coordinates, we divided the structure ensemble of the MD unfolding trajectories into nine clusters [25]. The clustering was performed using all data obtained for the authentic and recombinant proteins, and the clusters were numbered in the order of the distance from the native structure. Figure 2.10(c)–(f) shows protein structures in four representative clusters (Clusters 1, 4, 5, and 9), in which Cluster 1 is almost identical to the native structure with all of the 17 Q-coordinates close to unity, whereas Cluster 9, which lost 84% of its native contacts, represents the unfolded state. Twenty MD unfolding trajectories were obtained at 498 K, i.e., 10 for the authentic protein and the remaining 10 for the recombinant protein [25]. Each trajectory was characterized by ﬂows between diﬀerent clusters of the MD structure ensemble. Such trajectory ﬂows may thus represent the unfolding pathway. To investigate similarities and diﬀerences between the individual unfolding trajectories in terms of trajectory ﬂow (i.e., the unfolding pathway), we carried out multiple trajectory alignments analogous to multiple sequence alignments of biological sequences [25, 69]. As a result, we found that the 20 unfolding

30

K. Kuwajima et al.

Fig. 2.12. The trajectory ﬂows of the clades: (a) Clade 1, (b) Clade 2, and (c) Clade 5. Circles represent the nine clusters, Clusters 1–9 [25]. Each arrow represents the net frequency of the transition. A thicker arrow indicates a larger ﬂow. Reproduced with permission from [25]

trajectories could be classiﬁed into ﬁve groups; three of these (Clades 1, 2, and 5 ) each included at least ﬁve trajectories, and these major groups are shown in Fig. 2.12. Each of the ﬁve groups are referred to here as a “Clade” based on the analogy of a clade in a phylogenetic tree (cladogram) constructed by multiple sequence alignment. Clade 1 consists only of the trajectories of the authentic protein, and indicates a cooperative unfolding from Cluster 1 to Cluster 6 via Cluster 5. On the other hand, Clade 5 consists only of the trajectories of the recombinant protein, and indicates a noncooperative unfolding that reaches Cluster 5 via Clusters 2–4 and ultimately Cluster 6 or higher clusters. Clade 2 represents a mixture of trajectories of the authentic and recombinant proteins, and shows intermediate features between Clades 1 and 5. In all of the ﬁve clades, the unfolding pathway necessarily passes through Cluster 5, and hence Cluster 5 is the bottleneck of the unfolding transition (Fig. 2.12) [25]. This indicates that Cluster 5 may correspond to the transition state of unfolding. To validate the identity of Cluster 5 as the transition state, we estimated theoretical Φ values (ΦMD ) that were calculated from the MD trajectories. The ΦMD value was based on the fractional native contact of amino acid residues in the structures produced by MD simulations [5, 25]. The correlation coeﬃcient between ΦMD and the experimental Φ values given by equation (2.2) was highest around the center of Cluster 5, demonstrating that Cluster 5 represents the transition state of unfolding (Fig. 2.9). Hydration of Protein Interior During Unfolding To further characterize the structural changes of goat α-lactalbumin during unfolding, we examined the probability distributions of the following four structural parameters in each of the nine clusters of the structural ensemble of MD trajectories: (1) the fractional native contact (Q) of the entire molecule, (2) the RMSD of Cα atoms between a pair of structures that belong to the same cluster, (3) the solvent-accessible surface area (SASA) of hydrophobic side chains, and (4) the SASA of hydrophilic side chains [25].

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

31

Fig. 2.13. The probability distributions of four structural parameters calculated for the structures of each cluster [25]. (a) The fraction of the native tertiary contacts Q. (b) The RMSD value of Cα atoms between a pair of structures that belong to the same cluster. The SASA for (c) hydrophobic and (d) hydrophilic side chains. Reproduced with permission from [25]

As shown in Fig. 2.13, both the Q and RMSD values of the transition state (Cluster 5) are located between those of the native state (Cluster 1) and the unfolded state (Clusters 6–9). However, the RMSD distribution of the transition state remains native-like, and a sudden broadening occurs between Clusters 5 and 6 or after passing through the transition state. The SASA distribution of hydrophobic side chains shows a more characteristic behavior, and a large increase in the hydrophobic SASA occurs only after the protein passes through the transition state, while no signiﬁcant changes are observed in the hydrophilic SASA in any of the clusters. The above results thus suggest that extensive hydration of the hydrophobic interior of the protein occurs only after the protein passes through the transition state, and this hydration of the protein interior leads to the extensive unfolding (i.e., the increase in RMSD) of the protein molecule [25]. Provided that folding is the reverse of unfolding, an important rate-limiting step of protein folding may be the dehydration of hydrated hydrophobic groups to form a hydrophobic interior. The formation of partial native contacts (Q ≈ 0.5) accompanies this rate-limiting step of folding (Fig. 2.13(a)), and this partial structural organization occurs around the folding nucleus formed by the C-helix and Ca2+ -binding site in goat α-lactalbumin. Molecular simulations of other proteins or even small peptides are known to exhibit similar extensive dehydration of hydrated hydrophobic groups at the rate-limiting step of folding [4, 70–73], and hence this is probably a general mechanism of protein folding.

32

K. Kuwajima et al.

2.4.3 Conclusions (1) We experimentally characterized the molten globule state and the folding/unfolding transition state of goat α-lactalbumin using a hydrogenexchange 2D NMR technique and mutational Φ value analysis. The folding reaction occurs in a hierarchical manner, with the C-helix and Ca2+ binding site being weakly organized in the molten globule intermediate and the structure around the same region becoming further organized in the transition state. (2) We carried out unfolding MD simulations of goat α-lactalbumin at 498 K. The protein structure was represented in the segmental Q-coordinate, and cluster analyses and multiple-trajectory alignments were carried out to obtain the transition-state structure solely from the MD simulation. The structure obtained by this approach was very close to that obtained experimentally, and hence the results of the kinetic unfolding experiments were well reproduced by the simulations. (3) The analysis of the probability distributions of diﬀerent structural parameters in each cluster of the MD structural ensemble revealed that the hydration of most of the hydrophobic surface of the protein occurs after passage through the transition state of unfolding, and this hydration of the protein interior leads to the extensive unfolding of the protein molecule. Thus, the dehydration of hydrated hydrophobic groups, which enables the formation of a hydrophobic interior, may be an important rate-limiting step of protein folding.

2.5 Summary and Perspectives We studied the unfolding behavior and the folding/unfolding transition state of goat α-lactalbumin both experimentally and by MD simulation. The MD simulation results yielded good reproduction of experimentally observed differences in the unfolding behaviors of the authentic and recombinant proteins and also reliably reproduced the experimentally observed transitionstate structure, together with atomically detailed descriptions of the unfolding process [24, 25]. The present study thus demonstrates the power of the combined use of experimentation and simulation for investigating protein folding. In future studies, it will be necessary not only to combine experimental and simulation results but also to address more critical questions regarding the underlying mechanisms of protein folding. For goat α-lactalbumin, additional questions will need to be answered, e.g., why the region containing the C-helix and the Ca2+ -binding site acts as the folding nucleus and what determines the folding nucleus. To address issues of this sort, the combined results of experimental and simulation studies of the folding/unfolding of different proteins will be needed. Particularly intriguing in this regard would be a comparative study of goat α-lactalbumin and canine milk lysozyme. The

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat

33

latter protein is homologous to α-lactalbumin and has the same Ca2+ -binding site at the interface of the α- and β-domains [74]. Nevertheless, the folding nucleus of canine milk lysozyme diﬀers greatly from that of α-lactalbumin and is probably located at A- and B-helices distant from the Ca2+ -binding site [75]. Acknowledgments We would like to thank our former colleagues in the Department of Physics in the School of Science, University of Tokyo, including Tapan K. Chaudhuri, Kimiko Saeki, Munehito Arai, and Takao Yoda, all of whom assumed important roles in the experimental portion of this study. We are also grateful to Professor Motonori Ota (Nagoya University), who introduced the multiple trajectory alignment method in this study. This study was supported by a Grant-in-Aid for Scientiﬁc Research on Priority Areas (project numbers 15076201, 15076209, and 15076101).

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.

Y. Duan, P.A. Kollman, Science 282, 740 (1998) R. Day, V. Daggett, Adv. Protein Chem. 66, 373 (2003) H.A. Scheraga, M. Khalili, A. Liwo, Annu. Rev. Phys. Chem. 58, 57 (2007) A. Caﬂisch, M. Karplus, J. Mol. Biol. 252, 672 (1995) V. Daggett, A.J. Li, L.S. Itzhaki, D.E. Otzen, A.R. Fersht, J. Mol. Biol. 257, 430 (1996) J. Tsai, M. Levitt, D. Baker, J. Mol. Biol. 291(1), 215 (1999) U. Mayor, C.M. Johnson, V. Daggett, A.R. Fersht, Proc. Natl. Acad. Sci. U. S. A. 97(25), 13518 (2000) L.J. Smith, R.M. Jones, W.F. van Gunsteren, Proteins 58(2), 439 (2005) F. Ding, W. Guo, N.V. Dokholyan, E.I. Shakhnovich, J.E. Shea, J. Mol. Biol. 350(5), 1035 (2005) H. Lei, S.G. Dastidar, Y. Duan, J. Phys. Chem. B 110(43), 22001 (2006) N. Smolin, R. Winter, Biochim. Biophys. Acta 1764(3), 522 (2006) A. Das, C. Mukhopadhyay, J. Chem. Phys. 127(16), 165103 (2007) R.D. Schaeﬀer, A. Fersht, V. Daggett, Curr. Opin. Struct. Biol. 18(1), 4 (2008) V. Daggett, Chem. Rev. 106(5), 1898 (2006) R. Day, V. Daggett, J. Mol. Biol. 366(2), 677 (2007) M.E. McCully, D.A. Beck, V. Daggett, Biochemistry 47(27), 7079 (2008) H.J. Dyson, P.E. Wright, Annu. Rev. Phys. Chem. 47, 369 (1996) M.M. Krishna, L. Hoang, Y. Lin, S.W. Englander, Methods 34(1), 51 (2004) A. Matouschek, J.T. Kellis, L. Serrano, A.R. Fersht, Nature 340(6229), 122 (1989) A. Fersht, Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding (W.H. Freeman, New York, 1998) C.D. Snow, E.J. Sorin, Y.M. Rhee, V.S. Pande, Annu. Rev. Biophys. Biomol. Struct. 34, 43 (2005)

34

K. Kuwajima et al.

22. T.K. Chaudhuri, K. Horii, T. Yoda, M. Arai, S. Nagata, T.P. Terada, H. Uchiyama, T. Ikura, K. Tsumoto, H. Kataoka, M. Matsushima, K. Kuwajima, I. Kumagai, J. Mol. Biol. 285, 1179 (1999) (Erratum in: J Mol Biol. 336(3), 825 (2004)) 23. K. Saeki, M. Arai, T. Yoda, M. Nakao, K. Kuwajima, J. Mol. Biol. 341(2), 589 (2004) 24. T. Oroguchi, M. Ikeguchi, K. Saeki, K. Kamagata, Y. Sawano, M. Tanokura, A. Kidera, K. Kuwajima, J. Mol. Biol. 354(1), 164 (2005) 25. T. Oroguchi, M. Ikeguchi, M. Ota, K. Kuwajima, A. Kidera, J. Mol. Biol. 371(5), 1354 (2007) 26. A.C.W. Pike, K. Brew, K.R. Acharya, Structure 4, 691 (1996) 27. Y. Hiraoka, T. Segawa, K. Kuwajima, S. Sugai, N. Murai, Biochem. Biophys. Res. Commun. 95(3), 1098 (1980) 28. D.I. Stuart, K.R. Acharya, N.P. Walker, S.G. Smith, M. Lewis, D.C. Phillips, Nature 324(6092), 84 (1986) 29. M. Ikeguchi, K. Kuwajima, S. Sugai, J. Biochem. (Tokyo) 99(4), 1191 (1986) 30. T. Hendrix, Y.V. Griko, P.L. Privalov, Biophys. Chem. 84(1), 27 (2000) 31. A. Chedad, H. Van Dael, Proteins 57(2), 345 (2004) 32. K. Kuwajima, Proteins 6, 87 (1989) 33. K. Kuwajima, FASEB J. 10, 102 (1996) 34. M. Arai, K. Kuwajima, Fold. Des. 1(4), 275 (1996) 35. M. Arai, K. Kuwajima, Adv. Protein Chem. 53, 209 (2000) 36. M. Svensson, A. H˚ akansson, A.K. Mossberg, S. Linse, C. Svanborg, Proc. Natl. Acad. Sci. U. S. A. 97(8), 4221 (2000) 37. K.H. Mok, J. Pettersson, S. Orrenius, C. Svanborg, Biochem. Biophys. Res. Commun. 354(1), 1 (2007) 38. U. Brodbeck, W.L. Denton, N. Tanahashi, K.E. Ebner, J. Biol. Chem. 242(7), 1391 (1967) 39. B. Ramakrishnan, P.K. Qasba, J. Mol. Biol. 310(1), 205 (2001) 40. D.B. Veprintsev, M. Narayan, S.E. Permyakov, V.N. Uversky, C.L. Brooks, A.M. Cherskaya, E.A. Permyakov, L.J. Berliner, Proteins 37(1), 65 (1999) 41. S.E. Permyakov, G.I. Makhatadze, R. Owenius, V.N. Uversky, C.L. Brooks, E.A. Permyakov, L.J. Berliner, Protein Eng. Des. Sel. 18(9), 425 (2005) 42. N. Ishikawa, T. Chiba, L.T. Chen, A. Shimizu, M. Ikeguchi, S. Sugai, Protein Eng. 11, 333 (1998) 43. K. Takano, K. Tsuchimori, Y. Yamagata, K. Yutani, Eur. J. Biochem. 266(2), 675 (1999) 44. S. Goda, K. Takano, Y. Yamagata, Y. Katakura, K. Yutani, Protein Eng. 13(4), 299 (2000) 45. M. Ikeguchi, J. Comput. Chem. 25(4), 529 (2004) 46. W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey, M.L. Klein, J. Chem. Phys. 79(2), 926 (1983) 47. J. MacKerell AD, D. Bashford, M. Bellott, J. Dunbrack RL, J.D. Evanseck, M.J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F.T.K. Lau, C. Mattos, S. Michnick, T. Ngo, D.T. Nguyen, B. Prodhom, I.I.I. Reiher WE, B. Roux, M. Schlenkrich, Sm, J. Phys. Chem. B 102, 3586 (1998) 48. T. Yoda, M. Saito, M. Arai, K. Horii, K. Tsumoto, M. Matsushima, I. Kumagai, K. Kuwajima, Proteins 42, 49 (2001)

2 Experimental and Simulation Studies of the Folding/Unfolding of Goat 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75.

35

S. Hayward, A. Kitao, H.J. Berendsen, Proteins 27(3), 425 (1997) H. Meirovitch, Curr. Opin. Struct. Biol. 17(2), 181 (2007) Y. Bai, J.S. Milne, L. Mayne, S.W. Englander, Proteins 17, 75 (1993) C.L. Chyan, C. Wormald, C.M. Dobson, P.A. Evans, J. Baum, Biochemistry 32, 5681 (1993) B.A. Schulman, C. Redﬁeld, Z.Y. Peng, C.M. Dobson, P.S. Kim, J. Mol. Biol. 253, 651 (1995) S.E. Radford, C.M. Dobson, P.A. Evans, Nature 358, 302 (1992) S.D. Hooke, S.E. Radford, C.M. Dobson, Biochemistry 33, 5867 (1994) L.A. Morozova-Roche, C.C. Arico-Muendel, D.T. Haynie, V.I. Emelyanenko, H. Van Dael, C.M. Dobson, J. Mol. Biol. 268, 903 (1997) L.A. Morozova-Roche, J.A. Jones, W. Noppe, C.M. Dobson, J. Mol. Biol. 289, 1055 (1999) Y. Kobashigawa, M. Demura, T. Koshiba, Y. Kumaki, K. Kuwajima, K. Nitta, Proteins 40, 579 (2000) V. Forge, R.T. Wijesinha, J. Balbach, K. Brew, C.V. Robinson, C. Redﬁeld, C.M. Dobson, J. Mol. Biol. 288(4), 673 (1999) K. Kuwajima, M. Mitani, S. Sugai, J. Mol. Biol. 206(3), 547 (1989) K. Nitta, Methods Mol Biol 172, 211 (2002) A. Vanhooren, K. Vanhee, K. Noyelle, Z. Majer, M. Joniau, I. Hanssens, Biophys. J. 82, 407 (2002) A.R. Fersht, A. Matouschek, L. Serrano, J. Mol. Biol. 224(3), 771 (1992) A. Vanhooren, A. Chedad, V. Farkas, Z. Majer, M. Joniau, H. Van Dael, I. Hanssens, Proteins 60(1), 118 (2005) A. Chedad, H. Van Dael, A. Vanhooren, I. Hanssens, Biochemistry 44(46), 15129 (2005) R.L. Baldwin, G.D. Rose, Trends Biochem. Sci. 24, 77 (1999) B. N¨ olting, K. Andert, Proteins 41(3), 288 (2000) S. Nishiguchi, Y. Goto, S. Takahashi, J. Mol. Biol. 373(2), 491 (2007) M. Ota, M. Ikeguchi, A. Kidera, Proc. Natl. Acad. Sci. U. S. A. 101(51), 17658 (2004) M.S. Cheung, A.E. Garcia, J.N. Onuchic, Proc. Natl. Acad. Sci. U. S. A. 99(2), 685 (2002) W. Guo, S. Lampoudi, J.E. Shea, Biophys. J. 85(1), 61 (2003) Y.M. Rhee, E.J. Sorin, G. Jayachandran, E. Lindahl, V.S. Pande, Proc. Natl. Acad. Sci. U. S. A. 101(17), 6456 (2004) J. Juraszek, P.G. Bolhuis, Proc. Natl. Acad. Sci. U. S. A. 103(43), 15859 (2006) T. Koshiba, M. Yao, Y. Kobashigawa, M. Demura, A. Nakagawa, I. Tanaka, K. Kuwajima, K. Nitta, Biochemistry 39(12), 3248 (2000) H. Nakatani, K. Maki, K. Saeki, T. Aizawa, M. Demura, K. Kawano, S. Tomoda, K. Kuwajima, Biochemistry 46(17), 5238 (2007)

3 Transition in the Higher-order Structure of DNA in Aqueous Solution T. Sakaue and K. Yoshikawa

Abstract. Recent progress in single-chain observation techniques is revealing the fascinating world of individual long DNA molecules in higher-order structures. Examples include a large discontinuous folding transition between disordered coil and ordered compact states, the phenomenon of intrachain segregation, in which folded and unfolded parts coexist along the chain, and the multi-stability between diﬀerent “phases,” which implies the importance of dynamic degrees of freedom in the system. Although these behaviors are apparently much more complex than naively expected from conventional knowledge, the essential physics can be depicted from a simple polymer model with appropriate degrees of coarse-graining. The semiﬂexibility, that is, the local stiﬀness, of the chain and the electrostatic properties together with the eﬀects associated with ﬁnite-chain length are shown to be crucial, which dictates the large-scale behaviors of long DNA chains in the higher-order level. In the regulation of the genetic activity, living cells may utilize physico-chemical properties inherent in genomic DNA molecules, which are highly charged, locally stiﬀ, and very long.

3.1 Introduction Thanks to the remarkable progress in the molecular biology during the past quarter century, we have accumulated a great deal of knowledge on the molecular processes taking place in living cells [1]. Here the underlying technique is the transgenic. For instance, one can knock out a particular gene, and the consequences of this action can be investigated by comparison with the wild type. Based on such experimental methodology, the correlation between a certain function and a speciﬁc protein is revealed. Indeed, a large number of such speciﬁc proteins have been identiﬁed, reﬂecting the complexity of functions in cells. The question then arises as to how the cell organizes these speciﬁc events to create a spatio-temporal order to maintain its life. Cells have hierarchical dynamic structure from the nanometer to the micrometer scale. Therefore, to answer this question, it is necessary to explore the mesoscopic level of description, given the molecular knowledge in the nanometer scale.

38

T. Sakaue and K. Yoshikawa

An important example is seen in the mechanism of the gene expression regulation, which is one of the fundamental problems in biology. Despite the fact that all the cells are, in principal, equipped with the same DNA molecules as the genetic code, they yet exhibit diﬀerent phenotypes robustly, depending on the cell type. These so-called epigenetic phenomena are sustained through cell divisions. How does each cell diﬀerentiate spontaneously? In addition, how are the levels of expression in speciﬁc cells self-regulated? Various speciﬁc proteins have been identiﬁed as the regulatory factors for particular characteristics, and mathematical models of the molecular network with multiple stable attractors and feedback loops have been actively investigated to clarify the underlying mechanism. Despite extensive eﬀorts in this direction, however, a comprehensive view remains yet elusive. Recent elaborate experiments appear to pose severe questions on the current framework by revealing its weakness to large ﬂuctuations inherent in the cell scale [2–4]. Here, we would like to gaze into the problem from a diﬀerent viewpoint. Genetic information is stored in the chain-like molecules known as DNA. One of the striking properties of genomic DNA is its extremely long length L compared with the molecular thickness a = 2 nm. To better grasp this property, let us assume that the DNA is a rope of radius ∼1 cm. The total length of the human DNA inside a cell measures L ∼ 1 m, which means that the length of the rope would be ∼107 m, comparable with the diameter of the earth. Why is DNA so long? It is tempting to examine the intrinsic properties of such long polymers and ask the possible implications as a genetic material. In the present chapter, we review recent progress in the study of the higherorder structures of long DNA molecules. In Sect. 3.2, the physico-chemical properties of the DNA folding transition in aqueous solution are surveyed. In particular, the ﬂexibility and rich potentiality in the higher-order structure of long DNA chains are investigated by single-chain observation. In Sect. 3.3, we analyze these observed phenomena from the viewpoint of the statistical physics of long, semiﬂexible polymers. Here, coarse-grained phenomenological arguments and computer simulations with simple modelings are demonstrated to be powerful tools for revealing the fundamental features of DNA. We also discuss the recent attempt to reconstitute the chromatin-like structure and summarize the possible biological importance and perspectives.

3.2 Long DNA Molecules in Aqueous Solution 3.2.1 Primary, Secondary, and Higher-order Structures The genetic information of living organisms is coded in DNA in the form of base pair sequences. There are four types of nucleotides, which are linked to a polynucleotide with a sugar-phosphate backbone. The arrangement of nucleotides along the one-dimensional chain is called the primary structure of DNA, which directly encodes the primary structure of proteins by means of

3 Transition in the Higher-order Structure of DNA

39

3.4 nm

phosphate group

2 nm 0.1 μm

sugar

base(A, G, C, T)

Fig. 3.1. Hierarchy in DNA molecules

the genetic code. Usually, the complementary pairs of nucleotides are connected via hydrogen bonds and two polynucleotide chains are wound around each other to form a double helix. This double helical structure is called the secondary structure of DNA and is regarded as a fundamental unit for the spatial organization of the long DNA chains in larger length scales, that is, higher-order structures (Fig. 3.1). Because of the rigid double helix structure (and the electrostatic repulsion between phosphate groups), the DNA chain is locally stiﬀ, with a conformation that is almost a straight line with small thermal ﬂuctuations. Quantitatively, this leads to a large characteristic decay length of the orientation correlation known as the persistent length lp 50 nm in usual aqueous conditions, which is much larger than the molecular thickness of the DNA chain a = 2 nm. It should be stressed that while the secondary structure is determined by the local interactions, that is, aﬀected by segments located in the proximity along the chain, the higher-order structure is governed by the global inﬂuence created by the entire chain. Therefore, the structural transition in the higher-order level is essentially diﬀerent from the helix-coil transition that occurs on the secondary structure level.1 DNA molecules of biological origin are extremely long, having contour lengths of L. It is expected that the large-scale behaviors of long DNA molecules do not depend strongly on the molecular details such as the base pair sequence. The following subsections describe the phenomenology of the higher-order structural transitions in long 1

The coupling of these two transitions on diﬀerent scales is possible, which may merit future investigations. Note that helices are often adopted motifs in the secondary structure level in biopolymers, and investigating its impact on the higher-order structures is an important theme [5].

40

T. Sakaue and K. Yoshikawa

DNA molecules. Then, in Sect. 3.3, we demonstrate that the essential features are indeed described by a small number of material properties, such as lp , L, and the environmental parameters. 3.2.2 DNA Condensation When dissolved in water, a long DNA molecule takes a disperse random coil conformation. However, the DNA molecules found in living organisms look very diﬀerent. They are, in general, tightly packed inside a limited space. For instance, T4 phage DNA with a contour length of 57 μm (166 kbp) is packed inside a virus capsid of linear dimension ∼100 nm. Full length of the genomic DNA of Escherichia coli is as long as ∼1.4 mm, yet packed in a nucleus region in the order of ∼μm. Moreover, the random coil and the compactly packed states should be regarded as diﬀerent “phases.” When we add a suﬃcient quantity of polyamines to the dilute DNA solution, the DNA molecules aggregate and may even precipitate from the solution. As observed using electron microscope, the DNA aggregates often take an ordered toroidal morphology reminiscent of interphage DNA. This phenomenon is called DNA condensation [6–8]. Not only polyamines but also other multivalent cations, cationic surfactants, water soluble polymer, alcohol, etc. are capable of inducing condensation. These agents are collectively referred to as condensing agents. These observations have given rise to a number of interesting questions. In particular, the following two questions have attracted considerable attention. (1) The DNA condensation phenomenon is governed primarily by electrostatic interactions. Then, what is the origin of the attractive force between highly charged DNA segments [9]? (2) From a very dilute solution of long DNA molecules, the collapsing on the single chain level would occur. Then, given some eﬀective attraction, how can we describe the phenomenon of the folding of a long DNA into the compact ordered state? We shall be mainly concerned with the second question, but it should be kept in mind that these two are not completely separable and the nature of the eﬀective interactions may aﬀect the transition manner in some cases. Note that it is more common to observe multiple molecular condensates in conventional techniques such as total intensity and dynamic laser light scattering, which are not well suited to such a dilute solution. The term “condensation” was intended to make the distinction with the usual aggregation or precipitation, which indicate the situation in which the aggregate is of ﬁnite size and orderly morphology [6, 7]. 3.2.3 Looking at Single DNA Molecules As stated previously, genomic DNA molecules are generally very long and exhibit a large ﬂexibility in the micrometer scale. The behaviors of single DNA are, thus, described statistically, the understanding of which is highly

3 Transition in the Higher-order Structure of DNA

41

Fig. 3.2. Diﬀerent scenarios in the folding transition of long polymers. (Top) Gradual shrink, that is, continuous transition, (middle) all-or-none discontinuous transition in the level of single chains, and (bottom) multiple-step transition through intrachain segregations. Note that due to the coexistence region characteristic to the ﬁnite-size system, all the cases look similar to the continuous transition in the macroscopic measurement

required for various purposes in biological and material sciences. As noted earlier, conventional experiments measure ensemble averaged quantities, so that the ambiguity associated with multimolecular events in the condensation is unavoidable. However, it is of critical importance to recognize the hierarchy involved in the system under consideration (Fig. 3.2). In the dilute solution, there are a large number of long DNA chains, each of which should be regarded as a statistical subsystem. Reﬂecting the ﬁniteness of the subsystem, the unique characteristics at the single chain level may be smoothed-out by ensemble averaging. A clear picture in the single DNA level has become attainable through the use of the ﬂuorescence microscopy [10–12]. The direct observation of single DNA molecules has revealed basic characteristics inherent in the folding of long DNA molecules. Among others, the folding accompanies a marked discreteness, that is, the ﬁrst-order transition from the swollen coil to the compactly folded state (cf., Fig. 3.2(middle)). In Fig. 3.3, the dependence of the long-axis length of T4 DNA on the concentration of trivalent cation spermidine is plotted. Here, individual DNA molecules are folded in an all-or-none fashion and there is a certain range of coexistence, in which both the swollen coil and the compactly folded states are observed. The same trend has been reported for various cases with diﬀerent condensing agents.

42

T. Sakaue and K. Yoshikawa

Long axis length(mm)

5

(b)

4 3 2

(a) 5mm

1 0 100

101

102

103

104

CSPR(mM) Fig. 3.3. Folding transition of T4 phage DNA induced by the addition of the trivalent cation spermidine. The abscissa and ordinate axes are the spermidine concentration and long axis length of DNA measured by ﬂuorescence microscopic observation (see [11] for more details)

Recent progress in experiments has also revealed fascinating phenomena and rich scenarios of the folding transition. Most noteworthy, the phenomenon of intrachain segregation has been shown to be possible in long DNA molecules [13–19] (cf. Fig. 3.2(bottom)). Careful observation of individual DNA molecules around the region of the folding transition has revealed that such partially folded states appear in long DNA molecules under various situations. Long DNA chains can take not only coil and completely folded states, but also intrachain segregated states with various morphologies as higher-order structures, which can be controlled by suitable environmental conditions. What is the underlying mechanism behind such rich behaviors? We shall proceed to the theoretical description from the viewpoint of the statistical physics of macromolecules. We start from the classical theory of the coil-globule transition, and then the recent developments and attempts inspired by the single chain observation are also discussed.

3.3 Statistical Physics of Folding of a Long Polymer 3.3.1 Some Basis In this section, we review the statistical mechanical approach to the problem of the folding of long polymer chains. From the standpoint of physics, considerable eﬀorts have been made to extract simple and universal laws of biopolymers’ behavior regardless of their complexity and diversity. This leads

3 Transition in the Higher-order Structure of DNA

43

to the development of the theory of coil-globule transition [20, 21]. The coilglobule transition, if necessary, with appropriate modiﬁcations can be used to understand many features of real biopolymers. However, it is also obvious that it is insuﬃcient, and there still remain some gaps between our understanding and the transition behavior of real biopolymers. As possible picks for these gaps, we may quote the heterogeneity of the monomer sequence, the eﬀect of chain stiﬀness, and electrostatics, all of which are not considered in the ideal version of coil-globule transition. Since DNA molecules are locally stiﬀ, strongly (negatively) charged, and approximately treated as a homopolymer with the appropriate coarse-graining, it is expected that many of the conformational behaviors of DNA is described by the relatively simple homopolymer model with the eﬀect of chain stiﬀness and electrostatics. In particular, it is expected to be a reasonable model to study the conformational transition of long DNA molecules at higher level. Further, we shall see the importance of the proper degree of coarse-graining to capture the diversity and universality behind the phenomena. Let us start with some basis and deﬁnitions. The basic feature of the polymer molecules is connectivity. A linear polymer chain, that is, no branching, with the contour length L = N l can be described as a sequence of N segments of size l. The number N is proportional to the molecular weight and the length l is called the Kuhn segment length.2 For various phenomena, including the folding transition, it is important to distinguish l with the monomer size a, which corresponds to the thickness of the chain. The ratio l/a is a measure of the local chain stiﬀness. The smallness of the value on the order l/a 1 means that the directional memory along the chain is lost at the monomer scale, and such polymers are referred to as ﬂexible polymers. On the other hand, a large value of l/a 1 indicates that the chain is rigid and resists bending at the scale of the Kuhn length, while manifesting ﬂexibility at larger scales due to the entropic elasticity. Polymers with such a hierarchical property are referred to as semiﬂexible polymers. One may also deﬁne stiﬀ polymers, in which the Kuhn length is comparable to or exceeds the chain length l ≥ L. Examples of stiﬀ polymers include actin ﬁlaments in cells (l 35 μm and L 0.5–1 μm) and fragment DNA molecules with ∼100 bp. On the other hand, long DNA molecules are typical examples of semiﬂexible polymers. 3.3.2 Continuous Transition in Flexible Polymers: Coil-Globule Transition A basic characteristic of a single polymer is its spatial dimensions, such as the radius of gyration. The average size of the ideal identical to the chain is 1/2 2 lN (the bracket mean square displacement of the random walker Rid 2

Note that the Kuhn length is comparable to the persistence length lp , which is an alternative measure of the chain stiﬀness (see Sect. 3.2). For DNA (more generally, chains with worm-like elasticity), l = 2lp .

44

T. Sakaue and K. Yoshikawa

indicates ensemble averaging). The conformation of the polymer corresponds to the trajectory of the random walker, called a random coil, which is the origin of the entropic elasticity of the polymeric materials. In reality, this conformation would be modiﬁed depending on the compatibility with the solvent. When the compatibility is high, the solvent is called a good solvent, and the long polymer chain is more swollen due to the repulsive interaction between segments (excluded volume eﬀect). In the opposite case, called the poor solvent regime, however, the polymer is collapsed into a compact globule state to minimize the contact with the solvent. At the simplest level, this transformation, that is, coil-globule transition driven by the change in the solvent quality can be analyzed by the following free energy equation [20, 21]: F ∼ α2 + α−2 + xα−3 + yα−6 , T

(3.1)

where the swelling ratio α is the ratio of the polymer size R to the ideal chain 2 . The parameters x = BN 1/2 /l3 and y = C/l6 depend size α2 ≡ R2 /Rid on the second (B) and third (C) virial coeﬃcients, respectively, and T is the bath temperature (the Boltzmann constant is implicit throughout this chapter). The ﬁrst two terms arise from the eﬀect of the conformational entropy and the remaining two terms represent the interactions between segments, where the segment density is assumed not to be very high, so that the virial expansion (up to triple interactions) would be valid. In usual systems (such as ﬂexible chains), changes in the solvent quality are reﬂected in the second virial coeﬃcient, where B > 0 (B < 0) corresponds to a good (poor) solvent and the condition with B = 0 is called the θ point. If the solvent quality is controlled by the temperature,3 then one can write B al2 τ around the θ temperature, with the reduced temperature τ = (θ − T )/θ. The equilibrium swelling ratio is obtained via the minimization of (3.1): α5 − α = x + yα−3 .

(3.2)

This framework is most appropriate for the coil-globule transition in ﬂexible polymers with y 1.4 In Fig. 3.4, we show how the coil-globule transition proceeds with the temperature change for a ﬂexible polymer with various lengths. With decreasing temperature, the chain size shrinks gradually, and at some point, becomes equal to the ideal chain size (α = 1) due to the cancellation of the attractive binary interactions and repulsive higher-order (in this case, represented by C) interactions, which leads to the deﬁnition of the apparent transition temperature Ttr . It is seen that Ttr lies slightly below the θ 3

4

This simple case suﬃces to demonstrate the basic feature in the more general situation, in which the solvent quality can also be controlled by changing the solution composition. The calculation of C for the anisotropic molecule leads to C ∼ a3 l3 , thus y ∼ (a/l)3 .

3 Transition in the Higher-order Structure of DNA

45

2.0

α 1.0

0.0 -1.0

0.0

1.0

τ Fig. 3.4. Coil-globule transition in a ﬂexible polymer y = 1 with various lengths. The solid, long-dashed, short-dashed, and dotted curves correspond to the chain length N = 104 , 103 , 102 , and 10, respectively. The horizontal and vertical dotted lines represent α = 1 and τ = 0, respectively

temperature. By substituting α = 1 in (3.2), the width of the transition region is obtained as (θ − Ttr )/θ = N −1/2 , that is, the sharpness of the transition increases with the chain length. A sophisticated mean-ﬁeld theory predicts that this transition becomes a second-order transition in the limit of the inﬁnite chain length [20]. In addition, an analogy with the critical phenomena suggests that the coil-globule transition point corresponds to the tri-critical point [21]. These results claim that the global feature does not depend on the molecular details and highlights the universality in the coil-globule transition. The above analysis implies that the coil-globule transition is essentially a gas–liquid transition within a single chain. Unlike usual molecular gases, the translational entropy is absent due to the chain connectivity, and instead, the conformational entropy shows up. The collapsed state is a spherical droplet, that is, globule, to minimize the surface area, the size of which is self-adjusted to satisfy the mechanical balance between the inside and the outside of the globule. 3.3.3 Discontinuous Transition in Semiﬂexible Polymers It is known that the coil-globule transition in ﬂexible polymers is well explained by the theory of the type discussed [22]. Note that the chain length and the solvent quality come into the theory in the following combined form x = BN 1/2 /l3 , which is the only dimensionless parameter governing the transition. The presence of the master curve (see Fig. 3.5 below) implies that the phase behavior of the thermodynamic limit with N → ∞ is readily discussed from the measurement of shorter chains via ﬁnite-size scaling. What about semiﬂexible polymers? It is, in principle, possible to include the eﬀect of the chain stiﬀness through the parameter y in (3.1). As shown

46

T. Sakaue and K. Yoshikawa

a 1.0

0.5

0.0

-1.0

0.0

x

Fig. 3.5. Plots of α as a function of x for various values of y from (3.2). The solid, long-dashed, short-dashed, and dotted curves correspond to the parameters y = 1, 0.1, 1/60, and 0.005, respectively

in Fig. 3.5, the dependence of the swelling ratio α on x becomes sharper for larger values of y (stiﬀer chains) and develops a metastable loop beyond the critical value of ycri = 1/60, which is reminiscent of the van der Waals theory for the gas–liquid transition. Although this feature seems to have an interesting connection with the large discontinuous transition observed in the folding of long DNA molecules, it might be applicable only to an ideal situation with asymptotically long chains. In most practical cases, nontrivial features associated with the ﬁnite chain length eﬀect show up. Moreover, the anisotropic segments have a capability to exhibit the orientational ordering in dense states [20, 23], which implies that the description based on a sole order parameter α, that is, the segment density, becomes inadequate. The fact that the DNA chain with a rather wide range of length forms a compact toroid, the size of which is comparable to the Kuhn length in many situations [6, 7], indicates that the coarse-graining over the Kuhn length scale may be insufﬁcient. These features make the folding transition in semiﬂexible polymers much more exotic compared with a simple coil-globule transition in ﬂexible polymers. Equilibrium Aspects Computer simulation is a powerful method for studying the folding transition of semiﬂexible polymers, in which both intersegment and larger scale degrees of freedom can be treated reliably [24–27]. A suitable model is a sequence of spherical beads connected by bonds, in which the stiﬀness is controlled by the bending potential as a function of the angle between adjacent bonds. The solvent quality is tuned by the strength of the short-ranged attractive interaction between beads. An example of the result from Monte

3 Transition in the Higher-order Structure of DNA

47

Fig. 3.6. Dependences of the chain size (gyration radius) on the inverse temperature calculated through Monte Carlo simulations. (Top) A semiﬂexible chain with contour length L/a = 512 and Kuhn length l/a 20, and (bottom) a ﬂexible chain (l/a 2) with the same contour length. The error bars represent the standard deviations. The insets show snapshots of (a) coil states and (b) folded states

Carlo simulation is shown in Fig. 3.6(top), in which the gyration radius of the chain with L/a = 500 and l/a 20 is plotted as a function of the inverse temperature /T . The chain size is almost unaﬀected by the solvent quality until the threshold point, at which the chain is discontinuously folded into the compact state. There is a narrow but ﬁnite region of coexistence, in which both the coil and the compact states are observed. The compact state is no longer a spherical globule, but has a toroidal morphology reminiscent of the typical folded product of DNA chains. Neighboring segments inside the toroid exhibit a high orientational ordering, which indicates the folding of semiﬂexible polymers as a disorder–order transition. All these features resemble a typical trend in the

48

T. Sakaue and K. Yoshikawa

folding of long DNA molecules revealed by the single chain observation, and this strongly indicates that the semiﬂexibility is one of the crucial factors. For comparison, we also show the result from the same Monte Carlo calculation for a ﬂexible chain (L/a = 500 and l/a 2) in Fig. 3.6(bottom). With the decrease in the solvent quality, the chain gradually shrinks into the globule state through the θ point, in accordance with the classical scenario of the coil-globule transition (Sect. 3.3.2). There are two factors identiﬁed for controlling the torus morphology and its size in the poor solvent condition. One is the surface energy, which tends to reduce the surface area, and the other is the bending energy, which prefers straighter conformations. In a compact state, these two factors compete, leading to the torus as the optimum compromise [23, 28–32]. Let us discuss the optimum size of the torus. The torus is characterized by two radii of curvature: the average radius R and the thickness r of the torus (Fig. 3.7). The relevant energy consists of the surface and bending energies: U γS + κ

L , R2

(3.3)

where γ( /a2 ) is the surface tension, S = 4π 2 rR is the surface area of the torus, and κ = T l/2 is the bending modulus.5 To discuss the optimum shape of the torus, let us assume that the torus is made up of the dense packing of the segments with parallel alignment. Then, the volume 2π 2 r2 R = πa2 L/4 does not depend on the torus shape and one of the variables (r or R) is deleted. By minimizing (3.3) with respect to the remaining variable, the optimum size of the torus is deduced as (Fig. 3.7) R

102

R, r / a

R (charged)

101

R (neutral) r (neutral)

r

100

r (charged)

103

L/a

104

Fig. 3.7. (Left) Schematic image of a torus. (Right) Double-logarithmic plot of the torus size, average radius R, and thickness r vs. chain length L (3.4) with parameters γa2 /T = 4 and l/a = 15. Also shown are the results for a charged semiﬂexible chain (cf. Sect. 3.3.4 and [32] for more details) 5

While it may appear that the high curvature near the center hole of the torus would lead to a much higher bending energy, the trace of the chain segment is not necessarily a circle. Rather, the chain can reduce the bending energy by distributing the curvature more uniformly [33].

3 Transition in the Higher-order Structure of DNA

49

Fig. 3.8. Typical snapshots (top and side views) of folded semiﬂexible polymers with l/a 20 from Monte Carlo simulations. The chain lengths are (a) L/a = 500, (b) L/a = 1,000, and (c) L/a = 2,000. The dependence of the radius of gyration on the chain length obeys the scaling law Rg ∼ Lν with the exponent ν = 0.197 ± 0.019 (see [32] for more details)

1/5 γa6 L2 r∼ , κ

2 1/5 κ L . R∼ γ 2 a2

(3.4)

The mean radius of the torus is rather insensitive to the chain length. Consequently, as the chain length increases, the thickness of torus r increases more rapidly than the mean radius R. Beyond the critical length L∗ (obtained as R(L∗ ) = r(L∗ )), a hole is not formed, thus a fat disk would be formed. These predictions are in reasonable agreement with the results obtained from Monte Carlo simulations (Fig. 3.8). Folding Kinetics It is interesting to ask the kinetic aspect of the folding. How a long ﬂuctuating coil folds into an ordered torus structure upon the decrease of the solvent quality? The discontinuous nature of the transition implies that the process of the folding would be similar to the crystallization from a supersaturated solution, in which the nucleation and growth are typical kinetic processes. Figure 3.9 shows a typical example of the folding process obtained by Brownian dynamic simulations. A semiﬂexible chain is initially in a good solvent condition (leftmost snapshot). After the quench, the chain keeps a coil state for a while. During this metastable period, pairs of monomeric units stick to each other for a short time owing to the eﬀective attractive interactions in the course of thermal ﬂuctuation. However, such pairs soon break and separate. When a large enough doughnut-shaped nucleus (critical nucleus) is formed at a certain

50

T. Sakaue and K. Yoshikawa

time

10µm

6 sec

1.5 sec

1.5 sec

Fig. 3.9. Dynamical process of the folding of a semiﬂexible polymer with contour length L/a = 512 and Kuhn length l/a 20. (Top) Snapshots obtained through Brownian dynamics simulations, and (bottom) the ﬂuorescence intensity proﬁle of the T4 DNA during the folding and corresponding schematic pictures (see [26] and [34] for more details)

occasion, the remaining coil part is pulled into the nucleus in order, and ﬁnally the torus structure is formed. The critical nucleus is created at the chain end, with the highest probability reﬂecting a large motional freedom. The typical characteristic of torus formation is the almost constant speed of the growth process, reﬂecting the quasi-one-dimensional nature of the polymer chain.6 Note that not only the torus but also rod-shaped products are frequently formed, although these rod structures have slightly higher energies than the torus, and so are metastable at the condition investigated herein. Close inspection of the folding process indicates that the ﬁnal structure is almost controlled at the stage of the nucleation, that is, a rod-shaped nucleus would be more easily formed than the doughnut-shaped nucleus, resulting in a rather high probability for the metastable rod formation. These results demonstrate the crucial importance of the pathway in the free energy landscape in semiﬂexible chain folding. 6

For a more rigorous argument, the ﬁnite-size eﬀect in the torus state (surface, bending energies, etc.) and the dissipation involved in the process should be correctly characterized. It is worthwhile to point out the similarity between the growth process (sucking the coil part into the nucleus) and the dynamics of polymer translocation (sucking the coil part into the localized hole). For the latter, a lucid theoretical description has recently been proposed [35].

3 Transition in the Higher-order Structure of DNA

51

Fig. 3.10. Typical snapshot of the core-shell structure formed from a long semiﬂexible chain with L/a = 2400 and l/a 20 obtained through Monte Carlo simulations

Core-Shell Structure in Long Chains So far, we have observed unique characteristics in the folding of semiﬂexible polymers, which are mostly associated with torus formation in the compact state. Although the length of the chain studied was long enough to reveal the semiﬂexibility, the number N = L/l of statistically independent segments was very small (on the order of 10). The torus is indeed the product of a chain of ﬁnite length, as discussed earlier, and there would be several distinctive features expected for the folding of longer semiﬂexible chains. A recent study has demonstrated that a long semiﬂexible polymer may assume a partially folded state, in which a dense core is surrounded by a disperse fringe, at the moderately poor solvent condition [36] (Fig. 3.10). Inside the core, the segment density is rather high, and there is a weak orientational ordering. Upon further quenching, this core-shell structure will be transformed into a more ordered, completely folded state, such as a torus or a disk structure. Therefore, the long semiﬂexible polymers exhibit multiple-step folding transitions. 3.3.4 Instability Due to the Remanent Charge In Sect. 3.3.3, we have examined the impact of the chain stiﬀness on the folding transition. Several aspects of the DNA higher-order transition can be discussed in terms of the semiﬂexible chain model. However, experiments also provide diﬀerent situations, which seem not to be explained by the stiﬀness eﬀect alone. In this section, we shall deal with another important eﬀect arising from the polyelectrolyte nature of DNA molecules. One of the central issues here is the origin of the attractive force between like-charged segments as the driving force of the folding. However, our stance here is to investigate the large scale conformation of the polymer, given some eﬀective interaction, as mentioned in Sect. 3.2.2. The primal diﬀerence with the neutral chain case lies in the electrostatic self-energy of the structure due to the possible incomplete

52

T. Sakaue and K. Yoshikawa

charge compensation. This may have a crucial eﬀect on the folding manner both in ﬂexible and semiﬂexible polymers. Rayleigh Instability Given a constant volume ∼R3 , a shape with minimum surface area is a sphere of radius ∼R. Therefore, a liquid drop usually takes a spherical shape to minimize the surface energy to ∼γR2 . Now imagine that electric charge Q is accumulated in the droplet, which creates electrostatic self-energy ∼(lB Q2 /R).7 When the charge exceeds the critical value Qcr = e(γR3 /(T lB )), the spherical drop becomes locally unstable and will spontaneously deform. This is called Rayleigh instability and the equilibrium state is a set of smaller droplets with charge on each of them lower than the critical value, which are inﬁnitely separated from each other [37]. The same instability happens for the charged globule made from ﬂexible polyelectrolytes, but the ﬁnal equilibrium state is now smaller globules connected by narrow strings due to the connectivity of the chain. This pearl-necklace globule was ﬁrst predicted based on the scaling argument [38] and was validated by subsequent extensive computer simulations [39, 40]. Rings-on-a-String Conformation in Semiﬂexible Polyelectrolytes Let us start with a recent experimental observation [18, 19] summarized in Fig. 3.11. Here, T4 DNA molecules are folded by a gemini (dimeric) surfactant as a condensing agent. Fluorescence microscope (FM) observations show the appearance of the partially folded structure as a stable state in a certain range of surfactant concentration. This is an example of the stepwise folding transition through intrachain segregation, cf. Fig. 3.2(bottom). Atomic force microscopy (AFM) has clearly revealed the ﬁne structure in which several tori are interconnected by strings, that is, a single DNA molecule takes a rings-on-a-string structure. How can we explain this phenomenon? The preceding sections have identiﬁed several mechanisms to control the size and the morphology of the folded polymers. Surface tension is always important and is responsible for the spherical morphology of the ﬂexible polymer globule. The size of the globule is determined by the condition of the mechanical equilibrium between the inside globule and the outer solution. For semiﬂexible polymers of moderate length, the bending stress prefers the torus morphology, the size of which is determined by the balance between the surface and the bending energies. If the ﬂexible polymers are charged, the globule may split due to the Rayleigh instability, and the pearl-necklace conformation appears as a result of the 7

The length lB = e2 /εT is called the Bjerrum length, which corresponds to the distance at which the electrostatic energy between two unit charges in the medium of the (eﬀective) dielectric constant ε becomes equivalent to the thermal energy.

3 Transition in the Higher-order Structure of DNA

53

Fig. 3.11. Folding of T4 DNA by the addition of the gemini surfactant. Distributions of the long-axis length of T4 DNA at diﬀerent concentrations [cs ] of the surfactant. Coil, partially folded, and completely folded states are distinguished by the diﬀerent colorings. Also shown are FM and AFM images with the corresponding schematic representation of the partially folded state ([cs ] = 0.2 μM) and completely folded state ([cs ] = 1.0 μM). The FM and AFM observations are of the same DNA molecules attached to a mica surface. A rings-on-a-string structure is clearly seen for the partially folded DNA, while the completely folded DNA assumes a network structure composed of many fused rings (see [19] for more details)

competition between the surface and the electrostatic energies. The natural question, then, is what is expected for the folding transition of the semiﬂexible polyelectrolytes? The rings-on-a-string structure is characterized by the coexistence of ordered domains (torus) and disordered domains (coil), and is thus regarded as microphase segregation within a single chain. Since the generation of an ordered folded structure from a semiﬂexible chain can be considered to be a kind of crystal growth (Sect. 3.3.3), the appearance of such intra-chain segregated structures is somewhat counterintuitive. In the simulation of the folding of a single semiﬂexible chain, in which the process of torus nucleation and growth is clearly observed, a partially folded structure with a growing torus is only transient and is never stable [26]. One may naively suppose that this phenomenon is caused by Rayleigh instability, that is, a single torus may split upon the accumulation of the charge.

54

T. Sakaue and K. Yoshikawa

However, it is not immediately obvious that this mechanism is responsible for the rings-on-a-string structures observed for DNA in solution with a moderate concentration of monovalent salt. In fact, a simple energetic consideration suggests the following unique characteristic of the charged torus [32]. At a given segment density, a torus is characterized by two characteristic radii of curvature, that is, ring radius R and ring thickness r and therefore possesses a greater degree of freedom than a spherical globule, which is solely characterized by the radius, or equivalently, by the number of segments inside the globule. This additional freedom provides an escape pathway, which allows the torus to grow without accumulating the electrostatic self-energy, that is, unlike a spherical globule, a torus does not necessarily split upon charging. In other words, the electrostatic-self energy limits the ring thickness, but not the ring radius. Thus, the grand state of the charged torus is characterized by a thin ring, the radius of which rapidly increases with the chain length L (Fig. 3.7). Let us brieﬂy discuss a possible alternative scenario, which has been proposed based on the consideration of the unique characteristics of the charged torus and the crucial role of the combinational entropy of the segment state distribution along the chain [41]. The free energy F (N ) of a folded polymer with N segments is generally written in the following form: F (N ) = Fb (N ) + ΔF (N ),

(3.5)

where the ﬁrst term Fb (N ) ∼ N is the bulk term and the second term represents the nonextensive part. For a globule of neutral ﬂexible polymers, this comes from the surface energy ΔF (N ) ∼ N 2/3 , and for a neutral torus formed by semiﬂexible polymers, the minimization of (3.3) leads to ΔF (N ) ∼ N 3/5 . Therefore, splitting into two parts is forbidden by the high energetic penalty: F (N ) < F (N1 ) + F (N2 ) (with N = N1 + N2 ).8 On the other hand, if the residual charge inside the torus limits its thickness, splitting does not alter the total volume and the surface area of the object. Thus, the only contribution to the nonextensive part of the free energy arises from the bending energy, which can be evaluated as ΔF (N ) ∼ N −1 from (3.3). The energetic cost for the splitting is then very low, in particular for a long chain; therefore, a multiple-tori structure (Fig. 3.11 (right)) may appear as an entropically stabilized state, reﬂecting the increase in the possible number of states.9 A simple model calculation of the folding transition in line with the above analysis has demonstrated that the degree of the remanent charge inside the folded part is a crucial factor for the transition manner (Fig. 3.12). If the folded part is completely neutralized by oppositely charged low molecular solvents, then the scenario developed for neutral semiﬂexible polymers can be applied. 8

9

There is an additional penalty associated with the “boundary” between two parts, which may be regarded as a defect. In “low temperature” states like this, a kinetic eﬀect would also be important for the generation of multiple tori structures. (See [32] for more details.)

3 Transition in the Higher-order Structure of DNA 8.0 fully folded

cg (x10-3 )

cg (x10-2 )

3.0

55

2.0

fully folded 7.0

rings-on-a string

1.0

6.0

coil 0

coil 400

800

1200

N

0

400

800 N

1200

Fig. 3.12. Diagrams of the folding transition of semiﬂexible polyelectrolytes (l/a = 20) by the addition of the condensing agent in a plane of concentration of the condensing agent cg and the segment number N = L/l. (Left) An all-or-none transition from coil to fully folded torus is observed for the case of almost complete charge neutralization (degree of the remanent charge α = 10−4 ). (Right) Rings-on-a-string structures emerge for the folding of long chains due to the presence of the remanent charge (α = 0.2) (see [41] for more details)

On the other hand, the presence of the remanent charge may have a qualitative eﬀect. At the onset of the folding, the chain may be discontinuously folded into the rings-on-a-string structure. As can be easily guessed from the earlier discussion, this structure is stabilized by a large number of possible ways of realization on how tori and coils can be arranged along the chain. Reﬂecting the ﬁniteness of the system freedom, the structures of diﬀerent numbers of rings coexist in the intrachain segregated state. As the solvent quality decreases, the probability distribution changes and ﬁnally the completely folded state composed of many fused mini rings is reached. The essential requisite for the present scenario is the unique property of the charged torus, that is, its instability to thicken beyond a certain size. Therefore, its applicability is not limited to the case, in which the torus thickness is limited by the electrostatic mechanism. For example, we expect that surfactant molecules, which are sometimes used as condensing agents, may aﬀect such structural property through the packing inside folded structures. Note also that the presence of the ﬁnite-sized bundles is rather ubiquitous in other semiﬂexible polyelectrolyte systems and biopolymer solutions. Seeking for its consequences would be a yet uncultivated problem.

3.4 Summary and Perspectives Controlling the higher-order structures of DNA in a reliable way is highly required for various problems, ranging from biological/nanosciences to medical applications. We have seen that, in the mesoscopic length scale, a DNA

56

T. Sakaue and K. Yoshikawa

molecule can be reasonably well described by a simple polymer model with uniform physical property. It has become evident that this simple polymer is capable of exhibiting much richer conformational transitions than naively expected. One striking example is the phenomenon of intrachain segregation. We have discussed one of the possible scenarios in Sect. 3.3.4. However, other scenarios are also conceivable under diﬀerent experimental conditions. A related topic is the appearance of the core-shell structure in long semiﬂexible polymers discussed in Sect. 3.3.3, which requires further investigations. There would be many open questions and various future directions either fundamental- or application-oriented. We close this chapter by adding two comments, which are supposed to be fundamental from the biological point of view. 3.4.1 Higher-order Structure and Genetic Activity One of the most interesting questions is the relationship with the genetic activity. Although it is known that the part of chromatin in the genetically active state is somewhat relaxed, this phenomenon has not been discussed from the viewpoint of the material properties inherent in long DNA chains. A recent in vitro study reported that the transcriptional activity of long DNA molecules (40 kbp containing one gene) can be abruptly switched oﬀ at the critical concentration of the added condensing agents [42]. Importantly, this inhibition is shown to be directly correlated with the all-or-none discontinuous folding transition [43]. On the other hand, under the same conditions, a system composed of short fragments of DNA on the order of the persistence length does not show such an on/oﬀ switching of the transcriptional activity. Here, it should be noted that fragment DNA molecules are used in usual biochemical and molecular biology experiments due to the diﬃculty in operating long DNA molecules, where the correlation between the higher-order structures of DNA and its function is missing. What can be expected for longer DNA molecules with multiple genes? The typical domain size involved in the intrachain segregation is on the order of several dozen kilo base pairs. This implies that several dozen genes can be simultaneously switched oﬀ by the formation of one segregated domain through partial folding. Such a higher-order transition can be induced by a slight change in the environmental condition and may provide global control of the accessibility of regulatory proteins [44]. 3.4.2 Toward Chromatin Structure In eucaryotic cells, high compaction of genomic DNA is achieved by the complexation with cationic structural proteins called histones [45]. DNA ﬁrst wraps around the histone to form a basic unit known as nucleosome, which is

3 Transition in the Higher-order Structure of DNA

57

organized into the hierarchical chromatin structure. The structure and function of the chromatin has been actively studied on a molecular level. For example, chemical modiﬁcations such as acetylation and methylation of the histone tail are known to greatly aﬀect the chromatin activity. However, the higherorder structure has not yet been clariﬁed. Although it is widely recognized that chromatins function by utilizing various speciﬁc mechanisms, here again, one may also approve of the imperative impact of general (most importantly electrostatic) interactions. In this direction, let us introduce a recent study, which focuses on a simple model system composed of T4 DNA and cationic nanosized particles instead of histones [46] (Fig. 3.13). The constructed system of the chromatin analogue is not featured by speciﬁc DNA–histone interactions, and therefore is governed by general interactions only, the properties of which are controllable with comparative ease. The result shows that the global structure is indeed controlled by apparent physical parameters, such as the size and/or charge, the concentration of nanoparticles, and the ambient salt concentration, etc. In particular, under suitable conditions with the regular wrapping mode, structures reminiscent of real chromatins are obtained. The correlation between the higher-order structures and the transcriptional

Fig. 3.13. (Top) An electron micrograph of an artiﬁcial chromatin model composed of T4 DNA and cationic nanoparticles of diameter 15nm. (Bottom) Typical snapshots of a model DNA (semiﬂexible polyelectrolyte) complexed with cationic nanoparticles. At low salt concentration (Debye screening length rD /a = 1), a beadson-a-string nucleosome-like structure is observed (left), while locally segregated clusters are formed at higher salt concentrations (rD /a = 0.3) (right) (See [46] for more details)

58

T. Sakaue and K. Yoshikawa

activity has also been examined, providing a useful insight for the gene delivery application as well as the function of real chromatin [47]. It is likely that cells utilize the physico-chemical properties inherent in genomic DNA molecules, which are highly charged, locally stiﬀ, and very long in the course of their functioning. Unveiling the higher-order structure and its relation with the function of DNA and chromatin is awaited.

References 1. B. Albert et al., Molecular Biology of the Cell, 3rd edn. (Gerland, New York, 1994) 2. A. Arkin, J. Ross, H.H. McAdams, Genetics 149, 1633 (1998) 3. C.V. Rao, D.M. Wolf, A.P. Arkin, Nature (London) 420, 231 (2002); 421, 190E (2003) 4. J.M. Raser, E.K. O’Shea, Science 309, 2010 (2005) 5. A.A. Kornyshev, D.J. Lee, S. Leikin, A. Wynveen, Rev. Mod. Phys. 79, 943 (2007) 6. V.A. Bloomﬁeld, Biopolymers 31, 1471 (1991) 7. V.A. Bloomﬁeld, Curr. Opin. Struct. Biol. 6, 334 (1996) 8. J. Widom, R.L. Baldwin, Biopolymers 22, 1595 (1983) 9. W.M. Gelbart, R.F. Bruinsma, P.A. Pincus, V.A. Parsegian, Phys. Today 53, 38 (2000) 10. K. Yoshikawa, M. Takahashi, V.V. Vasilevskaya, A.R. Khokhlov, Phys. Rev. Lett. 76, 73029 (1996) 11. M. Takahashi, K. Yoshikawa, V.V. Vasilevskaya, A.R. Khokhlov, J. Phys. Chem. B 101, 9396 (1997) 12. K. Yoshikawa, Y. Yoshikawa, in Pharmaceutical Perspectives of Nucleic AcidBased Therapeutics, ed. by R.I. Mahato, S.W. Kim (Taylor & Francis, London, 2002) 13. S.G. Starodubsev, K. Yoshikawa, J. Phys. Chem. 100, 19702 (1996) 14. M. Ueda, K. Yoshikawa, Phys. Rev. Lett. 77, 2133 (1996) 15. K. Yoshikawa, Y. Yoshikawa, Y. Koyama, T. Kanbe, J. Am. Chem. Soc. 119, 6473 (1997) 16. S. Takagi, K. Tsumoto, K. Yoshikawa, J. Chem. Phys. 114, 6942 (2001) 17. Y. Yoshikawa, Yu.S. Velichko, Y. Ichiba, K. Yoshikawa, Eur. J. Biochem. 268, 2593 (2001) 18. A.A. Zinchenko, V.G. Sergeyev, S. Murata, K. Yoshikawa, J. Am. Chem. Soc. 125, 4414 (2003) 19. N. Miyazawa, T. Sakaue, K. Yoshikawa, R. Zana, J. Chem. Phys. 112, 044902 (2005) 20. A.Yu. Grosberg, A.R. Khokhlov, Statistical Physics of Macromolecules, (American Institute of Physics, New York, 1994) 21. P.-G. de Gennes, Scaling Concepts in Polymer Physics, (Cornell University Press, Ithaca, 1979) 22. G. Swislow, S. Sun, I. Nishio, T. Tanaka, Phys. Rev. Lett. 44, 796 (1980) 23. A.Yu. Grosberg, A.R. Khokhlov, Adv. Polym. Sci. 41, 53 (1981) 24. H. Noguchi, K. Yoshikawa, J. Chem. Phys. 109, 5070 (1998)

3 Transition in the Higher-order Structure of DNA

59

25. V.A. Ivanov, W. Paul, K. Binder, J. Chem. Phys. 109, 5659 (1998) 26. T. Sakaue, K. Yoshikawa, J. Chem. Phys. 117, 6323 (2002) 27. M.R. Stukan, V.A. Ivanov, A.Yu. Grosberg, W. Paul, K. Binder, J. Chem. Phys. 118, 3392 (2003) 28. A.Yu. Grosberg, Bioﬁzika(USSR) 24, 32 (1979) 29. A.Yu. Grosberg, A.V. Zhestkov, J. Biomol. Struct. Dyn. 3, 859 (1986) 30. J. Ubbink, T. Odijk, Europhys. Lett. 33, 353 (1996) 31. V.V. Vasilevskaya, A.R. Khokhlov, S. Kidoaki, K. Yoshikawa, Biopolymers 41, 51 (1997) 32. T. Sakaue, J. Chem. Phys. 120, 6299 (2004) 33. N.V. Hud, K.H. Downing, R. Balhorn, Proc. Natl. Acad. Sci. USA 92, 3581 (1995) 34. Y. Matsuzawa, Y. Yonezawa, K. Yoshikawa, Biochem. Biophys. Commu. 225, 796 (1996) 35. T. Sakaue, Phys. Rev. E 76, 021803 (2007) 36. Y. Higuchi, T. Sakaue, K. Yoshikawa, Chem. Phys. Lett. 461, 42 (2008) 37. L. Rayleigh, Philos. Mag. 14, 184 (1882) 38. A.V. Dobrynin, M. Rubinstein, S.P. Obukhov, Macromolecules 29, 2974 (1996) 39. A.V. Lyulin, B. D¨ unweg, O.V. Borisov, A.A. Darinskii, Macromolecules 32, 3264 (1999) 40. H.J. Limback, C. Holm, K. Kremer, Europhys. Lett. 49, 189 (2000) 41. T. Sakaue, K. Yoshikawa, J. Chem. Phys. 125, 074904 (2006) 42. K. Tsumoto, L. Fran¸cois, K. Yoshikawa, Biophys. Chem. 106, 23 (2003) 43. A. Yamada, K. Kubo, T. Nakai, K. Tsumoto, K. Yoshikawa, Appl. Phys. Lett. 86, 223901 (2005) 44. K. Yoshikawa, J. Biol. Phys. 28, 701 (2002) 45. A.P. Wolﬀe, Chromatin: Structure and Function, (Academic Press, New York, 1998) 46. A.A. Zinchenko, T. Sakaue, S. Araki, K. Yoshikawa, D. Baigl, J. Phys. Chem. B 111, 3019 (2007) 47. A.A. Zinchenko, L. Fran¸cois, K. Yoshikawa, Biophys. J. 92, 1318 (2007)

4 Generalized-Ensemble Algorithms for Studying Protein Folding Y. Okamoto

Abstract. Conventional simulations of biomolecular systems will get trapped in states of local-minimum energy. A simulation in generalized ensemble overcomes this diﬃculty by performing a random walk in potential energy space and other parameter space. From only one simulation run, one can obtain accurate canonicalensemble averages of physical quantities as functions of temperature and other parameters of the sytem by the single-histogram and/or multiple-histogram reweighting techniques. In this article, we review the generalized-ensemble algorithms. Two wellknown methods, namely, multicanonical algorithm and replica-exchange method, are described ﬁrst. Both Monte Carlo and molecular dynamics versions of the algorithms are given. We then present further extensions of the above two methods.

4.1 Introduction Canonical ﬁxed-temperature simulations of complex systems such as biomolecules are greatly hampered by the multiple-minima problem. Because simulations at low temperatures tend to get trapped in a few of the huge number of local-minimum-energy states, which are separated by high energy barriers, it is very diﬃcult to obtain accurate canonical distributions at low temperatures by conventional Monte Carlo (MC) and molecular dynamics (MD) methods. One way to overcome this multiple-minima problem is to perform a simulation in a generalized ensemble where each state is weighted by an artiﬁcial, non-Boltzmann probability weight factor so that a random walk in potential energy space may be realized. This class of simulation methods are referred to as the generalized-ensemble algorithms (for reviews see, e.g., [1–7]). The random walk allows the simulation to escape from any energy barrier and to sample much wider conformational space than by conventional methods. By monitoring the energy in a single simulation run, one can obtain not only the global-minimum-energy state but also canonical-ensemble averages as functions of temperature by the single-histogram [8] or multiple-histogram [9, 10] reweighting techniques (an extension of the multiple-histogram method is also referred to as the weighted histogram analysis method (WHAM) [10]).

62

Y. Okamoto

One of the most well-known generalized-ensemble methods is perhaps multicanonical algorithm (MUCA) [11, 12] (for reviews see, e.g., [13, 14]). (The method is also referred to as entropic sampling [15] and adaptive umbrella sampling [16] of the potential energy [17]. MUCA can also be considered as a sophisticated, ideal realization of a class of algorithms called umbrella sampling [18]. Also closely related methods are transition matrix methods reviewed in [19] and Wang-Landau method [20, 21], which is also referred to as density of states Monte Carlo [22]. See also [23].) MUCA and its generalizations have been applied to spin systems (see, e.g., [24–29]). MUCA was also introduced to the molecular simulation ﬁeld [30]. Since then MUCA and its generalizations have been extensively used in many applications in protein and related systems [31–65]. Molecular dynamics version of MUCA has also been developed [17, 38, 42] (see also [38, 66] for Langevin dynamics version). MUCA has been extended so that ﬂat distributions in other parameters instead of potential energy may be obtained (see, e.g., [25, 26, 37, 43, 45, 60, 64]). This can be considered as a special case of the multidimensional (or, multivariable) extensions of MUCA, where a multidimensional random walk in potential energy space and in other parameter space is realized (see, e.g., [37, 43, 44, 62, 65]). In this article, we just present one of such methods, namely, the multibaricmultithermal algorithm, where a two-dimensional random walk in both potential energy space and volume space is realized [62, 63]. The multicanonical algorithms are powerful, but the probability weight factors are not a priori known and have to be determined by iterations of short trial simulations. This process can be nontrivial and very tedius for complex systems with many degreees of freedom. In the replica-exchange method (REM) [67–69], the diﬃculty of weight factor determination is greatly alleviated. (A closely related method was independently developed in [70]. Similar methods in which the same equations are used but emphasis is laid on optimizations have been developed [71, 72]. REM is also referred to as multiple Markov chain method [73] and parallel tempering [74]. Details of literature about REM and related algorithms can be found in recent reviews [2, 75].) In this method, a number of noninteracting copies (or replicas) of the original system at diﬀerent temperatures are simulated independently and simultaneously by the conventional MC or MD method. For every few steps, pairs of replicas are exchanged with a speciﬁed transition probability. The weight factor is just the product of Boltzmann factors, and so it is essentially known. REM has already been used in many applications in protein systems [76–91]. Other molecular simulation ﬁelds have also been studied by this method in various ensembles [92–96]. Moreover, REM was applied to cluster studies in quantum chemistry ﬁeld [97]. The details of molecular dynamics algorithm have been worked out for REM in [77]. This led to a wide application of REM in the protein folding and related problems (see, e.g., [98–115]). However, REM also has a computational diﬃculty: As the number of degrees of freedom of the system increases, the required number of replicas also

4 Generalized-Ensemble Algorithms for Studying Protein Folding

63

greatly increases, whereas only a single replica is simulated in MUCA. This demands a lot of computer power for complex systems. Our solution to this problem is to use REM for the determinations of weight factor of MUCA, which is much simpler than previous iterative methods of weight determinations, and then perform a long MUCA production run. The method is referred to as the replica-exchange multicanonical algorithm (REMUCA) [82, 87, 88]. In REMUCA, a short replica-exchange simulation is performed, and the multicanonical weight factor is determined by the multiple-histogram reweighting techniques [9, 10]. Finally, one is naturally led to a multidimensional (or, multivariable) extension of REM, which we refer to as multidimensional replica-exhcange method (MREM) [80]. (The method is also referred to as generalized parallel sampling [116], Hamiltonian replica-exchange method [86], and Model Hopping [117].) A special realization of MREM is replica-exchange umbrella sampling (REUS) [80] and it is particularly useful in free energy calculations (see also [81] for a similar idea). In this article, we just present one of such methods, namely, the replica-exchange method in the isobaric-isothermal ensemble, where not only temperature values but also pressure values are exchanged in the replica-exchange processes [3, 94, 96, 104, 105]. (The results of the ﬁrst such application of the two-dimensional replica-exchange simulations in the isobaric-isothermal ensemble were presented in [3].) This approach is complementary to the multibaric-multithermal algorithm above. In this article, we describe the generalized-ensemble algorithms mentioned earlier. Namely, we ﬁrst review the two familiar methods: MUCA and REM. We then describe multidimensional extensions of these methods. Examples of the results by some of these algorithms are then presented.

4.2 Generalized-Ensemble Algorithms 4.2.1 Multicanonical Algorithm Let us consider a system of N atoms of mass mk (k = 1, . . . , N ) with their coordinate vectors and momentum vectors denoted by q ≡ {q 1 , . . . , q N } and p ≡ {p1 , . . . , pN }, respectively. The Hamiltonian H(q, p) of the system is the sum of the kinetic energy K(p) and the potential energy E(q): H(q, p) = K(p) + E(q), where K(p) =

N pk 2 . 2mk

(4.1)

(4.2)

k=1

In the canonical ensemble at temperature T , each state x ≡ (q, p) with the Hamiltonian H(q, p) is weighted by the Boltzmann factor

64

Y. Okamoto

WB (x; T ) = exp (−βH(q, p)) ,

(4.3)

where the inverse temperature β is deﬁned by β = 1/kB T (kB is the Boltzmann constant). The average kinetic energy at temperature T is then given by

N p 2 3 k K(p) T = = N kB T. (4.4) 2mk 2 k=1

T

Because the coordinates q and momenta p are decoupled in (4.1), we can suppress the kinetic energy part and can write the Boltzmann factor as WB (x; T ) = WB (E; T ) = exp(−βE).

(4.5)

The canonical probability distribution of potential energy PNVT (E; T ) is then given by the product of the density of states n(E) and the Boltzmann weight factor WB (E; T ): (4.6) PNVT (E; T ) ∝ n(E)WB (E; T ). Because n(E) is a rapidly increasing function and the Boltzmann factor decreases exponentially, the canonical ensemble yields a bell-shaped distribution, which has a maximum around the average energy at temperature T . The conventional MC or MD simulations at constant temperature are expected to yield PNVT (E; T ). A MC simulation based on the Metropolis algorithm [118] is performed with the following transition probability from a state x of potential energy E to a state x of potential energy E :

WB (E ; T ) w(x → x ) = min 1, = min (1, exp (−βΔE)) . (4.7) WB (E; T ) where

ΔE = E − E.

(4.8)

A MD simulation, on the other hand, is based on the following Newton equations of motion: pk , mk ∂E p˙k = − = f k, ∂q k q˙k =

(4.9) (4.10)

where f k is the force acting on the kth atom (k = 1, . . . , N ). This set of equations actually yield the microcanonical ensemble, and we have to add a thermostat to obtain the canonical ensemble at temperature T . Here, we just follow Nos´e’s prescription [119, 120], and we have q˙ k =

pk , mk

(4.11)

4 Generalized-Ensemble Algorithms for Studying Protein Folding

p˙ k = − s˙ = s

P˙s =

∂E s˙ s˙ − pk = f k − pk , ∂q k s s

(4.12)

Ps , Q

(4.13)

N p

k

k=1

65

2

mk

− 3N kB T = 3N kB (T (t) − T ) ,

(4.14)

where s is Nos´e’s scaling parameter, Ps is its conjugate momentum, Q is its mass, and the “instantaneous temperature” T (t) is deﬁned by T (t) =

N 1 pk (t)2 . 3N kB mk

(4.15)

k=1

However, in practice, it is very diﬃcult to obtain accurate canonical distributions of complex systems at low temperatures by conventional MC or MD simulation methods. This is because simulations at low temperatures tend to get trapped in one or a few of local-minimum-energy states. In the multicanonical ensemble [11, 12], on the other hand, each state is weighted by a non-Boltzmann weight factor Wmu (E) (which we refer to as the multicanonical weight factor), so that a uniform potential energy distribution Pmu (E) is obtained: Pmu (E) ∝ n(E)Wmu (E) ≡ constant.

(4.16)

The ﬂat distribution implies that a free one-dimensional random walk in the potential energy space is realized in this ensemble. This allows the simulation to escape from any local minimum-energy states and to sample the conﬁgurational space much more widely than the conventional canonical MC or MD methods. The deﬁnition in (4.16) implies that the multicanonical weight factor is inversely proportional to the density of states, and we can write it as follows: Wmu (E) ≡ exp [−β0 Emu (E; T0 )] =

1 , n(E)

(4.17)

where we have chosen an arbitrary reference temperature, T0 = 1/kB β0 , and the “multicanonical potential energy” is deﬁned by Emu (E; T0 ) ≡ kB T0 ln n(E) = T0 S(E).

(4.18)

Here, S(E) is the entropy in the microcanonical ensemble. Since the density of states of the system is usually unknown, the multicanonical weight factor has to be determined numerically by iterations of short preliminary runs [11, 12]. A multicanonical MC simulation is performed, for instance, with the usual Metropolis criterion [118]: The transition probability of state x with potential energy E to state x with potential energy E is given by

66

Y. Okamoto

Wmu (E ) n(E) w(x → x ) = min 1, = min 1, = min (1, exp (−β0 ΔEmu )) , Wmu (E) n(E ) (4.19) where ΔEmu = Emu (E ; T0 ) − Emu (E; T0 ). (4.20)

The MD algorithm in the multicanonical ensemble also naturally follows from (4.17), in which the regular constant temperature MD simulation (with T = T0 ) is performed by replacing E by Emu in (4.12) [38, 42]: p˙ k = −

∂Emu (E; T0 ) s˙ ∂Emu (E; T0 ) s˙ f k − pk . − pk = ∂q k s ∂E s

(4.21)

If the exact multicanonical weight factor Wmu (E) is known, one can calculate the ensemble averages of any physical quantity A at any temperature T (= 1/kB β) as follows: A(E)PNVT (E; T ) A(E)n(E) exp(−βE) < A >T =

E

= PNVT (E; T )

E

E

,

(4.22)

n(E) exp(−βE)

E

where the density of states is given by (see (4.17)) n(E) =

1 . Wmu (E)

(4.23)

The summation instead of integration is used in (4.22), because we often discretize the potential energy E with step size (E = Ei ; i = 1, 2, . . .). Here, the explicit form of the physical quantity A should be known as a function of potential energy E. For instance, A(E) = E gives the average potential energy < E >T as a function of temperature, and A(E) = β 2 (E− < E >T )2 gives speciﬁc heat. In general, the multicanonical weight factor Wmu (E), or the density of states n(E), is not a priori known, and one needs its estimator for a numerical simulation. This estimator is usually obtained from iterations of short trial multicanonical simulations. The details of this process are described, for instance, in [24, 33]. However, the iterative process can be nontrivial and very tedius for complex systems. In practice, it is impossible to obtain the ideal multicanonical weight factor with completely uniform potential energy distribution. The question is when to stop the iteration for the determination of weight factor. Our criterion for a satisfactory weight factor is that as long as we do get a random walk in potential energy space, the probability distribution Pmu (E) does not have to be completely ﬂat with a tolerance of, say, an order of magnitude deviation. In such a case, we usually perform with this weight factor a multicanonical simulation with high statistics (production run) to get even better estimate

4 Generalized-Ensemble Algorithms for Studying Protein Folding

67

of the density of states. Let Nmu (E) be the histogram of potential energy distribution Pmu (E) obtained by this production run. The best estimate of the density of states can then be given by the single-histogram reweighting techniques [8] as follows (see the proportionality relation in (4.16)): n(E) =

Nmu (E) . Wmu (E)

(4.24)

By substituting this quantity in (4.22), one can calculate ensemble averages of physical quantity A(E) as a function of temperature. Moreover, ensemble averages of any physical quantity A (including those that cannot be expressed as functions of potential energy) at any temperature T (= 1/kB β) can now be obtained as long as one stores the “trajectory” of conﬁgurations (and A) from the production run. Namely, we have n0

< A >T =

k=1

−1 A(x(k))Wmu (E(x(k))) exp [−βE(x(k))] n0

, −1 Wmu (E(x(k)))

(4.25)

exp [−βE(x(k))]

k=1

where x(k) is the conﬁguration at the kth MC (or MD) step and n0 is the total number of conﬁgurations stored. Note that when A is a function of E, (4.25) reduces to (4.22), where the density of states is given by (4.24).

4.3 Multidimensional Extensions of Multicanonical Algorithm In the multicanonical ensemble, a one-dimensional random walk is realized in the potential energy space. This algorithm can be generalized to multidimensions, where a random walk in other quantities besides potential energy is performed. There are many possibilities for this generalization. Here, we give an example of two-dimensional extensions of multicanonical algorithm, multibaric–multithermal algorithm [62, 63]. In the isobaric-isothermal ensemble [119–122], the probability distribution PNPT (E, V; T, P) for potential energy E and volume V at temperature T and pressure P is given by PNPT (E, V; T, P) ∝ n(E, V)WNPT (E, V; T, P) = n(E, V) e−βH .

(4.26)

Here, the density of states n(E, V) is given as a function of both E and V, and H is the “enthalpy” (without the kinetic energy contributions): H = E + PV.

(4.27)

68

Y. Okamoto

This weight factor produces an isobaric-isothermal ensemble at constant temperature (T ) and constant pressure (P), and this ensemble yields bell-shaped distributions in both E and V. To perform the isobaric-isothermal MC simulation [122], we perform Metropolis sampling on the scaled coordinates r i = L−1 q i (q i are the real coordinates)√and the volume V (here, the particles are placed in a cubic box of size L ≡ 3 V). The trial moves from state x with the scaled coordinates r with volume V to state x with the scaled coordinate r and volume V are generated by uniform random numbers. The enthalpy is accordingly changed from H(E(r, V), V) to H (E(r , V ), V ) by these trial moves. The trial moves will be accepted with the probability w(x → x ) = min (1, exp[−β{H − H − N kB T ln(V /V)}]) ,

(4.28)

where N is the total number of atoms in the system. As for the MD method in this ensemble, we just present the Nos´e-Andersen algorithm [119–121]. The equations of motion in (4.11)–(4.14) are now generalized as follows: V˙ pk q , + mk 3V k V˙ V˙ ∂H s˙ s˙ p˙ k = − + + − pk = f k − pk , ∂q k s 3V s 3V q˙ k =

s˙ = s P˙ s =

Ps , Q

N p2i − 3N kB T = 3N kB (T (t) − T ) , mi i=1

PV V˙ = s , M N N p2 1 ∂H ∂H i P˙V = = P(t) − P, − q · − 3V i=1 mi i=1 i ∂q i ∂V

(4.29) (4.30) (4.31) (4.32) (4.33) (4.34)

where M is the artiﬁcial mass associated with the volume, PV is the conjugate momentum for the volume, and the “instantaneous pressure” P(t) is deﬁned by 1 P(t) = 3V

N N N N p (t)2 ∂H 1 pi (t)2 i − q i (t) · (t) = + q i (t) · f i (t) . mi ∂q i 3V i=1 mi i=1 i=1 i=1 (4.35)

We now introduce the idea of the multicanonical technique into the isobaric-isothermal ensemble method and refer to this generalized-ensemble algorithmasthemultibaric-multithermalalgorithm(MUBATH)[62,63,123–125].

4 Generalized-Ensemble Algorithms for Studying Protein Folding

69

The molecular simulations in this generalized ensemble perform random walks both in the potential energy space and in the volume space. In the multibaric-multithermal ensemble, each state is sampled by the multibaric-multithermal weight factor Wmbt (E, V) ≡ exp{−βHmbt (E, V)} (Hmbt is referred to as the multibaric-multithermal enthalpy), so that a uniform distribution in both potential energy E and volume V is obtained [62]: Pmbt (E, V) ∝ n(E, V)Wmbt (E, V) = n(E, V) exp{−β0 Hmbt (E, V)} ≡ constant, (4.36) where we have chosen an arbitrary reference temperature, T0 = 1/kB β0 . The multibaric-multithermal MC simulation can be performed by replacing H by Hmbt in (4.28): w(x → x ) = min (1, exp[−β0 {Hmbt − Hmbt − N kB T0 ln(V /V)}]) ,

(4.37)

To perform the multibaric-multithermal MD simulation, we just solve the above equations of motion (4.29)–(4.34) for the regular isobaric-isothermal ensemble (with arbitrary reference temperature T = T0 and reference pressure P = P0 ), where the enthalpy H is replaced by the multibaric-multithermal enthalpy Hmbt in (4.30) and (4.34) [63]. After an optimal weight factor Wmbt (E, V ) is obtained, a long production simulation is performed for data collection. We employ the reweighting techniques [8] for the results of the production run to calculate the isobaricisothermal-ensemble averages. The probability distribution PNPT (E, V; T, P) of potential energy and volume in the isobaric-isothermal ensemble at the desired temperature T and pressure P is given by −1

e−β(E+PV) Nmbt (E, V) Wmbt (E, V) , PNPT (E, V; T, P) = −1 −β(E+PV) Nmbt (E, V) Wmbt (E, V) e

(4.38)

E,V

where Nmbt (E, V) is the histogram of the probability distribution Pmbt (E, V) of potential energy and volume that was obtained by the multibaric-multithermal production run. The expectation value of a physical quantity A at T and P is then obtained from A(E, V) PNPT (E, V; T, P). (4.39) A T,P = E,V

4.3.1 Replica-Exchange Method The system for the replica-exchange method (REM) consists of M noninteracting copies (or, replicas) of the original system in the canonical ensemble at M diﬀerent temperatures Tm (m = 1, . . . , M ). We arrange the replicas so that there is always exactly one replica at each temperature. Then there exists a one-to-one correspondence between replicas and temperatures; the label i

70

Y. Okamoto

(i = 1, . . . , M ) for replicas is a permutation of the label m (m = 1, . . . , M ) for temperatures, and vice versa: i = i(m) ≡ f (m), (4.40) m = m(i) ≡ f −1 (i), −1 where f (m) is a permutation function of m and f (i) is its inverse. [i(1)] [i(M )] [1] [M ] Let X = x1 , . . . , xM = xm(1) , . . . , xm(M ) stand for a “state” in [i]

this generalized ensemble. Each “substate” xm is speciﬁed by the coordinates q [i] and momenta p[i] of N atoms in replica i at temperature Tm : [i] xm ≡ q [i] , p[i] . (4.41) m

Because the replicas are noninteracting, the weight factor for the state X in this generalized ensemble is given by the product of Boltzmann factors for each replica (or at each temperature): WREM (X) =

M i=1

M = exp −βm(i) H q [i] , p[i] exp −βm H q [i(m)] , p[i(m)] ,

= exp −

M i=1

[i]

βm(i) H q , p

[i]

m=1

= exp −

M

βm H q

[i(m)]

,p

[i(m)]

,

m=1

(4.42) where i(m) and m(i) are the permutation functions in (4.40). We now consider exchanging a pair of replicas in this ensemble. Suppose we exchange replicas i and j, which are at temperatures Tm and Tn , respectively, [i] [j] X = . . . , xm , . . . , xn[j] , . . . −→ X = . . . , xm , . . . , xn[i] , . . . . (4.43) Here, i, j, m, and n are related by the permutation functions in (4.40), and the exchange of replicas introduces a new permutation function f : i = f (m) −→ j = f (m), (4.44) j = f (n) −→ i = f (n). The exchange of replicas can be written in more detail as [i] [j] xm ≡ q [i] , p[i] m −→ xm ≡ q [j] , p[j] m , [j] [i] xn ≡ q [j] , p[j] n −→ xn ≡ q [i] , p[i] n ,

(4.45)

where the deﬁnitions for p[i] and p[j] will be given below. We remark that this process is equivalent to exchanging a pair of temperatures Tm and Tn for the corresponding replicas i and j as follows: [i] [i] xm ≡ q [i] , p[i] m −→ xn ≡ q [i] , p[i] n , (4.46) [j] [j] xn ≡ q [j] , p[j] n −→ xm ≡ q [j] , p[j] m .

4 Generalized-Ensemble Algorithms for Studying Protein Folding

71

In the original implementation of the replica-exchange method (REM) [67–69], Monte Carlo algorithm was used, and only the coordinates q (and the potential energy function E(q)) had to be taken into account. In molecular dynamics algorithm, on the other hand, we also have to deal with the momenta p. We proposed the following momentum assignment in (4.45) (and in (4.46)) [77]: ⎧ Tn [i] ⎪ [i] ⎪ ⎪ p ≡ p , ⎨ Tm (4.47) ⎪ ⎪ Tm [j] ⎪ [j] ⎩p ≡ p , Tn which we believe is the simplest and the most natural. This assignment means that we just rescale uniformly the velocities of all the atoms in the replicas by the square root of the ratio of the two temperatures so that the temperature condition in (4.4) may be satisﬁed. The transition probability of this replica-exchange process is given by the usual Metropolis criterion:

WREM (X ) [i] [j] w(X → X ) ≡ w xm = min (1, exp (−Δ)) , xn = min 1, WREM (X) (4.48) [i] [j] where in the second expression (i.e., w(xm |xn )) we explicitly wrote the pair of replicas (and temperatures) to be exchanged. From (4.1), (4.2), (4.42), and (4.47), we have WREM (X ) = exp −βm K p[j] + E q [j] − βn K p[i] + E q [i] WREM (X) , +βm K p[i] + E q [i] + βn K p[j] + E q [j] Tm Tn = exp −βm K p[j] − βn K p[i] + βm K p[i] + βn K p[j] Tn Tm [j] [i] −E q − βn E q [i] − E q [j] . −βm E q

(4.49) As the kinetic energy terms in this equation all cancel out, Δ in (4.48) becomes Δ = βm E q [j] − E q [i] − βn E q [j] − E q [i] , (4.50) = (βm − βn ) E q [j] − E q [i] . (4.51) Here, i, j, m, and n are related by the permutation functions in (4.40) before the replica exchange: i = f (m), (4.52) j = f (n). Without loss of generality, we can assume T1 < T2 < · · · < TM . A simulation of the replica-exchange method (REM) is then realized by alternately

72

Y. Okamoto

performing the following two steps: 1. Each replica in canonical ensemble of the ﬁxed temperature is simulated simultaneously and independently for a certain MC or MD steps. [i] [j] 2. A pair of replicas at neighboringtemperatures, say xm and xm+1 , are [i] [j] exchanged with the probability w xm xm+1 in (4.48). Note that in Step 2 we exchange only pairs of replicas corresponding to neighboring temperatures, because the acceptance ratio of the exchange process decreases exponentially with the diﬀerence in the two β’s (see (4.51) and (4.48)). Note also that whenever a replica exchange is accepted in Step 2, the permutation functions in (4.40) are updated. A random walk in “temperature space” is realized for each replica, which in turn induces a random walk in potential energy space. This alleviates the problem of getting trapped in states of energy local minima. The REM simulation is particularly suitable for parallel computers. Because one can minimize the amount of information exchanged among nodes, it is best to assign each replica to each node (exchanging pairs of temperature values among nodes is much faster than exchanging coordinates and momenta). This means that we keep track of the permutation function m(i; t) = f −1 (i; t) in (4.40) as a function of MC or MD step t during the simulation. After parallel canonical MC or MD simulations for a certain steps (Step 1), M/2 pairs of replicas corresponding to neighboring temperatures are simulateneously exchanged (Step 2), and the pairing is alternated between the two possible choices, i.e., (T1 , T2 ), (T3 , T4 ), . . . and (T2 , T3 ), (T4 , T5 ), . . . . After a long production run of a replica-exchange simulation, the canonical expectation value of a physical quantity A at temperature Tm (m = 1, . . . , M ) can be calculated by the usual arithmetic mean as follows: nm 1 < A >Tm = A (xm (k)) , nm

(4.53)

k=1

where xm (k) (k = 1, · · · , nm ) are the conﬁgurations obtained at temperature Tm , and nm is the total number of measurements made at T = Tm . The expectation value at any intermediate temperature can also be obtained from (4.22), where the density of states is given by the multiple-histogram reweighting techniques [9, 10] as follows. Let Nm (E) and nm be, respectively, the potential-energy histogram and the total number of samples obtained at temperature Tm = 1/kB βm (m = 1, . . . , M ). The best estimate of the density of states is then given by [9, 10] M

n(E) =

−1 gm Nm (E)

m=1 M m=1

−1 gm nm exp(fm − βm E)

,

(4.54)

4 Generalized-Ensemble Algorithms for Studying Protein Folding

where we have for each m (= 1, · · · , M ) n(E) exp(−βm E). exp(−fm ) =

73

(4.55)

E

Here, gm = 1 + 2τm , and τm is the integrated autocorrelation time at temperature Tm . For many systems, the quantity gm can safely be set to be a constant in the reweighting formulae [10], and hereafter we set gm = 1. Note that (4.54) and (4.55) are solved self-consistently by iteration [9,10] to obtain the density of states n(E) and the dimensionless Helmholtz free energy fm . Namely, we can set all the fm (m = 1, . . . , M ) to, e.g., zero initially. We then use (4.54) to obtain n(E), which is substituted into (4.55) to obtain next values of fm , and so on. Moreover, ensemble averages of any physical quantity A (including those that cannot be expressed as functions of potential energy) at any temperature T (= 1/kB β) can now be obtained from the “trajectory” of conﬁgurations of the production run. Namely, we ﬁrst obtain fm (m = 1, . . . , M ) by solving (4.54) and (4.55) self-consistently, and then we have [87] nm M

A(xm (k))

m=1 k=1

< A >T =

1 M

exp [−βE(xm (k))]

n exp [f − β E(xm (k))]

=1 nm M m=1 k=1

, 1

M

exp [−βE(xm (k))]

n exp [f − β E(xm (k))]

=1

(4.56) where xm (k) (k = 1, · · · , nm ) are the conﬁgurations obtained at temperature Tm . The major advantage of REM over other generalized-ensemble methods such as multicanonical algorithm [11, 12] lies in the fact that the weight factor is a priori known (see (4.42)), while in the multicanonical algorithm the determination of the weight factors can be very tedius and time-consuming. In REM, however, the number of required replicas increases greatly as the system size N increases, while only one replica is used in the multicanonical algorithm. This demands a lot of computer power for complex systems. Moreover, so long as optimal weight factors can be obtained, the multicanonical algorithm is more eﬃcient in sampling than the replica-exchange method [88]. 4.3.2 Multidimensional Extensions of Replica-Exchange Method We now present our multidimensional extension of REM, which we refer to as multidimensional replica-exchange method (MREM) [80]. The crucial observation that led to the new algorithm is As long as we have M noninteracting

74

Y. Okamoto

replicas of the original system, the Hamiltonian H(q, p) of the system does not have to be identical among the replicas and it can depend on a parameter with diﬀerent parameter values for diﬀerent replicas. Namely, we can write the Hamiltonian for the ith replica at temperature Tm as Hm (q [i] , p[i] ) = K(p[i] ) + Eλm (q [i] ),

(4.57)

where the potential energy Eλm depends on a parameter λm and can be written, for instance, as Eλm (q [i] ) = E0 (q [i] ) + λm V (q [i] ).

(4.58)

This expression for the potential energy is often used in simulations. For instance, in umbrella sampling [18], E0 (q) and V (q) can be, respectively, taken as the original potential energy and the “biasing” potential energy with the coupling parameter λm . In simulations of spin systems, on the other hand, E0 (q) and V (q) (here, q stands for spins) can be, respectively, considered as the zero-ﬁeld term and the magnetization term coupled with the external ﬁeld λm . While replica i and temperature Tm are in one-to-one correspondence in the original REM, replica i and “parameter set” Λm ≡ (Tm , λm ) are in one-toone correspondence in the new algorithm. Hence, the present algorithm can be considered as a multidimensional extension of the original replica-exchange method, where the “parameter space” is one-dimensional (i.e., Λm = Tm ). Because the replicas are noninteracting, the weight factor for the state X in this new generalized ensemble is again given by the product of Boltzmann factors for each replica (see (4.42)): M [i] [i] βm(i) Hm(i) q , p WMREM (X) = exp − , i=1

= exp −

M

βm Hm q [i(m)] , p[i(m)]

(4.59) ,

m=1

where i(m) and m(i) are the permutation functions in (4.40). Then the same derivation that led to the original replica-exchange criterion follows, and the transition probability of replica exchange is given by (4.48), where we now have (see (4.50)) [80] Δ = βm Eλm q [j] − Eλm q [i] − βn Eλn q [j] − Eλn q [i] . (4.60) Here, Eλm and Eλn are the total potential energy (see (4.57)). Note that we need to newly evaluate the potential energy for exchanged coordinates, Eλm (q [j] ) and Eλn (q [i] ), because Eλm and Eλn are in general diﬀerent functions. For obtaining the canonical distributions, the multiple-histogram reweighting techniques [9,10] are particularly suitable. Suppose we have made a single

4 Generalized-Ensemble Algorithms for Studying Protein Folding

75

run of the present replica-exchange simulation with M replicas that correspond to M diﬀerent parameter sets Λm ≡ (Tm , λm ) (m = 1, . . . , M ). Let Nm (E0 , V ) and nm be, respectively, the potential-energy histogram and the total number of samples obtained for the mth parameter set Λm . The WHAM equations that yield the canonical probability distribution PT,λ (E0 , V ) = n(E0 , V ) exp(−βEλ ) with any potential-energy parameter value λ at any temperature T = 1/kB β are then given by [80] M

n(E0 , V ) =

Nm (E0 , V )

m=1 M

,

(4.61)

nm exp (fm − βm Eλm )

m=1

and for each m (= 1, · · · , M ) exp(−fm ) =

n(E0 , V ) exp (−βm Eλm ) .

(4.62)

E0 ,V

Here, n(E0 , V ) is the generalized density of states. Note that n(E0 , V ) is independent of the parameter sets Λm ≡ (Tm , λm ) (m = 1, . . . , M ). The density of states n(E0 , V ) and the “dimensionless” Helmholtz free energy fm in (4.61) and (4.62) are solved self-consistently by iteration. We now present an example of MREM. We consider an isobaric-isothermal ensemble and exchange not only the temperature but also the pressure values of pairs of replicas during a MC or MD simulation [94]. Namely, suppose we have M replicas with M diﬀerent values of temperature and pressure (Tm ,Pm ). We are setting E0 = E, V = V, and λm = Pm in (4.58). We exchange replicas i and j which are at (Tm ,Pm ) and (Tn ,Pn ), respectively. The transition probability of this replica-exchange process is then given by (4.48), where (4.60) now reads [3, 80, 96] Δ = (βm − βn ) E q [j] − E q [i] + (βm Pm − βn Pn ) V [j] − V [i] . (4.63) We can alternately exchange pairs of neighboring temperature values and pairs of neighboring pressure values during the replica-exchange simulation. Moreover, if we ﬁx the temperature, we can have only the pressure-exchange process as a special case, which yields a one-dimensional random walk in the volume space.

4.4 Examples of Simulation Results We now present some of the simulation results by the generalized-ensemble algorithms that were described in the previous section.

76

Y. Okamoto

The ﬁrst example is the results of the calculation of the residual entropy of the ordinary ice [126,127]. This calculation shows how accurate the density of states can be obtained by multicanonical simulations from the reweighting formula of (4.24). In the crystal structure of ordinary ice, each oxygen atom is located at the center of a tetrahedron and straight lines (bonds) through the sites of the tetrahedron point towards four nearest-neighbor oxygen atoms. Hydrogen atoms are distributed according to the ice rules [128]: A. There is one hydrogen atom on each bond (then called hydrogen bond). B. There are two hydrogen atoms near each oxygen atom (these three atoms constitute a water molecule). Extrapolating low temperature calorimetric experimental data (then available down to about 10 K) towards zero absolute temperature, it was found that ice has a residual entropy [129]: S0 = kB ln(Ω) > 0,

(4.64)

where Ω is the number of states for N molecules. Subsequently, Linus Pauling [128] derived estimates of Ω = (Ω1 )N by approximate methods, obtaining Ω1Pauling = 3/2.

(4.65)

Thus, Ω = (3/2)N is the number of Pauling conﬁgurations. Assuming that the H2 O molecules are essentially intact in ice, one of his arguments is that a given molecule can orient itself in six ways satisfying ice rule B. Choosing the orientations of all molecules at random, the chance that the adjacent molecules permit a given orientation is 1/4. The total number of conﬁgurations is thus Ω = (6/4)N . Equation (4.65) converts to the residual entropy S0Pauling = 0.80574 . . . cal deg−1 mol−1 ,

(4.66)

where we have used R = 8.314472 (15) [J deg−1 mol−1 ] for the gas constant [130]. This is in good agreement with the experimental estimate S0experimental = 0.82 (5) cal deg−1 mol−1 ,

(4.67)

which was subsequently obtained by Giauque and Stout [131] using reﬁned calorimetry (we give error bars with respect to the last digit(s) in parentheses). Pauling’s arguments omit correlations induced by closed loops when one requires fulﬁllment of the ice rules for all atoms, and it was shown by Onsager and Dupuis [132] that Ω1 = 1.5 is in fact a lower bound. Onsager’s student Nagle used a series expansion method to derive the estimate [133] Ω1Nagle = 1.50685 (15),

(4.68)

4 Generalized-Ensemble Algorithms for Studying Protein Folding

77

or S0Nagle = 0.81480 (20) cal deg−1 mol−1 . (4.69) Here, the error bar is not statistical but reﬂects higher order corrections of the expansion, which are not entirely under control. Despite Nagle’s high precision estimate, there has apparently been almost no improvement on the accuracy of the experimental value (4.67). Some of the diﬃculties are addressed in a careful study by Haida et al. [134]. But their ﬁnal estimate remains (4.67), with no reduction of the error bar. We noted that by treating the contributions in their table 3 as statistically independent quantities and using Gaussian error propagation (instead of adding up the individual error bars), the ﬁnal error bar becomes reduced by almost a factor of two and their value would then read S0 = 0.815 (26) cal deg−1 mol−1 . Still Pauling’s value is safely within one standard deviation. Modern electronic equipment should allow for a much better precision. We think that an experimental veriﬁcation of the diﬀerence to Pauling’s estimate would be an outstanding conﬁrmation of structures imposed by the ice rules. Our calculations are based on two simple statistical models, which reﬂect Pauling’s arguments. In the ﬁrst model, called six-state H2 O molecule model, we allow for six distinct orientations of each H2 O molecule and deﬁne its energy by E=− h(b, s1b , s2b ). (4.70) b

Here, the sum is over all bonds b of the lattice and (s1b and s2b indicate the dependence on the states of the two H2 O molecules, which are connected by the bond) 1 for a hydrogen bond, 1 2 h(b, sb , sb ) = (4.71) 0 otherwise. In the second model, called two-state H-bond model, we do not consider distinct orientations of the molecule, but allow two positions for each hydrogen nucleus on the bonds. The energy is deﬁned by f (s, b1s , b2s , b3s , b4s ), (4.72) E=− s

where the sum is over all sites (oxygen atoms) of the lattice. The function f is given by ⎧ ⎪ ⎨2 for two hydrogen nuclei close to s, f (s, b1s , b2s , b3s , b4s ) = 1 for one or three hydrogen nuclei close to s, (4.73) ⎪ ⎩ 0 for zero or four hydrogen nuclei close to s. The groundstates of each model fulﬁll the ice rules. The results of a multicanonical simulation will give an accurate estimate of the density of states n(E) from (4.24), and we can write Ω(E) = Cn(E).

(4.74)

78

Y. Okamoto

At β = 0, the number of states is 6N for the six-state model and 22N for the two-state model. Once these normalizations at β = 0 are given, the proportionality constant C can be determined from the results of the multicanonical simulations [24]. Hence, one can obtain an accurate estimate of the number of the lowest-energy state, Ω(E0 ), where E0 is the energy of the lowest-energy state. Using periodic boundary conditions (BCs), our simulations are based on a lattice construction set up earlier by Berg [135]. We have performed multicanonical MC simulations for the two models with the lattice sizes that correspond to the number of water molecules N = 128, 360, 576, 896, and 1,600. Combining the two ﬁt results in the thermodynamic limit (N → ∞) leads to our ﬁnal estimate Ω1MUCA = 1.50738 (16).

(4.75)

S0MUCA = 0.81550 (21) cal deg−1 mol−1

(4.76)

This converts into

for the residual entropy [126]. This is at present the most accurate value for the residual entropy of the ordinary ice. The next example is the multicanonical MD simulations of the C-peptide of ribonuclease A in explicit water [136]. In the model of simulations, the N-terminus and the C-terminus of the C-peptide analogue were blocked with the acetyl group and the N -methyl group, respectively. The number of amino acids is 13 and the amino-acid sequence is Ace-Ala-Glu− -Thr-Ala-Ala-AlaLys+ -Phe-Leu-Arg+ -Ala-His+ -Ala-Nme [137,138]. The initial conﬁguration of our simulation was ﬁrst generated by a high temperature molecular dynamics simulation (at T = 1,000 K) in gas phase, starting from a fully extended conformation. We randomly selected one of the structures that do not have any secondary structures such as α-helix and β-sheet. The peptide was then solvated in a sphere of radius 22 ˚ A, in which 1,387 water molecules were included (see Fig. 4.1). Harmonic restraint was applied to prevent the water molecules from going out of the sphere. The total number of atoms is 4,365. The dielectric constant was set equal to 1.0. The force-ﬁeld parameters for protein were taken from the all-atom version of AMBER parm99 [141], which was found to be suitable for studying helical peptides [142], and TIP3P model [143] was used for water molecules. The unit time step, Δt, was set to 0.5 fs. As a production run, we carried out a 15 ns multicanonical MD simulation and the results of this production run were analyzed in detail. In Fig. 4.2a we show the time series of potential energy from this production run. We indeed observe a random walk covering as much as 5,000 kcal mol−1 of energy range (note that 23 kcal mol−1 ≈1 eV). We show in Fig. 4.2b the average potential energy as a function of temperature, which was obtained from the trajectory of the production run by the reweighting techniques in (4.22) and (4.24). The average potential energy monotonically increases as the temperature increases.

4 Generalized-Ensemble Algorithms for Studying Protein Folding

79

Fig. 4.1. The initial conﬁguration of C-peptide in explicit water. The ﬁlled circles stand for the oxygen atoms of water molecules. The number of water molecules is 1,387, and they are placed in a sphere of radius 22 ˚ A. As for the peptide, besides the backbone structure (in dark gray), side chains of only Glu− -2, Phe-8, Arg+ -10, and His+ -12 are shown (in light gray). The ﬁgure was created with Molscript [139] and Raster3D [140]

a

−8000

b

E [kcal/mol]

−9000 −10000 −11000 −12000 −13000 −14000

0

2

4

6

8

10

Time [nsec]

12

14

300 350 400 450 500 550 600 650 700

T [K]

Fig. 4.2. Time series of potential energy of the C-peptide system from the multicanonical MD production run (a) and the average potential energy as a function of temperature (b). The latter was obtained from the trajectory of the multicanonical MD production run by the single-histogram reweighting techniques

By analyzing the free energy landscape, we identiﬁed three distinct local minima in free energy. We show representative conformations at these minima in Fig. 4.3. The structure of the global-minimum free-energy state (GM) has a partially distorted α-helix with the salt bridge between Glu− -2 and Arg+ -10. The structure is in good agreement with the experimental structure obtained by both NMR and X-ray experiments. In this structure, there also exists a

80

Y. Okamoto

Fig. 4.3. Representative structures at the global-minimum free-energy state ((a) GM) and the two local-minimum states ((b) LM1 and (c) LM2). As for the peptide structures, besides the backbone structure, side chains of only Glu− -2, Phe-8, Arg+ 10, and His+ -12 are shown in ball-and-stick model

contact between Phe-8 and His+ -12. This contact is again observed in the corresponding residues of the X-ray structure. At LM1, the structure has a contact between Phe-8 and His+ -12, but the salt bridge between Glu− -2 and Arg+ -10 is not formed. On the other hand, the structure at LM2 has this salt bridge, but it does not have a contact between Phe-8 and His+ -12. Thus, only the structures at GM satisfy all the interactions that have been observed by the X-ray and other experimental studies. The next example is the results of the multibaric-multithermal MD simulation [144, 145]. This simulation was performed for a system consisting of one alanine dipeptide molecule ((S)-2-(acetylamino)-N -methylpropanamide) and 63 water molecules. We used the AMBER parm96 force ﬁeld [146] for the alanine dipeptide molecule and the TIP3P [143] rigid-body model for the water molecules. The initial values of the alanine-dipeptide dihedral angles were set to be φ = ψ = 180◦ . We employed a cubic unit cell with periodic boundary conditions. The electrostatic potential was calculated by the Ewald method. We calculated the van der Waals interaction, which is given by the Lennard–Jones 12-6 term, of all pairs of the atoms within the minimum image convention instead of introducing the spherical potential cutoﬀ. Here, we used the symplectic time-development formalism [147], which is based on the Nos´e-Poincar´e thermostat [148, 149], the Andersen barostat [121], and the symplectic quaternion scheme [150]. The time step was taken as Δt = 0.5 fs. Figure 4.4a–c shows the time series of potential energy E in the isobaricisothermal MD simulation at (T0 , P0 ) = (240 K, 0.1 MPa), (298 K, 0.1 MPa), and (298 K, 300 MPa), respectively. The potential energy ﬂuctuates in narrow ranges. On the other hand, Fig. 4.4d shows that the MUBATH MD simulation realizes a random walk in the potential-energy space and covers a wide energy range. Figures 4.5a–c show the time series of volume V obtained by the conventional isobaric-isothermal MD simulations. The volume ﬂuctuates in narrow ranges. The MUBATH MD simulation, on the other hand, performs a random walk that covers a range of V = 1.8 ∼ 3.5 nm3 , as shown in Fig. 4.5d, which is 3–5 times wider than that by the isobaric-isothermal MD simulations.

4 Generalized-Ensemble Algorithms for Studying Protein Folding −2

E / (100 kcal/mol)

E/(100 kcal/mol)

−3 −4 −5 −6 −7 −8 0.0

c

b

0.2

0.4 0.6 t / ns

0.8

−2

d

−3 −4 −5 −6 −7 −8 0.0

0.2

0.4 0.6 t / ns

0.8

1.0

−2 −3 −4 −5 −6 −7 −8 0.0

1.0

E / (100 kcal/mol)

E/(100 kcal/mol)

a

81

0.2

0.4 0.6 t / ns

0.8

1.0

0.2

0.4 0.6 t / ns

0.8

1.0

−2 −3 −4 −5 −6 −7 −8 0.0

Fig. 4.4. Time series of potential energy E from (a) the conventional isobaric– isothermal MD simulation at T0 = 240 K and P0 = 0.1 MPa; (b) the conventional isobaric–isothermal MD simulation at T0 = 298 K and P0 = 0.1 MPa; (c) the conventional isobaric–isothermal MD simulation at T0 = 298 K and P0 = 300 MPa; and (d) the multibaric–multithermal MD simulation

The probability distributions P (φ, ψ) of φ and ψ at wide ranges of temperature and pressure have been calculated by the reweighting techniques. The MUBATH MD simulation sampled not only the states of PII and C5 but also the states of αR , αP , and αL . The volume under the surface P (φ, ψ) around each peak corresponds to the population W of each state. To calculate W , the whole (φ, ψ) plane was divided into six states as listed in Table 4.1. For example, the population WPII of the PII state is calculated by the integral of P (φ, ψ) in the area in which φ and ψ take the PII conﬁguration: WPII = dφdψP (φ, ψ) , (4.77) (φ,ψ)∈PII

where the integration range of (φ, ψ) stands for the range for the corresponding state in Table 4.1. The population of each state at T = 298 K and P = 0.1 MPa is also shown in Table 4.1. Estimation of the partial molar enthalpy and partial molar volume is important in solution chemistry, because these values control the population of

82

Y. Okamoto

a

b

4.0 3.5

V /nm3

V /nm3

3.5

3.0

3.0

2.5

2.5

2.0 1.5 0.0

4.0

2.0

0.2

0.4

0.6

0.8

1.5 0.0

1.0

0.2

0.4

t/ns

c

d

4.0

0.8

1.0

0.6 t /ns

0.8

1.0

4.0

V / nm3

3.5

V / nm3

3.5 3.0

3.0

2.5

2.5

2.0 1.5 0.0

0.6 t /ns

2.0

0.2

0.4

0.6

0.8

1.0

1.5 0.0

t/ns

0.2

0.4

Fig. 4.5. Time series of volume V from (a) the conventional isobaric–isothermal MD simulation at T0 = 240 K and P0 = 0.1 MPa; (b) the conventional isobaric– isothermal MD simulation at T0 = 298 K and P0 = 0.1 MPa; (c) the conventional isobaric–isothermal MD simulation at T0 = 298 K and P0 = 300 MPa; and (d) the multibaric–multithermal MD simulation Table 4.1. The dihedral-angle ranges of (φ, ψ) for six states and their population at T = 298 K and P = 0.1 MPa, which were obtained by the reweighting techniques from the MUBATH MD simulation State PII C5 αR αP αL Cax 7

φ

ψ

Population

(−100◦ , 0◦ ) (120◦ , −100◦ ) (−100◦ , 0◦ ) (120◦ , −100◦ ) (0◦ , 120◦ ) (0◦ , 120◦ )

(30◦ , −120◦ ) (30◦ , −120◦ ) (−120◦ , 30◦ ) (−120◦ , 30◦ ) (0◦ , 120◦ ) (120◦ , 0◦ )

0.412(18) 0.496(20) 0.041(6) 0.046(10) 0.004(4) 0.0008(7)

The numbers in parentheses for the population are the estimated uncertainties

each state when temperature and pressure are changed. It is the MUBATH algorithm that enables us to calculate the partial molar enthalpy and partial molar volume accurately.

4 Generalized-Ensemble Algorithms for Studying Protein Folding

83

Figure 4.6 shows the population ratios of WC5 /WPII , WαR /WPII , WαP /WPII , and WαL /WPII as functions of the inverse of temperature 1/T at the constant pressure of P = 0.1 MPa. The error bars were estimated by the jackknife method [151]. As temperature increases, WC5 /WPII , WαR /WPII , and WαP /WPII increase, although the error bars of WαL /WPII are too large to discuss its temperature dependence. Thermodynamics tells that the increase in temperature at constant pressure causes the increase in enthalpy. The increases in the population ratios W/WPII against the PII state by the temperature increase indicate that enthalpy for the C5 , αR , and αP states is higher than that of the PII state. The diﬀerence of partial molar enthalpy ΔH of the C5 state from that of the PII state is, for example, calculated from the derivative of WC5 /WPII with respect to 1/T : ! ∂ log(WC5 /WPII ) , (4.78) ΔH = −R ∂(1/T ) P where R is the gas constant. The derivative of WC5 /WPII was calculated here by the least-squares ﬁtting. The error bars were estimated again by the jackknife method [151]. These enthalpy diﬀerences are listed in Table 4.2.

−1

−4 2.5

1

0

C5/PII

−2 −3

c

1

αR/PII 3.5 3.0 1/T/(10−3/K)

4.0

αP /PII

−1 −2 −3 −4 2.5

αL / PII

0 log (W/WPII)

0 log (W/WPII)

b

1

log (W/WPII)

a

3.5 3.0 1/T/(10−3/K)

4.0

−1 −2 −3 −4 2.5

3.5 3.0 1/T/(10−3/K)

4.0

Fig. 4.6. The population ratios as functions of the inverse of temperature 1/T at constant pressure of P = 0.1 MPa, which was obtained by the reweighting techniques from the results of the multibaric–multithermal MD simulation: (a) those of WC5 /WPII and WαR /WPII , (b) that of WαP /WPII , and (c) that of WαL /WPII t Table 4.2. Diﬀerences of partial molar enthalpy ΔH (kJ mol−1 ) and partial molar volume ΔV (cm3 mol−1 ) of the C5 , αR , αP , and αL states from that of the PII state ΔH (kJ mol−1 )

ΔV (cm3 mol−1 )

State MUBATH MD Raman C5 αR αP αL

1.1 ± 0.9 10.8 ± 2.8 7.2 ± 4.3 −3 ± 56

2.5 4.4 − −

MUBATH MD

Raman

0.7 ± 0.9 −1.2 ± 5.4 2.8 ± 2.6 −8.1 ± 11.9

0.1 1.1 − −

The Raman spectroscopy data [152] are also given

C5/PII

0 log (W/WPII)

b

1

−1

aR/PII

−2 −3 −4

0

100 200 P/PMa

c

1

300

aP /PII

−1 −2 −3 −4

aL / PII

0

0 log (W/WPII)

a

Y. Okamoto

log (W/WPII)

84

0

100 200 P/PMa

300

−4 −8

−12

0

100 200 P/PMa

300

Fig. 4.7. The population ratios as functions of pressure P at constant temperature of T = 298 K, which was obtained by the reweighting techniques from the results of the multibaric–multithermal MD simulation: (a) those of WC5 /WPII and WαR /WPII , (b) that of WαP /WPII , and (c) that of WαL /WPII

Table 4.2 also lists the experimental data by Raman spectroscopy for the C5 and αR states [152]. Considering the errors, the diﬀerences of the partial molar enthalpy ΔH by the MUBATH MD simulation agree well with those by the Raman spectroscopy. Figure 4.7 shows the population ratios of WC5 /WPII , WαR /WPII , WαP / WPII , and WαL /WPII as functions of pressure P at the constant temperature of T = 298 K. As pressure increases, both WC5 /WPII and WαP /WPII decrease, although the WαR /WPII and WαL /WPII data have too large error bars to discuss their pressure dependence. The increase in pressure at constant temperature generally causes the decrease in volume. The decreases in WC5 /WPII and WαP /WPII means that the volumes of the C5 and αP states are larger than that of the PII . The diﬀerence of partial molar volume ΔV of the C5 state from that of the PII state is, for instance, calculated from the derivative of WC5 /WPII with respect to pressure P by ! ∂ log(WC5 /WPII ) . (4.79) ΔV = −RT ∂P T The diﬀerence between the partial molar volume of the αR , αP , and αL states and that of the PII state was also obtained in the same way. These volume diﬀerences are shown in Table 4.2. The partial molar volume diﬀerence ΔV between C5 and PII and that between αR and PII obtained by the MUBATH MD simulation agree well with those by the Raman spectroscopy. The MUBATH method has the merits of both multicanonical algorithm and isobaric-isothermal method. It can escape from local-minimum free-energy states and speciﬁc temperature and pressure. From a single MUBATH simulation run, we could obtain thermodynamic quantites at pressure ranging from 1 MPa to several hundred MPa. Hence, this generalized-ensemble algorithm is particularly suitable for studying pressure-induced denaturation of proteins. The next example is the results of the applications of REM MC simulations to the prediction of membrane protein structures [153–156].

4 Generalized-Ensemble Algorithms for Studying Protein Folding

85

It is estimated that 20–30% of all genes in most genomes encode membrane proteins [157]. However, only a small number of detailed structures have been obtained for membrane proteins because of technical diﬃculties in experiments such as high quality crystal growth. Therefore, it is desirable to develop a method for predicting membrane protein structures by computer simulations. Our method consists of two parts. In the ﬁrst part, amino-acid sequences of the transmembrane helix regions of the target protein are identiﬁed. It is already established that the transmembrane helical segments can be predicted by analyzing mainly the hydrophobicity of amino-acid sequences, without having any information about the higher order structures. There exist many WWW servers such as TMHMM [157], MEMSAT [158], SOSUI [159], and HMMTOP [160], in which given the amino-acid sequence of a protein they judge whether the protein is a membrane protein or not and (if yes) predict the regions in the amino-acid sequence that correspond to the transmembrane helices. In the second part, we perform a REM simulation of these transmembrane helices that were identiﬁed in the ﬁrst part. Given the amino-acid sequences of transmembrane helices, we ﬁrst construct α-helices of these sequences. For our simulations, we introduce the following rather drastic approximations. (1) We treat the backbone of the α-helices as rigid body and only side-chain structures are made ﬂexible. (2) We neglect the rest of the amino acids of the membrane protein (such as loop regions). (3) We neglect surrounding molecules such as lipids. In principle, we can also use molecular dynamics method, but we employ Monte Carlo algorithm here. We update conﬁgurations with rigid translations and rigid rotations of each α-helix and torsion rotations of side chains. We use a standard force ﬁeld such as CHARMM [161, 162] for the potential energy of the system. We also add the following simple harmonic constraints to the original force-ﬁeld energy: Econstr =

N H −1

k1 θ (ri,i+1 − di,i+1 ) [ri,i+1 − di,i+1 ]

2

i=1

+

NH " #2 k2 θ ziL − z0L − dLi ziL − z0L − dLi i=1

+

" U #2 zi − z0U − dU + k2 θ ziU − z0U − dU i i 2

k3 θ (rCα − dCα ) [rCα − dCα ] ,

(4.80)

Cα

where NH is the total number of transmembrane helices in the protein and θ(x) is the step function: 1 , for x ≥ 0, θ(x) = (4.81) 0 , otherwise,

86

Y. Okamoto

and k1 , k2 , and k3 are the force constants of the harmonic constraints; ri,i+1 is the distance between the C atom of the C-terminus of the ith helix and the N atom of the N-terminus of the (i + 1)th helix; ziL and ziU are the z-coordinate values of the Cα (or C) atom of the N-terminus (or C-terminus) of the ith helix near the ﬁxed lower boundary value z0L and the upper boundary value z0U of the membrane, respectively; rCα are the distance of Cα atoms from the origin; and di,i+1 , dLi , dU i , and dCα are the corresponding central value constants of the harmonic constraints. The ﬁrst term in (4.80) is the energy that constrains pairs of adjacent helices along the amino-acid chain not to be apart from each other too much (loop constraints). This term has a nonzero value only when the distance ri,i+1 becomes longer than di,i+1 . The second term in (4.80) is the energy that constrains helix N-teminus and C-terminus to be located near membrane boundary planes. This term has a nonzero value only when the C atom of each helix C-terminus and Cα atom of each helix N-terminus are apart more than dLi (or dU i ). Based on the knowledge that most membrane proteins are placed in parallel, this constraint energy is included so that helices are not too much apart from the perpendicular orientation with respect to the membrane boundary planes. The third term in (4.80) is the energy that constrains all Cα atoms within the sphere (centered at the origin) of radius dCα . This term has a nonzero value only when Cα atoms go out of this sphere. The term is introduced so that the center of mass of the molecule stays near the origin. The radius of the sphere is set to a large value to guarantee that a wide conformational space is sampled. In the ﬁrst part of the present method, we obtain amino-acid sequences of the transmembrane helix regions from existing WWW servers such as those in [157–160]. However, the precision of these programs in the WWW servers is about 85% and needs improvement. We thus focus our attention on the eﬀectiveness of the second part of our method, leaving this improvement to the developers of the WWW servers. Namely, we use the experimentally known amino-acid sequence of helices (without relying on the WWW servers) and try to predict their conformations, following the prescription of the second part of our method described earlier. The results that we present here are those of bacteriorhodopsin [156]. We thus have NH = 7. Other parameter values that we used in (4.80) are k1 = A−2 , di,i+1 = 20.0 ˚ A, k2 = 1.0 (kcal mol−1 ) ˚ A−2 , z0L = 0.0 ˚ A, 1.0 (kcal mol−1 ) ˚ U U L −1 −2 A, d = d = 2.0 ˚ A, k3 = 0.05 (kcal mol ) ˚ A , and dCα = 100 ˚ A. z0 = 31.5 ˚ We performed a REM MC simulation of 168,000,000 MC steps. We used the following 32 temperatures: 200, 218, 238, 260, 284, 310, 338, 369, 410, 455, 505, 561, 623, 691, 768, 853, 947, 1,052, 1,125, 1,202, 1,285, 1,374, 1,469, 1,642, 1,835, 2,051, 2,293, 2,679, 3,132, 3,660, 4,278, and 5,000 K. This temperature distribution was chosen so that all the acceptance ratios of replica exchange are almost uniform and suﬃciently large (>10%) for computational eﬃciency. The highest temperature was chosen suﬃciently high so that no trapping in local-minimum-energy states occurs. Replica exchange was attempted once at every 50 MC steps.

4 Generalized-Ensemble Algorithms for Studying Protein Folding

87

Fig. 4.8. Typical snapshots from the REM simulation for Replica 14. The conﬁgurations were taken at the 43,146,000-th MC step (a), at the 47,664,000-th MC step (b), at the 48,155,000-th MC step (c), at the 48,822,000-th MC step (d), at the 49,500,000-th MC step (e), and at the 58,398,000-th MC step (f ). The RMSD from the native conﬁguration is 7.78 ˚ A (a), 10.84 ˚ A (b), 15.18 ˚ A (c), 14.76 ˚ A (d), 11.71 ˚ A (e), and 5.72 ˚ A (f ) with respect to all Cα atoms. The corresponding temperatures are 3,132 K (a), 2,679 K (b), 3,132 K (c), 3,132 K (d), 2,051 K (e), and 561 K (f ). The color of the helices from the N terminus is as follows: Helix A (blue), Helix B (aqua), Helix C (green), Helix D (yellow-green), Helix E (yellow ), Helix F (orange), and Helix G (red ). The ﬁgures were created with RasMol [163]

In Fig. 4.8, typical snapshots of one of the 32 replicas, Replica 14, from the REM simulation are shown. In Fig. 4.8a, the helix conﬁguration is diﬀerent from the native one (see Fig. 4.9a below). In particular, Helix G is trapped in the center. As the simulation proceeds, the temperature becomes high and then drops to low values by the replica-exchange process, and the same helix conﬁguration (“topology”) as the native one is ﬁnally obtained in Fig. 4.8f. These ﬁgures conﬁrm that our simulations indeed sampled a wide conﬁgurational space. We see that the REM simulation performs random walks not only in energy space but also in conformational space and that they do not get trapped in one of a huge number of local-minimum-energy states. In Fig. 4.9, the PDB structure and the smallest RMSD structure obtained by the REM simulation are compared. The retinal molecule is included in the native PDB structure (Fig. 4.9a), but it was not used in our simulation. Nevertheless, the structure obtained by Replica 14 (Fig. 4.9b) has the same

88

Y. Okamoto

Fig. 4.9. (a) The PDB structure of bacteriorhodopsin (PDB code: 1C3W) with retinal. (b) The smallest RMSD conﬁguration that was obtained by the REM simulation. (a1), (a2) and (b1), (b2) are the same structures viewed from diﬀerent angles (from top and from side), respectively. Dark-color atoms in the center in (a) represent the retinal (a) was drawn by eliminating the loop regions and lipids from the PDB ﬁle. The RMSD of the structure in (b) from the native structure of (a) is 4.42 ˚ A with respect to all Cα atoms. The ﬁgures were created with RasMol [163]

helix topology (relative helix conﬁguration) as the native structure. Their structures are indeed quite similar to each other. We remark that the initial conformation of Replica 14 is very diﬀerent from the native one (RMSD = 16.39 ˚ A). It is indeed remarkable that we could obtain a native-like structure from a random initial conformation, even though we neglected loop regions, retinal, lipids, surrounding water molecules in our simulation. This suggests that the helix–helix interactions are the main driving force in the ﬁnal stage of the structure formation of membrane proteins. The ﬁnal example is the results of the applications of REMD simulations to the folding of a small protein, namely, the B1 domain of streptococcal protein G [164]. The simulations were performed on the Earth Simulator. Protein G consists of 56 amino acids, and the total number of atoms in the protein is 855. For the force ﬁelds, we used OPLS-AA/L [165] for the protein molecule and TIP3P [143] for water molecules. We ﬁrst performed a REMD simulation of protein G in vacuum with 96 replicas. The initial conformation of the REMD simulation was a fully extended one. We then solvated one of the obtained

4 Generalized-Ensemble Algorithms for Studying Protein Folding

89

Fig. 4.10. The canonical probability distributions of the total potential energy of protein G obtained from the REMD simulation with 224 temperatures. They are all bell-shaped with suﬃcient overlaps with the neighboring ones

Fig. 4.11. Snapshots from the REMD simulation of protein G in explicit solvent

˚. The total number compact conformation in a sphere of water of radius 50 A of water molecules was 17,187 (the total number of atoms was then 52,416 including the protein atoms). Using 112 nodes of the Earth Simulator, we performed a REMD simulation of this system with 224 replicas. The REMD simulation was successful in the sense that we observed a random walk in potential energy space, which suggests that a wide conformational space was sampled. In Fig. 4.10 we show the canonical probability distributions of the total potential energy at the corresponding 224 temperatures ranging from 250 to 700 K. As is clear from the Figure, all the adjacent distributions have suﬃcient overlaps with the neighboring ones, suggesting that this REMD simulation was successful. We indeed observed a random walk in the potential energy space. This random walk in potential energy space induced a random walk in the conformational space, and we indeed observed many occasions of the formation of native-like secondary structures (α-helix and β-strands) during the REMD simulation. In Fig. 4.11 we show some of the snapshots from this REMD simulation. Although we did observe lots of native-like secondary-structure formations, the simulation has not reached the native structure yet. We have to improve force-ﬁeld parameters and need more computation time.

90

Y. Okamoto

4.5 Conclusions In this article, we have reviewed some of powerful generalized-ensemble algorithms for both Monte Carlo simulations and molecular dynamics simulations. A simulation in generalized ensemble realizes a random walk in potential energy space, alleviating the multiple-minima problem that is a common diﬃculty in simulations of complex systems with many degrees of freedom. Detailed formulations of the two well-known generalized-ensemble algorithms, namely, multicanonical algorithm (MUCA) and replica-exchange method (REM), were given. We then introduced further extensions of the above two methods. We have shown the eﬀectiveness of these algorithms by applying them to various biomolecular systems. Acknowledgements The author thanks his co-workers for useful discussions. In particular, he is grateful to Drs. B.A. Berg, M. Kawata, A. Kitao, H. Kokubo, M. Mikami, A. Mitsutake, C. Muguruma, T. Nishikawa, T. Okabe, H. Okumura, Y. Sugita, and T. Yoda for collaborations that led to the results presented in the present article. The computations were performed on the Earth Simulator, computers at the Computer Center in the Institute for Molecular Science, and those at the Nagoya University Computer Center. This work was supported, in part, by Grants-in-Aid for Scientiﬁc Research in Priority Areas (“Water and Biomolecules”), for the Next Generation Super Computing Project, Nanoscience Program from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan, and for JST-BIRD Project.

References 1. U.H.E. Hansmann, Y. Okamoto, in Annual Reviews of Computational Physics VI, ed. by D. Stauﬀer (World Scientiﬁc, Singapore, 1999), pp. 129–157 2. A. Mitsutake, Y. Sugita, Y. Okamoto, Biopolymers (Peptide Science) 60, 96–123 (2001) 3. Y. Sugita, Y. Okamoto, in Lecture Notes in Computational Science and Engineering, ed. by T. Schlick, H.H. Gan (Springer-Verlag, Berlin, 2002), pp. 304–332; e-print: cond-mat/0102296 4. Y. Okamoto, J. Mol. Graphics Mod. 22, 425–439 (2004); e-print: cond-mat/ 0308360 5. H. Kokubo, Y. Okamoto, Mol. Sim. 32, 791–801 (2006) 6. S.G. Itoh, H. Okumura, Y. Okamoto, Mol. Sim. 33, 47–56 (2007) 7. Y. Sugita, A. Mitsutake, Y. Okamoto, in Lecture Notes in Physics, ed. by W. Janke (Springer-Verlag, Berlin, 2008), pp. 369–407; e-print: arXiv:0707.3382v1 [cond-mat.stat-mech] 8. A.M. Ferrenberg, R.H. Swendsen, Phys. Rev. Lett. 61, 2635–2638 (1988); 63, 1658 (1989)

4 Generalized-Ensemble Algorithms for Studying Protein Folding

91

9. A.M. Ferrenberg, R.H. Swendsen, Phys. Rev. Lett. 63, 1195–1198 (1989) 10. S. Kumar, D. Bouzida, R.H. Swendsen, P.A. Kollman, J.M. Rosenberg, J. Comput. Chem. 13, 1011–1021 (1992) 11. B.A. Berg, T. Neuhaus, Phys. Lett. B 267, 249–253 (1991) 12. B.A. Berg, T. Neuhaus, Phys. Rev. Lett. 68, 9–12 (1992) 13. B.A. Berg, Fields Institute Communications 26, 1–24 (2000); also see e-print: cond-mat/9909236 14. W. Janke, Physica A 254, 164–178 (1998) 15. J. Lee, Phys. Rev. Lett. 71, 211–214 (1993); 71, 2353 (1993) 16. M. Mezei, J. Comput. Phys. 68, 237–248 (1987) 17. C. Bartels, M. Karplus, J. Phys. Chem. B 102, 865–880 (1998) 18. G.M. Torrie, J.P. Valleau, J. Comput. Phys. 23, 187–199 (1977) 19. J.S. Wang, R.H. Swendsen, J. Stat. Phys. 106, 245–285 (2002) 20. F. Wang, D.P. Landau, Phys. Rev. Lett. 86, 2050–2053 (2001) 21. F. Wang, D.P. Landau, Phys. Rev. E 64, 056101 (2001) 22. Q. Yan, R. Faller, J.J. de Pablo, J. Chem. Phys. 116, 8745–8749 (2002) 23. S. Trebst, D.A. Huse, M. Troyer, Phys. Rev. E 70 046701 (2004) 24. B.A. Berg, T. Celik, Phys. Rev. Lett. 69, 2292–2295 (1992) 25. B.A. Berg, U.H.E. Hansmann, T. Neuhaus, Phys. Rev. B 47, 497–500 (1993) 26. W. Janke, S. Kappler, Phys. Rev. Lett. 74, 212–215 (1995) 27. B.A. Berg, W. Janke, Phys. Rev. Lett. 80, 4771–4774 (1998) 28. N. Hatano, J.E. Gubernatis, Prog. Theor. Phys. (Suppl.) 138, 442–447 (2000) 29. B.A. Berg, A. Billoire, W. Janke, Phys. Rev. B 61, 12143–12150 (2000) 30. U.H.E. Hansmann, Y. Okamoto, J. Comput. Chem. 14, 1333–1338 (1993) 31. U.H.E. Hansmann, Y. Okamoto, Physica A 212, 415–437 (1994) 32. M.H. Hao, H.A. Scheraga, J. Phys. Chem. 98, 4940–4948 (1994) 33. Y. Okamoto, U.H.E. Hansmann, J. Phys. Chem. 99, 11276–11287 (1995) 34. N.B. Wilding, Phys. Rev. E 52, 602–611 (1995) 35. A. Kolinski, W. Galazka, J. Skolnick, Proteins 26, 271–287 (1996) 36. N. Urakami, M. Takasu, J. Phys. Soc. Jpn. 65, 2694–2699 (1996) 37. S. Kumar, P. Payne, M. V´ asquez, J. Comput. Chem. 17, 1269–1275 (1996) 38. U.H.E. Hansmann, Y. Okamoto, F. Eisenmenger, Chem. Phys. Lett. 259, 321–330 (1996) 39. U.H.E. Hansmann, Y. Okamoto, Phys. Rev. E 54, 5863–5865 (1996) 40. U.H.E. Hansmann, Y. Okamoto, J. Comput. Chem. 18, 920–933 (1997) 41. H. Noguchi, K. Yoshikawa, Chem. Phys. Lett. 278, 184–188 (1997) 42. N. Nakajima, H. Nakamura, A. Kidera, J. Phys. Chem. B 101, 817–824 (1997) 43. C. Bartels, M. Karplus, J. Comput. Chem. 18, 1450–1462 (1997) 44. J. Higo, N. Nakajima, H. Shirai, A. Kidera, H. Nakamura, J. Comput. Chem. 18, 2086–2092 (1997) 45. Y. Iba, G. Chikenji, M. Kikuchi, J. Phys. Soc. Jpn. 67, 3327–3330 (1998) 46. A. Mitsutake, U.H.E. Hansmann, Y. Okamoto, J. Mol. Graphics Mod. 16, 226–238; 262–263 (1998) 47. U.H.E. Hansmann, Y. Okamoto, J. Phys. Chem. B 103, 1595–1604 (1999) 48. H. Shimizu, K. Uehara, K. Yamamoto, Y. Hiwatari, Mol. Sim. 22, 285–301 (1999) 49. S. Ono, N. Nakajima, J. Higo, H. Nakamura, Chem. Phys. Lett. 312, 247–254 (1999) 50. A. Mitsutake, Y. Okamoto, J. Chem. Phys. 112, 10638–10647 (2000)

92

Y. Okamoto

51. K. Sayano, H. Kono, M.M. Gromiha, and A. Sarai, J. Comput. Chem. 21, 954–962 (2000) 52. F. Yasar, T. Celik, B.A. Berg, H. Meirovitch, J. Comput. Chem. 21, 1251–1261 (2000) 53. A. Mitsutake, M. Kinoshita, Y. Okamoto, F. Hirata, Chem. Phys. Lett. 329, 295–303 (2000) 54. M.S. Cheung, A.E. Garcia, J.N. Onuchic, Proc. Natl. Acad. Sci. U.S.A. 99, 685–690 (2002) 55. N. Kamiya, J. Higo, H. Nakamura, Protein Sci. 11, 2297–2307 (2002) 56. S.W. Jang, Y. Pak, S.M. Shin, J. Chem. Phys. 116, 4782–4786 (2002) 57. J.G. Kim, Y. Fukunishi, H. Nakamura, Phys. Rev. E 67, 011105 (2003) 58. N. Rathore, T.A. Knotts IV, J.J. de Pablo, J. Chem. Phys. 118, 4285–4290 (2003) 59. T. Terada, Y. Matsuo, A. Kidera, J. Chem. Phys. 118, 4306–4311 (2003) 60. B.A. Berg, H. Noguchi, Y. Okamoto, Phys. Rev. E 68, 036126 (2003) 61. M. Bachmann, W. Janke, Phys. Rev. Lett. 91, 208105 (2003) 62. H. Okumura, Y. Okamoto, Chem. Phys. Lett. 383, 391–396 (2004) 63. H. Okumura, Y. Okamoto, Chem. Phys. Lett. 391, 248–253 (2004) 64. S.G. Itoh, Y. Okamoto, Chem. Phys. Lett. 400, 308–313 (2004) 65. S.G. Itoh, Y. Okamoto, Phys. Rev. E 76, 026705 (2007) 66. T. Munakata, S. Oyama, Phys. Rev. E 54, 4394–4398 (1996) 67. K. Hukushima, K. Nemoto, J. Phys. Soc. Jpn. 65, 1604–1608 (1996) 68. K. Hukushima, H. Takayama, K. Nemoto, Int. J. Mod. Phys. C 7, 337–344 (1996) 69. C.J. Geyer, in Computing Science and Statistics: Proc. 23rd Symp. on the Interface, ed. by E.M. Keramidas (Interface Foundation, Fairfax Station, 1991), pp. 156–163 70. R.H. Swendsen, J.-S. Wang, Phys. Rev. Lett. 57, 2607–2609 (1986) 71. K. Kimura, K. Taki, in Proc. 13th IMACS World Cong. on Computation and Appl. Math. (IMACS ’91), ed. by R. Vichnevetsky, J.J.H. Miller, vol. 2, pp. 827–828 72. D.D. Frantz, D.L. Freeman, J.D. Doll, J. Chem. Phys. 93, 2769–2784 (1990) 73. M.C. Tesi, E.J.J. van Rensburg, E. Orlandini, S.G. Whittington, J. Stat. Phys. 82, 155–181 (1996) 74. E. Marinari, G. Parisi, J.J. Ruiz-Lorenzo, in Spin Glasses and Random Fields, ed. by A.P. Young (World Scientiﬁc, Singapore, 1998), pp. 59–98 75. Y. Iba, Int. J. Mod. Phys. C 12, 623–656 (2001) 76. U.H.E. Hansmann, Chem. Phys. Lett. 281, 140–150 (1997) 77. Y. Sugita, Y. Okamoto, Chem. Phys. Lett. 314, 141–151 (1999) 78. A. Irb¨ ack, E. Sandelin, J. Chem. Phys. 110, 12256–12262 (1999) 79. M.G. Wu, M.W. Deem, Mol. Phys. 97, 559–580 (1999) 80. Y. Sugita, A. Kitao, Y. Okamoto, J. Chem. Phys. 113, 6042–6051 (2000) 81. C.J. Woods, J.W. Essex, M.A. King, J. Phys. Chem. B 107, 13703–13710 (2003) 82. Y. Sugita, Y. Okamoto, Chem. Phys. Lett. 329, 261–270 (2000) 83. A. Mitsutake, Y. Okamoto, Chem. Phys. Lett. 332, 131–138 (2000) 84. D. Gront, A. Kolinski, J. Skolnick, J. Chem. Phys. 113, 5065–5071 (2000) 85. G.M. Verkhivker, P.A. Rejto, D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, T. Marrone, P.W. Rose, Chem. Phys. Lett. 337, 181–189 (2001)

4 Generalized-Ensemble Algorithms for Studying Protein Folding 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123.

93

H. Fukunishi, O. Watanabe, S. Takada, J. Chem. Phys. 116, 9058–9067 (2002) A. Mitsutake, Y. Sugita, Y. Okamoto, J. Chem. Phys. 118, 6664–6675 (2003) A. Mitsutake, Y. Sugita, Y. Okamoto, J. Chem. Phys. 118, 6676–6688 (2003) A. Sikorski, P. Romiszowski, Biopolymers 69, 391–398 (2003) C.Y. Lin, C.K. Hu, U.H.E. Hansmann, Proteins 52, 436–445 (2003) G. La Penna, A. Mitsutake, M. Masuya, Y. Okamoto, Chem. Phys. Lett. 380, 609–619 (2003) M. Falcioni, M.W. Deem, J. Chem. Phys. 110, 1754–1766 (1999) Q. Yan, J.J. de Pablo, J. Chem. Phys. 111, 9509–9516 (1999) T. Nishikawa, H. Ohtsuka, Y. Sugita, M. Mikami, Y. Okamoto, Prog. Theor. Phys. (Suppl.) 138, 270–271 (2000) D.A. Kofke, J. Chem. Phys. 117, 6911–6914 (2002) T. Okabe, M. Kawata, Y. Okamoto, M. Mikami, Chem. Phys. Lett. 335, 435–439 (2001) Y. Ishikawa, Y. Sugita, T. Nishikawa, Y. Okamoto, Chem. Phys. Lett. 333, 199–206 (2001) A.E. Garcia, K.Y. Sanbonmatsu, Proteins 42, 345–354 (2001) R.H. Zhou, B.J. Berne, R. Germain, Proc. Natl. Acad. Sci. U.S.A. 98, 14931–14936 (2001) A.E. Garcia, K.Y. Sanbonmatsu, Proc. Natl. Acad. Sci. U.S.A. 99, 2782–2787 (2002) R.H. Zhou, B.J. Berne, Proc. Natl. Acad. Sci. U.S.A. 99, 12777–12782 (2002) M. Feig, A.D. MacKerell, C.L. Brooks III, J. Phys. Chem. B 107, 2831–2836 (2003) Y.M. Rhee, V.S. Pande, Biophys. J. 84, 775–786 (2003) D. Paschek, A.E. Garcia, Phys. Rev. Lett. 93, 238105 (2004) D. Paschek, S. Gnanakaran, A.E. Garcia, Proc. Natl. Acad. Sci. USA 102, 6765–6770 (2005) J.W. Pitera, W. Swope, Proc. Natl. Acad. Sci. U.S.A. 100, 7587–7592 (2003) M.K. Fenwick, F.A. Escobedo, Biopolymers 68, 160–177 (2003) A. Mitsutake, Y. Okamoto, J. Chem. Phys. 121, 2491–2504 (2004) M.K. Fenwick, F.A. Escobedo, J. Chem. Phys. 119, 11998–12010 (2003) K. Murata, Y. Sugita, Y. Okamoto, Chem. Phys. Lett. 385, 1–7 (2004) A.K. Felts, Y. Harano, E. Gallicchio, R.M. Levy, Proteins 56, 310 (2004) A. Mitsutake, M. Kinoshita, Y. Okamoto, F. Hirata, J. Phys. Chem. B 108, 19002–19012 (2004) A. Baumketner, J.E. Shea, Biophys. J. 89, 1493 (2005) T. Yoda, Y. Sugita, Y. Okamoto, Proteins 66, 846–859 (2007) A.E. Roitberg, A. Okur, C. Simmerling, J. Phys. Chem. B 111, 2415–2418 (2007) T.W. Whitﬁeld, L. Bu, J.E. Straub, Physica A 305, 157–171 (2002) W. Kwak, U.H.E. Hansmann, Phys. Rev. Lett. 95, 138102 (2005) N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, J. Chem. Phys. 21, 1087–1092 (1953) S. Nos´e, Mol. Phys. 52, 255–268 (1984) S. Nos´e, J. Chem. Phys. 81, 511–519 (1984) H.C. Andersen, J. Chem. Phys. 72, 2384 (1980) I.R. McDonald, Mol. Phys. 23, 41 (1972) H. Okumura, Y. Okamoto, Phys. Rev. E 70, 026702 (2004)

94 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146.

147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161.

Y. Okamoto H. Okumura, Y. Okamoto, J. Phys. Soc. Jpn. 73, 3304–3311 (2004) H. Okumura, Y. Okamoto, J. Comput. Chem. 27, 379–395 (2006) B.A. Berg, C. Muguruma, Y. Okamoto, Phys. Rev. B 75, 092202 (2007) C. Muguruma, Y. Okamoto, B.A. Berg, Phys. Rev. E 78, 041113 (2008) L. Pauling, J. Am. Chem. Soc. 57, 2680 (1935) W.F. Giauque, M. Ashley, Phys. Rev. 43, 81 (1933) National Institute of Standards and Technology (NIST) at http://physics.nist. gov/cuu/ W.F. Giauque, J.W. Stout, J. Am. Chem. Soc. 58, 1144 (1936) L. Onsager, M. Dupuis, Re. Scu. Int. Fis. ‘Enrico Fermi’ 10, 294 (1960) J.F. Nagle, J. Math. Phys. 7, 1484 (1966) O. Haida, T. Matsuo, H. Suga, and S. Seki, J. Chem. Thermodynamics 6, 815 (1974) B.A. Berg, 2005 (unpublished). Y. Sugita, Y. Okamoto, Biophys. J. 88, 3180–3190 (2005) K.R. Shoemaker, P.S. Kim, E.J. York, J.M. Stewart, R.L. Baldwin, Nature 326, 563–567 (1987) K.R. Shoemaker, R. Fairman, D.A. Schultz, A.D. Robertson, E.J. York, J.M. Stewart, R.L. Baldwin, Biopolymers 29, 1–11 (1990) P.J. Kraulis, J. Appl. Crystallogr. 24, 946–950 (1991) E.A. Merritt, D.J. Bacon, Methods Enzymol. 277, 505–524 (1997) J. Wang, P. Cieplak, P.A. Kollman, J. Comput. Chem. 21, 1049-1074 (2000) T. Yoda, Y. Sugita, Y. Okamoto, Chem. Phys. Lett. 386, 460–467 (2004) W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey, M.L. Klein, J. Chem. Phys. 79, 926–935 (1983) H. Okumura, Y. Okamoto, Bull. Chem. Soc. Jpn. 80, 1114–1123 (2007) H. Okumura, Y. Okamoto, J. Phys. Chem. B 112, 12038–12049 (2008) P.A. Kollman, R. Dixon, W. Cornell, T. Fox, C. Chipot, A. Pohorille, in Computer Simulation of Biomolecular Systems, Vol. 3, ed. by A. Wilkinson, P. Weiner, W.F. van Gunsteren (Kluwer, Dordrecht, 1997), pp. 83–96 H. Okumura, S.G. Itoh, Y. Okamoto, J. Chem. Phys. 126, 084103 (2007) S.D. Bond, B.J. Leimkuhler, B.B. Laird, J. Comput. Phys. 151, 114 (1999) S. Nos´e, J. Phys. Soc. Jpn. 70, 75 (2001) T.F. Miller, M. Eleftheriou, P. Pattnaik, A. Ndirango, D. Newns, G.J. Martyna, J. Chem. Phys. 116, 8649 (2002) B.A. Berg, Introduction to Monte Carlo Simulations and Their Statistical Analysis, (World Scientiﬁc, Singapore, 2004) T. Takekiyo, T. Imai, M. Kato, Y. Taniguchi, Biopolymers 73, 283 (2004) H. Kokubo, Y. Okamoto, Chem. Phys. Lett. 383, 397–402 (2004) H. Kokubo, Y. Okamoto, J. Chem. Phys. 120, 10837–10847 (2004) H. Kokubo, Y. Okamoto, J. Phys. Soc. Jpn. 73, 2571–2585 (2004) H. Kokubo, Y. Okamoto, Chem. Phys. Lett. 392, 168–175 (2004) A. Krogh, B. Larsson, G.v. Heijne, E.L.L. Sonnhammer, J. Mol. Biol. 305, 567 (2001) D.T. Jones, W.R. Taylor, J.M. Thornton, Biochemistry 33, 3038 (1994) T. Hirokawa, S. Boon-Chieng, S. Mitaku, Bioinformatics 14, 378 (1998) G.E. Tusnady, I. Simon, J. Mol. Biol. 283, 489 (1998) W.E. Reiher III, Theoretical Studies of Hydrogen Bonding, Ph.D. Thesis, Department of Chemistry, Harvard University, Cambridge, MA, USA, 1985

4 Generalized-Ensemble Algorithms for Studying Protein Folding 162. 163. 164. 165.

95

E. Neria, S. Fischer, M. Karplus, J. Chem. Phys. 105, 1902 (1996) R.A. Sayle, E.J. Milner-White, Trends. Biochem. Sci. 20, 374 (1995) A. Mitsutake, Y. Sugita, T. Yoda, T. Nishikawa, Y. Okamoto, in preparation. G.A. Kaminski, R.A. Friesner, J. Tirado-Rives, W.L. Jorgensen, J. Phys. Chem. B 105, 474 (2001)

5 Protein Folding and Binding: Eﬀective Potentials, Replica Exchange Simulations, and Network Models A.K. Felts, M. Andrec, E. Gallicchio, and R.M. Levy

Abstract. Advances in computational biophysics depend on the development of accurate eﬀective potentials and powerful sampling methods to traverse rugged energy landscapes. We have developed an approach that makes use of the combined power of replica exchange simulations and a network model for kinetics. We carry out replica exchange simulations to generate a very large set of states using an allatom eﬀective potential function and construct a kinetic model for the folding, using an ansatz that allows kinetic transitions between states based on structural similarity. We are also using replica exchange simulations to study the binding of ligands to proteins such as cytochrome P450. A better understanding of the relationship between the physical kinetics of the systems being studied to their “kinetics” in the replica exchange ensemble is needed to use this new technology to maximum advantage. To illustrate some of the challenges, we will discuss the results using a network model to “simulate” replica exchange simulations of protein folding.

5.1 Introduction Molecular simulations of protein structural changes and ligand binding are built upon two foundations: (1) the design of eﬀective potentials, which are matched with the requirements of accuracy and speed appropriate to particular modeling problems, and (2) the design of algorithms to sample the eﬀective potentials in highly eﬃcient ways so as to facilitate the convergence of the simulations in a thermodynamic sense. Developing algorithms to satisfy the competing goals of accuracy and speed is at the heart of the problem. The protein folding problem is of fundamental importance in modern structural biology. Recent advances in experimental techniques have helped to elucidate thermodynamic and kinetic mechanisms that underlie diﬀerent stages of the folding process [1–6]. Computer simulations performed at various levels of molecular detail have played a central role in the interpretation of experimental studies. Molecular simulations using models based on fully atomic representations are becoming more accurate and more practical and are increasingly

98

A.K. Felts et al.

employed to simulate protein folding and predict protein structures [7–15]. Because of the large number of degrees of freedom, however, these simulations require extensive computer resources to obtain meaningful results, especially with explicit solvent models [16]. Because of this, many recent computational studies have been carried out with implicit solvent models [15, 17–20]. The question of how well implicit solvent eﬀective potentials when combined with detailed atomic protein models can predict thermodynamic as well as kinetic aspects of protein folding is under active investigation [9, 10, 12, 13, 15, 19–27]. Numerous stringent requirements make the development of practically useful solvation-free energy models for biological applications very challenging. To be applicable to ligand binding aﬃnity prediction, the model should be accurate over a wide range of molecular sizes and over a wide range of functional groups. To study protein folding, allosteric reactions, and ﬂexible receptor and ligand docking, the model must be able to describe hydration free energy differences between diﬀerent molecules as well as diﬀerent conformations of the same molecule. Finally, the model needs to be computationally eﬃcient and should be expressed in analytical form with analytical gradients for seamless incorporation in a molecular mechanics code to perform conformational sampling and energy optimization calculations. Although models with some of these characteristics exist [9, 14, 22, 28–33], only few meet all the above requirements. In modern implicit solvent models [31], the solvation free energy is typically decomposed into a nonpolar component and an electrostatic component. Dielectric continuum methods account for the electrostatic component by treating the water solvent as a uniform high-dielectric continuum [34]. Methods based on the numerical solution of the Poisson–Boltzmann (PB) equation [35,36] provide a virtually exact representation of the response of the solvent within the dielectric continuum approximation. Their computational complexity is, however, still comparable to explicit solvent models and they are not easily integrated in molecular dynamics simulation programs. Recent advances extending dielectric continuum approaches have focused on the development of Generalized Born (GB) models [22, 37], which have been shown to reproduce with good accuracy PB [33,38,39] and explicit solvent [40] results at a fraction of the computational expense. The development of computationally eﬃcient analytical and diﬀerentiable GB methods with gradients based on pairwise descreening schemes [41,42] has made possible the integration of GB models in molecular dynamics packages for biological simulations [29, 43–45]. Despite the fact that nonpolar hydration forces dominate whenever hydrophobic interactions [46] are important, the general availability of accurate models for the nonpolar component of the hydration-free energy is lacking. The structure and properties of proteins in water is highly inﬂuenced by hydrophobic interactions [47–50]. Hydrophobic interactions also play a key role in the mechanism of ligand binding to proteins [30, 51–53]. Empirical surface area models [54] for the nonpolar component of the solvation free energy are widely used [28, 37, 55–62]. Surface area models are useful as a ﬁrst

5 Protein Folding and Binding

99

approximation; however, deﬁciencies are observed [57, 63, 64] that are particularly severe in the context of high resolution modeling and force ﬁeld transferability [65]. We developed the Analytical Generalized Born plus Non-Polar (AGBNP) model, an implicit solvent model based on the Generalized Born model [37–40, 44, 66] for the electrostatic component and on the decomposition of the nonpolar hydration-free energy into a cavity component based on the solute surface area and a solute–solvent van der Waals interaction free energy component modeled using an estimator based on the Born radius of each atom. Recent advances in parallel sampling techniques [67–69] and the widespread availability of large numbers of processors have now made possible the calculation of the full potential of mean force of small-to-medium sized peptides in solution [15, 19, 69–71]. One class of methods for studying equilibrium properties of quasi-ergodic systems that has received a great deal of recent attention is based on the Replica Exchange (RE) [72, 73] algorithm (also known as parallel tempering). RE methods, particularly Replica Exchange Molecular Dynamics (REMD) [67], have become very popular for the study of protein biophysics, including peptide and protein folding [15, 74, 75], aggregation [76–78], and protein–ligand interactions [79, 80]. Previous studies of protein folding appear to show a signiﬁcant increase in the number of reversible folding events in REMD simulations vs. conventional MD [81, 82]. The eﬀectiveness of RE methods is determined by the number of temperatures (replicas) that are simulated, their range and spacing, the rate at which exchanges are attempted, and the kinetics of the system at each temperature. While the determination of “optimal” Metropolis acceptance rates and temperature spacings has been the subject of various studies [73, 83–88], the role played by the intrinsic temperature-dependent conformational kinetics which is central to understanding RE has not received much attention. Recent work [88–91] recognizes the importance of exploration of conformational space and the crossing of barriers between conformational states as the key limiting factor for the RE algorithm. Molecular kinetics can have a strong eﬀect on RE beyond the entropic eﬀects that have been discussed [89, 91], particularly if the kinetics does not have simple temperature dependence. It is known from experimental and computational studies that the folding rates of proteins and peptides can exhibit anti-Arrhenius behavior, where the folding rate decreases with increase in temperature [92–97]. Diﬀerent models have been proposed to explain the physical origin of this eﬀect [98, 99]. We have investigated various systems to illustrate the principles of having a sound eﬀective potential and a powerful sampling technique. Predicting the conformations of peptides which form secondary structure in solution provides the test of the eﬀectiveness of OPLS-AA/AGBNP and REMD [75]. We demonstrate how we can determine the kinetics of folding of one of those peptides, the G-peptide, based on the conformations generated during a replica exchange simulation using network models [100]. We also successfully

100

A.K. Felts et al.

predict with OPLS-AA/AGBNP protein loop conformations that are themselves “peptides,” which are tethered to a protein frame [101]. And we demonstrate our ability to explore the thermodynamics of binding using REMD with the OPLS-AA/AGBNP potential for the system of N -palmitoylglycine complexed to cytochrome P450 BM-3 [80]. Finally, the behavior of RE methods are demonstrated with simple models that capture the kinetics of RE [102, 103].

5.2 Methods 5.2.1 The OPLS-AA/AGBNP Eﬀective Potential The total free energy of folding for a protein in solution can be represented approximately as the sum of two terms: ΔGtot ΔGint + ΔGsolv ,

(5.1)

where ΔGint is the internal free energy of folding corresponding to the intramolecular degrees of freedom of the protein and ΔGsolv is the diﬀerence of solvation free energy between the folded and unfolded states. The internal entropy change can be estimated from MD simulations; however, calculating the internal entropy is quite expensive [55]. Nevertheless, it has been found that the internal entropy changes between conformations are all roughly the same [55, 104]. Since in this work diﬀerent conformations of a given molecule are compared, it is not necessary to include ΔSint in the total free energy change [12]; therefore, an eﬀective free energy function ΔGeﬀ = ΔUint + ΔGsolv

(5.2)

can be used in the lieu of ΔGtot . The OPLS all-atom (OPLS-AA) force ﬁeld [105, 106] is used to model ΔUint , the internal energy for all atomic interactions and intramolecular degrees of freedom. The solvation free energy, ΔGsolv , of each structure is estimated using the analytical generalized Born model [27] with nonpolar free energy estimator (AGBNP, as described later) as implemented in the IMPACT modeling program [107]. In the original development of the OPLS-AA force ﬁeld, the partial charges and van der Waals parameters were adjusted to reproduce experimental heats of vaporization and densities for a series of pure liquids [105, 108–112]. These parameters were further tested by comparison with experimental solvation energies, using explicit-solvent simulations. Additional comparisons were made in some cases to hydrogen-bond dimer-interaction energies obtained from quantum-chemical calculations. These comparisons were used to detect large discrepancies that, when present, called for a reinvestigation of the nonbonded parameters. The OPLS-AA torsional parameters were ﬁt to reproduce gas-phase conformational energies obtained from quantum-chemical calculations [106], and stretching and bending parameters were adapted from the CHARMM22 or AMBER force ﬁelds.

5 Protein Folding and Binding

101

The generalized Born model is given by the following equation [37],

1 1 1 qi qj ΔGGB = − , (5.3) − 2 in

w ij fij (rij ) where qi is the charge of atom i and rij is the distance between atoms i and j, and gives the electrostatic component of the free energy of transfer of a molecule with interior dielectric in from vacuum to a continuum medium of dielectric constant w , by interpolating between the two extreme cases that can be solved analytically: the one in which the atoms are inﬁnitely separated and the other in which the atoms are completely overlapped. The interpolation function fij in (5.3) is deﬁned as " 2 #1 2 + Bi Bj exp(−rij /4Bi Bj ) 2 , fij = rij

(5.4)

where Bi is the Born radius of atom i deﬁned as the eﬀective radius that reproduces through the Born equation

1 1 1 qi2 − , (5.5) ΔGisingle = − 2 in

w Bi the electrostatic free energy of the molecule when only the charge of atom i is present in the molecular cavity. The analytical generalized Born (AGB) implicit solvent model is based on a novel pairwise descreening implementation [27] of the generalized Born model [29]. The combination of AGB with a recently proposed nonpolar hydrationfree energy estimator described later is referred to as AGBNP [27]. AGB employs a parameter-free and conformation-dependent analytical scheme to obtain the pairwise descreening scaling coeﬃcients used in the computation of the Born radii used in the generalized Born equation (5.3). The agreement between the AGB Born radii and exact numerical calculations was found to be excellent [27]. The AGBNP nonpolar model consists of an estimator for the solute–solvent van der Waals interaction energy in addition to an analytical surface area component corresponding to the work of cavity formation [27]. Because AGBNP is fully analytical with ﬁrst derivatives, it is well suited for energy minimization as well as for MD sampling. A detailed description of the AGBNP model and its implementation is provided in 27. The nonpolar solvation free energy is given by the sum of two terms: the free energy to form the cavity in solvent ﬁlled by the solute and the dispersion attraction between solute and solvent [65, 113]. The nonpolar free energy is written as [27] (i) γi Ai + ΔGvdW , (5.6) ΔGnp = i

where the ﬁrst term is the cavity term, γi is the surface tension proportionality constant for atom i, and Ai is the solvent exposed surface area of atom i. The second term is the dispersion interaction term, which is given by [27]

102

A.K. Felts et al. (i)

ΔGvdW = αi

6 −16πρw i,w σi,w , 3(Bi + Rw )3

(5.7)

where αi is an adjustable solute–solvent van der Waals dispersion parameter for atom i. The parameter ρw is the number density of water at standard conditions (0.033428/˚ A3 ). i,w and σi,w are the pairwise Lennard–Jones (LJ) well-depth and diameter parameters for atom i and the TIP4P water oxygen as √ given by the OPLS-AA force ﬁeld [105,106]. ( i,w = i w , where i is the LJ well-depth for atom i and w is similarly for the TIP4P water oxygen. The for water hydrogens is set to zero. σi,w is deﬁned in a similar manner.) Rw is the radius of a water molecule (1.4 ˚ A). By not incorporating the Lennard-Jones parameters into the dispersion parameter, αi , atoms with diﬀerent though similar i ’s and σi ’s are assigned the same α so as to minimize the number of adjustable parameters. Bi is the Born radius of atom i. The form of 5.7 for the solute–solvent van der Waals interaction energy component has been derived on the basis of simple physical arguments [27]. We use two sets of parameterizations of α and γ to test the full nonpolar function described earlier relative to a simpler nonpolar function. In past implementations [14], the total nonpolar solvation free energy is given by a term proportional to the solvent-accessible surface area, or in terms of 5.7, setting A−2 . all values of αi to zero, and setting γi for all atoms to 0.015 kcal mol−1 ˚ This implicit solvent model with the less-detailed nonpolar function is referred to as “AGB-γ.” When we use the full nonpolar function including the dispersion term using the parameters set forth in the work of Gallicchio and Levy [27], the implicit solvent model is referred to as “AGBNP.” A third parameterization aimed at implementing a correction for salt bridge interactions (which are generally overestimated by generalized Born solvent models) [75, 114] is also investigated. To correct for the overstabilization of salt bridges by the generalized Born model, we used modiﬁed radii and γi for carboxylate oxygens [101]. The implicit solvent model that has additional descreening of ion pairing is referred to as “AGBNP+.” 5.2.2 Replica Exchange Molecular Dynamics The MD replica exchange canonical sampling method (REMD) has been implemented in the molecular simulation package IMPACT [107] following the approach proposed by Sugita and Okamoto [67]. In this method, a series of structures (the replicas) are simulated in parallel using MD at diﬀerent temperatures. The temperatures, Tm and Tn , of two replicas, i and j, respectively, are exchanged with the following Metropolis transition probability [67]: 1 for Δ ≤ 0, (5.8) W ({Tm , Tn } → {Tn , Tm }) = exp(−Δ) for Δ > 0, where Δ ≡ (βn − βm )(Ei − Ej ),

(5.9)

5 Protein Folding and Binding

103

βm is 1/kTm and Ei is the current potential energy of replica i (and similarly for βn and Ej ). After the exchange, the velocities of replicas i and j are rescaled at the new given temperatures. In our simulations, several replicas are run in parallel over a particular temperature range. 5.2.3 The Network Model of Protein Folding During a REMD simulation of, for instance, the G-peptide (the C-terminal β-hairpin of the B1 domain of protein G), a series of conformations (“states”) are generated at each temperature, of which there are 20. The REMD simulation of the G-peptide resulted in 40,000 conformational snapshots (“states”) at each of the 20 temperatures, for a total of 800,000 states. In our kinetic network model, these REMD snapshots can be visualized as nodes in a network. The edges connecting these nodes represent allowed conformational transitions, and the allowed conformational transitions are determined by the structural similarity of the two states involved [115–117]. This network structure can be viewed as an approximate representation of eﬀects caused by frictional interactions with the environment [115]. For each state, 42 Cα-Cα distances were calculated, and structural similarity was deﬁned as the Euclidean distance between points in this distance space. The structural similarities for all sequential pairs of MD snapshots along a given REMD walker having the same temperature were tabulated. Any two states with the same REMD temperature were joined by an edge if their structural similarity was less than or equal to a cutoﬀ value. No connections were allowed between conformations not belonging to adjacent REMD temperatures. The resulting kinetic network has 800,000 nodes and 7.374 × 109 edges. As in previous works [115,117–121], we simulate the kinetics on our graph as a jump Markov process with discrete states using the Gillespie Algorithm [122], where each (directed) edge is assigned a microscopic rate constant. Such simulations allow us to more directly characterize the sequence of events of folding. We make the equilibrium probability of being in any given state equal to that of being in any other at the same replica exchange temperature. Such an equilibrium can be arranged by making the microscopic rate constant for each transition to be equal to the rate constant for the reverse process. We chose the relative equilibrium populations for states from diﬀerent temperatures such that the probability of being in states extracted from diﬀerent replica exchange temperatures is peaked near a “reference” or a “simulation” temperature, which is a parameter of the kinetic model. This model allows a given path to sample states having instantaneous temperatures above or below the reference temperature T0 in a physically realistic manner [100]. 5.2.4 Loop Prediction with Torsion Angle Sampling The loop prediction algorithm implemented in the Protein Local Optimization Program (PLOP) is described in detail in [123]. During loop build-up,

104

A.K. Felts et al.

a series of ﬁlters of increasing complexity is applied to eliminate unreasonable conformations as early as possible and clustering is performed to remove redundant conformations. For long loops (≥9 residues), we have adopted prediction schemes based on multiple executions of PLOP with diﬀerent parameters [123, 124]. The initial predictions with the most favorable energy scores are subjected to a series of constrained reﬁnement calculations with PLOP in which selected loop backbone atoms are not allowed to move or move only within a given range [123]. Further enhancements, such as allowing for more atomic overlaps and increasing the number of clusters in the K-means algorithm [125], have been incorporated into the loop sampling algorithms [101]. We have tested the loop prediction algorithms on two sets of protein loops of known structure (see [101]). The ﬁrst set is composed of the 57 nine-residue loops that were originally compiled by Fiser et al. [126] and by Xiang et al. [127]. The 35 13-residue loop set is the same as the one investigated by Zhu et al. [124]. We characterize if a loop is correct based on its root mean square deviation (RMSD) with respect to the crystallographically determined native structure (1.5 ˚ A for nine-residue loops and 2.0 ˚ A for 13-residue loops). Errors are classiﬁed as sampling errors if the predicted loop’s energy is higher than the native’s and as energy error if the predicted energy is lower than the native’s. A minority of incorrect predictions were not classiﬁable as either energy or sampling errors. In the following, we label these cases as marginal errors. Marginal errors are eﬀectively incorrect predictions due to subtle and not easily attributable energetic, entropic, and methodological causes [101].

5.3 Folding of Peptides 5.3.1 G-Peptide Folding REMD simulations of the C-terminal β-hairpin (residues 41–56) of the B1 domain of protein G (G-peptide) were conducted with the OPLS-AA force ﬁeld [105] and the AGBNP implicit solvent model [27]. Details of the simulations can be found in [75]. When using the surface-area-only model (AGB-γ) for the nonpolar interactions, the hydrophobic core (W43, Y45, F52, and V54) does not collapse to an appreciable extent; at 270 K, only 12.8% of the structures have a collapsed hydrophobic core (a conformation is said to have a collapsed hydrophobic core when its radius of gyration of the side chains of residues W43, Y45, F52, and V54 is less than 6 ˚ A). When the full nonpolar function of AGBNP is used, the percentage of hydrophobic collapse increases to 37.8% with the default dielectric screening (AGBNP) and 94.1% with the increased dielectric screening of charged side chains (AGBNP+) [75]. The decreased degree of hydrophobic collapse with the default dielectric screening (AGBNP) as compared with additional dielectric screening (AGBNP+) is due to a salt bridge forming between the side chains of K50

5 Protein Folding and Binding

105

and E56 that hinders the formation of the hydrophobic core. However, signiﬁcantly more of the structures generated with the full AGBNP nonpolar function have a collapsed hydrophobic core as compared to those generated with AGB-γ. The full nonpolar model of the OPLS-AA/AGBNP potential favors the formation of the collapsed hydrophobic core of the peptide even in the presence of the destructive salt bridge. While previous replica exchange simulations of the C-terminal polypeptide from the B1 domain of protein G in explicit and implicit solvent have been carried out using the capped peptide [19,20,70,71], the experiments have been performed on the uncapped form of the peptide [94, 128–130]. A salt bridge between the N- and C-termini can be formed in the uncapped polypeptide. The β-hairpin population of the uncapped peptide (26%) is signiﬁcantly larger than the β-hairpin population of the capped peptide (10%), with the same solvation model (AGBNP) [75]. This is due to the stabilizing eﬀects of the salt bridge between the N- and C-termini, which compensates for the disruptive interaction between the charged residues of K50 and E56. The population of this disruptive salt bridge is reduced when increased dielectric screening of the charged side chains is applied with AGBNP+; the β-hairpin population is increased from 26% to 40%. The predicted β-hairpin population of the uncapped peptide generated with AGBNP+ agrees well with the experimental results of Blanco et al. (42% at 283 K) [128]. The degree of hydrophobic collapse (98%) agrees reasonably well with the experimental results reported by Mu˜ noz et al. who observed around 80% hydrophobic collapse at 270 K. [94] 5.3.2 Folding of Other Small Peptides To demonstrate the accuracy of OPLS-AA/AGBNP+, we predicted the conformations of a series of small peptides that adopt either an α-helical conformation (CheY2-mu peptide [131], C-peptide [132], and the S-peptide-analog [133]), no secondary structure (the CheY2 peptide [131]), or a mix of β and α conformation (the FSD1 mini-protein [134]). We performed REMD simulations to sample the conformational space available to these peptides. The results are summarized in Table 5.1. We acheive reasonable accuracy for these peptides. It is also apparent that there is no bias towards forming α-helical conformation with OPLS-AA/AGBNP+ as is evident by the prediction of the coil conformation for CheY2 peptide, which is similar in sequence to the α-helical CheY2-mu [131]. 5.3.3 Loop Prediction Loop prediction is a form of peptide folding: in this case, the peptide is tethered to a protein frame and feels an energy ﬁeld generated by the frame. Loop prediction is a stringent test of the OPLS-AA/AGBNP energy function because during the search with PLOP to ﬁnd the native conformation, many energetically competing conformations are also generated [101]. The results

106

A.K. Felts et al.

Table 5.1. Summary of the small peptides we have predicted with REMD simulations using OPLS-AA/AGBNP+ Name

Sequence

Structure

% Content Experimental RXMD

G-peptide [128] GEWTYDDATKTFTVTE CheY2-mu [131] EDAVEALRKLQAGGY CheY2 [131] EDGVDALNKLQAGGY C-peptide [132] KETAAAKFERQHM S-pep-analog [133] AETAAAKFLREHMDS FSD1 [134] QQYTAKIKGRTFRNEKELRDFIEKFKGR

β α α α α ββα

42 39 2 29 45–63 >80

40 45 2 41 55 59

The simulations were carried out for up to 10 ns. Table 5.2. Summary of the loop conformational predictions results with the combination of standard and enhanced sampling procedures 9-Residue

E S M E+S+M RMSD median RMSD

13-Residue

ddd

AGB-γ

AGBNP

AGBNP+

AGBNP+

19 4 3 26 2.31 1.27

6 4 1 11 1.10 0.52

4 4 0 8 1.04 0.52

2 5 1 8 1.00 0.58

2 5 1 8 1.87 0.67

ddd refers to distance-dependent dielectric; E, S, and M are energy, sampling, and marginal errors, repectively; and RMSD: average RMSD (in ˚ A) of the lowest energy loops [101].

of the loop prediction tests are summarized in Table 5.2 for the combined standard and extended conformational sampling procedures [101]. All loop predictions summarized in Table 5.2 were performed in solution instead of the presence of the crystallographically related molecules (crystal symmetry) as Jacobson et al. [123] and Zhu et al. [124, 135] did for their loop predictions with PLOP for the 9- and 13-residue loops, respectively. We viewed loop prediction as a step in homology modeling where the crystal environment is not known a priori; therefore, we predicted loops in solution rather than in the crystal environment. For the 57 nine-residue loops, loop prediction tests were conducted with OPLS-AA and the following implicit solvent models: distancedependent dielectric, AGB-γ, AGBNP, and AGBNP+. Loop prediction tests for the 35 13-residue loops were conducted with AGBNP+. Table 5.2 reports the total number of errors and the number of energy, sampling and marginal errors, and the mean and median RMSD of the predictions from the X-ray structure.

5 Protein Folding and Binding

107

Prediction Accuracy The loop prediction procedure based on PLOP with the AGBNP+ solvation model and the extended sampling schemes we devised is very successful in predicting the conformations of the 9- and 13-residue loops we have investigated. Fiser et al. used MD along with simulated annealing to predict loop conformations with an all-atom force ﬁeld and a statistical treatment of solvation [126]. The percentage of predictions they report within 2 ˚ A RMSD (described as good and medium predictions) is 55% [126]. Using a tighter RMSD cutoﬀ of 1.5 ˚ A, we obtain with PLOP and AGBNP+ an 86% success rate in our predictions for nine-residue loops. For a set of 13-residue loops, Fiser et al., using the same 2 ˚ A RMSD cutoﬀ, report a very low 15% success rate [126], compared to the 77% success rate we obtained using the AGBNP+ scoring function. Xiang et al. performed a search over a discrete rotamer library with scoring based on their colony energy. For nine-residue loops, they report an average RMSD of 2.68 ˚ A [127]. In comparison, the average RMSD we have obtained with PLOP and AGBNP+ is 1.00 ˚ A. De Bakker et al. [136] generated loop conformations with their program RAPPER [137] and scored them with a knowledge-based potential and with a physics-based potential, AMBER/GBSA. For nine-residue loops from the Fiser set [126], the average RMSD of the lowest energy loops was over 2 ˚ A when scored with the AMBER/GBSA potential, which produced their best results [136]. Jacobson et al. [123] performed loop prediction calculations on a large set of nine-residue loops using the SGB/NP model [40, 138], with the crystal symmetry included [123]. They had obtained ten energy errors and eight sampling errors [123]. We obtained two energy errors and ﬁve sampling errors using AGBNP+ without the presence of the crystal environment [101]. A recent study based on the comparison of X-ray and NMR structures of identical proteins suggests that in most cases the impact of the crystal environment on protein structures is relatively small and not strongly correlated with crystal packing [139]. Recently, Zhu et al. [124, 135] have reported loop prediction results for the same 35 13-residue loops investigated here using the SGB/NP potential with crystal symmetry supplemented by hydrophobic correction terms and a variable dielectric model. Zhu et al. showed that these promising models lower the average backbone RMSD’s of the 13-residue predictions substantially, from 2.73 ˚ A to 1.08 ˚ A. In comparison, we obtain for the 13-residue loop set with AGBNP+ without crystal symmetry an average RMSD of 1.87 ˚ A which is intermediate between the range of RMSD measures reported by Zhu et al. [124, 135]. The best performing model reported by Zhu et al. [135] produces according to our deﬁnition ﬁve energy errors on the 13-residue loop set compared with the two energy errors obtained here [101].

108

A.K. Felts et al.

5.4 Kinetic Model of the G-Peptide 5.4.1 The G-Peptide has Apparent Two-State Kinetics After a Small Temperature Jump Perturbation Previous experimental work in the Eaton laboratory [94] has shown that the time dependence of loss of hairpin structure in the G-peptide after a small temperature-jump perturbation is well ﬁt by a single exponential. To conﬁrm that our kinetic model is consistent with this previous experimental kinetic work, we performed a series of simulations modeling this temperature-jump experiment. We began each simulation by constructing an ensemble of starting points distributed according to an equilibrium distribution, with T0 ranging from 300 to 615 K. We then performed a Markov process simulation for 2,000–5,000 time units beginning from each starting point by using a reference temperature 60◦ higher than the temperature used to construct the initial starting point ensemble. For each temperature, the number of trajectories residing in a β-hairpin state were monitored as a function of time. In all cases, the loss of hairpin structure is ﬁt well by single exponential decay with the exception of a small initial “burst phase” [100]. Our results are qualitatively consistent with experimental observations [94]. 5.4.2 The G-Peptide has an α-Helical Intermediate During Folding from Coil Conformations Protein folding is a process by which conformations without identiﬁable secondary structure adopt a native conformation. To study this process in the G-peptide with our kinetic network model, we performed a temperature quench experiment similar to the temperature-jump experiment described earlier, but for which the starting ensemble was chosen from the equilibrium distribution at T0 = 700 K, and the simulation was run at a reference temperature of 300 K. The fraction of α-helix and β-hairpin states as a function of time displays a rapid rise in the amount of α-helix initially, which reaches a maximum and then decreases. Simultaneously, the amount of β-hairpin rises initially at a rapid rate, then continues to rise with a slower rate similar to the rate of decrease in the fraction of α-helix. This ﬁnding is suggestive of a mechanism in which there are a small number of fast direct paths from unfolded coil states to the β-hairpin, but that the majority quickly fold to α-helical states, which then convert into β-hairpins on a longer time scale. A similar phenomenon is not observed for the unfolding process: temperature-jump simulations from 300 to 700 K do not show appreciable α-helix formation. That the folding and unfolding kinetic paths are diﬀerent reﬂects the quite diﬀerent nonequilibrium cooling and heating conditions that are being simulated [100]. We can assign approximate absolute time scales to the processes observed here. Based on this ﬁnding, the appearance of β-hairpin has a time constant of ∼2,500 time units, which would correspond in physical units to ∼50 μs,

5 Protein Folding and Binding

109

whereas the rapid initial formation of α-helix occurs with a time constant of nine time units or ∼180 ns. [100] These rates are in qualitative agreement with experimental observations (6 μs) [94]. To conﬁrm that this mechanism is indeed the basis for our “ensemble averaged” observations, we performed an analogous single-molecule quenching experiment in which we chose ∼4,000 states at random from among the coil states at 690 K and used each as a starting point for a simulation at a reference temperature of 300 K. Only 9% of the trajectories reach the β-hairpin macrostate without passing through any α-helix-containing states. This ﬁnding conﬁrms that in our kinetic network model the β-hairpin folding mechanism consists of two parallel pathways: the direct formation of the β-hairpin structure from coil states and the formation of α-helical conformations, which then interconvert into β-hairpins. 5.4.3 A Molecular View of Kinetic Pathways One of the advantages of the kinetic network model proposed here is that we are able to explore a large number of potential pathways that join two macrostates. The number of such paths will typically be extremely large. Furthermore, each state along the path has associated with it all of the atomic coordinates from the REMD simulation. Therefore, the molecular aspects of the paths can be analyzed in detail. This ability allows us to explore the multitude of folding pathways that the system can potentially have at its disposal. One way in which this model can be used is to generate many paths by using Markovian kinetic Monte Carlo simulations. Such an approach with allatom models has been useful for enumerating and quantifying the relative ﬂux through parallel kinetic pathways in small systems [119,120]. Alternatively, it is possible to investigate thermodynamically favorable pathways by a detailed analysis of the structure of the kinetic network, for example, by searching for a small number of short paths connecting the two macrostates under the constraint that the instantaneous temperature remain below a predetermined maximum value. We use this approach to analyze pathways connecting the α-helix and β-hairpin macrostates in the G-peptide [100]. Two short pathways that link the α-helical and β-hairpin macrostates without making use of microstates with an instantaneous temperature above 488 K are shown in Fig. 5.1. The path shown in Fig. 5.1(upper) involves the unwinding of both ends of the helix, leaving approximately one turn of helix in the middle of the molecule. This turn then serves as a nucleation point for the formation of the β-turn, which is stabilized by hydrophobic interactions between the side chains of Y45 and F52. The native hydrogen bonds nearest to the turn then form, after which the remainder of the native hairpin structure forms. This pathway is similar to previously proposed mechanisms for the folding of the G-peptide β-hairpin from a coil state, which emphasize the formation of hydrophobic contacts before hydrogen bond formation [17, 18, 140–143] and the persistence of the β-turn even in the unfolded state [143].

110

A.K. Felts et al.

Fig. 5.1. Two possible pathways for the interconversion of an α-helix into a β-hairpin of the G-peptide. Backbone trace is shown in ribbons and cylinders, and the hydrophobic core residues (W43, Y45, F52, and V54) side chains are shown in sticks. (Upper ) The path corresponds to an unraveling of the helix at both ends and formation of a β-turn from a residual turn of the α-helix. (Lower ) The path corresponds to an unraveling of one end of the helix, which loops back

The novel aspect of the path shown in Fig. 5.1(upper) is the preformation of the β-turn from a residual turn in an otherwise unfolded α-helix. An alternative pathway (Fig. 5.1 Lower) involves the unwinding of the C-terminal half of the α-helix, which then loops back so as to be nearly parallel to the remaining helix. This proximity allows for the possibility of side-chain interactions between the helix and the C-terminal half of the molecule, including hydrophobic interactions between F52 in the helix and either W43 or Y45. This pathway is very similar to the one previously identiﬁed by us on the basis of the analysis of the potential of mean force for the G-peptide along two principle component degrees of freedom [144]. In both pathways, it is clear that formation of native β-hairpin contacts can occur without the complete loss of helical secondary structure, making the idea of the α-helix as an on-path intermediate in the formation of the β-hairpin physically plausible [100].

5.5 Ligand Conformational Equilibrium in a Cytochrome P450 Complex The cytochrome P450 enzymes catalyze the oxidation of a wide variety of hydrophobic substrates [145]. P450 enzymes are ubiquitous. In humans they are found in the liver and are important in cellular housekeeping processes, including the metabolism of pharmaceutical agents and detoxiﬁcation [145]. P450 enzymes are thus important in the study of drug metabolism and toxicity. The mechanism of catalysis by P450 is centered on the iron of the heme group [146]. However the crystal structures of many P450 enzyme-substrate complexes [147–150] show the substrate bound distant to the iron in a position that is evidently unproductive for chemistry. Based on UV–vis and NMR measurements and induced ﬁt docking, Jovanovic et al. [151] have proposed

5 Protein Folding and Binding

111

Fig. 5.2. Active site of the P450 BM-3/NPG complex in (a) the low temperature X-ray conformation (PDB 1jpz) representative of distal state where the NPG (shown in green) is distant from the heme iron, with Phe87 (shown in magenta) interposed between NPG and heme iron (shown in blue) and (b) the alternative active site of the conformation predicted by Jovanovic et al. representative of the proximal state where Phe87 has changed its rotameric state to allow NPG to approach the heme iron

that the structure of one of these complexes (P450 BM-3 bound to NPG [147]) depends on temperature, and that at biologically relevant temperatures the ligand moves from a position distant from the heme iron, as seen in the low temperature X-ray crystal structure, into a position proximal to the iron, leading to the displacement of the iron coordinated water molecule and the initiation of the oxidation mechanism. In this study we use REMD [67, 75] to study the thermodynamic equilibrium between the conformations of the P450 BM-3/NPG complex in which the terminal carbon atoms of NPG is distant from the heme iron as in the low temperature X-ray crystal structure [147] (the distal state, see Fig. 5.2a) and conformations with the terminal carbon atoms of NPG proximal to the heme iron as in the conformation proposed by Jovanovic et al. [151] (the proximal state, see Fig. 5.2b). REMD is ideally suited for this problem not only because it improves conformational sampling but also because it yields the populations of conformational states over a range of temperatures. 5.5.1 Methodology We apply the REMD [67, 75] to the P450 BM-3/NPG complex starting from the low temperature crystal structure [147] (PDB id 1jpz) over a temperature range from 260 to 457 K with 24 replicas. This range was chosen to study the system at biologically relevant temperatures and at the same time (1) to connect with low temperature experimental information and (2) to enhance sampling at low temperature. A receptor restraining scheme was designed to prevent unfolding of the protein at high temperatures, but to allow enough ﬂexibility to observe the conformational change at the active site. The REMD simulation employed the OPLS-AA all atom force ﬁeld [105] and the

112

A.K. Felts et al.

AGBNP [27] implicit solvent model to mimic the water environment. The replica exchange acceptance ratio was 25% on average. The total simulation time, including equilibration, was 3 ns for 24 replicas for a total of 72 ns. Population distributions were obtained by collecting the distances between the ω−1 carbon atom of NPG (the main substrate oxidation site) and the catalytic Fe atom as well as the potential energy of conformations from 10 replicas in the temperature range from 260 to 357 K. These quantities are binned into histograms, which are then used as the input for the temperature weighted histogram method (T-WHAM) [144] to ﬁnally give population distributions. T-WHAM [144] makes it possible to resolve the population distributions corresponding to conformations of relatively high free energy, which are rarely sampled at room temperature, but are needed to determine the mechanism of interconversion between stable conformations. T-WHAM accomplishes this by exploiting information contained in the high temperature replicas where high free energy conformations are generated. Using this tool we postulate a mechanism for the conformational interconversion between the distal and proximal states [80]. 5.5.2 The Population of the Proximal State as a Function of Temperature The ω−1-Fe distance in the low temperature X-ray crystal structure [147], which corresponds to the distal state, is 8.5 ˚ A and in the conformation proposed by Jovanovic et al. [151], which corresponds to the proximal state, is 4.5 ˚ A. By deﬁning the proximal state to be made of all conformations with ω−1-Fe distances less than 6.5 ˚ A we obtain the population of the proximal state as a function of temperature shown in Fig. 5.3. The population of the proximal state is 32% at 260 K, increases with temperature and ﬁnally plateaus at 318 K with 90% of the population in the proximal state. Both proximal and distal states exist at all temperatures: rather than a sharp conformational transition from distal to proximal state at a speciﬁc transition temperature, a gradual shift in population from distal to proximal state occurs with increase in temperature. These ﬁndings are in agreement with the thermal activation mechanism proposed by Jovanovic et al. [151]. The predicted midpoint of the transition from the distal to the proximal state is 268 K (see Fig. 5.3), ∼20◦ higher than the observed transition temperature [151]. The increase in population of the proximal state with increasing temperature indicates that the proximal state is stabilized by conformational entropy [80].

5.6 Simple Continuous and Discrete Models for Simulating Replica Exchange One cannot systematically explore the convergence properties of RE as a function of the simulation parameters and/or the underlying kinetics of the

5 Protein Folding and Binding

113

Fig. 5.3. Population as a function of temperature, p(T ), corresponding to the conformations in which ligand is proximal to the heme iron. The proximal state population increases monotonically with temperature, indicating that the proximal state is stabilized by conformational entropy at temperatures greater than at least 268 K. This is borne out by the expression for the conformational entropy diﬀerence between the proximal and the distal states: S = k ln[p/(1 − p)] + kT /[p(1 − p)] ∂p/∂T , where the second term is positive and the ﬁrst term is positive for T > 268 K (p(T ) > 1/2)

molecular system by brute force molecular simulations, since RE simulations of protein folding are very diﬃcult to converge. As an alternative, it is useful to study simpliﬁed low dimensionality systems. While these models do not capture all of the complexities of the “real” molecular simulation, they do capture some of the essential features of RE and allows us to study these fundamental aspects of the algorithm at relatively low computational cost and in a controlled setting. We discuss here two simpliﬁed models of RE. The ﬁrst is a discrete two-state network model, containing two conformational states (Folded and Unfolded) at each of the several temperatures [102]. This model reduces the atomic complexity of the system to discrete conformational states, which evolve in continuous time according to Markovian kinetics for both conformational transitions and exchange between replicas. The second makes use of a continuous two-dimensional potential, which is suﬃciently simple to be amenable to accurate analytical and numerical solution, while including some characteristics of molecular systems that were absent from the discrete network model. In both cases, the eﬃciency of RE conformational sampling will be monitored by measuring NT E , the number round-trip transitions in the conformational state of a replica, conditional on the low temperature of interest T0 , that occur in a given observation time. A transition event is a transit of a given replica from one conformation at T0 to the other conformation at T0 and back again regardless of route. Conceptually, this measure reﬂects the

114

A.K. Felts et al.

potential of RE to achieve rapid equilibration at the temperature of interest by means of conformational transitions at temperatures other than the temperature of interest. 5.6.1 Discrete Network Replica Exchange (NRE) In the NRE model, the protein is assumed to exist in one of the two macrostates F and U (for “folded” and “unfolded”), which do not possess any internal structure. Instead, it is assumed that the system evolves in time as a Poisson process, in which instantaneous transitions between F and U occur after waiting periods given by exponentially distributed random variables with means equal to the reciprocals of the folding or unfolding rates. If the transition events are Markovian, then the simultaneous behavior of two uncoupled noninteracting replicas can be represented by the four composite states {F1 F2 , F1 U2 , U1 F2 , U1 U2 }. In each symbol, the ﬁrst letter represents the conﬁguration of replica 1, the second letter the conﬁguration of replica 2, and the subscripts denote the temperature of each replica. The four-state composite system for two noninteracting replicas can be extended to create a network model of replica exchange by introducing temperature exchanges between replicas, that is, by allowing transitions such as F1 U2 → F2 U1 . This leads to a system with eight states arranged in a cubic network, with “horizontal” folding and unfolding transitions and “vertical” temperature exchange transitions (Fig. 5.4). The eﬀect of the rate of temperature exchanges is included by introducing the rate parameter α, which controls the overall scaling of the temperature exchange rate relative to the folding and unfolding rates. For canonical equilibrium probabilities to be preserved under temperature exchanges, it is suﬃcient that detailed balance is satisﬁed by scaling α by a factor w = Peq (F2 U1 )/Peq (F1 U2 ) as appropriate. Kinetics in the NRE model is simulated using a standard method for continuous time Markov processes, with discrete states known as the “Gillespie algorithm” [122]. It was found that the convergence of NRE for a two replica system in the limit of very rapid temperature exchanges is fastest when the high temperature is chosen to maximize the harmonic mean of the folding and unfolding rates. Thus, if protein folding follows anti-Arrhenius kinetics, there exists an optimal maximal temperature, beyond which the eﬃciency of the replica exchange method is degraded. Both the convergence rate and eﬃciency are reduced if the temperature exchange rate is ﬁnite, and the optimal temperature of the high-temperature is reduced. 5.6.2 RE Simulations using MC on a Continuous Potential In contrast to the NRE model, the simpliﬁed model of RE based on the continuous potential has macrostates which, like real molecular systems, have microscopic internal structure and therefore is not guaranteed to have

5 Protein Folding and Binding F2U1

F2F1

U2U1

U2F1

F1U2

F1F2

115

U1U2

U1F2

Fig. 5.4. The kinetic network model for the discrete NRE model used by Zheng et al. [102] The state labels represent the conformation (letter ) and temperature (subscript) for each replica. For example, F2 U1 represents the state in which replica 1 is folded and at temperature T2 , while replica 2 is unfolded at temperature T1 . Gray and black arrows correspond to folding and unfolding transitions, respectively, while the temperature at which the transition occurs is indicated by the solid and dashed lines (for T2 and T1 , repectively). The bold arrows correspond to temperature exchange transitions, with the solid and dashed lines denoting transitions with rate parameters α and wα, respectively

Markovian kinetics. The two-dimensional potential was constructed to mimic the anti-Arrhenius temperature dependence of the folding rate seen in proteins by having an energetic barrier when going from the “folded” to the “unfolded” region, and an entropic barrier in the reverse direction. This was achieved by imposing a hard wall constraint that limits the space accessible to the folded region, combined with a potential energy function that has an energetic well in the folded region, and increases as one goes further into the unfolded region. This results in a two-well free energy proﬁle as a function of the folding coordinate, where the activation free energy for folding increases with increasing temperature. Metropolis kinetic Monte Carlo (MC) sampling was used to simulate the movement of a particle in this two-dimensional potential, and rate constants were obtained by calculating the mean ﬁrst passage times (FPTs) between the two macrostates. The resulting FPT distributions were exponential and in agreement with the activation free energies obtained from the free energy proﬁle along the folding coordinate. Replica exchange simulations were performed with a kinetic MC propogator, and exchanges of conﬁgurations were attempted every NX MC steps. Behavior similar to that seen for the NRE model is also observed for the continuous potential: the eﬃciency is nonmonotonic and exhibits a maximum

116

A.K. Felts et al.

at an optimal high temperature given by the maximal harmonic mean of the folding and unfolding rates. However, the number of transitions is signiﬁcantly lower than that predicted from the average of the harmonic means of the rates as seen in the NRE model. A comparison of continuous and discrete RE simulations has revealed non-Markovian eﬀects. By simultaneously studying a discrete network model of RE and RE on a simpliﬁed two-dimensional potential, it is possible to clarify to some degree the origins and eﬀects of antiArrhenius and non-Markovian kinetics on the eﬃciency of RE. Furthermore, these results suggest that the use of “training” simulations to explore some aspects of the temperature dependence for folding of the atomic level models prior to performing replica exchange studies could be useful in improving the overall eﬃciency of the calculation. [102]

5.7 Conclusion We have demonstrated that the OPLS-AA/AGBNP+ and REMD can capture the thermodynamics of peptide folding (for instance, the G-peptide and C-peptide [75]) and protein–ligand binding (N -palmitoylglycine complexed to cytochrome P450 BM-3 [80]). OPLS-AA/AGBNP+ is eﬀective in discriminating the correct fold of a loop on a protein from competing misfolded conformations [101]. This is an indication that our eﬀective potential is suitable for protein folding when considered in conjunction with our previous work on detecting native folds from misfolded decoys [14]. While thermodynamics can be calculated directly from replica exchange, kinetics cannot. We have shown, however, that network models can be constructed from the conformations generated from REMD to calculate the kinetics of the system [100]. Also we have shown that a kinetic network model with a discrete model of the RE system can provide insights into the kinetics of RE [102]. We have extended our investigation into the behavior of RE with a simple continuous potential, which captures some of the kinetics of protein folding [103]. These simple models have demonstrated some of the pitfalls to RE, which can occur under certain circumstances, such as when systems exhibit anti-Arrhenius behavior. Acknowledgments This project has been supported in part by the National Institutes of Health Grants, GM-30580.

References 1. W.A. Eaton, V. Mu˜ noz, S.J. Hagen, G.S. Jas, L.J. Lapidus, E.R. Henry, J. Hofrichter, Annu. Rev. Biophys. Biomol. Struct. 29, 327 (2000) 2. J.K. Myers, T.G. Oas, Annu. Rev. Biochem. 71, 783 (2002)

5 Protein Folding and Binding

117

˘ 3. A.R. Dinner, A. Sali, L.J. Smith, C.M. Dobson, M. Karplus, Trends Biochem. Sci. 25, 331 (2000) 4. J. Rumbley, L. Hoang, L. Mayne, S.W. Englander, Proc. Natl. Acad. Sci. USA 98, 105 (2001) 5. A.R. Fersht, V. Daggett, Cell 108, 573 (2002) 6. M. Vendruscolo, E. Paci, Curr. Opin. Struct. Biol. 13, 82 (2003) 7. T. Lazaridis, M. Karplus, J. Mol. Biol. 288, 477 (1999) 8. D. Petrey, B. Honig, Protein Sci. 9, 2181 (2000) 9. T. Lazaridis, M. Karplus, Curr. Opin. Struct. Biol. 10, 139 (2000) 10. B.D. Bursulaya, C.L. Brooks III, J. Phys. Chem. B 104, 12378 (2000) 11. B.N. Dominy, C.L. Brooks III, J. Comput. Chem. 23, 147 (2002) 12. Y. Liu, D.L. Beveridge, Proteins: Struct. Funct. Genet. 46, 128 (2002) 13. M. Feig, C.L. Brooks III, Proteins: Struct. Funct. Genet. 49, 232 (2002) 14. A.K. Felts, E. Gallicchio, A. Wallqvist, R.M. Levy, Proteins: Struct. Funct. Genet. 48, 404 (2002) 15. Y.M. Rhee, V.S. Pande, Biophys. J. 84, 775 (2003) 16. R.M. Levy, E. Gallicchio, Annu. Rev. Phys. Chem. 49, 531 (1998) 17. A.R. Dinner, T. Lazaridis, M. Karplus, Proc. Natl. Acad. Sci. USA 96, 9068 (1999) 18. B. Zagrovic, E.J. Sorin, V. Pande, J. Mol. Biol. 313, 151 (2001) 19. R. Zhou, B.J. Berne, Proc. Natl. Acad. Sci. USA 99, 12777 (2002) 20. R. Zhou, Proteins: Struct. Funct. Genet. 53, 148 (2003) 21. B. Roux, T. Simonson, Biophys. Chem. 78, 1 (1999) 22. D. Bashford, D.A. Case, Annu. Rev. Phys. Chem. 51, 129 (2000) 23. T. Simonson, Curr. Opin. Struct. Biol. 11, 243 (2001) 24. J. Zhu, Y. Shi, H. Liu, J. Phys. Chem. B 106, 4844 (2002) 25. M. Kr´ ol, J. Comput. Chem. 24, 531 (2003) 26. A. Suenaga, J. Mol. Struct. (Theochem) 634, 235 (2003) 27. E. Gallicchio, R.M. Levy, J. Comput. Chem. 25, 479 (2004) 28. B. Marten, K. Kim, C. Cortis, R.A. Friesner, R.B. Murphy, M.N. Ringnalda, D. Sitkoﬀ, B. Honig, J. Phys. Chem. 100, 11775 (1996) 29. D. Qiu, P.S. Shenkin, F.P. Hollinger, W.C. Still, J. Phys. Chem. A 101, 3005 (1997) 30. N. Froloﬀ, A. Windemuth, B. Honig, Protein Sci. 6, 1293 (1997) 31. C.J. Cramer, D. Truhlar, Chem. Rev. 99, 2161 (1999) 32. E. Gallicchio, L.Y. Zhang, R.M. Levy, J. Comp. Chem. 23, 517 (2002) 33. M.S. Lee, M. Feig, F.R. Salsbury Jr., C.L. Brooks III, J. Comp. Chem. 24(11), 1348 (2003) 34. J. Tomasi, M. Persico, Chem. Rev. 94, 2027 (1994) 35. C.M. Cortis, R.A. Friesner, J. Comp. Chem. 18, 1591 (1997) 36. W. Rocchia, S. Sridharan, A. Nicholls, E. Alexov, A. Chiabrera, B. Honig, J. Comp. Chem. 23, 128 (2002) 37. W.C. Still, A. Tempczyk, R.C. Hawley, T. Hendrickson, J. Am. Chem. Soc. 112, 6127 (1990) 38. A. Onufriev, D. Bashford, D.A. Case, J. Phys. Chem. B 104, 3712 (2000) 39. A. Ghosh, C.S. Rapp, R.A. Friesner, J. Phys. Chem. B 102, 10983 (1998) 40. L. Zhang, E. Gallicchio, R. Friesner, R.M. Levy, J. Comp. Chem. 22, 591 (2001) 41. M. Schaefer, C. Froemmel, J. Mol. Biol. 216, 1045 (1990) 42. G.D. Hawkins, C.J. Cramer, D.G. Truhlar, J. Phys. Chem. 100, 19824 (1996)

118 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79.

80.

A.K. Felts et al. M. Schaefer, M. Karplus, J. Phys. Chem. 100, 1578 (1996) B.N. Dominy, C.L. Brooks III, J. Phys. Chem. B 103, 3765 (1999) V. Tsui, D.A. Case, Biopolymers 56, 275 (2000) A. Ben-Naim, Hydrophobic Interactions (Plenum Press, New York, 1980) W. Kauzmann, Adv. Prot. Chem. 14, 1 (1959) K.A. Dill, Biochemistry 29, 7133 (1990) P.L. Privalov, G.I. Makhatadze, J. Mol. Biol. 232, 660 (1993) B. Honig, A.S. Yang, Ad. Prot. Chem. 46, 27 (1995) J.M. Sturtevant, Proc. Natl. Acad. Sci. USA 74, 2236 (1977) D.H. Williams, M.S. Searle, J.P. Mackay, U. Gerhard, R.A. Maplestone, Proc. Natl. Acad. Sci. USA 90, 1172 (1993) X. Siebert, G. Hummer, Biochemistry 41, 2965 (2002) T. Ooi, M. Oobatake, G. Nemethy, A. Sheraga, Proc. Natl. Acad. Sci. USA 84, 3086 (1987) M.R. Lee, Y. Duan, P.A. Kollman, Proteins 39(4), 309 (2000) P.H. H¨ unenberger, V. Helms, N. Narayana, S.S. Taylor, J.A. McCammon, Biochemistry 38(8), 2358 (1999) T. Simonson, A.T. Br¨ unger, J. Phys. Chem. 98, 4683 (1994) D. Sitkoﬀ, K.A. Sharp, B. Honig, J. Phys. Chem. 98, 1978 (1994) C.S. Rapp, R.A. Friesner, Proteins: Struct. Funct. Genet. 35, 173 (1999) F. Fogolari, G. Esposito, P. Viglino, H. Molinari, J. Comp. Chem. 22, 1830 (2001) E. Pellegrini, M.J. Field, J. Phys. Chem. A 106, 1316 (2002) C. Curutchet, C.J. Cramer, D.G. Truhlar, M.F. Ruiz-L` opez, D. Rinaldi, M. Orozco, F.J. Luque, J. Comp. Chem. 24, 284 (2003) A. Wallqvist, D.G. Covell, J. Phys. Chem. 99, 13118 (1995) E. Gallicchio, M.M. Kubo, R.M. Levy, J. Phys. Chem. B 104, 6271 (2000) R.M. Levy, L.Y. Zhang, E. Gallicchio, A.K. Felts, J. Am. Chem. Soc. 25(31), 9523 (2003) M. Nina, D. Beglov, B. Roux, J. Phys. Chem. B 101, 5239 (1997) Y. Sugita, Y. Okamoto, Chem. Phys. Lett. 314, 141 (1999) A. Mitsutake, Y. Sugita, Y. Okamoto, Biopolymers 60, 96 (2001) S. Gnanakaran, H. Nymeyer, J. Portman, K.Y. Sanbonmatsu, A.E. Garc´ıa, Curr. Opin. Struct. Biol. 13, 168 (2003) A.E. Garc´ıa, K.Y. Sanbonmatsu, Proteins: Struct. Funct. Genet. 42, 345 (2001) R. Zhou, B.J. Berne, R. Germain, Proc. Natl. Acad. Sci. USA 98, 14931 (2001) R.H. Swendsen, J.S. Wang, Phys. Rev. Lett. 57, 2607 (1986) K. Hukushima, K. Nemoto, J. Phys. Soc. Jpn. 65, 1604 (1996) H. Nymeyer, S. Gnanakaran, A.E. Garc´ıa, Meth. Enzymol. 383, 119 (2004) A.K. Felts, Y. Harano, E. Gallicchio, R.M. Levy, Proteins: Struct. Funct. Bioinform. 56, 310 (2004) M. Cecchini, F. Rao, M. Seeber, A. Caﬂisch, J. Chem. Phys. 121, 10748 (2004) H.H.G. Tsai, M. Reches, C.J. Tsai, K. Gunasekaran, E. Gazit, R. Nussinov, Proc. Natl. Acad. Sci. USA 102, 8174 (2005) A. Baumketner, J.E. Shea, Biophys. J. 89, 1493 (2005) G.M. Verkhivker, P.A. Rejto, D. Bouzida, S. Arthurs, A.B. Colson, S.T. Freer, D.K. Gehlhaar, V. Larson, B.A. Luty, T. Marrone, P.W. Rose, Chem. Phys. Lett. 337, 181 (2001) K.P. Ravindranathan, E. Gallicchio, R.A. Friesner, A.E. McDermott, R.M. Levy, J. Am. Chem. Soc. 128, 5786 (2006)

5 Protein Folding and Binding

119

81. F. Rao, A. Caﬂisch, J. Chem. Phys. 119, 4035 (2003) 82. M.M. Seibert, A. Patriksson, B. Hess, D. van der Spoel, J. Mol. Biol. 354, 173 (2005) 83. D.A. Kofke, J. Chem. Phys. 117, 6911 (2002) 84. A. Kone, D.A. Kofke, J. Chem. Phys. 122, 206101 (2005) 85. C. Predescu, M. Predescu, C.V. Ciobanu, J. Chem. Phys. 120, 4119 (2004) 86. C. Predescu, M. Predescu, C.V. Ciobanu, J. Phys. Chem. B 109, 4189 (2005) 87. N. Rathore, M. Chopra, J.J. de Pablo, J. Chem. Phys. 122, 024111 (2005) 88. S. Trebst, M. Troyer, U.H.E. Hansmann, J. Chem. Phys. 124, 174903 (2006) 89. D.M. Zuckerman, E. Lyman, J. Chem. Theory Comput. 2, 1200 (2006) 90. D.M. Zuckerman, J. Chem. Theory Comput. 2, 1693 (2006) 91. D.A.C. Beck, G.W.N. White, V. Daggett, J. Struct. Biol. 157, 514 (2007) 92. S.I. Segawa, M. Sugihara, Biopolymers 23, 2473 (1984) 93. M. Oliveberg, Y.J. Tan, A.R. Fersht, Proc. Natl. Acad. Sci. USA 92, 8926 (1995) 94. V. Mu˜ noz, P.A. Thompson, J. Hofrichter, W.A. Eaton, Nature 390, 196 (1997) 95. M. Karplus, J. Phys. Chem. B 104, 11 (2000) 96. P. Ferrara, J. Apostolakis, A. Caﬂisch, J. Phys. Chem. B 104, 5000 (2000) 97. W.Y. Yang, M. Gruebele, Biochemistry 43, 13018 (2004) 98. M.L. Scalley, D. Baker, Proc. Natl. Acad. Sci. USA 94, 10636 (1997) 99. J.D. Bryngelson, P.G. Wolynes, J. Phys. Chem. 93, 6902 (1989) 100. M. Andrec, A.K. Felts, E. Gallicchio, R.M. Levy, Proc. Natl. Acad. Sci. USA 102, 6801 101. A.K. Felts, E. Gallicchio, D. Chekmarev, K.A. Paris, R.A. Friesner, R.M. Levy, J. Chem. Theory Comput. 4, 855 (2008) 102. W. Zheng, M. Andrec, E. Gallicchio, R.M. Levy, Proc. Natl. Acad. Sci. USA 104, 15340 (2007) 103. W. Zheng, M. Andrec, E. Gallicchio, R.M. Levy, J. Phys. Chem. B 112, 6083 (2008) 104. Y.N. Vorobjev, J.C. Almagro, J. Hermans, Proteins: Struc. Func. Gen. 32, 399 (1998) 105. W.L. Jorgensen, D.S. Maxwell, J. Tirado-Rives, J. Am. Chem. Soc. 118, 11225 (1996) 106. G.A. Kaminski, R.A. Friesner, J. Tirado-Rives, W.L. Jorgensen, J. Phys. Chem. B 105, 6474 (2001) 107. J.L. Banks, H.S. Beard, Y. Cao, A.E. Cho, W. Damm, R. Farid, A.K. Felts, T.A. Halgren, D.T. Mainz, J.R. Maple, R. Murphy, D.M. Philipp, M.P. Repasky, L.Y. Zhang, B.J. Berne, R.A. Friesner, E. Gallicchio, R.M. Levy, J. Comput. Chem. 26, 1752 (2005) 108. W.L. Jorgensen, N.A. McDonald, Theochem 424, 145 (1998) 109. W.L. Jorgensen, N.A. McDonald, J. Phys. Chem. B 102, 8094 (1998) 110. R.C. Rizzo, W.L. Jorgensen, J. Am. Chem. Soc. 121, 4827 (1999) 111. E.K. Watkins, W.L. Jorgensen, J. Phys Chem. A 105, 4118 (2001) 112. D.J. Weininger, J. Chem. Info. Comput. Sci. 28, 31 (1988) 113. J.A. Wagoner, N.A. Baker, Proc. Natl. Acad. Sci. USA 103, 8331 (2006) 114. R. Geney, M. Layten, R. Gomperts, V. Hornak, C. Simmerling, J. Chem. Theory Comput. 2, 115 (2006) 115. S.B. Ozkan, K.A. Dill, I. Bahar, Protein Sci. 11, 1958 (2002)

120 116. 117. 118. 119.

120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149.

A.K. Felts et al. F. Rao, A. Caﬂisch, J. Mol. Biol. 342, 299 (2004) N. Singhal, C.D. Snow, V.S. Pande, J. Chem. Phys. 121, 415 (2004) W.C. Swope, J.W. Pitera, F. Suits, J. Phys. Chem. B 108, 6571 (2004) W.C. Swope, J.W. Pitera, F. Suits, M. Pitman, M. Eleftheriou, B.G. Fitch, R.S. Germain, A. Rayshubski, T.L.C. Ward, Y. Zhestkov, R. Zhou, J. Phys. Chem. B 108, 6582 (2004) D.S. Chekmarev, T. Ishida, R.M. Levy, J. Phys. Chem. B 108, 19487 (2004) D.A. Evans, D.J. Wales, J. Chem. Phys. 121, 1080 (2004) D.T. Gillespie, Markov Processes: An Introduction for Physical Scientists (Academic Press, Boston, 1992) M.P. Jacobson, D.L. Pincus, C.S. Rapp, T.J.F. Day, B. Honig, D.E. Shaw, R.A. Friesner, Proteins: Struct. Funct. Bioinform. 55, 351 (2004) K. Zhu, D.L. Pincus, S. Zhao, R.A. Friesner, Proteins: Struct. Funct. Bioinform. 65, 438 (2006) J.A. Hartigan, M.A. Wong, Appl. Stat. 28, 100 (1979) A. Fiser, R.K.G. Do, A. Sali, Protein Sci. 9, 1753 (2000) Z.X. Xiang, C.S. Soto, B. Honig, Proc. Natl. Acad. Sci. USA 99, 7432 (2002) F.J. Blanco, G. Rivas, L. Serrano, Nat. Struc. Biol. 1, 584 (1994) F.J. Blanco, L. Serrano, Eur. J. Biochem. 230, 634 (1995) V. Mu˜ noz, E.R. Henry, J. Hofrichter, W.A. Eaton, Proc. Natl. Acad. Sci. USA 95, 5872 (1998) V. Mu˜ noz, L. Serrano, J. Mol. Biol. 245, 275 (1995) A. Bierzynski, P.S. Kim, R.L. Baldwin, Proc. Natl. Acad. Sci. USA 79, 2470 (1982) C. Mitchinson, R.L. Baldwin, Proteins: Struct. Funct. Genet. 1, 23 (1986) B.I. Dahiyat, S.L. Mayo, Science 278, 82 (1997) K. Zhu, M.R. Shirts, R.A. Friesner, J. Chem. Theory Comput. 3, 2108 (2007) P.I.W. de Bakker, M.A. DePristo, D.F. Burke, T.L. Blundell, Proteins: Struct. Funct. Bioinform. 51, 21 (2003) M.A. DePristo, P.I.W. de Bakker, S.C. Lovell, T.L. Blundell, Proteins: Struct. Funct. Bioinform. 51, 44 (2003) A. Ghosh, C.S. Rapp, R.A. Friesner, J. Phys. Chem. B 102, 10983 (1998) M. Andrec, D.A. Snyder, Z. Zhou, J. Young, G.T. Montelione, R.M. Levy, Proteins: Struct. Funct. Bioinform. 69, 449 (2007) D.K. Klimov, D. Thirumalai, Proc. Natl. Acad. Sci. USA 97, 2544 (2000) V. Pande, D.S. Rokhsar, Proc. Natl. Acad. Sci. USA 96, 9062 (1999) B. Ma, R. Nussinov, J. Mol. Biol. 296, 1091 (2000) P.G. Bolhuis, Proc. Natl. Acad. Sci. USA 100, 12129 (2003) E. Gallicchio, M. Andrec, A.K. Felts, R.M. Levy, J. Phys. Chem. B 109, 6722 (2005) P.R.O. Montellano, Cytochrome P450: Structure, Mechanism and Biochemistry, 2nd edn. (Plenum Press, New York, 1995) V. Guallar, R.A. Friesner, J. Am. Chem. Soc. 126, 8501 (2004) D.C. Haines, D.R. Tomchick, M. Machius, J.A. Peterson, Biochemistry 40, 13456 (2001) P.A. Williams, J. Cosme, A. Ward, H.C. Angova, D.M. Vinkovic, H. Jhoti, Nature 424, 464 (2003) P.A. Williams, J. Cosme, D.M. Vinkovic, A. Ward, H.C. Angove, P.J. Day, C. Vonrhein, I.J. Tickle, H. Jhoti, Science 305, 683 (2004)

5 Protein Folding and Binding

121

150. G.A. Schoch, J.K. Yano, M.R. Wester, K.J. Griﬃn, C.D. Stout, E.F. Johnson, J. Biol. Chem. 279, 9497 (2004) 151. T. Jovanovic, R. Farid, R.A. Friesner, A.E. McDermott, J. Am. Chem. Soc. 127, 13548 (2005)

6 Functional Unfolded Proteins: How, When, Where, and Why? H.J. Dyson, S.-C. Sue, and P.E. Wright

Abstract. Recent advances in the sequencing of whole genomes have given fascinating insights into the overall composition of the encoded proteins. Many of the amino acid sequences that have been deduced in this way have highly biased sequences and are predicted to be unfolded. A signiﬁcant number of these sequences correspond to parts of functional proteins, and in a surprising number of cases, the unstructured regions correspond to the most relevant parts of the protein for function – the actual sites for the binding of activators, repressors, and other ligands. This is particularly true for proteins involved in signaling networks – that is, signal transduction, transcriptional activation, translation, and cell cycle regulation. The intrinsically disordered regions facilitate interactions with multiple binding partners and also provide a means for eﬃciently dissociating the complex after the signal has been transduced. This article brieﬂy reviews some of the recent experimental evidence from our own and other labs, upon which these conclusions are based.

6.1 What is a Functional Unfolded Protein? As long as biochemical studies were focused on the characterization of proteins puriﬁed from cells and tissues, it was inevitable that the proteins studied were well-behaved, folded, and of a recognizable structure. Classic biochemical separations, including salting-out, column chromatography of various kinds, and gel ﬁltration, all relied on the presence of well-folded proteins. Those proteins that were incompletely folded were generally badly-behaved under these conditions, and were frequently discarded as refractory. We therefore built up a picture of the protein world where the members were in most cases folded into distinct globular states, which could be characterized by X-ray crystal structure analysis. Any unstructured regions of such proteins had to be removed or otherwise immobilized, sometimes by the packing in the crystals themselves. Order was thus equated with intact functional proteins. With the advent of genetic methods in the 1990s, culminating in the sequencing of whole genomes, it became possible to map the function of proteins by altering genes. Reﬁnement of these techniques now allows us to pinpoint

124

H.J. Dyson et al.

the areas of a given protein that are vital for its function. It was at this stage that the puzzling widespread occurrence of proteins that were clearly unstructured but nevertheless functional, was observed [1–6]. Such behavior had previously been observed for peptide hormones, rationalized as a case of speciﬁc folding upon binding to a speciﬁc receptor [7, 8]. However, the realization that this phenomenon was not only operative within cells [9], but was widespread particularly among the most important proteins in the control mechanisms of the cell was not recognized until later. The recognition came almost simultaneously from experimental and theoretical studies. Several examples of functional unstructured proteins from cellular signal transduction pathways, cell cycle control and transcriptional activation were noted [10–15]. At the same time, scanning of published genome sequences showed that there were frequently long stretches of the coded amino acid sequences that could not, by any of the rules of normal globular protein structure, form folded three-dimensional structures in water environments [16, 17]. These sequences contained, for example, repeated units of hydrophilic amino acids, or patterns of hydrophobic and hydrophilic amino acids that did not correspond to any known secondary structure. In addition, these sequences (up to 30% of protein sequences derived from published genomes) appeared to be disproportionately present in cancer-related genes [18]. Thus it appears that intrinsically unstructured proteins are found among the most important processes that go on in the cell.

6.2 Where do Functional Unfolded Proteins Occur? Functional unfolded proteins, and unfolded domains of otherwise folded proteins, frequently occur among the most important cellular processes, including signal transduction [19, 20], transcriptional regulation [21–24], regulation of translation [25] and cell cycle regulation [10]. The biological function of unstructured protein domains frequently involves coupled folding and binding [26] and the various components of a complex may show diﬀerent degrees of structure/lack of structure (Fig. 6.1).

6.3 How Are Functional Unfolded Proteins Studied? Because an unfolded or partly folded protein consists of a conformational ensemble containing a wide range of diﬀerent structures, it is impossible to obtain meaningful results from crystal structures; even if the molecule will form crystals, the resulting structure will not be representative of the ensemble in solution. It is necessary to obtain information on unfolded proteins in solution. Spectroscopic methods are therefore employed to give information on conformational preferences within the ensemble. These include circular dichroism, ﬂuorescence, Raman and NMR spectroscopy. NMR gives a great deal of site-speciﬁc information, and is preferred when NMR spectra are possible.

6 Functional Unfolded Proteins: How, When, Where, and Why?

125

Fig. 6.1. Schematic representation showing various types of disorder that may occur in proteins. Adapted by permission from [5] (Macmillan Publishers Ltd., copyright 2005)

6.4 NMR Spectra: Practical Considerations Because the chemical environments of all of the nuclei in the polypeptide chain are very similar when the chain is disordered in water solution, the NMR signals, which rely for their dispersion on small local diﬀerences in the environment, will be largely overlapped, although the resonances themselves may be quite narrow (Fig. 6.2). In the past this resulted in the study of unfolded proteins being, in most cases, abandoned. The use of 3D spectra and triple resonance experiments, as well as the availability of high-ﬁeld NMR instruments, means that the assignment of resonances in the NMR spectra of unfolded proteins is no longer a deterrent to the study of these systems by NMR. Partly folded systems can be more problematic, since they frequently consist of a series of conformations in intermediate exchange, which causes broadening of the resonance lines. The NMR spectra of unfolded proteins are assigned mainly using the intrinsic resonance dispersion of the backbone 15 N and 13 CO resonances, which are highly sequence-dependent [27]. Other problems arise if the complex that is formed is of high molecular weight – in this case the T2 relaxation time, which depends on the molecular weight, causes broadening of the resonances, although this problem can be overcome by the use of relaxation-optimized (TROSY) techniques.

126

H.J. Dyson et al.

Fig. 6.2. 1 H–15 N HSQC spectra of folded apomyoglobin at pH 6 (left) and unfolded apomyoglobin at pH 2 (right). Note the wide dispersion in the 1 H dimension in the left spectrum, and the narrow dispersion on the right. Also, the cross peaks are broader in the left spectrum, due to isotropic tumbling of the folded, globular protein. The cross peaks are narrower in the right spectrum due to rapid segmental motion of the unfolded polypeptide chain

6.5 Dynamic Complexes in CBP Our group has been particularly interested in the transcriptional activator CBP and its partners, which show a wide range of diﬀerent modes of interaction of unstructured and partly folded proteins (Fig. 6.3). The ﬁrst CBP system where an unstructured component was identiﬁed was the KIX domain and its partner pKID, the phosphorylated kinase-inducible domain of CREB [13, 14]. KIX is a folded domain, but pKID is unstructured in solution, becoming folded into a pair of helices when bound to the KIX domain (Fig. 6.4). The mechanism of the coupled folding and binding process for the pKID–KIX system has recently been elucidated by NMR, utilizing HSQC titrations and relaxation dispersion measurements [28]. These results are described in more detail in Chap. 1 (Wright). A particularly intriguing example occurs in the complex of the interaction domain of ACTR and the nuclear coactivator binding domain (NCBD) of CBP. CD spectra show that although neither of the free proteins is cooperatively folded, the complex is folded and stable. The 3D structure of the complex [23] demonstrates one of the rationales for the existence of intrinsically unstructured proteins: the surface area of contact between the two proteins (Fig. 6.5) is much larger than could be expected from the interaction of folded proteins of comparable size, as has been pointed out [29]. Another functional application of intrinsically unstructured proteins is illustrated by the complex between the TAZ1 domain of CBP and the interaction domain of the hypoxia-inducible factor, HIF-1α. Like the KIX–pKID complex, the TAZ1–HIF-1α complex involved the folding of an unstructured

6 Functional Unfolded Proteins: How, When, Where, and Why?

127

Fig. 6.3. Schematic representation of the domain structure of human CREB-binding protein CBP. Folded domains are shown as spheres. Adapted by permission from [5] (Macmillan Publishers Ltd., copyright 2005)

Fig. 6.4. Illustration of the unfolded nature of the phosphorylated kinase-inducible domain (pKID) of CREB (left) and its conformation after folding upon binding to the KIX domain of CBP (right). The mechanism of this process has recently been elucidated by NMR [28] and is discussed more fully in Chap. 1 (Wright). Adapted by permission from [5] (Macmillan Publishers Ltd., copyright 2005)

partner (in this case HIF-1α) onto a folded domain (in this case TAZ1). The 3D structure of the complex [21] shows not only the extensive surface area of contact seen for other such complexes, but illustrates the operation of a biological switch, another major rationale for the participation of unstructured proteins in systems such as this. The TAZ1–HIF-1α interaction is primed by the presence or absence of a hydroxyl group on a particular asparagine residue, Asn803, in HIF-1α. The enzyme that accomplishes this hydroxylation reaction, termed FIH, binds the sequence containing Asn803 as part of a β-strand, according to the crystal structure [30] (Fig. 6.6a), but the same sequence is present in a well-formed helix in the NMR structure of the TAZ1–HIF-1α complex (Fig. 6.6b). That is, the same sequence can take up two functionally important, quite diﬀerent, structures, as a consequence of its conformational freedom as an intrinsically unstructured protein in the uncomplexed state.

128

H.J. Dyson et al.

Fig. 6.5. Illustration of the extensive binding surface of the ACTR domain on the NCBD of CBP. The right-hand structure is obtained by rotation of the left-hand structure in the manner indicated by the arrow. The backbone and side chains of ACTR are indicated by a wire, while the CBP is represented by a van der Waals surface. Adapted by permission from [23] (Macmillan Publishers Ltd., copyright 2002)

Fig. 6.6. Conformations of the HIF-1α sequence containing the regulatory asparagine 803 that is hydroxylated under normoxic conditions. (a) Extended conformation in the X-ray crystal structure of the complex with the hydroxylating enzyme FIH [30]. (b) helical conformation in the NMR structure of the complex with the TAZ1 domain of CBP [21]

6.6 Role of Flexibility in the Function of IκBα One of the major roles of intrinsically unstructured proteins and domains, as well as partially folded domains and domains that undergo signiﬁcant internal motion is in cellular signaling. The dynamic nature of such systems makes

6 Functional Unfolded Proteins: How, When, Where, and Why?

129

them well-suited to the reception, transduction, and eventual turning-oﬀ of cellular signals. Indeed, the cessation of the response upon removal of the signal is a vital part of the process, and is frequently accomplished by integration of signaling pathways with the proteolytic destruction of the intermediary molecules, many of which are partly or completely unstructured. The interaction of NF-κB with IκB provides a wealth of examples of several diﬀerent kinds of order–disorder processes. This work was started in our lab as a collaboration with Dr. E.A. Komives at the University of California, San Diego. Nuclear factor-kappaB (NF-κB) is a dimeric transcription factor widely employed for the transcription of stress-response genes, as it binds to κB upstream enhancer DNA sequences, where it recruits the transcriptional activator CBP. In an unstressed cell, the majority of the NF-κB resides in the cytoplasm, in complex with the inhibitor of NF-κB (IκB). Response to stress involves phosphorylation and ubiquitination of IκB and its subsequent degradation by the proteasome. The free NF-κB is transported to the nucleus, where it binds to the κB enhancer sequences and mediates the transcription of genes that include that of IκB, which acts subsequently to remove NF-κB from the DNA and return it to the cytoplasm as the NF-κB–IκB complex. A number of X-ray crystal structures of complexes of NF-κB have illustrated the interactions that occur with DNA and with IκBα [31–33] (Fig. 6.7). The most common form of NF-κB consists of a heterodimer of two proteins,

Fig. 6.7. (a) X-ray crystal structure of the complex of the p50/p65 heterodimer of NF-κB with the cognate DNA sequence [32]. Adapted by permission from [32] (Macmillan Publishers Ltd., copyright 1998). (b) X-ray crystal structure of the complex of the p50/p65 heterodimer of NF-κB with IκBα [31]. Adapted by permission from [31] (Elsevier, copyright 1998)

130

H.J. Dyson et al.

p65 and p50, which each consist of two immunoglobulin-like domains, together with various linker sequences that are unstructured in solution. Figure 6.7a shows that the N-terminal domains of each of the two molecules form the major sites of DNA binding, while Fig. 6.7b shows that the interaction with IκBα occurs with the two dimerization domains. IκBα is seen from Fig. 6.7b to consist of an ankyrin-repeat (ANK) structure containing six ankyrin repeats. As well as binding to the dimerization domains of p65 and p50, IκBα appears to form a cooperative interaction with the nuclear localization signal(NLS) of p65: this observation was used to form a hypothesis about the mechanism of inhibition by IκBα. By this hypothesis, NF-κB binds DNA in an open conformation (Fig. 6.7a), but when IκBα binds NF-κB, the N-terminal domain of p65 rotates into the DNA-binding site (the N-terminal domain of p50 is missing in the X-ray structure). Binding of the p65 NLS causes the complex to remain largely in the cytoplasm. Upon activation, IκBα is removed and targeted for degradation, releasing the NLS, which allows NF-κB to be transported to the nucleus for gene activation. This picture does not give the complete story. The interaction of NF-κB and IκBα is mediated and orchestrated by changes in ﬂexibility and motion in both molecules. Parts of IκBα are highly ﬂuxional in the free protein, and diﬀerent parts appear to be ﬂuxional in the complex, which may be functionally relevant. Initial evidence for the ﬂuxional nature of IκBα came from H/D exchange monitored by mass spectrometry [34,35]. These studies demonstrated that while amide protons in the ﬁrst four ankyrin repeats remained protected either in the free protein or when they were bound to NF-κB, repeats 5 and 6 were highly exchanged in the free protein but not in the complex. Figure 6.7b shows that all of the ankyrin repeats are equally well-structured in the complex, and ankyrin repeats are normally highly stable proteins. Further circumstantial evidence for motion or heterogeneity in IκBα comes from the inability of the protein to form crystals in the free state. Although repeat 6 has a lower similarity to the consensus ankyrin repeat sequence, neither repeat 5 nor repeat 6 appears more likely than repeats 1–4 to form a stable structure. We decided to apply NMR to the problem, to quantitate structural and dynamic diﬀerences both between individual repeats and between free and bound IκBα. The initial spectra of a construct of IκBα containing repeats 1–6 showed that only some parts of the protein give rise to observable cross peaks. We were able to show that, consistent with the mass spectrometry H/D results, the signals that were observed arose from repeats 1–4, which are well-structured in the free protein. The remaining signals are badly broadened, and some are completely missing, indicating that there is conformational exchange within repeats 5 and 6, probably on an intermediate time scale. This circumstance made our stated aim of comparing dynamic behavior of the free protein with the bound protein more diﬃcult. We developed a streamlined production method that takes advantage of the diﬀerential expression levels of p50 and p65 in the E. coli expression system. Using this

6 Functional Unfolded Proteins: How, When, Where, and Why?

131

Fig. 6.8. Schematic diagram showing the assignment strategy for the 94 kDa complex of p50/p65 with IκBα67–287 based on transfer of assignments from smaller proteins and complexes. The top row shows putative structures of the complexes, modeled from X-ray crystal structures [Jacobs and Harrison [31] for (a–d) and Chen et al. [32] for (d)]. The approximate position of the ﬂexible PEST sequence is indicated by a dotted line. The bottom row shows the 600 MHz 1 H–15 N HSQC spectra (a, b) or 900 MHz TROSY-type HSQCs (c, d) for each IκBα fragment. (a) [2 H, 15 N, 13 C]-labeled IκBα67–206 (15 kDa), (b) [2 H, 15 N, 13 C]-IκBα67–206 in complex with p65 NLS (residues 289–321 of human p65) (19 kDa), (c) [2 H, 15 N, 13 C]-IκBα67–287 in complex with the heterodimer of the C-terminal dimerization domains of p50 and p65 (52 kDa), (d) [2 H, 15 N, 13 C]-IκBα67–287 in complex with p50/p65

method, we were able to produce diﬀerentially labeled complexes of IκBα and NF-κB, and were able to complete the resonance assignments of IκBα even in very large complexes containing both the dimerization and DNA-binding domains of both p65 and p50, as well as IκBα. Since very large complexes cause diﬃculty in resonance assignment, we transferred assignments from smaller complexes to larger ones, as described for other large systems [36, 37]. The process is shown in Fig. 6.8. Problems remain – the assignments of both free and bound IκBα are far from complete, since a signiﬁcant number of resonances are missing from both sets of NMR spectra, mainly in repeats 5 and 6 of the free protein and in repeat 3 of the bound protein. However, if we infer that these resonances are missing due to a dynamic process, we can use this information to build up a picture of the dynamics of IκBα in the presence and absence of NF-κB. Figure 6.9 shows the backbone nitrogens of the missing resonances mapped onto the backbone of IκBα in the NF-κB complex (there is no structural information on the free form of IκBα). Missing resonances abound in repeats 5 and 6 of the

132

H.J. Dyson et al.

Fig. 6.9. Mapping of missing resonances onto IκBα Left: Representation of the ankyrin repeat structure of IκBα derived from the X-ray structure (no direct structural information is available for free IκBα showing missing residues mainly in ANK5 and 6. Right: Structure of the ankyrin repeat region of IκBα from the X-ray structure [31] showing missing resonances in ANK3

free protein, but this appears to be the best-structured region in the bound protein. Surprisingly, the ﬂexibility measured by missing resonances appears to be enhanced in repeat 3 of the bound protein, compared to the free protein. We can therefore classify IκBα into four regions according to their dynamic characteristics (Fig. 6.10). Region 1, comprising the majority of ankyrin repeats 1 and 2, appears to be well-folded and stable in both the free protein and in the complexes with NF-κB, as shown by the presence of most resonances in the NMR spectra, the high protection factors in the H/D exchange experiments and the uniform values of the 1 H–15 N NOE and other relaxation measurements. According to the crystal structure of the complex, Region 1 makes intimate contact with the NLS of p65. We know from a comparison of the dispersion of the NMR spectra of the NLS when free or bound to IκBα that these 20 residues of p65 are unstructured in solution in the free state, but become well-structured in the complex. Thus, Region 1 of IκBα provides a structured scaﬀold upon which the intrinsically unstructured NLS can bind in a speciﬁc manner.

6 Functional Unfolded Proteins: How, When, Where, and Why?

133

Fig. 6.10. Regions of IκBα with diﬀerent dynamic properties in the free and complexed states (see text). Region 1 consists of parts of ANK1 and 2, and appears rigid in both free and complexed states. Region 2, consisting of ANK3 and part of ANK4, is more rigid in the free state than in the complexed state. Region 3 consists of ANK5 and part of ANK6, which are more rigid in the complexed state than in the free state. Region 4 consists of the C-terminal portion of ANK 6 and the PEST sequence; this region is ﬂexible in both free and complexed states. Note that the C-terminal helix of ANK4 (marked with asterisk) is not included in any of these regions, as it appears to be rigid in both free and complexed IκBαu Adapted by permission from Sue et al. [38]

Region 2 comprises much of ankyrin repeat 3 and the N-terminal part of repeat 4. This region shows some enigmatic properties. According to the NMR H/D exchange measurements, this region is well-structured in the free state, but is destabilized to exchange in the complex, consistent with the loss or broadening of many of the resonances in repeat 3 upon complex formation (Fig. 6.9). This region of IκBα spans the interval between repeats 1 and 2, bound to the NLS in the complex, and repeats 5–6, which are bound to the bulk of the dimerization domains of p50 and p65 in the complex. Thus, we may expect that Region 3 might undergo some intermediate time-scale exchange processes concomitant with segmental motion of the two ends of the complex. Region 3 consists largely of repeat 5 and the N-terminal part of repeat 6. This region is distinguished by segmental motion on an intermediate time scale in the free state, such that many of the resonances are completely missing and others are severely broadened. Yet upon complexation, this region becomes well structured, with high protection factors and well-dispersed resonances. Clearly in this case there is a transition from less-structured to

134

H.J. Dyson et al.

more-structured upon complex formation. Region 3 shows the classic foldingupon-binding behavior that is frequently observed for intrinsically unstructured domains [26]. Finally, Region 4 remains unstructured in both the free and complexed IκB protein. This region, containing the C-terminal portion of ankyrin repeat 6 and the PEST sequence, might well undergo conformational transitions to a more structured form in other contexts. For example, this region is thought to be involved in the removal of NF-κB from the DNA after the signal is no longer needed [33]. Given the wide variety in the behavior of structurally similar ankyrin repeats in IκBα, it is interesting to speculate about the possible reasons. Part of the activation of NF-κB in the cytoplasm in response to signaling involves the dissociation and degradation of IκBα. The presence of a rather mobile, solvent-accessible region such as is seen for Region 2 in the complex, might predispose the complex to dissociation, perhaps in the presence of an accessory factor associated with the phosphorylation and ubiquitylation process that ultimately decide the fate of the IκBα molecule. From a thermodynamic standpoint, the loss of conformational entropy that accompanies the formation of the stable and rigid structure of Region 3 in the complex from a relatively ﬂexible form in the free state would require a considerable enthalpic term for the complex to be formed. However, this complex must be readily dissociated in response to a signal, so the complex cannot be too stable – a compromise position may be to transfer some of the entropy loss from repeats 5 and 6 as the complex is formed, to repeat 3, thus lowering the requirement for a large enthalpy term. The NF-κB–IκBα system provides examples of many diﬀerent types of unfolded protein interactions, which are uniﬁed into a delicately balanced set of interactions that enable NF-κB to be rapidly deployed in response to cellular signaling. However, the means by which nuclear IκBα dissociates NF-κB from the κB site on the DNA after its job is done is not at all clear from structural studies, and remains an intriguing challenge to future spectroscopic studies. Acknowledgments We thank Elizabeth Komives, Stephanie Truhlar, Carla Cervantes, Gourisankar Ghosh, Maria Yamout, and Gerard Kroon for helpful discussions. This work was supported by grant GM71862 from the National Institutes of Health.

References 1. P.E. Wright, H.J. Dyson, J. Mol. Biol. 293, 321 (1999) 2. V.N. Uversky, Protein Sci. 11, 739 (2002) 3. A.K. Dunker, C.J. Brown, Adv. Protein Chem. 62, 25 (2002)

6 Functional Unfolded Proteins: How, When, Where, and Why? 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.

135

P. Tompa, Trends Biochem. Sci. 27, 527 (2002) H.J. Dyson, P.E. Wright, Nat. Rev. Mol. Cell Biol. 6, 197 (2005) R.B. Russell, T.J. Gibson, FEBS Lett. 582, 1271 (2008) C. B¨ osch, A. Bundi, M. Oppliger, K. W¨ uthrich, Eur. J. Biochem. 91, 209 (1978) X. He, D. Chow, M.M. Martick, K.C. Garcia, Science 293, 1657 (2001) A.J. Daniels, R.J.P. Williams, P.E. Wright, Neuroscience 3, 573 (1978) R.W. Kriwacki, L. Hengst, L. Tennant, S.I. Reed, P.E. Wright, Proc. Natl. Acad. Sci. USA 93, 11504 (1996) G.W. Daughdrill, M.S. Chadsey, J.E. Karlinsey, K.T. Hughes, F.W. Dahlquist, Nat. Struct. Biol. 4, 285 (1997) G.W. Daughdrill, L.J. Hanely, F.W. Dahlquist, Biochemistry 37, 1076 (1998) I. Radhakrishnan, G.C. P´erez-Alvarado, D. Parker, H.J. Dyson, M.R. Montminy, P.E. Wright, Cell 91, 741 (1997) I. Radhakrishnan, G.C. P´erez-Alvarado, H.J. Dyson, P.E. Wright, FEBS Lett. 430, 317 (1998) D. Liu, R. Ishima, K.I. Tong, S. Bagby, T. Kokubo, D.R. Muhandiram, L.E. Kay, Y. Nakatani, M. Ikura, Cell 94, 573 (1998) P. Romero, Z. Obradovic, C.R. Kissinger, J.E. Villafranca, E. Garner, S. Guilliot, A.K. Dunker, Pac. Symp. Biocomput. 3, 437 (1998) P. Romero, Z. Obradovic, C.R. Kissinger, J.E. Villafranca, A.K. Dunker, Proc. IEEE Int. Conf. Neural Networks 1997, 90 (1997) L.M. Iakoucheva, C.J. Brown, J.D. Lawson, Z. Obradovic, A.K. Dunker, J. Mol. Biol. 323, 573 (2002) N. Abdul-Manan, B. Aghazadeh, G.A. Liu, A. Majumdar, O. Ouerfelli, K.A. Siminovitch, M.K. Rosen, Nature 399, 379 (1999) A.H. Huber, D.B. Stewart, D.V. Laurents, W.J. Nelson, W.I. Weis, J. Biol. Chem. 276, 12301 (2001) S.A. Dames, M. Martinez-Yamout, R.N. De Guzman, H.J. Dyson, P.E. Wright, Proc. Natl. Acad. Sci. USA 99, 5271 (2002) R.N. De Guzman, M. Martinez-Yamout, H.J. Dyson, P.E. Wright, J. Biol. Chem. 279, 3042 (2004) S.J. Demarest, M. Martinez-Yamout, J. Chung, H. Chen, W. Xu, H.J. Dyson, R.M. Evans, P.E. Wright, Nature 415, 549 (2002) N.K. Goto, T. Zor, M. Martinez-Yamout, H.J. Dyson, P.E. Wright, J. Biol. Chem. 277, 43168 (2002) P.E. Hershey, S.M. McWhirter, J.D. Gross, G. Wagner, T. Alber, A.B. Sachs, J. Biol. Chem. 274, 21297 (1999) H.J. Dyson, P.E. Wright, Curr. Opin. Struct. Biol. 12, 54 (2002) J. Yao, H.J. Dyson, P.E. Wright, FEBS Lett. 419, 285 (1997) K. Sugase, H.J. Dyson, P.E. Wright, Nature 447, 1021 (2007) K. Gunasekaran, C.J. Tsai, S. Kumar, D. Zanuy, R. Nussinov, Trends Biochem. Sci. 28, 81 (2003) J.M. Elkins, K.S. Hewitson, L.A. McNeill, J.F. Seibel, I. Schlemminger, C.W. Pugh, P.J. Ratcliﬀe, C.J. Schoﬁeld, J. Biol. Chem. 278, 1802 (2003) M.D. Jacobs, S.C. Harrison, Cell 95, 749 (1998) F.E. Chen, D.B. Huang, Y.Q. Chen, G. Ghosh, Nature 391, 410 (1998) T. Huxford, D.B. Huang, S. Malek, G. Ghosh, Cell 95, 759 (1998) C.H. Croy, S. Bergqvist, T. Huxford, G. Ghosh, E.A. Komives, Protein Sci. 13, 1767 (2004)

136

H.J. Dyson et al.

35. S.M. Truhlar, J.W. Torpey, E.A. Komives, Proc. Natl. Acad. Sci. USA 103, 18951 (2006) 36. J. Fiaux, E.B. Bertelsen, A.L. Horwich, K. Wuthrich, Nature 418, 207 (2002) 37. R. Sprangers, L.E. Kay, Nature 445, 618 (2007) 38. Sue et al., J. Mol. Biol. 380, 917 (2008)

7 Structure of the Photointermediate of Photoactive Yellow Protein and the Propagation Mechanism of Structural Change M. Kataoka and H. Kamikubo Abstract. In order to understand the molecular mechanism of a protein function, it is important to reveal the conformational change of the protein during functioning. Time-resolved X-ray crystallography has been utilized to reveal the structural change during functioning, and has revealed the local structural change after triggering. However, global conformational changes which are demonstrated by solution studies with various spectroscopic measurements are generally diﬃcult to observe through time-resolved crystallography. Furthermore, the structural properties of folding intermediates cannot be revealed by crystallography. Solution X-ray scattering (SOXS) is one of the powerful techniques to study solution structure of a protein and its change. We will describe the solution structure analysis of the photointermediate of a light-absorbing protein by high-angle solution X-ray scattering.

7.1 Solution X-ray Scattering Solution X-ray scattering (SOXS) experiments at small angle region (SAXS) give the overall structural parameters of a protein, such as the radius of gyration, the maximum dimension of the particle, and the molecular shape, under various physiological conditions [1,2]. Low-resolution structural models can be constructed without any assumptions by SAXS proﬁle. This so-called ab initio shape prediction [3, 4] is widely used to characterize protein structures under physiological conditions [5, 6]. On the other hand, high-angle proﬁles contain information about secondary structure packing and tertiary folds [7–11]. It is also suggested that the high angle scattering is sensitive to the subtle structural change [8]. Furthermore, high angle scattering is quite useful for characterizing the structure of folding intermediates [12–14] as well as the protein folding process [15]. Although some theoretical treatments have been proposed to analyze high angle scattering [7, 8], no successful application to derive the structural information on real proteins has been reported. This is mainly due to the diﬃculties in observing high angle scattering proﬁle with high accuracy. When we observed high angle scattering of hemoglobin solution

138

M. Kataoka and H. Kamikubo

with the second generation synchrotron, Photon Factory, it required a fairly high concentration (100 mg ml−1 ) and a long exposure time (10 min) [9]. The detector used was a one-dimensional position sensitive proportional counter. However, recent improvements in two-dimensional X-ray detectors and the availability of third-generation synchrotron radiation sources have improved the quality of X-ray solution scattering proﬁles even in the higher angle re−1 gion with momentum transfer (Q) values up to 6 ˚ A . We can observe high angle scattering of photoactive yellow protein (PYP) with 5 mg ml−1 and 1min exposure. Quantitative analysis of high angle scattering is now required. A promising method would be the combination of molecular dynamics simulation and high-angle solution scattering [8]. Here we describe the structural change of PYP upon light absorption by high angle scattering combined with the ﬂuctuation analysis [16].

7.2 Photoactive Yellow Protein Photoactive yellow protein (PYP) is a putative photoreceptor of negative phototaxis in the purple phototropic bacterium Halorhodospira halophila [17, 18]. PYP is a prototype of PAS domain which is conserved in various proteins mediated in signal transduction [19]. Crystal structure revealed that PYP is composed of four segments, namely, an N-terminal cap (residues 1–28), a PAS core (residues 29–69), a helical connector (residues 70–87), and a β scaffold (residues 88–125) [19, 20]. We refer to the latter three segments as the chromophore-binding region. Absorption of a photon by the chromophore, p-coumaric acid, triggers the isomerization of the chromophore [21] and the subsequent thermal reaction cycle [22–26]. The blue-shifted reaction intermediate PYPM , which has also been referred to as I2 or pB and forms over a timescale of ∼100 μs, is assumed to be the active state. Although the target molecule of PYP has not been identiﬁed, structural information about PYPM is crucial for understanding the molecular mechanism of PYPdependent photosignal transduction. According to time-resolved crystallography, the structural changes in PYPM were conﬁned to the area near the chromophore [27, 28]. The large change is only observed for R52, which is located inside the protein in a ground state, but exposed to solvent at PYPM . On the other hand, substantial conformational changes in the protein moiety of PYPM in solution have been reported [29–39]. An interesting aspect of the photoreaction of PYP is the similarity to the protein folding/unfolding reaction. Hellingwerf and his coworkers applied the transition state theory to the photoreaction of PYP and estimated the thermodynamic parameters, the entropy, enthalpy, and heat capacity changes of activation [29]. They also carried out thermodynamic analysis on the thermal denaturation of PYP. Consequently, they found that the heat capacity changes in the photoreaction are comparable to those in the unfolding

7 Structure of the Photointermediate of Photoactive Yellow Protein

139

reaction. We performed the urea denaturation experiments on PYP and PYPM [30]. PYPM is more sensitive to urea than PYP. The free energy change upon denaturation is estimated as 11.0–11.5 kcal mol−1 for PYP and 7.6–7.8 kcal mol−1 for PYPM . Taking into account the fact that the isomeric state of the chromophore of the denatured state of PYPM is diﬀerent from that of PYP, the free energy diﬀerence in protein moiety between PYP and PYPM is estimated to be 6.5–11.5 kcal mol−1 , which is comparable to the diﬀerence between the native state and the molten globule state in soluble proteins [30]. We concluded that PYPM has a property of the partially unfolded state. We observed the signiﬁcant diﬀusion constant change upon formation of PYPM by the transient grating method in collaboration with Terazima [31, 32]. The diﬀusion constant change is well explained by the unfolding of α-helical moiety in the N-terminal region. Most of HSQC peaks assigned to the N-terminal region disappear upon the formation of PYPM [33, 34]. The loss of α-helical content is also observed by CD [35]. However the controversial conclusion was obtained by the fragmentation and H/D exchange mass spectroscopic analysis [36]. Therefore, detailed structural information about PYPM in solution is required to clarify the mechanism underlying the phototransduction. There are two ways to study the structure of a short-lived photointermediate: the kinetic measurement and the static measurement. For high angle X-ray scattering, static measurement is preferable, because the analysis of kinetic data depends on the kinetic model. Chymotrypsin cleaves PYP at the C-terminal sides of the 6th, the 15th, and the 23rd residues [40], which will be called T6 (residues 7–125), T15 (residues 16–125), and T23 (residues 24–125), respectively hereafter. The absorption spectrum of each truncated PYP is identical to that of the intact PYP, indicating that the structure of the chromophore-binding region is not perturbed by the truncations [40]. The lifetime of PYPM for T6, T15, and T23 are 30, 300, and 600 s, respectively. The lifetime of PYPM of intact PYP is only 0.3 s. Therefore, these truncated forms are suitable for the structure analysis of the M intermediate. The crystal structure of PYP and the truncated parts are shown in Fig. 7.1.

7.3 Solution Structure Analysis of Photointermediate of PYP 7.3.1 High-Angle X-ray Scattering of PYP in the Dark and in the Light The N-terminal deletions of PYP may aﬀect the scattering proﬁle. Figure 7.2 (right) shows the experimentally observed scattering proﬁles of intact PYP and three truncated variants (T6, T15, and T23). The proﬁle of intact PYP −1 has two broad peaks at Q = 0.35 and 0.55 ˚ A , with a valley around Q = −1 0.41 ˚ A . In T6, the intensity of the peak at the lower Q value increases,

140

M. Kataoka and H. Kamikubo

Fig. 7.1. Crystal structure of PYP and the truncated position by chymotrypsin treatment

Fig. 7.2. High-angle scattering proﬁles of wild type PYP, T6, T15, and T23 measured in solution (left), and calculated from the respective atomic structural models (right) [16]

while the peak position shifts toward a higher Q value. At the same time, the intensity of the peak at the higher Q value decreases and the valley shifts toward a higher Q value. On the other hand, both T15 and T23 resulted −1 in similar scattering proﬁles with a single maximum around Q = 0.39 ˚ A . These characteristic proﬁles indicate that the scattering proﬁle in this Q region reﬂected intramolecular interference. The crystal structure of PYP (Fig. 7.1) explains the experimentally observed proﬁles satisfactorily. The theoretical proﬁle of intact PYP has two broad peaks at the same positions as those observed in the experimentally obtained curve (Fig. 7.2 right). The theoretical proﬁles for T6 and T23

7 Structure of the Photointermediate of Photoactive Yellow Protein

141

were also similar to the respective observed proﬁles. The agreement between the calculated proﬁles and the observed proﬁles indicates that the structures of T6 and T23 as well as that of intact PYP can be explained by removing the corresponding residues from the crystal structure (Fig. 7.1). The theoretical proﬁle for T15 appeared to be an intermediate between those for T6 and T23, and was diﬀerent from the observed proﬁle of T15. As shown in Fig. 7.1, the truncated position of T15 is at the center of α-helix. After removing 15 residues, the helix may be no longer stable, resulting in the disappearance of the interference between the N-terminal region and the rest of the protein. The X-ray scattering proﬁles of T6, T15, and T23 were measured under continuous illumination. Due to the long lifetime of the M intermediate for the truncated form, we can expect that more than 90% of the protein is in the PYPM state under continuous illumination [41]. Figure 7.3 shows the intensity proﬁles of the M intermediates of the truncated PYP variants compared with those obtained for their dark states. Signiﬁcant diﬀerences between the two states were observed for each truncated PYP. The proﬁles of the PYPM intermediates of the three truncated PYP variants are similar with two broad peaks

Fig. 7.3. High-angle X-ray scattering proﬁles of T6, T15, and T23 under illumination (circles with error bars) [16]: As a reference, the proﬁles of the dark states are shown (dashed lines). The characteristic bimodal proﬁles observed under the illumination are noted by the arrowheads

142

M. Kataoka and H. Kamikubo

located at the same positions (arrowheads in the ﬁgure). The characteristic proﬁle changes in T23, which lacks most of the N-terminal cap, indicate rearrangements of the secondary structure packing in the chromophore-binding region. The proﬁles of the PYPM of the three truncated PYP variants can be superimposed on the log–log plot. The diﬀerences among the proﬁles appear in −1 the valley around Q = 0.3 ˚ A , where the shape scattering and the intramolecular interference scattering overlap. In order to derive the contribution from the secondary structure packing, the contribution of the shape scattering proﬁles were subtracted from the original proﬁle. In general, the ﬁnal slope of the shape scattering can be described as Q−α , where α is related to the fractal dimension [42] or the protein conformational state [12]. The ﬁnal slope of the shape scattering from each truncated variant is well approximated by a straight line in a log–log plot. The slope gives the value α. The excess intensity due to the shape scattering thus estimated was subtracted to derive the corrected intramolecular interference proﬁle of the PYPM intermediate for each truncated PYP. All the corrected proﬁles were identical within the statistical errors, indicating that the N-terminal regions of T6 and T15 did not inﬂuence the intramolecular interference scattering. 7.3.2 Analysis of High Angle Scattering The change in the proﬁle of T23 indicates a signiﬁcant rearrangement in the secondary structure packing of the chromophore-binding region during the formation of PYPM . On the basis of the obtained proﬁle, we attempted to construct a solution structural model of PYPM , especially for the chromophorebinding region. We attempted to generate plausible conformations from a variety of structures derived from the crystal structure of PYP using the high-angle X-ray scattering proﬁle as a boundary condition. The 500 structures were constructed using the CONCOORD program [43]. The high angle scattering proﬁle of each generated structure was calculated by CRYSOL [44]. Most of the structures showed the proﬁles similar to the proﬁle of the dark −1 state of T23 (a single broad peak at Q = 0.39 ˚ A ), some structures showed proﬁles with the bimodal shape observed for the PYPM intermediate of T23. We selected the structures that satisﬁed the following two properties in the calculated scattering proﬁle as the candidate models of the PYPM structure: −1 (1) the peak position was observed at Q < 0.39 ˚ A ; and (2) a clear shoulder −1 was present around Q = 0.6 ˚ A . Consequently, 51 structures from the 500 structures were selected. The average of the selected structures is adopted as a structure model of the chromophore-binding region of PYPM . According to the model, the loop between β4 and β5, and the α4 helix that envelop the chromophore-binding pocket in the dark state of the protein move away from each other, opening the chromophore-binding pocket. The root-mean-square deviation of the model structure of PYPM from the structure of intact PYP

7 Structure of the Photointermediate of Photoactive Yellow Protein

143

suggests that the structural changes in PYPM are localized in the N-terminal tail (residues 24–28), the α4 helix (residues 55–58), and the loop connecting β4 and β5 (residues 96–102). The structure of the N-terminal region of T6 is similar to that of the dark state of the wild-type protein. It, however, undergoes large structural changes during the formation of PYPM that abrogate the intramolecular interference between the N-terminal and chromophore-binding regions. The lack of interference strongly suggests that the N-terminal region is substantially disordered and moves stochastically in PYPM . In fact, small angle X-ray scattering analysis indicated that the N-terminal region moves away from the chromophore-binding region. Taking all these into consideration, a schematic structural model for wild-type PYPM was built by combining the structural model of the chromophore-binding region of the PYPM intermediate of T23 with the structural ﬂuctuation of the N-terminal region predicted by the results for T6 (Fig. 7.4). The photosignal generated by the chromophore is propagated to N-terminal tail (residues 24–28), the α4 helix (residues 55–58), and the loop connecting β4 and β5 (residues 96–102). The propagation direction of the structural changes is consistent with the analysis by fragmentation and mass spectroscopy [36]. The NMR structures of PYP lacking the N-terminal 25 residues were reported under the dark and illuminated conditions [34]. In the NMR structure of the M intermediate, the three regions at residues 42–58, 63–78, and 96– 103 (the amino-acid positions in intact PYP) are highly disordered to bring the exposure of the hydrophobic chromophore to the solvent. Although the structural changes in the α4 helix (residues 55–58) and the loop connecting

Fig. 7.4. A schematic model of the PYPM intermediate of intact PYP (solid ribbon model) [16]: The crystal structure of the dark state of intact PYP (1NWZ; line ribbon model) is superimposed on the model

144

M. Kataoka and H. Kamikubo

β4 and β5 (residues 96–102) revealed in the present study are conserved in the NMR structure, there are signiﬁcant diﬀerences in the amplitudes of the structural displacements. In our model, the chromophore is buried inside the molecule. The scattering proﬁles of the NMR structures were calculated for the 20 NMR structures of PYPM and the dark state of the protein listed in the 1ODV and 1XFQ PDB ﬁles, respectively. The proﬁles of the 20 NMR structures of the dark state of PYP are similar to the observed proﬁle of the dark state of T23, but the calculated proﬁles of the NMR structures of PYPM are completely diﬀerent from the observed scattering proﬁle for T23. The calculated proﬁles for the NMR structures are also quite diﬀerent from each other. The increases in the calculated radii of gyration of the NMR structures (> 2˚ A) are also larger than the observed value (∼0.7 ˚ A) [41], indicating that the NMR structures are not as compact as the native solution structure. Although the reason that NMR produced such highly disordered structures is unclear, the poor distance restraints in these regions may not yield good convergent structures, resulting in the divergent features of the obtained models. Our structural model is supported by the molecular dynamics study [45].

7.4 Propagation Mechanism of the Structural Change The ﬁrst event after light absorption by PYP is a proton transfer from the E46 to the chromophore [11, 46]. In the dark state, it is considered that the chromophore is deprotonated and E46 is protonated. Therefore, E46 was postulated as the direct proton donor for the chromophore. However, it is suggested that the protonation of the chromophore is independent of deprotonation of E46 [47–49]. The large conformational change of PYPM is closely related to the protonated state of E46. The key property in understanding these ﬁndings is an interaction between the chromophore and E46. The recent high resolution crystal structure analysis of PYP revealed that the hydrogen bond formed between the chromophore and E46 is an unusual strong short hydrogen bond (SSHB), where the distance between the phenolic oxygen of the chromophore and the carboxylic oxygen of E46 is 2.58 ˚ A, much shorter than the standard donor–acceptor distance [50, 51]. When the distance between the donor and the acceptor becomes shorter, the electron orbitals overlap to form a quasicovalent bond called a low-barrier hydrogen bond (LBHB) [52]. It is proposed that LBHBs are responsible for hydrolytic catalysis of serine proteases, and that they are formed at the transition states of enzymes [52, 53]. It could be possible that the SSHB in PYP is LBHB, although no direct evidence has been demonstrated. The photosignal is ﬁnally propagated to the N-terminal region. The N-terminal region interacts with the C-terminal β6 of the chromophorebinding region. Hydrogen bond would play a major role in this interaction. We prepared the site-directed substitution mutants for the putative hydrogen bonding residues, E9A, E12A, and K110A. The lifetimes of PYPM of these

7 Structure of the Photointermediate of Photoactive Yellow Protein

145

mutants were 0.98, 0.39, and 1.95 s, respectively [54]. The lifetimes of the wild type PYP and T6 are 0.29 and 29 s, respectively. Therefore, the hydrogen bonds between these residues are not essential for the structural change. On the other hand, F6A substantially prolongs the lifetime of the M intermediate, 19 s, and K123A produces no pigment. We assumed that the interaction between F6 and K123 is essential for the structural change. Both K123E and K123L do not change the photochemical property, indicating that the charge is not essential but that the alkyl chain is important [55]. The substitution mutations of F6 dramatically change the properties of PYP except for F6Y [55]. Therefore, the aromatic ring is essential at the position. Based on these observations, we concluded that the weak CH/π hydrogen bond is responsible for the structural change [55]. It is interesting that both the unusual SSHB and the very weak CH/π hydrogen bond play essential roles for the photosignal transduction. In order to clarify the properties of these peculiar hydrogen bonds, the identiﬁcation of the hydrogen atom position should be most essential. Neutron crystallography is the most promising method to identify the hydrogen atom position [51]. For this purpose, the preparation of a large crystal is an essential step and we succeeded in obtaining a large crystal of PYP [56].

7.5 Summary We developed a promising method for the analysis of high-angle X-ray solution scattering combined with the ﬂuctuation analysis. The method is especially useful for the understanding of the structural change during the functional expression. In order to apply the method, it is essential to record high-angle X-ray scattering data with high accuracy, which became possible by using the third-generation synchrotron radiation and two-dimensional CCD-based detector. We succeeded in analyzing the solution structure of the functional photointermediate of PYP by high-angle scattering. PYP undergoes substantial conformational changes upon light absorption. The changes are propagated from the chromophore to N-terminal tail (residues 24–28), the α4 helix (residues 55–58), and the loop connecting β4 and β5 (residues 96–102). The conformational change at the N-terminal tail is propagated through the hydrogen bond network including both a very SSHB and a very weak CH/π hydrogen bond. The generated structural ensembles based on the dark state structure by ﬂuctuation analysis (the simpliﬁed molecular dynamics simulation) include the ensemble of the intermediate structures, indicating that the conformations at the functional intermediates are involved in an ensemble of the possible conformations of the resting state. Solution NMR analysis of the photointermediate is not necessarily consistent with high-angle X-ray scattering. The origin of the discrepancy should be clariﬁed for a better understanding of the intermediate structure.

146

M. Kataoka and H. Kamikubo

Acknowledgments The authors are grateful to Prof. Y. Imamoto (Kyoto University), Drs. N. Shimizu (SPring-8) and M. Harigai (Kyoto University) for their help throughout the study. This work is partly supported by the Grant-in-Aid of Scientiﬁc Research in a Priority Area, “Chemistry of Biological Processes Created by Water and Biomolecules” to MK (15076208).

References 1. O. Glatter, O. Kratky, Small Angle X-ray Scattering (Academic, New York, 1982) 2. L.A. Feigin, D.I. Svergun, Structure Analysis by Small-Angle X-Ray and Neutron Scatteing (Plenum, New York, 1982) 3. D.I. Svergun, Biophys. J. 76, 2879 (1999) 4. D.I. Svergun, M.V. Petoukhov, M.H.J. Koch, Biophys. J. 80, 2946 (2001) 5. S.S. Funari, G. Rapp, M. Perbandt, K. Dierks, M. Vallazza, C. Betzel, V.A. Erdmann, D.I. Svergun, J. Biol. Chem. 275, 31283 (2000) 6. R. Kato, M. Kataoka, H. Kamikubo, S. Kuramits, J. Mol. Biol. 309, 227 (2001) 7. B.A. Fedorov, J. Mol. Biol. 98, 341 (1975) 8. C.A. Pickover, D.M. Engelman, Biopolymers 21, 817 (1982) 9. T. Ueki, Y. Inoko, M. Kataoka, Y. Amemiya, Y. Hiragi, J. Biochem. 99, 1127 (1986) 10. R. Zhang, P. Thiyagarajan, D.M. Tiede, J. Appl. Crystallogr. 33, 565 (2000) 11. D.M. Tiede, R. Zhang, S. Seifert, Biochemistry 41, 6605 (2002) 12. M. Kataoka, I. Nishii, T. Fujisawa, T. Ueki, F. Tokunaga, Y. Goto, J. Mol. Biol. 249, 215 (1995) 13. M. Kataoka, Y. Goto, Fold. Des. 1, 107 (1996) 14. M. Kataoka, K. Kuwajima, F. Tokunaga, Y. Goto, Protein Sci. 6, 422 (1997) 15. M. Hirai et al., Biochemistry 43, 9036 (2004) 16. H. Kamikubo, N. Shimizu, M. Harigai, Y. Yamazaki, Y. Imamoto, M. Kataoka, Biophys. J. 92, 3633 (2007) 17. T.E. Meyer, Biochim. Biophys. Acta 806, 175 (1985) 18. W.W. Sprenger, W.D. Hoﬀ, J.P. Armitage, K.J. Hellingwerf, J. Bacteriol. 175, 3096 (1993) 19. J.L. Pellequer, K.A. Wager-Smith, S.A. Kay, E.D. Getzoﬀ, Proc. Natl. Acad. Sci. USA 95, 5884 (1998) 20. G.E. Borgstahl, D.R. Williams, E.D. Getzoﬀ, Biochemistry 34, 6278 (1995) 21. Y. Imamoto, Y. Shirahige, F. Tokunaga, T. Kinoshita, K. Yoshihara, M. Kataoka, Biochemistry 40, 8997 (2001) 22. T.E. Meyer, E. Yakali, M.A. Cusanovich, G. Tollin, Biochemistry 26, 418 (1987) 23. W.D. Hoﬀ et al., Biophys. J. 67, 1691 (1994) 24. Y. Imamoto, M. Kataoka, F. Tokunaga, Biochemistry 35, 14047 (1996) 25. L. Ujj et al., Biophys. J. 75, 406 (1998) 26. Y. Imamoto, M. Kataoka, F. Tokunaga, T. Asahi, H. Masuhara, Biochemistry 40, 6047 (2001) 27. U.K. Genick et al., Science 275, 1471 (1997)

7 Structure of the Photointermediate of Photoactive Yellow Protein

147

28. H. Ihee et al., Proc. Natl. Acad. Sci. USA 102, 7145 (2005) 29. M.E. van Brederode et al., Biophys. J. 71, 365 (1996) 30. S. Ohishi, N. Shimizu, K. Mihara, Y. Imamoto, M. Kataoka, Biochemistry 40, 2854 (2001) 31. J.S. Khan, Y. Imamoto, M. Harigai, M. Kataoka, M. Terazima, Biophys. J. 90, 3686 (2006) 32. Y. Hoshihara, Y. Imamoto, M. Kataoka, F. Tokunaga, M. Terazima, Biophys. J. 94, 2187 (2008) 33. G. Rubinstenn et al., Nat. Struct. Biol. 5, 568 (1998) 34. C. Bernard et al., Structure. 13, 953 (2005) 35. B.C. Lee et al., J. Biol. Chem. 276, 20821 (2001) 36. R. Brudler et al., J. Mol. Biol. 363, 148 (2006) 37. R. Brudler, R. Rammelsberg, T.T. Woo, E.D. Getzoﬀ, K. Gerwert, Nat. Struct. Biol. 8, 265 (2001) 38. A. Xie, L. Kelemen, J. Hendriks, B.J. White, K.J. Hellingwerf, W.D. Hoﬀ, Biochemistry 40, 1510 (2001) 39. N. Shimizu, H. Kamikubo, K. Mihara, Y. Imamoto, M. Kataoka, J. Biochem. 132, 257 (2002) 40. M. Harigai, S. Yasuda, Y. Imamoto, K. Yoshihara, F. Tokunaga, M. Kataoka, J. Biochem. 130, 51 (2001) 41. Y. Imamoto, H. Kamikubo, M. Harigai, N. Shimizu, M. Kataoka, Biochemistry 41, 13595 (2002) 42. P.W. Schmidt, J. Appl. Crystallogr. 24, 414 (1991) 43. B.L. de Groot et al., Proteins: Struct. Funct. Genet. 29, 240 (1997) 44. D.I. Svergun, C. Baberato, M.H.J. Koch, J. Appl. Crystallogr. 28, 768 (1995) 45. M. Shiozawa, M. Yoda, N. Kamiya, N. Asakawa, J. Higo, Y. Inoue, M. Sakurai, J. Am. Chem. Soc. 123, 7445 (2001) 46. Y. Imamoto et al. J. Biol. Chem. 272, 12905 (1997) 47. B. Borucki et al., Biochemistry 44, 13650 (2005) 48. B. Borucki, C.P. Joshi, H. Otto, M.A. Cusanovich, M.P. Heyn, Biophys J. 91, 2991 (2006) 49. N. Shimizu, Y. Imamoto, M. Harigai, H. Kamikubo, Y. Yamazaki, M. Kataoka, J. Biol. Chem. 281, 4318 (2006) 50. S. Anderson, S. Crosson, K. Moﬀat, J. Acta Crystallogr. D 60, 1008 (2004) 51. S.Z. Fisher et al., J. Acta Crystallogr. D 63, 1178 (2007) 52. W.W. Cleland, M.M. Kreevoy, Science 264, 1887 (1994) 53. P.A. Frey, S.A. Whitt, J.B. Tobin, Science 264, 1927 (1994) 54. M. Harigai, M. Kataoka, Y. Imamoto, Photochem. Photobiol. 84, 1031 (2008) 55. M. Harigai, M. Kataoka, Y. Imamoto, J. Am. Chem. Soc. 128, 10646 (2006) 56. S. Yamaguchi, H. Kamikubo, N. Shimizu, Y. Yamazaki, Y. Imamoto, M. Kataoka, Photochem. Photobiol. 83, 336 (2007)

8 Time-Resolved Detection of Intermolecular Interaction of Photosensor Proteins M. Terazima

Abstract. A recently developed new method to monitor reaction kinetics of intermolecular interaction is reviewed. This method is based on the measurement of the time-dependent diﬀusion coeﬃcient using the pulsed-laser-induced transient grating technique. Using this method, conformation change, transient association, and transient dissociation on reactions are successfully detected. The principle and some applications to studies on changes in the intermolecular interactions of photosensor proteins (e.g., photoactive yellow protein, phototropins, AppA) in the time domain are described. In particular, unique features of this time-dependent diﬀusion coeﬃcient method are discussed.

8.1 Introduction Inter- and intraprotein (domain–domain) interactions play an important role in many signal transduction processes of sensor proteins. For example, many signaling proteins consist of modulator components that regulate input, output, and also protein–protein communication. They contain characteristic transmitter and receiver domains that transfer information within and between proteins. Signaling pathways are assembled by arranging these domains. Therefore, revealing such interprotein interaction during the signaling pathway should be important for understanding the molecular mechanism of the sensor proteins. Furthermore, since the interprotein interaction is closely related to the oligomerization of the protein, detection of oligomer formation during the signaling process would be essential. In fact, reﬂecting the importance, there are many sensor proteins that exist in the oligomeric form. For example, the oligomerized state is stable for some PAS (PerArntSim) proteins, which are well-known regulators: e.g., a dimer of ARNT PAS-B domain, a dimer of the heme-binding PAS domain E. coli Dos (EcDos), and a decamer of PixD [1–3].

150

M. Terazima

Nevertheless, it is not generally simple to detect the dynamics of the association/dissociation change induced by external stimulations, in particular, in real time. Although optical absorption change in the time domain (ﬂash photolysis method) has been frequently used for studying reaction dynamics of proteins, one should always be careful with the fact that the whole protein size is very large compared to that of the chromophore. Since the absorption spectrum of the chromophore is sensitive to only conformational change close to the chromophore, structural changes far from the chromophore and changes in the interprotein interaction are frequently spectral silent processes. Several techniques that can detect protein binding have been developed. For example, a gel chromatographic technique has been used to monitor the association state. However, it does not have any time resolution [3]. The surface plasmon resonance (SPR) method is another highly sensitive and widely used method [4–8]. The principle of this technique is based on the refractive index change by the protein–protein binding and the refractive index dependence of the wavelength for the surface plasmon excitation. For the detection, a target protein must be ﬁxed on a metal surface and an analyte molecule is introduced on the surface. If protein association occurs, the refractive index near the surface changes and it changes the resonance angle to excite the surface plasmon. The SPR biosensor monitors this change in the resonance angle. However, it usually takes several tens of minutes to accumulate proteins on the surface for the detection and this time response is not fast enough to study protein association of a chemically unstable intermediate species that could play a key role in the signal transduction process. Furthermore, since the target protein should be ﬁxed on a metal surface, any possible interaction with the metal surface could change the protein conformation or the reactivity. Some other spectroscopic techniques such as NMR or IR are also very diﬃcult to apply to monitoring the protein–protein interaction in the time domain for short-lived species. Another physical property that may reﬂect an association state of a molecule is the transport property, such as the rotational relaxation rate or translational diﬀusion coeﬃcient. In particular, the translational diﬀusion coeﬃcient (D) has been shown to be a good physical property reﬂecting the conformational change and the intermolecular interaction. Because of its importance in the ﬁeld of physical chemistry, many techniques, e.g., Taylor dispersion, capillary method, NMR method, and so on, have been developed to monitor molecular diﬀusion in the solution phase [9–13]. However, a diﬃculty in using the diﬀusion process for detecting the transient interprotein interaction is again the slow time response. For example, it takes several hours for measuring D by the Taylor dispersion method. This diﬃculty, the slow time resolution of the traditional diﬀusion measurement, was overcome by using the pulsed-laser-induced transient grating (TG) technique [14–19]. In this chapter, the principle and some applications of photosensor proteins to studies on changes in the intermolecular interactions in the time domain are reviewed. In particular, transient association and dissociation reactions are described.

8 Time-Resolved Detection of Intermolecular Interaction

151

8.2 Principle In the TG method, two pulsed laser beams are crossed at an angle θ within the coherence time so that an interference (grating) pattern is created with a wavenumber q (Fig. 8.1) [14–24]: q = 2π/Λ = 4π sin(θ/2)/λex ,

(8.1)

where Λ is the fringe length and λex is the wavelength of the excitation laser. The wavenumber q can be varied by varying θ. Photosensor proteins are photoexcited by this grating light, and chemical reaction is initiated. When a probe beam is introduced to the interference region, a part of the light is diﬀracted as the TG signal. When the absorption change at the probe wavelength is negligible, the TG intensity (ITG ) is proportional to the square of the refractive index (δn) diﬀerence between the peak null of the grating pattern. ITG = α(δn)2 ,

(8.2)

where α is a constant representing the sensitivity of the experimental system. There are several reasons for the origin of the phase grating [24]. One of the important contributions is the temperature change of the medium induced by the thermal energy released from the decay of excited states and from the TG signal

probe beam

sample

excitation pulses Concentration

L

Fig. 8.1. Schematic illustration of the TG experiment (upper) and the principle of diﬀusion measurement (lower). Lower: The white and black circles indicate the reactant and product molecules. The concentrations of the reactant and the product are spatially modulated by the sinusoidally modulated light intensity of the grating light. The fringe length Λ is also indicated

152

M. Terazima

enthalpy change of the reaction (thermal grating; δnth ). Furthermore, a change in absorption spectrum (population grating) and a change in molecular volume (volume grating) also contribute to the signal. The sum of the population grating and volume grating terms is called the species grating (δnspe ) [24]. The species grating signal intensity is given by the diﬀerence between δn due to the reactant (δnR ) and product (δnP ). Hence, the observed TG signal [ITG (t)] is expressed as 2

ITG (t) = α [δnth (t) + δnspe (t)]

2

= α [δnth (t) + δnP (t) − δnR (t)] .

(8.3)

The “product” in this equation does not necessarily mean the ﬁnal product, but can be any molecule produced from the reactant at the time of observation. The temporal proﬁle of δnth (t) is determined by the convolution integral between the thermal diﬀusion decay and intrinsic temporal evolution of the thermal energy [Q(t)]. (8.4) δnth (t) = (dn/dT ) (W ΔN/ρCp ) (∂Q(t)/∂t) ∗ exp −Dth q 2 t , where ∗ represents the convolution integral, dn/dT is the refractive index change by the temperature variation of the solution, W is the molecular weight (g mol−1 ), ρ is the density (g cm−3 ) of the solvent, ΔN the molar density of the excited molecule (mol cm−3 ), and Dth is the thermal diﬀusivity. The temporal evolution of the species grating component is determined by the chemical reaction and protein diﬀusion processes. When there is no chemical reaction in the detection time window, and the molecular diﬀusion coeﬃcient (D) is time-independent, the temporal proﬁle of the species grating signal can be calculated by the molecular diﬀusion equation. The Fourier component at a wavenumber of q of the concentration proﬁle decays with a rate constant Dq 2 for both the reactant and the product. Hence, the time development of the TG signal can be expressed by [15–19, 23] " #2 ITG (t) = α δnP exp −DP q 2 t − δnR exp −DR q 2 t ,

(8.5)

where DR and DP are diﬀusion coeﬃcients of the reactant and the product, respectively. Furthermore, δnR (> 0) and δnP (> 0) are, respectively, the initial refractive index changes due to changes in reactant and product concentrations during the reaction. When a chemical reaction including a conformation change of a protein takes place during a time range of the signal detection, the apparent D of the protein changes. The observed TG signal should be calculated from the diﬀusion equation with a concentration-dependent term. Describing the reaction by the following model, Scheme1

hν

k

R −→ I −→ P,

8 Time-Resolved Detection of Intermolecular Interaction

153

where R, I, P, and k represent, respectively, a reactant, an intermediate species, a ﬁnal product, and the rate constant of the change, one may ﬁnd the time dependence of the refractive index as [23, 25] ! # " δnP k ITG = α δnR exp −DR q 2 t + δnI + exp − DI q 2 + k t 2 (Dp − DR ) q − k ! 2 δnP k 2 − , exp −DP q t (DP − DR ) q 2 − k (8.6) where δnI and DI are the refractive index change due to the formation of the intermediate species and the diﬀusion coeﬃcient of the intermediate species, respectively. Here, it should be noted that δnP (t) describes the species grating signal of the product as well as the intermediate. When proteins are dimerzied during the diﬀusion process, the apparent D is also time dependent. For analyzing the observed TG signal, we may use the following model. Scheme 2

A −→ A∗ + A −→ (A∗ : A), hν

k

where A∗ indicates an intermediate created by the photoexcitation and the dimer is formed between this intermediate (A∗ ) and the ground-state protein (A) with a rate constant k. Under the condition that the concentration of A is suﬃciently large so that it can be treated as a constant, we may ﬁnd the time dependence of the TG signal as ITG = α δnR exp −DR q 2 t ! # " δnP k[A] + δnI + exp − DI q 2 + k[A] t (8.7) (Dp − DR ) q 2 − k[A] ! 2 δnP k[A] 2 − q t . exp −D P (DP − DR ) q 2 − k[A] The time range over which one can observe the protein diﬀusion depends on the grating wavenumber q. For instance, the typical D of a globular protein with a size of myoglobin (18 kDa) is 10−10 m2 s−1 [14]. Hence, if one uses q 2 = 1014 m−2 , the signal disappears with a rate constant of Dq 2 = 10 4 s−1 : i.e., D of 100 μs after the photoexcitation can be detected. If one uses q 2 = 1010 m−2 , the signal disappears with a rate constant of 1 s−1 , and D within a time window of 1 s can be detected. These are the typical time ranges we can use for detecting the protein diﬀusion dynamics.

154

M. Terazima

8.3 Diﬀusion Coeﬃcient The diﬀusion coeﬃcient is a physical property that represents the speed of molecular diﬀusion. Recently it was shown that D changes during chemical reactions. Here, we describe the origin of the change in D. Intuitively, it may be easily understood that D decreases when molecular size increases because of the association reaction. In some cases, the relationship between D and the molecular size is well described by the Stokes–Einstein equation. The Stokes– Einstein equation is expressed by [9–13] D=

kB T , aηr

(8.8)

where kB , T , η, a, and r are the Boltzmann constant, temperature, viscosity, a constant representing the boundary condition between the diﬀusing molecule and the solvent, and radius of the molecule, respectively. Hence, D decreases with increasing r. When molecular size decreases as a result of the dissociation reaction, D is expected to increase. Although the molecular size is certainly an important factor that determines D, D also depends on the intermolecular interaction between the molecule and the solvent. A clear example has been reported for chemical reaction of aromatic molecules [15–19]. It was found that the D values of organic radicals are much smaller than those of electronically closed shell molecules with similar sizes and shapes. This change was attributed to the enhanced intermolecular interaction between the radicals and solvent molecules. It was further reported that D of cytochrome c in its native form is much larger than that in the unfolded state [25–27]. This diﬀerence was attributed to the larger intermolecular interaction between the protein and water due to the unfolded conformation of the α-helices. D has been sometimes expressed in terms of the hydrodynamic radius. However, we consider that “hydrodynamic radius” is not the proper term to describe the change of D, because this is not a well-deﬁned radius such as the “radius of gyration,” which is clearly deﬁned to show the molecular size. The hydrodynamic radius has just the same meaning as D as long as the Stoke–Einstein relation holds good.

8.4 Time-Resolved Detection of Interprotein Interactions Below, we describe some examples demonstrating that the time-resolved measurement of D is a suitable way for detecting the change in the interprotein interaction.

8 Time-Resolved Detection of Intermolecular Interaction

155

8.4.1 Protein–Protein Interaction of the Photoexcited Photoactive Yellow Protein Change in D of a reaction intermediate protein during a chemical reaction was ﬁrst reported for the photochemical reaction of Photoactive Yellow Protein (PYP) [28–32]. PYP is a 14-kDa photoreceptor protein functioning in negative phototaxis of the purple sulfur bacterium Ectothiorhodospira halophila [33]. For detecting light, it possesses a chromophore of p-hydroxycinnamyl bound via a thioester bond to Cys69 [34,35]. Upon photoexcitation of PYP, the chromophore is photoisomerized from the trans form to the cis form to initiate the photocyclic reaction [36–38]. The reaction dynamics of PYP has been extensively studied by various methods [39–44]. The ground state species (pG) is initially converted to the ﬁrst intermediates (pR1 and pR2 ), and then transformed to the second species (pB and pB). This pB species returns to pG with lifetimes of 150 ms to 2 s (Fig. 8.2). One of the intermediates should interact with proteins in the bacterium to transfer the light information. This information transfer should stimulate the biological response. However, an interacting protein (or molecule) with the intermediate species of PYP is not known. One of the reasons is the lack of experimental techniques to monitor the protein association reaction with a time resolution better than 1 s (the lifetime of unstable transient species). The TG method was used to monitor the intermolecular interaction with the transient intermediate species pB. For demonstrating the time-resolved detection of intermolecular interaction, D of pB was measured with various molecules extracted from the bacterium [45]. Before describing the eﬀect of intermolecular interaction on D, the TG signal of PYP in the buﬀer solution without any additive is described to show the principle and diﬀerence [28, 29]. The TG signal upon photoexcitation of PYP rose quickly and then showed a weak, slow-rising component corresponding to pR1 → pR2 [30]. After this, the signal decayed to a certain intensity with a time constant Dth q 2 and showed the growth–decay curves twice (Fig. 8.3a). The decay component with Dth q 2 should be the thermal pG

hν

pG* pB [short-lived Intermediates]

pB'

pR1 pR2

Fig. 8.2. A proposed photochemical reaction scheme of PYP

156

M. Terazima

pB diffusion

ITG /a.u.

pG diffusion thermal grating pR2

pB'

pB

b a 10−5

10−4

10−3 t/s

10−2

10−1

Fig. 8.3. Typical TG signals (circles) after photoexcitation of PYP (a) in a buﬀer solution and (b) in the buﬀer with an eluted fraction from the bacterium. The rise–decay components in a few milliseconds to a few hundred milliseconds range represent the protein diﬀusion signal. The enhancement of the diﬀusion peak of (b) is due to the intermolecular interaction between the pB species and DNA of the bacterium. The best-ﬁtted curves by (8.9) are shown by the solid lines. The assignments of the signal components are shown

grating component. The rising component represents the chemical reaction from pR2 to pB. It is the latest growth–decay curve that was attributed to the protein diﬀusion processes of pG and pB. The presence of the rise–decay curve implies that there are two diﬀusing species having diﬀerent signs of the amplitude (δn). The assignment of the diﬀusing species was made from the sign of the refractive index change. It was found that the rate constant of the rising component represents DpG q 2 (DpG : diﬀusion coeﬃcient of the pG species of PYP) and that of the decaying component corresponds to DpB q 2 (DpB : diﬀusion coeﬃcient of the pB species) (DpG > DpB ). This rise–decay component is a clear indication that D of PYP changed by the photoexcitation. The TG signal in Fig. 8.3a was expressed by [28, 30] " ITG (t) = α δnth exp −Dth q 2 t + δn1 exp (−t/τ1 ) + δn2 exp (−t/τ2 ) (8.9) #2 −δnpG exp −DpG q 2 t + δnpB exp −DpB q 2 t , where the lifetimes τ1 and τ2 represent the pR2 → pB kinetics. The peak in the latest time region (diﬀusion peak) appeared because DpG and DpB are diﬀerent. If the diﬀerence between DpG and DpB becomes smaller, two terms of δnpG exp(−DpG q 2 t) and δnpB exp(−DpB q 2 t) are cancelled and the signal intensity becomes weaker, because the signs of δnpG and δnpB are opposite. Hence the maximum amplitude of this peak is an indicator of the diﬀerence in D between the reactant and the product. For the quantitative measurement of D, the rate constants of the rise and decay component were determined

8 Time-Resolved Detection of Intermolecular Interaction

157

by curve-ﬁtting, and plotted against q 2 , and from the slopes of the plots DpG and DpB were determined to be DpG = 1.3 × 10−10 m2 s−1 and DpB = 1.2 × 10−10 m2 s−1 . The observed reduction in D by the chemical reaction was rather surprising. If the Stokes–Einstein relation is applicable, the ratio of DpB /DpG = 0.92 means a volume expansion of 1.27 times. Since the partial molar volume of PYP is estimated to be ca. 1,000 cm3 mol−1 , the volume increase is 270 cm3 mol−1 , which is unrealistically large. This reduction in D of the intermediate species was attributed to the enhanced intermolecular interaction between PYP and water molecules due to the unfolding the N-terminal α-helices (diﬀusion sensitive conformation change) [31]. Next, in order to detect protein–protein interaction of the transient species pB and molecules in the bacteria, the extracted solution from the bacteria was separated into 20 fragments by chromatography and were added to the PYP solution [45]. The TG signal of the PYP solution with the ﬁrst eluted solution is shown in Fig. 8.3b. Most part of the signal was the same as that without the protein solution. However, by the addition of the protein solution the amplitude of the diﬀusion peak was dramatically enhanced. Since this amplitude reﬂected the diﬀerence in D between the reactant and the product, the larger peak amplitude should result from a larger reduction in DpB by adding the protein solution from the bacteria. From the signal, DpG = 1.3 × 10−10 m2 s−1 and DpB = 1.10 × 10−10 m2 s−1 were determined. The decrease of DpB indicated that the pB species of PYP interacted with molecules in the solution. A similar enhancement was observed by adding any fraction from the extracted solution. We investigated the target molecules in the solution and found that DNA of the bacterium was bound to the pB species in this case. 8.4.2 Photoinduced Dimerization of AppA Transient Diﬀusion Change Another example of time-resolved detection of transient protein–protein interaction for a photosensor protein was reported for photochemical reaction of AppA [46]. AppA is a light- and redox-responding regulator of photosynthesis gene transcription in Rb. sphaeroides, where it can be found in two diﬀerent functional forms [47–53]. Under anaerobic, low-light growth conditions, AppA is in a “dark-adapted” form which is able to bind and inactivate the repressor PpsR, thereby allowing the RNA polymerase to maximally transcribe photosynthesis genes. Under aerobic highlight conditions or under strong blue light illumination, FAD in AppA is photoexcited and AppA is transformed into a signaling state (“light-adapted” form), which is incapable of interacting with the photosynthesis repressor PpsR. Under these conditions, there is a maximal repression of the photosynthesis gene expression [47]. The isolated N-terminal BLUF domain exhibits a photocycle identical to that observed with full-length AppA [48]. Photoexcitation of AppA involving a singlet excited state in the ﬂavin chromophore leads to the formation of

158

M. Terazima AppABLUF

hν excited state

AppABLUF*

AppABLUF

AppABLUF*-AppABLUF

Fig. 8.4. Photochemical reaction scheme of AppA. If the reaction is monitored by the ﬂash photolysis method, the spectrally red-shifted product AppABLUF * directly returns to the ground state (broken line). However, using the diﬀusion detection method, the dimerization reaction takes place after the AppABLUF * formation

a red-shifted intermediate state (or signaling state) after 10 ns, which slowly decays to the ground state with a lifetime of 30 min (Fig. 8.4) [49]. The red shift was attributed to altered π–π stacking interactions between the isoalloxazine ring and a conserved tyrosine residue. The dark-state X-ray structure of the A resolution [52], BLUF domain of AppA (AppABLUF ) was determined at 2.3 ˚ and it indicated that AppABLUF forms the dimer in the crystal through the hydrophobic interactions of a β-sheet of two monomers. The ground state of AppABLUF exists as a dimer even in a very dilute solution [53]. Reaction dynamics of AppABLUF was monitored by the transient absorption technique. A detailed study showed that the absorption change indicated only the decay of the excited triplet state in a microsecond time range, and there was no other slow dynamics that may be expected for creating the signaling state. The observed TG signal of AppABLUF after the photoexcitation is depicted in Fig. 8.5. Initially, a weak, slow-rising component appeared with a time constant of ∼3.4 μ s [46]. After measuring the TG signal at diﬀerent q 2 it was concluded that the rising part of the TG signal represented a reaction phase of the protein, not the diﬀusion, e.g., the decay rate of the triplet state of the chromophore, ﬂavin adenine dinucleotide (FAD). After this rising component, the TG signal decayed to zero with a time constant of Dth q 2 . This was the thermal grating component created by the thermal energy due to the nonradiative transition from the excited state of FAD. After the thermal grating signal, the signal rose again and ﬁnally it decayed to the baseline. This rise–decay component depended on q 2 (Fig. 8.6a) and this q 2 dependence is a clear indication that these components represent the diﬀusion processes. On the basis of considerations similar to the previous PYP case, it was concluded that this rise–decay feature of the diﬀusion signal

8 Time-Resolved Detection of Intermolecular Interaction

159

40

triplet state decay 30

ITG / a.u.

thermal diffusion 20

reactant diffusion

product diffusion

10

0 10−5

10−4

10−3

10−2

10−1

t/s Fig. 8.5. A typical TG signal after photoexcitation of AppABLUF . The assignments of the signal components are shown

indicated diﬀerent D values between the reactant and the product, and the product diﬀuses more slowly than the reactant. A prominent feature of this signal was that not only the rate but also the temporal proﬁle of the signal depended on q 2 . If D values of the reactant and the product were constants in time, and the product was created promptly, the time dependence should be expressed by a combination of terms of exp(−Dq 2 t) (e.g., (8.5)). In this case, if the signal measured at various q 2 values was plotted against q 2 t, the shape of the signals should be identical. However, the signals were totally diﬀerent depending on the q 2 value (Fig. 8.6b). This behavior was explained by the time-dependent diﬀusion. For determining the rate constant, the observed TG signal was analyzed on the basis of the theoretical equation presented in the Principle section (8.6). In order to reduce the ambiguity of the ﬁtting, some parameters were independently determined before the ﬁtting. The method was the following: DR and DP were determined from the signal in a long time region without using (8.6). It should be mentioned that after the reaction (conformational change or association/dissociation) completes, D should be time-independent. Therefore, the temporal proﬁle of the TG signal after this time should be expressed by a bi-exponential function (8.5), and, from the rate constants, DR and DP were determined to be 8.8 × 10−11 and 7.2 × 10−11 m2 s−1 , respectively. Therefore, the product diﬀuses 1.22 times more slowly than the reactant. The determined DR is smaller than that of other proteins having a similar size; e.g., the value for myoglobin (18 kDa) measured by the TG method is 10 × 10−11 m2 s−1 [14]. The molecular weight of the BLUF domain of AppA is ∼15.5 kDa. This diﬀerence in D reﬂects the dimeric form of

160

M. Terazima

a 1.2

q

ITG / a.u.

1.0 0.8 0.6 0.4 0.2 0.0 0.001

0.01

0.1

t/s

b 1.2 1.0

ITG / a.u.

0.8 0.6 0.4 0.2 0.0 0.1

1

10

q2t/1010m−2s Fig. 8.6. (a) Grating wavenumber dependence of the TG signals after photoexcitation of AppABLUF (0.95 mM). The signal intensity is normalized at the peak. The arrow indicates the increase of q, and the q 2 values are 4.5 × 1012 , 5.6 × 1011 , and 1.3 × 1011 m−2 . (b) TG signals of the BLUF domain of AppA (0.95 mM) at various q 2 plotted against q 2 t

AppABLUF in solution [53]. Indeed, D of a protein having a molecular weight of ∼30 kDa (about the same size as the dimer of AppABLUF ) was reported to be 8.7 × 10−11 m2 s−1 (green ﬂuorescent) [54]. The similar value of D to DR of AppABLUF ensures the dimeric form of AppABLUF . Using these D values, the signals at various q 2 were ﬁtted by (8.6) well and the rate constant of the D change was determined to be k −1 = 4.5 ms at 0.95 mM.

8 Time-Resolved Detection of Intermolecular Interaction

161

Origin of Diﬀusion Change Why did D of the product decrease? The origin of the D change was investigated using the kinetics. There are mainly two possible origins of the observed D change as described in Sect. 8.3. One possible explanation is the conformational change of the protein, which leads to an increase in the interaction between the solvent and the protein. As demonstrated by the PYP reaction, D of the intermediate could be smaller than that of the ground-state species. Another possible explanation for the large reduction in D is the dimerization of the BLUF domain after the photoreaction. (Since AppABLUF already exists as a dimer in the ground state even in a very dilute solution [52, 53], the formation of the dimer in this case means the tetramer formation in the signaling state. However, we call this process “dimerization” because this process is a bi-molecular reaction.) To examine these possibilities, the TG signals at various AppABLUF concentrations were examined. If the dimerization were the main cause of the difference in D, this reaction rate should be slower at a lower concentration. On the other hand, if a conformational change was responsible for the reduction in D, the temporal proﬁle of the TG signal should not depend on concentration, besides the absolute intensity. Under a low q 2 condition (q 2 = 3.9×1010 m−2 ), the temporal proﬁle of the diﬀusion signal was relatively similar at any concentration. At this low q 2 , the diﬀusion peak was reproduced by a bi-exponential function with DR = 8.8 × 10−11 m2 s−1 and DP = 7.2 × 10−11 m2 s−1 after 80 ms at any concentration. Therefore the ﬁnal product should be the same at all concentrations after a suﬃciently long time. On the other hand, in a fast timescale, the temporal proﬁle of the TG signals changed very drastically with the concentration. The signal became an approximately single exponential decay as the concentration decreased (Fig. 8.7). Considering that the diﬀusion peak arises as a result of the diﬀerence between DR and DP , one may understand that the nearly single exponential behavior indicates a small change in D in this time range. As DR and the ﬁnal DP are always constant as shown above, the small change in D should be interpreted in terms of a slower rate of change in DP with decreasing concentration. This single exponential behavior provided us with another important information; i.e., D of the initially created product was similar to DR [DI = DR in (8.6)]. This concentration dependence of the TG proﬁle and the 1.22 times decrease in D (i.e., about two times increase in molecular volume) in the product state support the dimerization mechanism in the excited state of this protein. For producing the dimer, there may be two possible reaction schemes: The phototransformed AppABLUF (AppABLUF *) is associated with the ground state AppABLUF to yield a dimer (Scheme 3), or two AppABLUF * form the dimer (Scheme 4). Scheme 3 Scheme 4

AppABLUF ∗ +AppABLUF → (AppABLUF ∗ −AppABLUF ), AppABLUF ∗ +AppABLUF ∗ → (AppABLUF ∗)2 .

162

M. Terazima 2.0

ITG /a.u.

1.5

1.0

0.5

0.0 0.000

0.002

0.004

0.006 t/s

0.008

0.010

0.012

Fig. 8.7. Concentration dependence of the TG signals of AppABLUF at q 2 = 1.3 × 1012 m−2 . The arrow indicates the increase of the concentration: 0.95, 0.48, 0.31, and 0.17 mM (from upper to lower curves). The gray lines are the best-ﬁtted curve by (8.7)

These possibilities were distinguished by measuring the laser power dependence of the rate constant. If the concentration of AppABLUF is high enough, compared to that of AppABLUF *, the reaction of Scheme 3 can be represented by the pseudo-ﬁrst-order reaction and the rate constant of this reaction should be essentially independent of the laser power. On the other hand, the reaction of Scheme 4 should be the second-order reaction on the phototransformed AppA so that the rate depends on the laser power; that is, the proﬁle should be changed by changing the laser power. From the laser power dependence, it was concluded that the photoexcited AppABLUF (AppA∗BLUF ) is associated with the ground-state AppABLUF to yield the dimer. Kinetics of Dimer Formation The dimer formation rate k was determined by ﬁtting the TG signal at various concentrations using (8.7). The rate constant k decreased as the concentration decreased. From the slope of the plot of k vs. concentration and the relation k = ki [AppA], we determined the second-order rate constant ki to be ∼2.5 × 105 M−1 s−1 . Interestingly, this value is much smaller than that of a diﬀusion-controlled reaction (∼109 M−1 s−1 ) calculated by the Smolochowski– Einstein equation for a bimolecular reaction in solution [55]. This diﬀerence indicated that the collision between two protein molecules is not the sole criterion for the aggregation process; i.e., their relative orientations dictate additional constraints, which slow down the rate of the reaction by 4 orders of magnitude.

8 Time-Resolved Detection of Intermolecular Interaction

163

This photoinduced dimer ﬁnally dissociates to the original species, because the TG signal is reproducible when the repetition rate of the excitation is low enough. This leads to the conclusion that there is no covalent bond formation in the aggregated state. This was the ﬁrst report showing the dimerization rate of photosensor proteins in the short-lived signaling process. Later, the origin of the photoinduced association was attributed to the exposure of the hydrophobic surface by the initial reaction [56]. 8.4.3 Photoinduced Dimerization and Dissociation of Phototropins Dimerization Reaction In the previous sections, protein–protein association reactions were described. However, not only the association but also the dissociation reaction was reported for a photosensor protein; phototropins are unique system because association and dissociation reactions upon photoexcitation are observed simultaneously [57, 58]. Phototropins (phot1 and phot2) are blue light receptors in higher plants for regulating phototropism, chloroplast relocations, and stomatal opening [59]. All these are major regulation mechanisms of the photosynthetic activities. Both proteins, phot1 and phot2, are homologous ﬂavoproteins and contain two LOV (light–oxygen–voltage sensing) domains (LOV1 and LOV2), a typical serine/threonine kinase at the C-terminus, and one linker region connecting the LOV2 and the kinase domains acting as light-regulated protein kinase [60]. Both LOV domains bind a ﬂavin mononucleotide (FMN) as chromophore [61]. The mechanism and the kinetics of the reaction have been attracting much attention recently [62–69]. The reaction kinetics has mainly been studied by monitoring the absorption change of the chromophore [63–66]. Upon blue light illumination, the ground state LOV2 possessing the absorption maximum at 447 nm (D447 ) is converted to a species with a broad absorption spectrum (L660 ) [67]. This change is attributed to the creation of the excited triplet state through the intersystem crossing from the photoexcited singlet state. This broad spectrum changes to a blue-shifted absorption spectrum peaked at 390 nm (S390 ) with a lifetime of 4 μs (for phot1LOV2 of Avena) [67]. This species was assigned to the FMN–cysteinyl adduct, in which the sulfur covalently binds to the C(4a) carbon of the isoalloxazine ring of FMN. This adduct is stable for tens of seconds before returning to the ground state (Fig. 8.8) [68]. The assignment of this product has been conﬁrmed by NMR and X-ray crystallography [69, 70]. It is believed that this state is the signaling state. Therefore, as long as the reaction kinetics is monitored by UV–vis spectroscopy, the signaling state is formed with a lifetime of a few microseconds and no signiﬁcant change has been reported after this process. Photochemical reactions of phot1 and phot2 were studied by the TG method and a signiﬁcant change in the association state was observed mainly for phot1LOV2.

164

M. Terazima D447

hn L660

S390

S390+D447

S390−D447

D447

Fig. 8.8. Photochemical reaction scheme of the phototropin LOV domain (phot1LOV2). If the reaction is monitored by the ﬂash photolysis method, the S390 intermediate directly returns to the ground state (broken line). However, using the diﬀusion detection method, the dimerization reaction takes place after the formation

10 8

product diffusion

ITG/a.u.

adduct formation

6

thermal diffusion reactant diffusion

4 2 0 10−6

10−5

10−4

10−3 t/s

10−2

10−1

100

Fig. 8.9. A TG signal (broken line) of phot1LOV2 at 50 μM and q 2 = 3.4×1010 m−2 . The best-ﬁtted curve to the observed TG signal based on the two state model (8.6) is shown by the solid line. The assignments of the signal components are shown

A typical TG signal of a phot1LOV2 domain observed at 50 μM and at q 2 = 3.4 × 1010 m−2 is shown in Fig. 8.9. The signal consisted of a rapid decay in microseconds, following rise and decay, and a peak in a time region of longer than milliseconds. The TG signal in the whole time range was expressed by [57, 58, 71, 72] " #2 ITG (t) = α δn1 exp (−k1 t) + δn2 exp −Dth q 2 t + δnspe (t) ,

(8.10)

8 Time-Resolved Detection of Intermolecular Interaction

165

where k1 > k2 . The faster decay time constant k1 was determined to be 1.9 μs. This value did not depend on q 2 . On the basis of the comparison with rate constants reported before, the 1.9 μs dynamics was attributed to the conversion process from D447 to S390 . The second term represented the thermal grating term. The third term δnspe (t) represented the species grating signal appearing in the longer time region, and this δnspe (t) signal reﬂected the chemical reaction kinetics as well as the molecular diﬀusion process. The temporal proﬁle of this part depended on the q 2 value and the concentration in complex ways. At a low concentration ([LOV] = 50 μM), the signal after the thermal grating decayed to the base line monotonously in the high-q 2 range (q 2 > 5 × 1012 m−2 ) (Fig. 8.10). This decay was expressed by a single exponential function: δnspe (t) = δn3 exp (−k3 t) .

(8.11)

Since this rate constant depended on the q 2 value (e.g., Fig. 8.10), this component was certainly originated by the molecular diﬀusion process. If a product was formed by the photoexcitation, the molecular diﬀusion of the reactant and the product should be observed. This single exponential decay at a high q 2 indicated that D’s of the reactant (D447 ) and the product (S390 ) were the same (DR = DP ); i.e., D did not change upon the reaction in this observation time range. From the rate constant of the exponential ﬁtting and q 2 value, D(= DR = DP ) was calculated to be 9.8 ×10−11 m2 s−1 . Since D is one of the 6 5

ITG / a.u.

4 3 2

q 1 0 10−4

10−3

10−2 t/s

10−1

100

Fig. 8.10. Grating wavenumber (q) dependence of the TG signals (broken lines) of a 50 μM phot1LOV2 solution. The arrow indicates the increase of q. The q 2 values are 4.5 × 1010 , 7.3 × 1010 , 3.4 × 1011 , 6.3 × 1011 , and 5.3 × 1012 m−2 in the order of the amplitude. The signals representing the molecular diﬀusion processes are shown, and these signals are normalized at the initial part of the diﬀusion signal

166

M. Terazima

quantities that represent the global molecular structure of proteins, this fact of DR = DP suggested that phot1LOV2 does not change the conformation signiﬁcantly upon photoreaction within approximately 1 ms time range. The temporal proﬁle changed at a relatively low q 2 condition (Fig. 8.10); a growth–decay signal (diﬀusion peak) appeared. Similar to the results described in the previous sections, the rise and decay components of the TG signal were attributed to the molecular diﬀusion processes of the reactant [ground state protein; (D447 )] and the photoproduct, respectively; i.e., the faster rate of the rising component than the rate of decay indicated that the product diﬀuses more slowly than the reactant (DR > DP ) in this time range. The drastic change of the proﬁle depending on q 2 was rationalized by the time dependence of D. The temporal proﬁle of the TG signal was analyzed using (8.6). For analyzing the signal, some of the parameters were independently determined. For example, DR was ﬁxed at 9.8 × 10−11 m2 s−1 , which was obtained from the high q 2 signal (Fig. 8.10). The determined DR of phot1LOV2 is a typical value for a protein of this size. This fact suggested that phot1LOV2 existed in a monomeric form in the solution at this concentration. Secondly, as noted above, the ﬁnal DP was determined to be DP = 8.0 × 10−11 m2 s−1 from the signal in a long time range. By using these parameters, the observed TG signal was reproduced very well at various q 2 values using a single reaction rate k. The time constant of the change determined from the ﬁtting is 40 ms at 50 μM. The photoreaction process with the lifetime of 1.9 μs accompanying the adduct formation (S390 ) should be a trigger for this diﬀusion change. Possible explanations for the reduction of D were a dimerization reaction of the monomeric phot1LOV2 or the conformation change upon the photoreaction. The origin of the change of D was investigated by the concentration dependence. In a lower q 2 range than 7.0 × 1010 m−2 ; i.e., in a relatively long time region for the diﬀusion signal, the temporal proﬁle was rather insensitive to the concentration, and they were reproduced well by a bi-exponential function with DP = 9.8 × 10−11 m2 s−1 and DP = 8.0 × 10−11 m2 s−1 after 200 ms. Therefore, the product with the ﬁnal DP was independent of the concentration at least after 200 ms. On the other hand, in a middle q 2 range (q 2 = 6.3 × 1011 m−2 ), the temporal proﬁles depended on the concentration signiﬁcantly. In particular, the relative intensity of the diﬀusion peak with respect to the thermal grating intensity decreased with decreasing the concentration (Fig. 8.11). Considering that the diﬀusion peak appeared as a result of the diﬀerence between DP and DR , one may ﬁnd that the change in DP is smaller in this time range for a dilute sample. This change should be due to the slower rate of the DP change with decreasing concentration. This concentration dependence of the rate indicated that more than one molecule is involved in the D change process. The 1.8 times increase in the molecular volume suggested that dimerization is a cause of the D change. From the laser power dependence, the reaction scheme was written as LOV −→ LOV∗ −→ (LOV∗ − LOV), hν

k

8 Time-Resolved Detection of Intermolecular Interaction

167

80

concentration

ITG /a.u.

60

40

20

0

0.00

0.02

0.04

0.06

0.08

0.10

0.12

t/s

Fig. 8.11. Concentration dependence of the TG signal (broken lines) measured at q 2 = 6.3 × 1011 m−2 with the concentrations of 40, 60, 70, 80, 120, and 190 μM in the order of the concentration increase shown by the arrow. The signals are normalized at the initial part of the diﬀusion signal. The smooth solid lines are the best ﬁtted curves

where k is a bimolecular reaction rate and may be written as k2 [LOV], where k2 is the intrinsic bimolecular reaction rate constant, and [LOV] is the concentration of phot1LOV2. This scheme is identical to Scheme 2. The very good ﬁt of the observed signal by (8.7) implies that the above Scheme 2 is appropriate to describe the dimerization process. From the slop of the plot of k against [LOV], k2 is determined to be 6.6×105 M−1 s−1 . This value is much smaller than that of the diﬀusion-limited reaction rate calculated from DR and the reaction distance [55]. This small k2 suggests that the dimerization reaction occurs only at a speciﬁc relative orientation of two phot1LOV2 monomers. The light-induced dimer should eventually dissociate to return to the monomers, because no permanent change was observed. It may be reasonable to assume that the dimer dissociates when the photoadduct state of LOV2 goes back to the ground state. We should emphasize that this TG technique for the D measurement in the time domain has been the only one technique that can detect such transient dimer formation. Photodissociation Reaction In the previous section, the protein association reaction upon photoexcitation was described; DP was smaller than DR . However, at a higher concentration, the opposite change was observed. Figure 8.12 depicts the concentration dependence of the signal in the concentration range 40–250 μM. When the concentration was low enough, the species grating signal decayed single exponentially. This feature indicated that the molecular diﬀusion process was

168

M. Terazima product

4

concentration

ITG /a.u.

3

reactant

2 1 0 0

1

2

t/ms

3

4

5

Fig. 8.12. Concentration dependence of the TG signals (broken lines) with the concentrations of 56, 110, 180, 200, and 300 μM (in the order of the arrow) measured at q 2 = 7.9 × 1012 m−2 . The signals are normalized at the initial part of the diﬀusion signal. The best-ﬁtted curves to the observed TG signals by the two state model (8.7) are shown by the solid lines

faster than the dimerization reaction on this timescale. When the concentration was increased, the signal showed the growth–decay feature (Fig. 8.12). The signs of δn of the rise and the decay components were, respectively, positive and negative, which was opposite to what we observed for the dilute sample. Therefore, the rising component was attributed to the diﬀusion of a product and decay to that of the reactant. Apparently, from the rates of the rise and decay components one may easily ﬁnd that the product diﬀusion is faster than that of the reactant at high concentrations (DR < DP ). The temporal proﬁle was again ﬁtted by (8.6). It was found that the signal was reproduced almost perfectly with D of the reactant at the low concentration (DR = 8.0 × 10−11 m2 s−1 ), DI = DR , D of the product (DP = 9.8 × 10−11 m2 s−1 ), and k −1 = 300 μs. One should note that, from the results of the previous section, D of the dimer and the LOV monomer are 8.0 × 10−11 and 9.8 × 10−11 m2 s−1 , respectively. Therefore, at these concentrated solutions, the reactant existed in a dimeric form and the product is a monomer. The observed TG signal indicated that the dimer was dissociated to yield the monomer with a time constant of 300 μs upon the photoexcitation. The reaction detected by this method is summarized in Fig. 8.13. 8.4.4 Diﬀusion Detection of Interprotein Interaction In the previous sections, several examples were reviewed to show that the diffusion change is sensitive enough for the detection of the protein–protein interaction. This method could be called the diﬀusion detected biosensor method. Characteristic features of this method are discussed in the following.

8 Time-Resolved Detection of Intermolecular Interaction

a

169

LOV2 k2[LOV]

hν

LOV2 LOV2

1.9 μs

LOV2 LOV2

30 s

LOV2

b LOV2 LOV2

hν 1.9 μs

LOV2 LOV2

LOV2

LOV2 300 μs

30 s

LOV2

LOV2

Fig. 8.13. Schematic showing the photoreaction process of phot1LOV2 detected by TG: (a) light-induced association of two monomers and (b) light-induced dissociation of a dimeric form

First, the most prominent character of this method is the fast time response. The time response of this TG method is fast enough to detect transient protein association or protein dissociation reactions. This technique can be used for the measurement of the binding rate constant in real time. It should be noted that our TG technique monitors sensitively the refractive index change caused only by the creation of the photoexcited state, whereas gel chromatography monitors all proteins in the solution. It might be diﬃcult to detect the dimer contribution among the whole proteins by the conventionally used gel chromatography unless the population of the dimer is dominant. Moreover, while covalently linked or stable noncovalently linked protein aggregates may be detected by size exclusion liquid chromatography, a noncovalent protein aggregate that is formed by a weak hydrophobic or hydrogen-bond interaction may not be detectable because of a possible dissociation during the elution through the column. Second, not only protein–protein interaction but any intermolecular interaction that changes D can be detected. Protein association changes the radius of the diﬀusing species, which leads the changes in D. However, D is determined not only by these factors, but also by the conformation of the protein or the intermolecular interaction. This is a characteristic compared to the SPR method, in which a refractive index change by the association is necessary. Since the small molecular binding to a protein may not change the refractive index, this process should be silent for the SPR method. Third, compared with the SPR method, it is a big advantage that the target protein need not be ﬁxed on a metal surface. The intermolecular interaction can be detected in the solution phase. Hence this method can be used conveniently without pretreatment of the sample, such as ﬁxing on a metal

170

M. Terazima

surface. For example, the protein activity can be checked during the protein separation if this system is combined with a column chromatograph. Since we can avoid the protein contact with a metal surface, any possible denaturation or inactivation by the surface can be avoided. Fourth, since the diﬀusion coeﬃcient is not sensitive to the temperature ﬂuctuation during the measurement, a precise temperature control is not required. This merit may be very useful compared with the SPR technique, which is very sensitive to the temperature so that the sample temperature should be kept constant precisely during the measurement. Fifth, solvent properties do not aﬀect the measurement by this system at all. Hence, solvent can be changed without any limitation. This is also an advantage over the SPR method, in which the refractive index of the solvent is an important property for the experiment. We believe that these prominent characteristics of the diﬀusion detected biosensor are important for studying intermolecular interaction of sensor proteins in the time domain and will be used for many cases to reveal their essential features. Acknowledgments The author is deeply indebted to the coauthors of the papers cited in this article.

References 1. B. Card, P.J.A. Erbel, K.H. Gardner, J. Mol. Biol. 353, 664 (2005) 2. H.J. Park, C. Suquet, J.D. Satterlee, C. Kang, Biochemistry 43, 2738 (2004) 3. K. Okajima, S. Yoshihara, Y. Fukushima, X. Geng, M. Katayama, S. Higashi, M. Watanabe, S. Sato, S. Tabata, Y. Shibata, S. Itoh, M. Ikeuchi, J. Biochem. 137, 741 (2005) 4. D.A. Schultz, Curr. Opin. Biotechnol. 14, 13 (2003) 5. J.M. McDonnell, Curr. Opin. Chem. Biol. 5, 572 (2001) 6. M. Fivash, E.M. Towler, R.J. Fisher, Curr. Opin. Biotechnol. 9, 97 (1998) 7. Z. Salamon, H.A. Macleod, G. Tollin, Biochim. Biophys. Acta 1331, 131 (1997) 8. I.L. Medintz, G.P. Anderson, M.E. Lassman, E.R. Goldman, L.A. Bettencourt, J.M. Mauro, Anal. Chem. 76, 5620 (2004) 9. E.L. Cussler, Diﬀusion (Cambridge University Press, Cambridge, 1997) 10. H.J.V. Tyrrell, K.R. Harris, Diﬀusion in liquids (Butterworth, London, 1984) 11. G.I. Taylor, Proc. Roy. Soc. A 219, 186 (1953) 12. K.M. Berland, Methods Mol. Biol. 261, 383 (2004) 13. R. Pecora, Dynamic Light Scattering (Plenum, London, 1985) 14. N. Baden, M. Terazima, Chem. Phys. Lett. 393, 539 (2004) 15. M. Terazima, N. Hirota, J. Chem. Phys. 98, 6257 (1993) 16. M. Terazima, K. Okamoto, N. Hirota, J. Phys. Chem. 97, 13387 (1993) 17. M. Terazima, K. Okamoto, N. Hirota, J. Chem. Phys. 102, 2506 (1995) 18. K. Okamoto, M. Terazima, N. Hirota, J. Chem. Phys. 103, 10445 (1995)

8 Time-Resolved Detection of Intermolecular Interaction

171

19. M. Terazima, Acc. Chem. Res. 33, 687 (2000) 20. H.J. Eichler, P. G¨ unter, D.W. Pohl, Laser induced dynamic gratings (Spirnger, Berlin, 1986) 21. M. Terazima, Adv. Photochem. 24, 255 (1998) 22. M. Terazima, J. Photochem. Photobiol. C 3, 81 (2002) 23. M. Terazima, Phys. Chem. Chem. Phys. 8, 545 (2006) 24. M. Terazima, N. Hirota, S.E. Braslavsky, A. Mandelis, S.E. Bialkowski, G.J. Diebold, R.J.D. Miller, D. Fournier, R.A. Palmer, A. Tam, Pure Appl. Chem. 76, 1083 (2004) 25. S. Nishida, T. Nada, M. Terazima, Biophys. J. 87, 2663 (2004) 26. T. Nada, M. Terazima, Biophys. J. 85, 1876 (2003) 27. S. Nishida, T. Nada, M. Terazima, Biophys. J. 89, 2004 (2005) 28. K. Takeshita, N. Hirota, Y. Imamoto, M. Kataoka, F. Tokunaga, M. Terazima, J. Am. Chem. Soc. 122, 8524 (2000) 29. K. Takeshita, Y. Imamoto, M. Kataoka, F. Tokunaga, M. Terazima, Biochemistry 41, 3037 (2002) 30. K. Takeshita, Y. Imamoto, M. Kataoka, K. Mihara, F. Tokunaga, M. Terazima, Biophys. J. 83, 1567 (2002) 31. J.S. Khan, Y. Imamoto, M. Harigai, M. Kataoka, M. Terazima, Biophys. J. 90, 3686 (2006) 32. Y. Hoshihara, Y. Imamoto, M. Kataoka, F. Tokunaga, M. Terazima, Biophys. J. 94, 2187 (2008) 33. T.E. Meyer, Biochem. Biophys. Acta 806, 175 (1985) 34. G.E.O. Borgstohl, D.R. Williams, E.D. Getzoﬀ, Biochemistry 34, 6278 (1995) 35. W.D. Hoﬀ, P. D¨ ux, K. H˚ ard, B. Devreese, I.M. Nugteren-Roodzant, W. Crielaard, R. Boelens, R. Kaptein, J. van Beeumen, K.J. Hellingwerf, Biochemistry 33, 13959 (1994) 36. R. Kort, H. Vonk, X. Xu, W.D. Hoﬀ, W. Crielaard, K.J. Hellingwerf, FEBS Lett. 382, 73 (1996) 37. U.K. Genick, G.E.O. Borgstahl, K. Ng, Z. Ren, C. Pradervand, P.M. Burke, ˇ V. Srajer, T. Teng, W. Schildkamp, D.E. McRee, K. Moﬀat, E.D. Getzoﬀ, Science 275, 1471 (1997) 38. Brudler,R., R. Rammelsberg, T.T. Woo, E.D. Getzoﬀ, K. Gerwert, Nat. Struct. Biol. 8, 265 (2001) 39. W.D. Hoﬀ, I.H.M. van Stokkum, H.J. van Ramesdonk, M.E. van Brederode, A.M. Brouwer, J.C. Fitch, T.E. Meyer, R. van Grondelle, K.J. Hellingwerf, Biophys. J. 67, 1691 (1994) 40. P. D¨ ux, G. Rubinstenn, G.W. Vuister, R. Boelens, F.A.A. Mulder, K. H˚ ard, W.D. Hoﬀ. A.R. Kroon, W. Crielaard, K.J. Hellingwerf, R. Kaptein, Biochemistry 37, 12689 (1998) 41. Y. Imamoto, H. Koshimizu, K. Mihara, O. Hisatomi, T. Mizukami, K. Tsujimoto, M. Kataoka, F. Tokunaga, Biochemistry 40, 4679 (2001) 42. G. Rubinstenn, G.W. Vuister, F.A.A. Mulder, P. D¨ ux, R. Boelens, K.J. Hellingwerf, R. Kaptein, Nat. Struct. Biol. 5, 568 (1998) 43. M.E.Van Brederode, W.D. Hoﬀ, I.H.M. van Stokkum, M. Groot, K.J. Hellingwerf, Biophys. J. 71, 365 (1996) 44. K.J. Hellingwerf, J. Hendriks, T. Gensch, J. Phys. Chem. A, 107, 1082 (2003) 45. J.S. Khan, Y. Imamoto, Y. Yamazaki, M. Kataoka, F. Tokunaga, M. Terazima, Anal. Chem. 77, 6625 (2005)

172

M. Terazima

46. P. Hazra, K. Inoue, W. Laan, K.J. Hellingwerf, M. Terazima, Biophys. J. 91, 654 (2006) 47. S. Masuda, C.E. Bauer, Cell 110, 613 (2002) 48. B.J. Kraft, S. Masuda, J. Kikuchi, V. Dragnea, G. Tollin, J.M. Zaleski, C.E. Bauer, Biochemistry 42, 6726 (2003) 49. M. Gauden, S. Yeremenko, W. Laan, I.H. van Stokkum, J.A. Ihalainen, R. van Grondelle, K.J. Hellingwerf, J.T. Kennis, Biochemistry 44, 3653 (2005) 50. S. Masuda, K. Hasegawa, T.A. Ono, Biochemistry 44, 1215 (2005) 51. W. Laan,T. Bednarz, J. Heberle, K.J. Hellingwerf, Photochem. Photobiol. Sci. 3, 1011 (2004) 52. S. Anderson, V. Dragnea, S. Masuda, J. Ybe, K. Moﬀat, C. Bauer, Biochemistry 44, 7998 (2005) 53. W. Laan, M. Gauden, S. Yeremenko, R. van Grondelle, J.T.M. Kennis, K.J. Hellingwerf, Biochemistry 45, 51 (2006) 54. R. Swaminathan, C.P. Hoang, A.S. Verkman, Biophys. J. 72, 1900 (1997) 55. P. Atkins, J. Paula, Physical Chemistry (Oxford University Press, Oxford, 2004) 56. P. Hazra, K. Inoue, W. Laan, K.J. Hellingwerf, M. Terazima, J. Phys. Chem. B 112, 1494 (2008) 57. Y. Nakasone, T. Eitoku, D. Matsuoka, S. Tokutomi, M. Terazima, Biophys. J. 91, 645 (2006) 58. Y. Nakasone, T. Eitoku, D. Matsuoka, S. Tokutomi, M. Terazima, J. Mol. Biol. 367, 432 (2007) 59. E. Huala, P.W. Oeller, E. Liscum, I.S. Han, E. Larsen, W.R. Briggs, Science 278, 2120 (1997) 60. W.R. Briggs, E. Huala, Annu. Rev. Cell Dev. Biol. 15, 33 (1999) 61. J.M. Christie, P. Reymond, G.K. Powell, P. Bernasconi, A.A. Raibekas, E. Liscum, Science 282, 1698 (1998) 62. J.M. Christie, M. Salomon, K. Nozue, M. Wada, W.R. Briggs, Proc. Natl Acad. Sci. USA 96, 8779 (1999) 63. J.A. Jarriol, H. Gabrys, J. Capel, J.M. Alonso, J.R. Ecker, A.R. Cashmore, Nature 410, 952 (2001) 64. T. Kagawa, T. Sakai, N. Suetsugu, K. Oikawa, S. Ishiguro, T. Kato, Science 291, 2138 (2001) 65. T. Kinoshita, M. Doi, N. Suetsugu, T. Kagawa, M. Wada, K. Shimazaki, Nature 414, 656 (2001) 66. T.E. Swartz, S.B. Corchnoy, J.M. Christie, J.W. Lewis, I. Szundi, W.R. Briggs, R.A. Bogomolni, J. Biol. Chem. 276, 36493 (2001) 67. T. Kottke, J. Heberle, D. Hehn, B. Dick, P. Hegemann, Biophys. J. 84, 1192 (2003) 68. T.A. Sch¨ uttrigkeit, C.K. Kompa, M. Salomon, W. R¨ udiger, M.E. MichelBeyerle, Chem. Phys. 294, 501 (2003) 69. J.T.M. Kennis, S. Crosson, M. Gauden, I.H.M. van Stokkum, K. Moﬀat, R. van Grondelle, Biochemistry 42, 3385 (2003) 70. E. Schleicher, R.M. Kowalczyk, C.W.M. Kay, P. Hegemann, A. Bacher, M. Fischer, R. Bittl, G. Richter, S. Weber, J. Am. Chem. Soc. 126, 11067 (2004) 71. T. Eitoku, Y. Nakasone, D. Matsuoka, S. Tokutomi, M. Terazima, J. Am. Chem. Soc. 127, 13238 (2005) 72. T. Eitoku, Y. Nakasone, K. Zikihara, D. Matsuoka, S. Tokutomi, M. Terazima, J. Mol. Biol. 371, 1290 (2007)

9 Volumetric Properties of Proteins and the Role of Solvent in Conformational Dynamics C.A. Royer and R. Winter

Abstract. Walter Kauzmann stated in a review of protein thermodynamics that “volume and enthalpy changes are equally fundamental properties of the unfolding process, and no model can be considered acceptable unless it accounts for the entire thermodynamic behaviour” (Nature 325:763–764, 1987). While the thermodynamic basis for pressure eﬀects has been known for some time, the molecular mechanisms have remained rather mysterious. We, and others in the rather small ﬁeld of pressure eﬀects on protein structure and stability, have attempted since that time to clarify the molecular and physical basis for the changes in volume that accompany protein conformational transitions, and hence to explain pressure eﬀects on proteins. The combination of many years of work on a model system, staphylococcal nuclease and its large numbers of site-speciﬁc mutants, and the rather new pressure perturbation calorimetry approach has provided for the ﬁrst time a fundamental qualitative understanding of ΔV of unfolding, the quantitative basis of which remains the goal of current work.

9.1 Introduction The physical chemical properties of proteins inform their function and as such have been the object of intense investigation for over 50 years. Indeed, major progress in the understanding of protein structure, dynamics and thermodynamics, as well as their inter-relationships has been made thanks to advances in experimental and computational approaches. Despite this gain in fundamental understanding, a complete description of the factors that control these properties has not been achieved. In particular, the characterization of the role of solvent in controlling protein conformational transitions and stability remains to be accomplished [1]. During the 1970s and 1980s the fundamental basis for the temperature dependence of protein stability and conformational changes was revealed [2]. Heat and cold denaturation were clearly attributed to the signiﬁcant decrease in heat capacity upon folding, leading to entropy-driven unfolding at high temperature and enthalpy-driven unfolding at low temperature. The amount of hydrophobic surface area that is removed from interaction with water was

174

C.A. Royer and R. Winter

shown to be proportional to the magnitude of the loss of heat capacity associated with disorder–order transitions involved in protein folding or function [3]. In contrast to the fairly complete understanding of the temperature dependence of protein conformation, insight into pressure eﬀects on proteins has lagged behind. Although a few rather complete studies on the pressure dependence of protein stability appeared early on [4–6], the number of scientists working in the ﬁeld of high pressure did not increase, and indeed, even diminished for a time. This pressure dependence of protein stability is based on the volume change associated with unfolding. A review in 2002 by one of the authors of the present review [7] provides a listing of a number of volume changes obtained in pressure studies reported in the literature over about 30 years. Given the long time period, the data base is indeed quite small. Moreover, these volume changes were measured for several diﬀerent proteins under completely diﬀerent conditions of temperature and pH and some involved assumptions of nonzero compressibility changes. Hence, they are diﬃcult if not impossible to compare. One clear observation is that at low temperature, the volume changes upon unfolding are invariably negative, with values ranging from just below zero to −185 ml mol−1 . Positive volume changes reported at high temperature or low pressure, however, have served to confuse the issue, and the volumetric properties of proteins have largely been considered inextricable. In 1987 [8] and again in 1993 [9], it was pointed out that the hydrophobic liquid model could not be entirely adapted to protein folding, since it completely fails to explain the eﬀects of pressure. Kauzmann points out that “volume and enthalpy changes are equally fundamental properties of the unfolding process, and no model can be considered acceptable unless it accounts for the entire thermodynamic behaviour” In his “Reminiscences from a Life in Protein Physical Chemistry” [10], Kauzmann further states: I continue to feel that the study of the volume changes in protein reactions is sorely neglected. They may be determined by dilatometry and by the eﬀects of pressure on protein equilibrium constants. The results complement the results of the determination of enthalpy changes as measured by calorimetry and the eﬀects of temperature on equilibrium constants. Much useful insight at the molecular level can be obtained from a knowledge of volume changes

So, rather than follow the example of Kauzmann’s drunk [8], who searches for his keys under the light of the street lamp, despite having lost them in the dark, we have attempted over the past 15 years to shed new light on what he termed “the darkness of pressure studies.”

9.2 Thermodynamics The early pressure unfolding studies cited above revealed all of the essential parameters for describing combined temperature–pressure eﬀects. Hawley ﬁrst demonstrated that protein unfolding p–T diagrams were elliptical in shape

9 Volumetric Properties of Proteins p

pressure denaturation ΔS=0

native cold denaturation

175

denatured

ΔV=0 heat denaturation T

Fig. 9.1. Hypothetical general p–T phase diagram for two-state cooperative protein folding, according to (9.1). The stability decreases with increasing or decreasing temperature from the ΔS = 0 line and with increasing or decreasing pressure from the ΔV = 0 line. The shape of the ellipse depends very strongly on Δα and ΔCp

(Fig. 9.1). He analyzed the p–T diagrams using the following approximation which incorporates changes upon unfolding of the basic thermodynamic parameters ΔH, ΔS, and ΔV as well as their temperature (ΔCp , Δα ) and pressure (Δβ) dependences. The Gibbs energy diﬀerence between the denatured (unfolded) and native state, relative to some reference point T0 , p0 (e.g., the unfolding temperature at 25◦ C and ambient pressure), can be approximated – assuming a second-order Taylor series of ΔG(T, p) expanded with respect to T and p around T0 , p0 – as [6, 11]:

! T − 1 + T0 + ΔS (T − T0 ) + ΔV (p − p0 ) ΔG = ΔG0 + ΔCp T ln T0 Δβ 2 (p − p0 ) . + Δα (T − T0 ) (p − p0 ) + 2 (9.1) In particular, these early studies clearly demonstrated that the volume change upon unfolding (like the enthalpy change) is not constant with temperature and that also like the enthalpy, changes sign, being rather large and negative at low temperature but becoming positive at higher temperatures. This temperature dependence of the volume change is due to Δα, the diﬀerence in thermal expansivity between the unfolded and the folded state. Despite this rather complete description, a profound understanding of the molecular contributions to the value of the volume change has remained elusive [7], and it has been our goal to describe these contributions to ΔV and its complete temperature dependence. Hence we have sought to understand Δα, as well.1 1

We note here that while Δβ, the diﬀerence in compressibility between the unfolded and folded state, necessarily plays a role at high temperature and pressure,

176

C.A. Royer and R. Winter

Fig. 9.2. Ribbon diagram of Snase, PDB 1EYO [13]. The single tryptophan is shown in dark gray and one of the residues for which a number of site speciﬁc mutants has been studied, valine 66, is shown in black

To approach these issues, we have studied for several years a model protein system, staphylococcal nuclease (Snase, Fig. 9.2) that presents a number of advantages. First of all, Snase (as well as a very large number of site-speciﬁc mutants) has been widely studied in terms of structure using multiple techniques (NMR, crystallography and other spectroscopic approaches) and in thermal and chemical denaturation, both at equilibrium and in kinetic studies. Therefore a great deal of information is available (which will not be cited here). Secondly, Snase is a highly basic protein, evolved to hydrolyze nucleic acids, and as such presents a high positive surface charge that minimizes aggregation phenomena. This has been quite useful in high-pressure Fourier transform infrared (FTIR), small-angle X-ray scattering (SAXS), NMR and densitometry experiments as well as in pressure perturbation calorimetry (PPC), since these techniques require rather large concentrations of protein, 2–20 mg ml−1 . In our hands, in contrast to Snase, many proteins fail to exhibit reversible thermodynamics under these conditions. Third, Snase at low temperature has a relatively large, negative volume change for unfolding (e.g., ∼ −90 ml mol−1 at 4◦ C), and the wild type presents marginal stability at ambient conditions (∼ −5 to − 6 kcal mol−1 ), rendering it rather pressure sensitive. we have not undertaken a complete description to date, as these are not the conditions under which most pressure unfolding studies are carried out. Indeed we have found that over most of the temperature range, a diﬀerence in compressibility between the folded and unfolded states need not be invoked. Hence we have left this parameter for future consideration. We further note that reported positive ΔV values at low pressure [11, 12] are likely due to changes in spectroscopic observables due to simple isothermal compression of the folded state.

9 Volumetric Properties of Proteins

a

b 250

100

50 25 −10

denaturated

200

75

p / MPa

−DV/ ml mol−1

177

150 native

100 50

0

10

20

30

T/°C

40

50

60

0 −10

0

10

20

30

40

50

T/°C

Fig. 9.3. Temperature dependence of Snase high pressure unfolding. (a) Temperature dependence of the absolute value of the volume change of unfolding as measured by ﬂuorescence (triangles) and FITR (squares); (b) p–T phase diagram of Snase stability by ﬂuorescence (triangles), FITR (crosses) and SAXS (circles)

We determined several years ago the temperature dependence of the pressure unfolding of Snase [14] using ﬂuorescence, FTIR and SAXS to build the p–T phase diagram (Fig. 9.3). These studies showed a clear decrease in the absolute value of the volume change for unfolding as a function of temperature, although the uncertainty in the recovered values of ΔV did not allow us to conclude unequivocally in a linear dependence. Nonetheless, in the absence of any further information we assumed linearity and hence calculated from the slope the change in thermal expansivity between the folded and the unfolded state to be on the order of 1 ml mol−1 K−1 . This value for Δα was in accord with the values reported for chymotrypsinogen [6] and metmyoglobin [5], and can clearly account for the change in sign of the volume change that may occur at high temperature. (Note that the slope of the p–T phase diagram for Snase becomes steeper at high temperature, but it never becomes positive, at least under these experimental conditions.) While these results conﬁrmed the importance of the expansivity in deﬁning the pressure dependence of protein stability, they did not bring much further insight into the molecular basis for such eﬀects. Moreover, as in the earlier studies cited above, the values of ΔV and Δα were derived from analysis of spectroscopic data as a function of pressure and temperature according to a two-state unfolding model. We thus felt it important to measure, directly, the quantities of interest, and hence undertook densitometric studies as a function of pressure and temperature using an ultra-highsensitivity oscillating U-tube densitometer (Anton Paar, Graz, Austria) [15]. We were able to calculate also the decrease in volume upon unfolding by temperature at atmospheric pressure and by pressure at about 40◦ C (arrow at 100 MPa in Fig. 9.4). The latter value (−55 ml mol−1 ) was in good agreement with the ΔV obtained from ﬁtting the spectroscopic pressure-induced unfolding proﬁles to a two-state model, −52 ml mol−1 (assuming no signiﬁcant change in isothermal compressibility between the two states).

178

C.A. Royer and R. Winter 0.7800

Vs /ml g−1

0.7775 0.7750 0.7725 DV

0.7700 0.7675

0

25

50 p / MPa

75

100

Fig. 9.4. Speciﬁc volume of Snase as a function of pressure at 40◦ C [15]. The protein is folded up to 50 MPa, and the slope up to that pressure is indicative of the isothermal compressibility of the folded state. The arrow at 100 MPa indicates the volume change of unfolding assuming constant compressibility of the folded state and nearly complete unfolding by 100 MPa. Unfortunately, the high-pressure densitometer was limited to 100 MPa, so the compressibility of the unfolded state could not be determined 12200

V/ml mol−1

12100

Vf

12000 11900

Vu

11800 11700

0

10

20

30

40

50

60

70

T /°C Fig. 9.5. Speciﬁc molar volumes of the folded (Vf ) and unfolded (Vu ) states of Snase as derived from densitometric measurements [15] (crosses, diamonds), pressure perturbation calorimetry [16] (open square), and spectroscopic high-pressure unfolding experiments [14] (ﬁlled squares). Dashed lines correspond to extrapolations

In Fig. 9.5 is shown the ﬁrst, and to our knowledge, only direct experimental plot of the volume of both the folded and unfolded states of a protein. The densitometric studies yielded directly the volume V of Snase as a function of temperature for the folded state (below the transition temperature, crosses) and for the unfolded state (above the transition temperature, diamonds). It can be seen as well from Fig. 9.5 that the increase in V of the native state of Snase with temperature is not linear; indeed the folded state α decreases signiﬁcantly as the temperature increases while at high temperature

9 Volumetric Properties of Proteins

179

the expansivity of the unfolded state appears to be a constant. Taking the values of ΔV obtained in Fig. 9.2a, we also calculated the V of the unfolded state at low temperature (ﬁlled squares), which to a ﬁrst approximation appears to increase linearly over this temperature range as well, with approximately the same slope as over the high temperature range (extrapolated triangles). Thus, we have concluded that the expansivity of the unfolded state is, to a ﬁrst approximation, temperature independent, while that of the folded state is not. Hence, Δα, the diﬀerence in expansivity between the two states is most likely not constant. The crossed circle just below 50◦ C corresponds to the V of the folded state calculated from the V of the unfolded state plus the volume change for folding obtained from PPC measurements of the volume change upon unfolding at the transition temperature [16]. Beyond this point, we do not know the value of the expansivity or the speciﬁc volume of the folded state. The dashed line represents the extrapolation of a polynomial ﬁt to the curve at lower temperature. Thus, the direct measurement of volumetric properties conﬁrms the importance of the diﬀerence in thermal expansivity of the unfolded and folded states of Snase in determining the pressure dependence of the volume change. These studies also support the notion that the diﬀerence in compressibility is small and likely only contributes to the pressure dependence of the unfolding at high temperature. Our results also suggest that the diﬀerence in expansivity is probably not constant with temperature; and indeed we have no idea how Δα may depend upon pressure. Nonetheless, these results reinforce and expand the studies from the 1970s, and at least from a thermodynamic point of view, clear up to a signiﬁcant extent the confusion that has surrounded the volumetric properties of proteins. However, it still does not provide insight into the molecular nature of volume changes and pressure eﬀects.

9.3 Thermal Expansivity and ΔV We can reasonably assume two major contributions to the diﬀerence in speciﬁc volume between the unfolded and folded states of a protein. The ﬁrst contribution is that arising from the decrease in solvent-excluded volume when the tightly, but of course not perfectly, packed protein folded structure is disrupted. Water molecules enter this volume, thereby decreasing the overall volume of the protein–solvent system. The magnitude of this contribution is a speciﬁc property of the protein, both in its folded and unfolded state. The second contribution arises from the change in the volume of the water molecules that hydrate the newly exposed protein surface area, relative to their volume in the bulk. Much of our present understanding of the contribution of diﬀerential hydration volume has come from recent studies of model compounds and proteins based on PPC. This technique, developed by Brandts and coworkers [17] and recently reviewed by us [16, 18], is based on the measurement of the heat released or absorbed upon small (e.g., 0.5 MPa) pressure

180

C.A. Royer and R. Winter

perturbations in a diﬀerential scanning calorimeter. The heat exchange is related to the entropy change (9.2). Taking the derivative with respect to pressure (9.3) and substituting the Maxwell relation (9.4) yields the expression for the heat change with pressure in terms of the thermal expansivity α (9.5). If a transition occurs, integrating the change in α over the temperature range (from T0 to Tf ) of the transition yields the volume change for the transition [at that temperature (9.6)]. dQrev = T dS.

∂S ∂Qrev =T . ∂p ∂p T T

∂V ∂S =− . ∂p T ∂T p

(9.2) (9.3) (9.4)

∂Qrev ∂V = −T = −T V α, ∂p ∂T p T

1 ∂V ΔQrev . α= =− V ∂T p T V Δp Tf ΔV = α dT . V T0

(9.5)

(9.6)

Thus, measurement of the heat exchange every degree or two along a diﬀerential scanning calorimetry (DSC) scan for a model compound or protein provides a direct measurement of the expansivity, and in the case of proteins, the volume change of unfolding at the folding transition temperature. Lin and coworkers [17] have measured the expansivity of individual amino acid side chains (by subtracting the value obtained for glycine) (Fig. 9.6a). Lin and coworkers observed that the expansivity value for polar amino acids was large and positive at low temperature, and decreased dramatically between 5◦ C and 50◦ C. Quite the opposite was observed for nonpolar amino acid side chains, which exhibited a large negative expansivity at low temperature which increased dramatically between 5◦ C and 50◦ C. We have carried out similar studies following a host–guest scheme, in which we subtracted the expansivity measured for a glycine tripeptide, from peptides in which the central glycine residue was substituted with the residue of interest (Fig. 9.6b). The relative magnitude of the results from these two studies is not the same, but the overall picture is similar. In our case, we have controlled very carefully for aggregation phenomena and we observe that the magnitude of the negative expansivities for the nonpolar amino acids is more or less proportional to their hydrophobicity (L > A > Q = M > F). Note that the black line in Fig. 9.6a corresponds to the expansivity of pure water and that it exhibits a small negative value at low temperature. This observation helps to interpret the expansivity data. A negative expansivity

9 Volumetric Properties of Proteins

a

b

2.5

Asn Glu

2.0 1.5 1.0

K / 10

a

0.5 0

−0.5 −1.0 −1.5

181

Ser

0.2 0.0 K −0.2 /10 −0.4 −0.6 GG H2 O −0.8 GX a −1.0 10 20 30 40 50 60 70 80 90 D −1.2 Phe T / °C

Leu Val Ala

−1.4

GAG-GGG GLG-GGG GOG-GGG GMG-GGG GFG-GGG

8

16

24

32

40

48

56

64

T/°C

−2.0

Fig. 9.6. (a) PPC data taken from Lin and coworkers [17] for polar and nonpolar amino acids (calculated with respect to the signal obtained from glycine). (b) Similar studies obtained by us [18] for nonpolar amino acids using a host–guest approach

means that the density increases upon heating. We know this is true for water at low temperature, since ice ﬂoats. We can use the same reasoning for the nonpolar amino acids. As their solutions in water are heated, hydrating waters are released to the bulk where they occupy a smaller partial molar volume, akin to ice melting. Hence we can conclude that at these low temperatures, the density of the waters hydrating the nonpolar residues is lower than in the bulk, or ice-like. While the Frank and Evans iceberg model has been highly controversial, these PPC results lend some support. Indeed Kauzmann stated in the 1987 Nature article comments: “I still believe that the Frank and Evans iceberg model of 40 years ago is essentially correct . . .” [8]. In contrast, the large positive expansivity for the polar amino acids indicates a degree of “electrostriction” of the hydrating water molecules around polar moieties, leading to a higher density than that of the bulk. Upon heating, these molecules are released gradually into the less dense bulk, and hence lead to a large, positive expansivity. Proteins, being composed of a combination of polar and nonpolar moieties, more or less exposed to solvent depending upon their conformation, should exhibit expansivities that correspond, in part to a weighted combination of the expansivities of these moieties. In addition, we must consider for proteins the intrinsic expansivity of the protein structure itself, in addition to the hydration, which can be positive and negative. We and Brandts and coworkers [16–18] have measured the expansivity of a few model proteins, in particular Snase, under a variety of conditions. A typical protein PPC scan is shown in Fig. 9.7a.

182

a

C.A. Royer and R. Winter

b

1.1

1.5

1.0

1.0 a/ 10−3K−1

a/ 10−3K−1

0.9 0.8 0.7 0.6

0.0

0.5 0.4 0

0.5

0

10

20

30

40 50 T / °C

60

70

80

90

−0.5 0

10

20

30

40 T / °C

50

60

70

Fig. 9.7. (a) PPC scan for Snase (2 mg ml−1 ) taken from Ravindra et al. [16] and (b) the (less accurate) expansivity calculated from the densitometry measurements of Seemann et al. [15]

It can be seen from Fig. 9.7 that the expansivity of Snase is rather large and positive at low temperature and that it decreases dramatically up to about 43◦ C. At this point, the protein unfolds, and the accompanying DSC scan showing the enthalpy peaks at 50◦ C. Above 60◦ C, the expansivity of the protein corresponds to that of the unfolded state, and between 60◦ C and 70◦ C it is rather constant. Moreover, the agreement between the PPC measurements and those obtained by densitometry is rather astounding. The expansivities of the folded state (populated at low temperature) and the unfolded state (populated at high temperature) are nearly identical using the two techniques. In both experiments, α for the folded state decreases dramatically with temperature, while that for the unfolded state is rather constant. The expansivity proﬁle for the folded state of Snase resembles that obtained for polar amino acid residues, and this similarity is due to the fact that protein surfaces are rather polar. If the expansivity of the unfolded state is rather constant, as suggested by the extrapolation to low temperatures in Fig. 9.5, then one may conclude that this arises from the oﬀset of the polar and hydrophobic surface areas that are exposed in the unfolded state. From the PPC data one can reliably calculate the volume change of unfolding, ΔV (at the transition temperature), by integrating α over the unfolding transition as shown in (9.6). Under these conditions, we found it to be −19 ml mol−1 A linear extrapolation of the plot in Fig. 9.3 would place the value closer to −40 ml mol−1 , but we do not know if the dependence is linear; indeed we suspect that Δα is not a constant. Moreover, the data in Fig. 9.3 were obtained from the analysis of high pressure data, and Δβ may play a role. In any case it is clear from this rather direct measurement of the volume change of unfolding that it is not positive at low pressure (0.5 MPa) and moderate temperature, at least in the case of Snase, in agreement with our experimental p–T diagram in Fig. 9.3. Thus the often-cited statement that the volume change for protein unfolding is negative at high pressure and positive at low pressure is not necessarily true, and likely quite often false.

9 Volumetric Properties of Proteins

183

K-1 -3

1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3

α / 10

α /10

-3

K

-1

As a means of understanding more clearly the determinants for α, Δα, and hence ΔV , we can ask the question as to how the expansivity proﬁles change as a function of solution conditions that modify protein stability. We investigated the PPC proﬁles of Snase as a function of the osmolyte, sorbitol, and the denaturant, urea. It has been amply demonstrated that these additives do not function through some hypothetical eﬀect on water structure [19] only, but rather through either positive or negative interaction energies with the protein surface [20, 21], the peptide bond in the case of urea. Thus we can be reasonably sure that the diﬀerences observed in the PPC curves obtained in the presence of these additives arise from changes in the protein stability, structure or hydration. It can be seen in Fig. 9.8 that the transition shifts, as expected, to higher temperature as a function of increasing osmolyte concentration. The ΔV decreases in absolute value from −19 to −5 ml mol−1 . This is in part due to the increase in the transition temperature, and because of a positive Δα (see Fig. 9.5) the volume between the unfolded and folded state decreases in absolute value. There may be a contribution of the eﬀect of the osmolyte to the structure of the unfolded state as well. The value of α at low temperature increases with increasing osmolyte as a result of the preferential hydration effect. At high temperature, the diﬀerences in the expansivity of the bulk water

0

10

20

30

40

50

60

70

80

90

1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

10

20

30

10 5 0 -5 -10 -15 -20 -25 -30 -35 -40 0

10

20

30

40 T/° C

40

50

60

70

80

90

60

70

80

90

T/°C

Cp / kJ mol-1 K-1

Cp / kJ mol

-1

K

-1

T/° C

50

60

70

80

90

10 5 0 -5 -10 -15 -20 -25 -30 -35 -40 0

10

20

30

40

50

T/ °C

Fig. 9.8. PPC (upper panels) and DSC (lower panels) proﬁles of Snase (4 mg ml−1 ) in phosphate buﬀer at pH 5.5. The eﬀects of sorbitol (left panels) (0, 0.5, and 1.5 M, curves shifting to higher temperatures) and urea (right panels) (0, 0.5, 1.5, and 2.5 M, curves shifting to lower temperatures) were tested

184

C.A. Royer and R. Winter

and the hydrating water, as well as the decrease in the hydration interaction lead to basically indistinguishable α values at high temperature. The eﬀect of urea on the PPC proﬁles is just the opposite. As expected, the temperature of the transition decreases, and the absolute value of ΔV increases from −19 to − 56 ml mol−1 . This again is due primarily to the eﬀect of Δα, which increases the diﬀerence in volume between the two states as temperature decreases. The value of α at low temperature decreases signiﬁcantly. This may be due to the decrease of hydration because of urea binding, or may involve the density diﬀerences between bound and bulk urea at low temperature.

9.4 Conclusions The PPC studies carried out so far on proteins seem to suggest that their volumetric properties, and hence the eﬀects of pressure on their structures and stabilities can be largely explained by the diﬀerential hydration terms. For example, we have found recently (unpublished data) that partially destructured variants of Snase that expose more hydrophobic surface area to the solvent also exhibit lower values of α at low temperatures. However, other recent experiments in progress on hyperstable variants of Snase suggest that the stability and dynamics of the various states of the protein, in addition to the degree and type of hydration, may be crucial in determining the value of α as well. These studies have suggested that high stability and limited dynamics tend also to decrease the amplitude of α for the folded state at low temperature, indicating that the value of the expansivity for the folded state at low temperature results from a combination of surface hydration properties and structural ﬂexibility. More experimental work on model compounds and on speciﬁc variants under a variety of conditions, in addition to computational approaches, will be necessary to quantify protein expansivity, which to our mind is essential to the molecular-level understanding of volume changes and pressure eﬀects. We have come to consider the volume change of unfolding at 4◦ C as a standard value. At this low temperature, the diﬀerences in expansivity for the polar and nonpolar amino acid side chains are close to maximal. Since the expansivity (releasing water to the bulk) for polar and charged groups is large and positive, then moving water molecules from the bulk to hydrate newly exposed polar surface area leads to an increase in density or a decrease in volume. Just the opposite is true for nonpolar surface area exposed upon unfolding. Hence the ΔV at this low temperature can be considered to comprise the sum of negative values for the exposure of each polar moiety, positive values for the exposure of each nonpolar moiety, and the contribution of the disappearance of solvent excluded volume upon disruption of the tertiary packing. Given that the protein interior contains most of the nonpolar amino acid side chains, and that disruption of the structure would expose this nonpolar surface area, one might expect that the result of the contributions from the exposed polar and

9 Volumetric Properties of Proteins

185

nonpolar surface area could be a positive (or less negative) ΔV . This is not the case. Indeed, it is at these low temperatures that ΔV is found to be at its most negative. Hence, we propose that the diﬀerence in solvent-excluded volume is mainly responsible for the decrease in volume upon unfolding of proteins at low temperature, and that this contribution may indeed overcome a positive contribution from diﬀerential solvation. We must bear in mind that the magnitude of the diﬀerence in solvent-excluded volume depends both on the packing density of the folded state and the degree of disruption of the unfolded state which is rather poorly characterized in most cases. The folded state presents a relatively more polar surface area than the unfolded state, and it has a speciﬁc three-dimensional structure that imposes constraints on its expansion. Hence its expansivity decreases drastically with increasing temperature, whereas that of the unfolded state appears to be rather constant. Thus, as the temperature increases, the unfolded state expands much more eﬃciently than the folded state. This is why the diﬀerence in speciﬁc volume between the unfolded and folded state of proteins decreases with increasing temperatures and may even become positive. Indeed, we have observed in PPC experiments on a hyperstable variant of Snase (unpublished results) that under certain conditions the volume change for unfolding indeed becomes positive. Such an observation was possible because the unfolding temperature of the variant is considerably higher than that of the wild type. This leads us to suggest that at low temperature the deﬁning contribution to ΔV comes mainly from excluded volume diﬀerences, and ΔV for unfolding is negative. In contrast, at high temperatures, diﬀerential solvation due to the increased exposed surface area of the unfolded state in addition to its larger thermal volume linked to increased conformational dynamics takes over and ΔV for unfolding eventually becomes positive. After almost two decades of wandering around in “the darkness of the ﬁeld of pressure eﬀects on protein folding” we have come to understand, at least qualitatively, the underlying molecular contributions to the volumetric properties of the various states of proteins and how these change with temperature. We have yet to reach a quantitative understanding of these contributions. While we can calculate for example from pressure-jump relaxation studies, the fractional change in hydration between the folded and transition state or the transition state and the unfolded state [22, 23], we cannot say how many water molecules are excluded from the protein surface in these transitions; nor can we predict volumetric properties from sequence and structure. Finally, we have yet to explore in detail the pressure eﬀects on the volumetric properties of proteins. Despite these remaining challenges, it would appear that the light of a small candle may be making its way into the darkness. We are conﬁdent that further progress in understanding the volumetric properties of proteins will provide fundamental information in adaptation and evolution that will ultimately contribute to the multiple applications involving protein design and functional modulation.

186

C.A. Royer and R. Winter

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

Y. Levy, J.N. Onuchic, Annu. Rev. Biophys. Biomol. Struct. 35, 389 (2006) P.L. Privalov, S.J. Gill, Adv. Protein Chem. 39, 191 (1988) K.P. Murphy, E. Freire, Adv. Protein Chem. 43, 313 (1992) J.F. Brandts, R.J. Oliveira, C. Westort, Biochemistry 9, 1038 (1970) A. Zipp, W. Kauzmann, Biochemistry 12, 4217 (1973) S.A. Hawley, Biochemistry 10, 2436 (1971) C.A. Royer, Biochim. Biophys. Acta 1595, 201 (2002) W. Kauzmann, Nature 325, 763 (1987) K.A. Dill, Biochemistry 29, 7133 (1990) W. Kauzmann, Protein Sci. 2, 671 (1993) E.J. Fuentes, A.J. Wand, Biochemistry 37, 9877 (1998) T.M. Li, J.W. Hook III, H.G. Drickamer, G. Weber, Biochemistry 15, 5571 (1976) J. Chen, Z. Lu, J. Sakon, W.E. Stites, J. Mol. Biol. 303, 125 (2000) G. Panick, G.J. Vidugiris, R. Malessa, G. Rapp, R. Winter, C.A. Royer, Biochemistry 38, 4157 (1999) H. Seemann, R. Winter, C.A. Royer, J. Mol. Biol. 307, 1091 (2001) R. Ravindra, C. Royer, R. Winter, Phys. Chem. Chem. Phys. 6, 1952 (2004) L.N. Lin, J.F. Brandts, J.M. Brandts, V. Plotnikov, Anal. Biochem. 302, 144 (2002) L. Mitra, N. Smolin, R. Ravindra, C. Royer, R. Winter, Phys. Chem. Chem. Phys. 8, 1249 (2006) J.D. Batchelor, A. Olteanu, A. Tripathy, G.J. Pielak, J. Am. Chem. Soc. 126, 1958 (2004) T. Arakawa, S.N. Timasheﬀ, Biophys. J. 47, 411 (1985) M. Auton, L.M. Holthauzen, D.W. Bolen, Proc. Natl. Acad. Sci. USA 104, 15317 (2007) L. Brun, D.G. Isom, P. Velu, B. Garcia-Moreno, C.A. Royer, Biochemistry 45, 3473 (2006) L. Mitra, K. Hata, R. Kono, A. Maeno, D. Isom, J.B. Rouget, R. Winter, K. Akasaka, B. Garcia-Moreno, C.A. Royer, J. Am. Chem. Soc. 129, 14108 (2007)

10 A Statistical Mechanics Theory of Molecular Recognition T. Imai, N. Yoshida, A. Kovalenko, and F. Hirata

Abstract. A novel theoretical approach to the molecular recognition process in protein is presented, based on the statistical mechanics of molecular liquids, or the reference interaction site model/three-dimensional reference interaction site model (RISM/3D-RISM) theory. The method requires just the structure of protein and the potential energy parameters for the biomolecule and solutions as inputs. The calculation is carried out in two steps. The ﬁrst step is to obtain the pair correlation function for solutions consisting of water and ligands based on the RISM theory. Then, given the pair correlation functions prepared in the ﬁrst step, we calculate the 3D-distribution functions of water and ligands around and inside protein based on the 3D-RISM theory. The molecular recognition of a ligand by the protein is realized by the 3D-distribution functions: if one ﬁnds some conspicuous peaks in the distribution of a ligand inside the protein, then the ligand is regarded as “recognized” by the protein. Some molecular recognition processes of small ligands, including water, noble gases, and ions, by a protein are presented in this chapter. The relation of the molecular recognition process to the pressure denaturation of protein is also discussed.

10.1 Introduction Life phenomena are a series and a network of chemical reactions, which are regulated by genetic information inherited from generation to generation. The genetic information itself is generated and transmitted by a series of chemical processes [1]. In each of those reactions, some characteristic process takes place, which distinguishes biochemical reactions from ordinary chemical reactions in solutions. The process is commonly referred to as “molecular recognition (MR).” For example, in order for the enzymatic reaction to occur, the substrate molecules should be accommodated ﬁrst by the protein in its reaction pocket to form the so-called enzyme-substrate (ES) complex [2]. The MR process is extremely selective and speciﬁc in atomic level, and that selectivity as well as speciﬁcity is the key for living systems to maintain their life.

188

T. Imai et al.

Imagine what happens if a calcium binding protein binds say potassium ion erroneously. In that respect, the MR is an elementary process of life phenomena. The MR process can be deﬁned as a molecular process in which one or a few guest molecules are bound in high probability at a particular site, a cleft or a cavity, of a host molecule in a particular orientation. In this regard, the MR is a molecular process determined by speciﬁc interactions between atoms in host and guest molecules. On the other hand, the process is a thermodynamic process as well, with which the chemical potential or the free energy of guest molecules in the recognition site and in the bulk solution are concerned. As an example, let us think about the binding of a substrate molecule at some reaction pocket of a host protein. Usually, the reaction pocket is likely to be ﬁlled with one or a few water molecules when there is no substrate. For a substrate molecule, in order to come into the reaction pocket, one or some of the water molecules should be disposed from the pocket, while the substrate molecule itself should be partially or entirely dehydrated. The free energy changes associated with the processes are commonly called “dehydration penalty.” When a guest molecule comes into a cleft or a cavity of a host molecule, it has to overcome a high entropy barrier, because the space or the degree of freedom allowed to the guest molecule is so small compared to those in the bulk solution. The conformation of the host molecule should ﬂuctuate to accommodate the guest molecule dynamically. The hinge bending motion of protein to accommodate a ligand is an example of such induced ﬁtting. The conformational ﬂuctuation of biomolecules is also driven by the free energy. The reason why the MR process is so challenging for any theoretical means lies in the fact that the process is a “molecular process” governed by “thermodynamic laws.” The “docking simulation” often employed in drug design uses essentially a trial and error scheme to ﬁnd a “best-ﬁt complex” of host and guest molecules based on geometrical and/or energetic criteria [3,4]. However, the best-ﬁt complex in geometrical sense will never be the most stable one in terms of the thermodynamics, because it cannot account for the solvent: neither the dehydration penalty nor the entropy barrier mentioned earlier is taken into account. The so-called implicit solvent models, the generalized Born (GB) [5] and the Poisson–Boltzmann (PB) equations [6], which have been used most popularly for evaluating the solvation thermodynamics of biomolecules, are much less accurate and are not insightful at all for the problem under concern, because by deﬁnition they do not have a molecular view for solvent. Moreover, it is impossible to deﬁne a dielectric constant of solvent inside a host cavity; thereby it cannot account for the dehydration penalty, especially that from the host cavity. At best, those quantities can be calculated by ﬁtting the empirical parameters such as the boundary conditions and the dielectric constants with experimental data, but then it loses credibility as a ﬁrst-principle theory predicting the phenomena. Molecular simulation, on the other hand, can provide the most detailed molecular view for the process. However, a “let-it-do” type simulation does not work for the problem at all, because the MR process is usually a slow

10 A Statistical Mechanics Theory of Molecular Recognition

189

and rare event. A common strategy adopted by the simulation community to overcome the diﬃculty is a non-Boltzmann-type sampling, which deﬁnes a “reaction coordinate” or an “order parameter” onto which all other degrees of freedoms are projected. The best example is “umbrella” sampling to realize the potential of mean force, or the free energy along a conduction path of an ion in an ion-channel [7]. The method is quite powerful for sampling the conﬁguration space around an order parameter if the parameter is unique. Unfortunately, the problems in the biochemical processes are not so simple as can be described by a unique order parameter. So, it is often the case that the results of the simulation depend on the choice of order parameter and on “scheduling” of the sampling. The other methodology employed to accelerate the sampling is to apply an artiﬁcial external force to the system: for example, external pressure applied to water molecules in aquaporin [8]. That kind of simulations should verify that the conﬁguration water satisﬁes the Boltzmann distribution; otherwise, the simulation has a danger of ending up with just a “science ﬁction.” Recently, a new theoretical approach to the MR process has been launched, based on the three-dimensional reference interaction site model (3D-RISM) method, a statistical mechanics theory of liquids [9–11]. The 3D-RISM equation was derived from the molecular Ornstein–Zernike (MOZ) equation, the most fundamental equation to describe the density pair correlation of molecular liquids [12, 13], for a solute–solvent system in the inﬁnite dilution by taking a statistical average over the orientation of solvent molecules. By solving the combined 3D-RISM and RISM equations, the latter providing the bulk solvent structure in terms of the site–site density pair correlation functions, one can get the “solvation structure” or the solvent distributions around a solute. The solvation structure so produced retains the atomic information, because it starts from a Hamiltonian in which the information of atom–atom interactions among molecules is embedded just as in a molecular simulation. The method produces naturally all the solvation thermodynamics as well, including energy, entropy, free energy, and their derivatives such as the partial molar volume and compressibility. Unlike molecular simulation, there is no necessity for concern about size of the system and “sampling” of the conﬁguration space, because the method treats essentially the inﬁnite number of molecules and integrates over the entire conﬁguration space of the solvent. The power of the 3D-RISM theory has been demonstrated fully in the solvation structure and thermodynamics of protein. The partial molar volumes of proteins in aqueous solutions calculated by Imai, Kovalenko, and Hirata have exhibited quantitative agreement with corresponding experimental results [14]. This turns out to be the ﬁrst quantitative results obtained for the thermodynamics of protein entirely from statistical mechanics theory. It was an accomplishment by itself in the sense that it gave great conﬁdence in the 3D-RISM to explore the stability of protein in solutions. However, it was only a prelude to a discovery that will give even bigger impact on the science. When we were analyzing the 3D-distribution of water around hen egg-white

190

T. Imai et al.

lysozyme, we found conspicuous peaks inside small cavities in the protein, which no doubt reveal the water molecules trapped inside the macromolecule [15]. In fact, the number of water molecules and the positions inside the cavity coincide with those found by the X-ray crystallography. This implies that the 3D-RISM is capable of “detecting” the molecules “recognized” by protein or the host molecule. This is nothing but the realization of the “molecular recognition.” In this chapter, we review our recent studies on molecular recognition by protein based on the RISM and 3D-RISM theories, which have been carried out as a part of the Scientiﬁc Research in Priority Areas “Water and Biomolecules” during last 5 years.

10.2 Outline of the RISM and 3D-RISM Theories Let us begin the section with asking the following questions to the readers. “What is the structure of liquid?” “How the structure of liquid can be characterized?” These questions are nontrivial, because unlike individual molecules and crystal, liquid state does not form a structure of deﬁnite shape. One can readily deﬁne the structure of a molecule by giving the bond lengths, bond angles, and dihedral angles even for the most complex molecule like protein. The crystalline structure of solid can be also deﬁned unambiguously by giving the lattice constants. However, molecules in liquids are in continuous diﬀusive motion, and thereby the deﬁnite geometry among the molecules cannot be deﬁned. In such a case, we can only use statistical or probabilistic language. The probabilistic language to characterize the structure of liquids is the distribution $ functions, which are nothing but the moments of the density ﬁeld ν(r) = i δ(r − ri ) with respect to the Boltzmann weight. If there is no ﬁeld applied to the system, the ﬁrst moment or the average density is just constant everywhere in the system, namely, ρ(r) ≡ ν(r) = ρ = N/V , where V and N are the volume of the container and the number of molecules in the system, respectively, and · · · indicates the thermal average. So, the average density does not convey any information with respect to the liquid. However, if you look at the second moment ρ(r, r ) = ν(r)ν(r ) , this quantity carries the structural information of liquids. The quantity is referred to as the density pair distribution function, which has essentially the same physical meaning as the radial distribution function (RDF) obtained from X-ray diﬀraction measurement. The density pair distribution function ρ(r, r ) is proportional to the probability density of ﬁnding two molecules at the two positions r and r at the same time, and becomes just a product of the average densities when the distance between the two positions becomes so large that there is no “correlation” between the densities at the two positions. lim ρ(r, r ) → ρ(r)ρ(r ) = ρ2 in uniform liquids . (10.1) |r−r |→∞

10 A Statistical Mechanics Theory of Molecular Recognition

191

The quantity g(r, r ) = ρ(r, r )/ρ2 represents a “correlation” between the densities at the two positions r and r . So, it is referred to as the “pair correlation function” (PCF), or RDF when the liquid density is uniform and the translational invariance is implied. We further deﬁne a function called the “total correlation function” by h(r, r ) = g(r, r )−1, which represents the correlation of the density “ﬂuctuations” at the two positions r and r , h(r, r ) = δν(r)δν(r ) /ρ2 ,

(10.2)

where δν(r) (= ν(r) − ρ) denotes the density ﬂuctuation. The main task of the liquid state theory is to ﬁnd an equation that governs the function h(r, r ) or g(r, r ) based on the statistical mechanics, and to solve the equation. As is brieﬂy described in the Introduction, an “exact” equation referred to as the Ornstein–Zernike equation, which relates h(r, r ) with another correlation function called the direct correlation function c(r, r ), can be “derived” from the grand canonical partition function by means of the functional derivatives. Our theory to describe the molecular recognition starts from the Ornstein–Zernike equation generalized to a solution of polyatomic molecules, or the molecular Ornstein–Zernike (MOZ) equation [12], h(1, 2) = c(1, 2) + c(1, 3)ρh(3, 2) d3, (10.3) where h(1, 2) and c(1, 2) are the total and direct correlation functions, respectively, and the numbers in the parenthesis represent the coordinates of molecules in the liquid system, including both the position R and the orientation Ω. d3 = Ω −1 dR3 dΩ3 , where Ω is the unweighted integral over the angular coordinates. The boldface letters of the correlation functions indicate that they are matrices consisting of the elements labeled by the species in the solution. In the simple case of a binary mixture, the equation can be written down labeling the solute by “u” and solvent by “v” as follows. (It is straightforward to generalize the equations to multi-component mixtures.) hvv (1, 2) = cvv (1, 2) + cvv (1, 3)ρv hvv (3, 2) d3 + cvu (1, 3)ρu huv (3, 2) d3, huv (1, 2) = cuv (1, 2) +

cuv (1, 3)ρv hvv (3, 2) d3 +

huu (1, 2) = cuu (1, 2) +

(10.4)

cuu (1, 3)ρu huv (3, 2) d3, (10.5)

cuv (1, 3)ρv hvu (3, 2) d3 +

cuu (1, 3)ρu huu (3, 2) d3. (10.6)

By taking the limit of inﬁnite dilution (ρu → 0), one gets hvv (1, 2) = cvv (1, 2) + cvv (1, 3)ρv hvv (3, 2) d3,

(10.7)

192

T. Imai et al.

huv (1, 2) = cuv (1, 2) +

cuv (1, 3)ρv hvv (3, 2) d3.

(10.8)

The equations depend essentially on six coordinates in the Cartesian space, and it includes a sixfold integral. This integral is the one that prevents the theory from applications to polyatomic molecules. It is the interaction-site model and the RISM approximation proposed by Chandler and Andersen [16] that enabled one to solve the equations. The idea behind the model is to project the functions onto the one-dimensional space along the distance between the interaction sites, usually placed on the center of atoms, by taking the statistical average over the angular coordinates of the molecules with ﬁxation of the separation between a pair of interaction site. γ (10.9) fαγ (r) = δ (R1 + lα 1 ) δ (R2 + l2 − r) f (1, 2) d1d2, where lα i is the vector displacement of site α in molecule i from the molecular α center Ri . It follows that Ri + lα i = ri denotes the position of site α in molecule i. The angular average of the second terms in (10.7) and (10.8) is formidable, but the approximation γ cαγ (|rα (10.10) c(1, 2) ≈ 1 − r2 |) α,γ

allows one to perform the angular average, leading to the RISM equation ρhρ = ω ∗ c ∗ ω + ω ∗ c ∗ ρhρ, where the asterisk denotes convolution integrals f ∗ g(r) = f (r )g(|r − r|) dr .

(10.11)

(10.12)

The new function ω appearing in the derivation of (10.11) is called the “intramolecular” correlation function, which is deﬁned for a pair of atoms α and γ in a molecule by ωαγ (r) = ρδαγ δ(r) + ρ(1 − δαγ )δ(r − lαγ ),

(10.13)

in which δαγ and δ(r) are the Kronecker and Dirac delta functions, respectively. By means of the Dirac delta function, the term δ(r − lαγ ) imposes a distance constraint lαγ between the pair of atoms. Thus, in the RISM theory, imposing the distance constraints on all pairs of atoms in the molecule deﬁnes the molecular geometry in terms of trigonometry, similar to the z-matrix in computational chemistry. The 3D-RISM equation for the solute–solvent system at inﬁnite dilution can be derived from (10.8) by taking the statistical average over the angular coordinate of “solvent,” but not for that of “solute” [10, 11, 17]. The equation reads

10 A Statistical Mechanics Theory of Molecular Recognition

hγ (r) =

cγ (r ) ωγvv γ (|r − r|) + ρhvv γ γ (|r − r|) dr ,

193

(10.14)

γ

where hγ (r) and cγ (r) are, respectively, the total and direct correlation functions of solvent site γ at position r in the Cartesian coordinate of which origin is placed at an arbitrary position, generally inside the protein. The functions ωγvv γ (r) and hvv γ γ (r) are the correlation functions for solvent molecules, which appear in (10.11). It is these equations that can be applied to the molecular recognition process. If one views the solute molecule as a “source of external force” exerted on solvent molecules, then ρgγ (r) (= ρhγ (r) + ρ) is identiﬁed as the density distribution of solvent molecules in the “external force.” This identiﬁcation called “Percus trick” is the “key” concept to realize the molecular recognition process by means of statistical mechanics. The equations described earlier contain two unknown functions, h(r) and c(r). Therefore, they are not closed without another equation that relates the two functions. Several approximations have been proposed for the closure relations: HNC, PY, MSA, etc. [12]. The HNC closure can be obtained from the diagramatic expansion of the pair correlation functions in terms of density by discarding a set of diagrams called “bridge diagrams,” which have multifold integrals. It should be noted that the terms kept in the HNC closure relation still include those up to the inﬁnite orders of the density. Alternatively, the relation has been derived from the linear response of a free energy functional to the density ﬂuctuation created by a molecule ﬁxed in the space within the Percus trick. The HNC closure relation reads h(r) = exp (−u(r)/kB T + h(r) − c(r)) + 1,

(10.15)

where kB and T are the Boltzmann constant and temperature, respectively, and u(r) is the interaction potential between a pair of atoms in the system. Equation (10.15) is the relation that incorporates the physical and chemical characteristics of the system into the theory through u(r). The PY approximation can be obtained from the HNC relation just by linearizing the factor exp (h(r) − c(r)). The HNC closure has been quite successful for describing the structure and thermodynamics of liquids and solutions including water. However, the approximation is notorious in the low density regime. The drawback becomes fatal sometimes when one tries to apply the theory to associating liquid mixtures or solutions, especially of dilute concentration, because a solution of “dilute” concentration is equivalent to “low density” liquid for the minor component. To get rid of the problem, Kovalenko and Hirata proposed the following approximation, or the KH closure [18], exp (d(r)) for d(r) ≤ 0, g(r) = (10.16) 1 + d(r) for d(r) > 0, where d(r) = −u(r)/kB T + h(r) − c(r). The approximation turns out to be quite successful even for mixture of complex liquids.

194

T. Imai et al.

The procedure of solving the equations consists of two steps. We ﬁrst solve the RISM equation (10.11) for hvv γ γ (r) of solvent or a mixture of solvents in cases of solutions. Then, we solve the 3D-RISM equation (10.14) for hγ (r) of a protein–solvent (solution) system, inserting hvv γ γ (r) for the solvent into (10.14), which has been calculated in the ﬁrst step. Considering the deﬁnition g(r) = h(r) + 1, g(r) thus obtained is the three-dimensional distribution of solvent molecules around a protein in terms of the interaction site density representation of the solvent or a mixture of solvents in case of solutions. The so-called solvation free energy can be obtained from the distribution functions through the following equations [18, 19] corresponding, respectively, to the two closure relations described earlier, (10.15) and (10.16): ! 1 1 (10.17) dr hγ (r)2 − cγ (r) − hγ (r)cγ (r) , ΔμHNC = ρkB T 2 2 γ ΔμKH = ρkB T

γ

dr

! 1 1 hγ (r)2 Θ(−hγ (r)) − cγ (r) − hγ (r)cγ (r) , 2 2 (10.18)

where Θ(x) is the Heaviside step function. The other thermodynamic quantities concerning solvation can be readily obtained from the standard thermodynamic derivative of the free energy except for the partial molar volume. The partial molar volume, which is a very important quantity to probe the response of the free energy (or stability) of protein to pressure, including the so-called pressure denaturation, is not a “canonical” thermodynamic quantity for the (V, T ) ensemble, since volume is an independent thermodynamic variable of the ensemble. The partial molar volume of protein at inﬁnite dilution can be calculated from the Kirkwood–Buﬀ equation [20] generalized to the site–site representation of liquid and solutions [21, 22], (10.19) cγ (r) dr , V¯ = kB T χT 1 − ρ γ

where χT is the isothermal compressibility of pure solvent or solution, which is obtained from the site–site correlation functions of the solution. In the following, we present an application of the theory described earlier to demonstrate the robustness of the theory. The example is the partial molar volume of protein, which can be calculated using (10.19) from h(r), or equivalently from c(r) obtained from the 3D-RISM equation. The partial molar volume of several proteins in water which appear frequently in the literature of protein research is plotted against the molecular weight in Fig. 10.1. [23] By comparing the results with the experimental ones plotted in the same ﬁgure, one can readily see that the theory is capable of reproducing the experimental results in quantitative level. At a glance, the results seem to be reproduced by just simple consideration

10 A Statistical Mechanics Theory of Molecular Recognition

195

Fig. 10.1. Partial molar volume of proteins plotted against the molecular weight. The theoretical results (black circles) show quantitative agreement with the experimental ones (crosses)

of protein geometry using a commercial software to calculate the exclusion volume of protein. However, it is never the case. The reason is because the partial molar volume is the “thermodynamic quantity,” not the “geometrical volume.” The partial molar volume reﬂects all the solvent–solvent and solute– solvent interactions as well as all the conﬁgurations of water molecules in the system, while geometrical volume accounts for just the simpliﬁed (hardcore type) repulsive interaction between the solute and solvent. Other factors such as attractive interactions between solute and solvent and the solvent reorganization are entirely neglected in the geometrical volume. The contributions from the solvent reorganization are of particular importance in the partial molar volume of protein, because it is concerned with the so-called volume of “cavity” in protein. As is well regarded, a protein has many internal cavities where water molecules can or cannot be accommodated. Let us carry out a simple “thought experiment” with respect to the partial molar volume of protein. The experiment is to dissolve a protein in water. Upon the dissolution of protein in water, some of the cavities in the protein may be ﬁlled by water molecules, but others may not. If the cavity stays empty, then the empty space will contribute to increase the partial molar volume of the protein. On the other hand, if the space is ﬁlled by water molecules due to the reorganization of the solvent, it will contribute to reduce the entire volume of the solution, and compensate the increase due to the cavity volume. This compensation is nontrivial: if a cavity can accommodate one water molecule, it gives rise to the reduction in the volume by 18 cm3 mol−1 . In this regard, unless a theory is able to describe the reorganization of water molecules induced by

196

T. Imai et al.

protein, it is useless to predict the partial molar volume. The nearly quantitative results shown in the ﬁgure demonstrate that the theory is properly accounting for all the solute–solvent and solvent–solvent interactions as well as solvent reorganization induced by protein, including the accommodation of some water molecules into the internal cavity. In the following sections, we will demonstrate how the 3D-RISM theory is capable of describing molecular recognition processes.

10.3 Recognition of Water Molecules by Protein It is not necessary to emphasize how important water is for living systems to maintain their life [24–26]. No wonder that many scientists in the ﬁeld of X-ray and neutron diﬀraction measurement have been trying to determine positions and orientations of water molecules around and inside biomolecules, or protein and DNA [27, 28]. However, it is not so easy even for modern experimental technology to locate the position of water molecules, partly due to the limited resolution of diﬀraction measurements in space as well as in time. This is because water molecules at the surface of protein are not necessarily bound ﬁrmly to some particular site of biomolecules, but exchange their positions quite frequently. Actually this ﬂexibility and ﬂuctuation of water molecules are essential for living systems to control their life. The diﬀraction measurement can identify only some water molecules that have long residence time at some particular position of the biomolecules. In this study [15, 29], we have carried out the 3D-RISM calculation for a hen egg-white lysozyme immersed in water and obtained the 3D-distribution function of oxygen and hydrogen of water molecules around and inside the protein. The native 3D structure of the protein is taken from the protein data bank (PDB). The protein is known to have a cavity composed of the residues from Y53 to I58 and from A82 to S91, in which four water molecules have been determined by means of the X-ray diﬀraction measurement [30]. In our calculation, those water molecules are not included explicitly. In Fig. 10.2, depicted by green surfaces or spots using isosurface representation is g(r) of water oxygen, which is very similar to the electron density map obtained from the X-ray crystallography. We have drawn g(r) greater than a threshold value: the left, center, and right ﬁgures correspond, respectively, to g(r) > 2, g(r) > 4, and g(r) > 8. Since g(r) is unity in the bulk, the left ﬁgure indicates that the probability of ﬁnding those water molecules at the surface is more than twice as larger compared to the bulk water. As such, the water molecules depicted in the right ﬁgure have the probability of location in those spots eight times higher than in the bulk. The water molecules are those bound ﬁrmly to some particular atoms of the protein due to, say, hydrogen bonds, and they are quite rare as one can see from the ﬁgure. In this sense, the threshold values play the role of “temperature” in the X-ray diﬀraction measurement: if you lower the temperature, you can observe more

10 A Statistical Mechanics Theory of Molecular Recognition

197

water molecules that have weaker interaction with protein. The results suggest that the X-ray and neutron diﬀraction communities have acquired a powerful theoretical tool to analyze their data to locate the position and orientations of water molecules, as our theory also provides the distribution of hydrogen sites of water molecules. The results depicted in Fig. 10.2 are what we expected before we actually carried out the calculation, although they were entirely new by themselves in the history of statistical mechanics. Entirely unexpected was that we observed some peaks of water distribution in a cavity “inside” the protein, which is surrounded by the residues from Y53 to I58 and from A82 to S91. The results are shown in Fig. 10.3. The left picture in Fig. 10.3 shows the isosurfaces of g(r) > 8 for water-oxygen (green) and hydrogen (pink) in the cavity. In the ﬁgure, only the surrounding residues are displayed, except for A82 and L83, which are located in the front side. There are four distinct peaks of water oxygen and seven distinct peaks of water hydrogen in the cavity. The spots colored by green and pink indicate water oxygen and hydrogen, respectively. From the isosurface plot, we have reconstructed the most probable model of the hydration structure. It is shown in the center of Fig. 10.3, where the four water molecules are numbered in the order from the left. Water 1 is hydrogenbonding to the main-chain oxygen of Y53 and the main-chain nitrogen of L56. Water 2 forms hydrogen bonds with the main-chain nitrogen of I56 and the main-chain oxygen of L83, which is not drawn in the ﬁgure. Water 3 and 4 also form hydrogen bonds with protein sites, the former to the main-chain oxygen of S85 and the latter to the main-chain oxygens of A82 (not displayed) and of D87. There is also a hydrogen bond network among Water 2, 3, and 4. The peak of the hydrogen between Water 3 and 4 does not appear in the ﬁgure because it is slightly less than 8, which means the hydrogen bond is weaker or looser than the other hydrogen-bonding interactions. Although the hydroxyl group of S91 is located in the center of the four water molecules, it makes only weak interactions with them. It is interesting to compare the hydration structure obtained by the 3DRISM theory with crystallographic water sites of X-ray structure [30]. The crystallographic water molecules in the cavity are depicted in the right of Fig. 10.3, showing four water sites in the cavity, much as the 3D-RISM theory has detected. Moreover, the water distributions obtained from the theory and experiment are quite similar to each other. Thus the 3D-RISM theory can predict the water-binding sites with great success. It should be noted that one peak of the 3D-distribution function does not necessarily correspond to one molecule. If a water molecule transfers back and forth between two sites in the equilibrium state, two peaks correspondingly appear in the 3D-distribution function. In fact, the number of water molecules within the cavity calculated from the 3D-distribution function is 3.6. It is less than the number of water-binding sites and includes decimal fractions. To explain that, we carried out molecular dynamics (MD) simulation using the same parameters and under the same thermodynamic conditions as

198

T. Imai et al.

Fig. 10.2. Isosurface representation of the 3D distribution function g(r) of water oxygen around lysozyme calculated by the 3D-RISM theory. Green surfaces or spots show the area where the distribution function is larger than 2 (left), 4 (center ), and 8 (right)

Fig. 10.3. Water molecules in a cavity of lysozyme. Only the surrounding residues are displayed. The isosurfaces of water oxygen (green) and hydrogen (pink ) for the 3D distributions larger than 8 (left), the most probable model of the hydration structure reconstructed from the isosurface plots (center ), and the crystallographic water sites (right)

Fig. 10.4. Xenon bound by lysozyme: protein surface, blue; xenon, yellow; water oxygen, red; water hydrogen, white. The right and left panels magnify the substrate binding site and the internal site, respectively. The X-ray xenon sites are painted as orange spheres

10 A Statistical Mechanics Theory of Molecular Recognition

199

for the 3D-RISM calculation. Only one exception was that the four crystallographic water molecules in the cavity as well as the other crystallographic water molecules were initially put at their own sites in the MD simulation. The result of MD simulation also shows the hydration number less than 4, that is, 3.5 [29]. From the MD trajectory, it is found that two inner water molecules, Water 1 and 2, stay at their own sites during all the simulation time, and make only small ﬂuctuation around the sites. On the other hand, two outer water molecules, Water 3 and 4, sometimes enter and leave the sites, and by chance exchange with other water molecules from the bulk phase. As a result, the number of water molecules at the outer sites is 1.5 on average. The 3D-RISM theory provides a reasonable hydration number including fractions through statistical-mechanical relations, even though the theory takes no explicit account of the dynamics of molecules.

10.4 Noble Gas Binding to Protein Molecular recognition by protein, or ligand binding, is one of the most fundamental functions of protein in the biological process. In addition to a scientiﬁc interest, prediction of the ligand binding sites and aﬃnities is the starting point for drug discovery [31, 32]. Therefore, a large number of computational methods as well as experimental approaches have been proposed [3,4,33]. The computational methodologies are divided into two categories or stages. One is the prediction of ligand binding sites in a target protein. The binding sites are located, in the most common case, based on a purely geometric analysis of the protein structure, in which cavities or clefts in the protein are detected and regarded as the potential binding sites [3]. The binding sites can also be predicted by bioinformatics from multiple alignment of the amino acid sequences in the protein family [33]. The other is docking of a ligand molecule at the binding sites that are already known or predicted in advance. Possible docking structures are then evaluated based on a force ﬁeld or a scoring function [4]. Although such docking programs are increasingly popular among the ﬁelds of bioscience and pharmacology [34], theoretical methodologies are not fully developed. One of the least developed methodologies is how to incorporate the eﬀect of water into the binding aﬃnity or free energy. Water participates in the protein–ligand binding in the following two ways. Primarily, bulk water provides the reaction ﬁeld acting on the binding. This eﬀect includes the electrostatic screening and the hydrophobic interaction between protein and ligand molecules. Moreover, individual water molecules can act as integral molecular components of the complex [35–37]. In fact, water molecules are often found at the binding interface of protein–ligand complexes mediating with the hydrogen bonds or simply ﬁlling void spaces. In spite of evident signiﬁcance of such water molecules, the eﬀect of water is usually treated at the level of continuum solvent models [4], unless the interfacial water molecules are found in advance.

200

T. Imai et al.

The methodology described in the previous section can be applied to the process with a slight modiﬁcation, and provides a powerful theoretical tool to realize the ligand binding by protein. The modiﬁcation to be made is just to change the solvent from the pure water to an aqueous solution containing ligand molecules. In this section, we present the results for binding of noble gases [38], which are the simplest model of nonpolar ligands. Figure 10.4 shows the 3D distribution functions of xenon and water (oxygen and hydrogen) around lysozyme calculated by the 3D-RISM theory for lysozyme in water–xenon mixture at the concentration of 0.001 M. The molecular surface of the protein is painted blue. The regions where g(r) > 8 are painted with diﬀerent colors for diﬀerent species: yellow, xenon; red, water oxygen; white, water hydrogen. Of course, the surface painted blue is covered by water molecules weakly bound to the protein, which are not shown. A number of well-deﬁned peaks, yellow and red spots, are found for xenon and water oxygen at the surface of the protein, which are separated from each other. The result demonstrates the capability of the 3D-RISM theory to predict “preferential binding” of ligands. The distributions of ligand and water are simultaneously found in this result, which means the peak of either the ligand or the water is found at each site, depending on the ratio of their aﬃnities to the site. Actually, Fig. 10.4 indicates that there are water- and xenon-preferred sites on the protein surface. Similar results are obtained for the other gases and the other concentrations. It is interesting to compare the distribution of xenon obtained by the 3D-RISM theory with the xenon sites in the X-ray structure [39], even though their conditions are diﬀerent: the former is aqueous solution under atmospheric pressure, while the latter is crystal under xenon gas pressure of 12 bar. There are two binding sites of xenon in lysozyme: one corresponds to the binding pocket of native ligands, which is referred to as the substrate binding site, and the other is located in a cavity inside the protein, which is referred to as the internal site [39]. The right panel of Fig. 10.4 compares the theoretical result of the 3D distribution of xenon with the X-ray xenon site at the substrate binding site. The location of a high and sharp peak found by the theory is in complete agreement with the X-ray xenon site. The left panel of Fig. 10.4 shows the result at the internal site. The xenon peak found there is actually a minor one; nevertheless, the location is again consistent with the X-ray site. It is interesting to note that the peaks of water are shifted oﬀ from the xenon binding site. Figure 10.5 shows the size dependence of the coordination number of noble gases at the two binding sites, which is calculated at the concentration of 0.001 M. At the substrate binding site, the coordination number becomes exponentially larger as the size of gas increases (Fig. 10.5a). At the internal site, the coordination number becomes larger with increase in the gas size up to σ ≈ 3.4 ˚ A, while it decreases in the region where σ > 3.4 ˚ A (Fig. 10.5b). As a result, argon has the largest binding aﬃnity to the internal site. These results demonstrate that the 3D-RISM theory has the ability to describe ligand-size

10 A Statistical Mechanics Theory of Molecular Recognition

201

Fig. 10.5. Coordination numbers of noble gases at the two binding sites, plotted against the atomic diameter of the gases. (a) substrate binding site. (b) internal site

selectivity in binding or molecular recognition. Although there are no corresponding experimental data, the present results serve as a representative test case. It is well known that the activity of protein plotted against the logarithm of ligand concentration generally produces a sigmoidal curve, which is the socalled dose-response curve. Experimentalists use the sigmoidal dose-response curve to obtain the equilibrium constant of the protein–ligand binding and the binding free energy. As in the experimental procedure, we can plot the coordination number of each noble gas against the logarithm of the gas concentration. In the present case, the complete sigmoidal curves were not obtained (data not shown) because the aﬃnities between the protein and noble gases are considerably weak. Nevertheless, it should be emphasized here that the production of the dose-response curve can be achieved only if the employed method can treat a highly dilute mixture, because the typical equilibrium constant of ligand binding is in the order of μM. The ordinary molecular simulation would never cover such highly dilute conditions. In the 3D-RISM theory, the calculation can be done at an arbitrary concentration, just by setting the value of component density ρ in the equation. Then, we can obtain the equilibrium constants and the binding free energies from the concentration dependence without calculating the free energy directly.

10.5 Selective Ion-Binding by Protein Ion binding is essential for a variety of physiological processes. The binding of calcium ions by some protein triggers the process to induce the muscle contraction and enzymatic reactions [40,41]. The initial process of the information

202

T. Imai et al.

transmission through the ion channel is the ion-binding by channel protein [42]. The ion-binding plays an essential role sometimes to the folding process of a protein by inducing the secondary structure [43]. Such processes are characterized by the highly selective ion recognition by the proteins. It is of great importance, therefore, for life science to clarify the origin of the ion selectivity in molecular detail. In this section, we present theoretical results for the ion binding by human lysozyme [44, 45] obtained through basically the same procedure as that described in the preceding section, but with change in the solution from noble gas to ionic solutions. We ﬁrst prepare the correlation functions for the bulk solutions by solving (10.11), and then plug those functions into the 3D-RISM equation (10.14) to obtain the 3D-distribution of ions along with water molecules. A special attention, however, should be paid to the treatment of the bulk solution as the reference state, because the ion–ion interactions in the solutions are the Coulomb interaction, and their contribution to the “dehydration penalty” should not be disregarded even in low concentration. To make sure that the free energy due to ion–ion interaction is reasonably accounted, we have calculated the excess chemical potential, or the mean activity coeﬃcient, of ions in solutions. The results are given in Fig. 10.6. The results in general show fair agreement with the experimental results. Particularly, the theory discriminates the divalent ion from the monovalent ions quite well. Apparently, the concentration dependence of the two monovalent ions is not resolved well. This may be due to the potential parameters for the ions. However, it will not seriously inﬂuence the results for the ion recognition by protein, because the process is determined primarily by the free energy diﬀerence of the same ion inside protein and in bulk solutions. The 3D-RISM calculation was carried out for aqueous solutions of three diﬀerent electrolytes, CaCl2 , NaCl, and KCl, and for four diﬀerent mutants of the protein, wild type, Q86D, A92D, Q86D/A92D that have been studied experimentally by Kuroki and Yutani [46].

Fig. 10.6. Mean activity coeﬃcient of aqueous solutions of NaCl, KCl, and CaCl2

10 A Statistical Mechanics Theory of Molecular Recognition

203

Fig. 10.7. Selective ion binding by human lysozyme: upper left, wild type; upper middle, Q86D; upper right, A92D; lower left, Q86D/A92D. The lower middle picture shows the calcium binding site in the Q86D/A92D mutant detected by X-ray, while the picture in lower right exhibits the binding-site found by the 3D-RISM theory

In Fig. 10.7, the distributions of water molecules and the cations inside and around the cleft under concern are shown, which consists of amino acid residues from Q86 to A92. The area where the distribution function g(r) is greater than ﬁve is painted with a color for each species: oxygen of water, red; Na+ ion, yellow; Ca2+ ion, orange; K+ ion, purple. For the wild type of protein in the aqueous solutions of all the electrolytes studied, CaCl2 , NaCl, and KCl, there are no areas of g(r) > 5 observed for the ions inside the cleft, as seen in the upper left part of Fig. 10.7. The Q86D mutant exhibits essentially the same behavior as that of the wild type, but with the water distribution changed slightly. (There is a trace of yellow spot that indicates a slight possibility of ﬁnding a Na+ ion in the middle of the binding site, but it is too small to make a signiﬁcant contribution to the distribution.) Instead, the distribution corresponding to water oxygen is observed, as shown in red in the ﬁgure. The distribution covers faithfully the region where the crystallographic water molecules have been detected, which are shown with the spheres colored gray. There is a small diﬀerence between the theory and the experiment, which is the crystallographic water bound to the backbone of D91. The theory does not reproduce the water molecule by unidentiﬁed reasons. Except for this difference, the observation is consistent with the experimental ﬁnding, especially that the protein with the wild type sequence binds neither Na+ nor Ca2+ . The A92D mutant in the NaCl solution shows a conspicuous distribution of a Na+ ion bound at the recognition site, which is in accord with the

204

T. Imai et al.

experiment (upper-right part of the ﬁgure). The Na+ ion is apparently bound to the carbonyl oxygen-atoms of D92, and is distributed around the moieties. There is a water distribution observed at the active site, but the shape of the distribution is entirely changed from that in the wild type. The distribution indicates that the Na+ ion bound at the active site is not naked, but is accompanied by hydrating water molecules. The mutant does not show any indication of binding K+ ion. (The results are not shown.) This suggests that the A92D mutant discriminates a Na+ ion from a K+ ion. The ﬁnding demonstrates the capability of the 3D-RISM theory to realize the ion selectivity by protein. In the lower panels, shown are the distributions of Ca2+ ions and of water oxygen at the ion binding site of the holo-Q86D/A92D mutant. The mutant is known experimentally as a calcium binding protein. The protein, in fact, exhibits a strong calcium binding activity as is evident from the ﬁgure. The calcium ion is recognized by the carboxyl groups of the three aspartic acid residues, and is distributed around the oxygen atoms. Water distribution at the center of the triangle made by the three carbonyl oxygen atoms is reduced dramatically, which indicates that the Ca2+ ion is coordinated by the oxygen atoms directly, not with water molecules in between. The Ca2+ ion, however, is not entirely naked, because the persistent water distribution is observed at least at two positions where original water molecules were located in the wild type of the protein.

10.6 Pressure-Induced Structural Transition of Protein and Molecular Recognition “Molecular recognition” or speciﬁc hydration in the internal cavity of protein is of substantial importance for the stability and integrity of protein structure itself. In this section, we present an example of such phenomena. Pressure denaturation of protein has been one of the problems in the focus of protein research due not only to its signiﬁcance in science [47–49], but also to its importance in industrial applications, including food processing [50]. The molecular mechanism of the process has not been clariﬁed for a long time, especially concerning the role played by water or hydration. We have applied the RISM/3D-RISM theory to this problem to clarify the molecular mechanism behind the thermodynamics process [51]. Change in the equilibrium constant for the transition (N↔D) between the native (N) and denatured (D) states of protein due to applied pressure can be described thermodynamically by

ΔV¯ ∂ ln K , (10.20) =− ∂p RT T where ΔV¯ denotes the partial molar volume (PMV) change associated with the transition from N to D. This equation indicates that the conformational

10 A Statistical Mechanics Theory of Molecular Recognition

205

change induced by pressure should proceed toward decreasing the volume, which is nothing but “Le Chaterier’s law.” The experimental facts that a protein denatures entirely or partially by pressure indicate that ΔV¯ for the N to D transition should be negative. However, this simple law has never been veriﬁed in terms of molecular theories. The reason is there was neither molecular theory to describe PMV nor data available for protein conformations at high pressure. As we have noted in the section outlining the theory, the RISM/3D-RISM theory is capable of describing PMV of protein in quantitative level. Moreover, the structure of ubiquitin at high pressure (300 MPa) as well as at low pressure (3 MPa), shown in Fig. 10.8, have been obtained recently by the Akasaka group [52]. So, it was a natural attempt to calculate PMV for the two structures, high-pressure structure (HPS) and low-pressure structure (LPS), of the protein by using the 3D-RISM theory. The data shown in Fig. 10.8 are the PMV change upon the structural transition and its decomposition into diﬀerent contributions obtained by the 3D-RISM theory [53]. The decomposition is made by the following equation, which was proposed ﬁrst by Chalikian and Breslauer [54] and later redeﬁned theoretically by us [23, 55], V¯ = VW + VV + VT + VI + kB T χT ,

(10.21)

where VW is the van der Waals volume, VV is the volume of structural voids within the solvent-inaccessible core, VT is the thermal volume that results

Fig. 10.8. Changes in the structure and in the volume components associated with the pressure-induced structural transition of ubiquitin. Solid ribbon representation of low-pressure (3 MPa) and high-pressure (300 MPa) structures. The data shown are the total change in the partial molar volume (V¯ ) and the changes in the van der Waals (VW ), void (VV ), thermal (VT ), and interaction (VI ) volumes

206

T. Imai et al.

from thermally induced molecular ﬂuctuations between the solute and solvent and is considered as average empty space around the solute due to imperfect packing of the solvent, VI is the change in the solvent volume induced by the intermolecular interaction between the solute and solvent, and the last term kB T χT is the ideal contribution to PMV from the translational degrees of freedom of solute. The theoretical calculation indicates that PMV of HPS is less than that of LPS according to Le Chaterier’s law, and most of the contribution to the volume reduction results from the void volume VV . Then, a question to be asked is what is the molecular mechanism of decreasing the void volume by pressure. Is it simply caused by shrinking the volume of internal cavities where there are no water molecules? The answer is “no.” (In such a case, unlike the present result, the thermal volume VT is almost unchanged [23]). Take a look at the pictures in Fig. 10.9, which exhibit the water distribution in the internal cavities of LPS (left) and HPS (right) of the protein. As indicated by dashed circles, the water distribution inside the cavities is largely enhanced in HPS, compared to that in LPS. What happened is that part of the internal void space in LPS is ﬁlled with water molecules upon the structural change into HPS due to the pressure, which gives rise to the decrease in the void volume. The relation between the thermodynamics and the molecular process of pressure denaturation, clariﬁed by the 3D-RISM theory, is as follows. At the low pressure condition in which all the calculations have been carried out, HPS is not the equilibrium conformation but is one of the ﬂuctuating structures.

Fig. 10.9. Isosurface representation of the 3D distribution function of water oxygen around the low-pressure (3 MPa) and high-pressure (300 MPa) structures of ubiquitin. The dark gray surfaces show the area where the distribution function is larger than 2. This is a top-view representation, in which the upper parts (the front parts in the ﬁgure) are clipped to bring the internal cavity (marked by dashed circle) into view

10 A Statistical Mechanics Theory of Molecular Recognition

207

Applying pressure stabilizes the structure in ﬂuctuation at low pressure by reducing PMV through the enhanced contact with water molecules in the internal cavity. The equilibrium shifts toward HPS due to the reduced PMV.

10.7 Perspective In this chapter, we have presented a new method to describe the molecular recognition in biomolecules based on the statistical mechanics of molecular liquids, or the RISM/3D-RISM theory. In some phenomena for which thermodynamic and structural data are available, the theoretical results have exhibited at least qualitative agreement with the experiment. The typical example is the positions of water molecules in a cavity of hen egg-white lysozyme for which the theoretical and experimental results exhibited quantitative agreement. In other cases where there are no experimental data to be compared with, the theory has demonstrated its predictive capability. The best example is the recognition of noble gas by lysozyme. Although there is no data available for noble-gas binding by the protein, except for xenon, our theory reasonably accounts for the dependence of the binding aﬃnity on the size of noble-gas molecules, which shows an entirely diﬀerent trend depending on the position and size of the cavities. We believe that the prediction will be proven sooner or later by the X-ray and/or neutron diﬀraction measurements. Although the RISM/3D-RISM theory has proven its capability of “prediction,” there are few other summits to be conquered before it establishes itself as the “theory of molecular recognition.” The problem concerns conformational ﬂuctuation of protein. For example, the present theory still requires experimental data for structure of protein as an “input.” In other words, we have not yet succeeded in “building” tertiary structure of protein from the amino acid sequence. If we become able to build the tertiary structure in diﬀerent solution conditions (containing, say, electrolytes or other ligands) on the free energy surface produced by the RISM/3D-RISM method, we will be able to attain at the same time two most highlighted problems in the biophysics: the “protein folding” and the “molecular recognition.” The statement of “diﬀerent solution conditions” has an even deeper implication. Experimental results are clearly indicating that some of the folding processes are driven or enforced by “salt bridges” or “water bridges.” This implies that the methodologies that do not account for water molecules and electrolytes explicitly are fatal in this business. The RISM/3D-RISM theory certainly has such an ability to realize those ions and water molecules “bridging” amino-acid residues inside protein, as has been demonstrated in this chapter. If one could sample the protein conformation on the potential of mean force or free energy surface produced by the RISM/3D-RISM method, one would attain the two goals at the same time. We have already developed such methodologies to explore large ﬂuctuation of protein by combining the RISM/3D-RISM theory with the molecular dynamics [56] and Monte Carlo method [57–59].

208

T. Imai et al.

Experimental analysis of protein function involves time-dependent properties such as the rate of an enzymatic reaction and the conduction rate of ions in an ion channel. These properties are related to comparably small ﬂuctuations of protein around the native conformations. In enzymatic reactions, an enzyme may have to “open” its “door” of entrance to accommodate substrate molecules in the reaction pocket. The ion channels have some device called the “gating” mechanism to control the ﬂow of ions into the channel pore. The mechanisms are regulated often by conformational ﬂuctuation of protein. Analyses of those processes require evaluation of “dynamic” or timedependent properties of both protein as well as solvent, which are sometimes closely correlated. In such a case, the “dynamics” on the free energy surface described earlier is insuﬃcient. We have to describe the dynamics of protein and solvent on an equal footing. To our best knowledge, the generalized Langevin equation is only the theory to meet such a requirement. The study to combine the RISM/3D-RISM theory with the generalized Langevin equation to realize the correlated dynamics of protein and solvent is in progress in our group [60]. Any of those methods that we have been developing requires solving the 3D-RISM equations for many conformations of a protein. Currently, it takes a few hours to solve the 3D-RISM equations for the conformation of a protein with a few hundred residues, using a modern workstation. It is not feasible at present to solve the above-stated problems on conventional computational resources, even though we succeed in building the methodology. However, with the National Project of building a next-generation supercomputer, which is underway in Japan, the 3D-RISM methodology ﬁne-tuned to and drastically accelerated with the new supercomputer will hopefully make a crucial contribution to solving these most important problems in life sciences.

References 1. J.D. Watson et al., Molecular Biology of the Gene (Benjamin/Cummings, Menlo Park, CA, 1987) 2. L. Michaelis, M. Menten, Biochem. Z. 49, 333 (1913) 3. C. Sotriﬀer, G. Klebe, Il Formaco 57, 243 (2002) 4. H. Gohlke, G. Klebe, Angew. Chem. Int. Ed. 41, 2644 (2002) 5. W.C. Still, A. Tempczyk, R.C. Hawley, T. Hendrickson, J. Am. Chem. Soc. 112, 6127 (1990) 6. M.K. Gilson, B. Honig, Proteins: Struct. Funct. Genet. 4, 7 (1988) 7. M. Kato, A. Warshel, J. Phys. Chem. B 109, 19516 (2005) 8. F. Zhu, E. Tajkhorshid, K. Schulten, Biophys. J. 86, 50 (2004) 9. F. Hirata (ed.), Molecular Theory of Solvation (Kluwer, Dordrecht, 2003) 10. A. Kovalenko, F. Hirata, Chem. Phys. Lett. 290, 237 (1998) 11. D. Beglov, B. Roux, J. Phys. Chem. B 101, 7821 (1997) 12. J.-P. Hansen, I.R. McDonald, Theory of Simple Liquids, 3rd edn. (Academic, London, 2006)

10 A Statistical Mechanics Theory of Molecular Recognition

209

13. L. Blum, A.J. Torruella, J. Chem. Phys. 56, 303 (1972) 14. T. Imai, A. Kovalenko, F. Hirata, Chem. Phys. Lett. 395, 1 (2004) 15. T. Imai, R. Hiraoka, A. Kovalenko, F. Hirata, J. Am. Chem. Soc. 127, 15334 (2005) 16. D. Chandler, H.C. Andersen, J. Chem. Phys. 57, 1930 (1972) 17. C.M. Cortis, P.J. Rossky, R.A. Friesner, J. Chem. Phys. 107, 6400 (1997) 18. A. Kovalenko, F. Hirata, J. Chem. Phys. 110, 10095 (1999) 19. S.J. Singer, D. Chandler, Mol. Phys. 55, 621 (1985) 20. J.G. Kirkwood, F.P. Buﬀ, J. Chem. Phys. 19, 774 (1951) 21. T. Imai, M. Kinoshita, F. Hirata, J. Chem. Phys. 112, 9469 (2000) 22. Y. Harano, T. Imai, A. Kovalenko, M. Kinoshita, F. Hirata, J. Chem. Phys. 114, 9506 (2001) 23. T. Imai, A. Kovalenko, F. Hirata, J. Phys. Chem. B 109, 6658 (2005) 24. E. Mayer, Protein Sci. 1, 1543 (1992) 25. Y. Zhou, J.H. Morais-Cabral, A. Kaufman, R. MacKinnon, Nature 414, 43 (2001) 26. T. Tanimoto, Y. Furutani, H. Kandori, Biochemistry 42, 2300 (2003) 27. M. Nakasako, Phil. Trans. R. Soc. Land. B Biol. Sci. 359, 1191 (2004) 28. N. Niimura, S. Arai, K. Kurihara, T. Chatake, I. Tanaka, R. Bau, Cell. Mol. Life Sci. 62, 285 (2006) 29. T. Imai, R. Hiraoka, A. Kovalenko, F. Hirata, Proteins: Struct. Funct. Bioinformat. 66, 804 (2007) 30. K.P. Wilson, B.A. Malcolm, B.W. Matthews, J. Biol. Chem. 267, 10842 (1992) 31. D.B. Kitchen, H. Decornez, J.R. Furr, J. Bajorath, Nat. Rev. Drug Discov. 3, 935 (2004) 32. G. Klebe, Drug Discov. Today 11, 580 (2006) 33. O. Lichtarge, M.E. Sowa, Curr. Opin. Struct. Biol. 12, 21 (2002) 34. S.F. Sousa, P.A. Fernandes, M.J. Ramos, Proteins: Struct. Funct. Genet. 65, 15 (2006) 35. J.E. Ladbury, Chem. Biol. 3, 973 (1996) 36. Y. Levy, J.N. Onuchic, Annu. Rev. Biophys. Biomol. Struct. 35, 389 (2006) 37. Z. Li, T. Lazaridis, Phys. Chem. Chem. Phys. 9, 573 (2007) 38. T. Imai, R. Hiraoka, T. Seto, A. Kovalenko, F. Hirata, J. Phys. Chem. B 111, 11585 (2007) 39. T. Prange, M. Schiltz, L. Pernot, N. Colloc’h, S. Longhi, W. Bourguet, R. Fourme, Proteins: Struct. Funct. Genet. 30, 61 (1998) 40. O. Herzberg, M.N. James, Nature 313, 635 (1985) 41. M. Ikura, G.M. Clore, A.M. Gronenborn, G. Zhu, C.B. Klee, A. Bax, Science 256, 632 (1992) 42. B. Hille, Ionic Channels of Excitable Membranes (Sinauer Associates, Sunderland, MA, 2001) 43. S. Tsuda, K. Ogura, Y. Hasegawa, K. Yagi, K. Hikichi, Biochemistry 29, 4951 (1990) 44. N. Yoshida, S. Phongphanphanee, Y. Maruyama, T. Imai, F. Hirata, J. Am. Chem. Soc. 128, 12042 (2006) 45. N. Yoshida, S. Phongphanphanee, F. Hirata, J. Phys. Chem. B 111, 4588 (2007) 46. R. Kuroki, K. Yutani, J. Biol. Chem. 273, 34310 (1998) 47. J.L. Silva, G. Weber, Annu. Rev. Phys. Chem. 44, 89 (1993) 48. C. Balny, P. Masson, K. Heremans, Biochim. Biophys. Acta 1595, 3 (2002)

210

T. Imai et al.

49. F. Meersman, C.M. Dobson, K. Heremans, Chem. Soc. Rev. 35, 908 (2006) 50. M.F. San Martin, G.V. Barbosa-Canovas, B.G. Swanson, Crit. Rev. Food Sci. Nutr. 42, 627 (2002) 51. T. Imai, Condens. Matter Phys. 10, 343 (2007) 52. R. Kitahara, S. Yokoyama, K. Akasaka, J. Mol. Biol. 347, 277 (2005) 53. T. Imai, S. Ohyama, A. Kovalenko, F. Hirata, Protein Sci. 16, 1927 (2007) 54. T.V. Chalikian, K.J. Breslauer, Biopolymers 39, 619 (1996) 55. T. Imai, Y. Harano, A. Kovalenko, F. Hirata, Biopolymers 59, 512 (2001) 56. T. Miyata, F. Hirata, J. Comput. Chem. 29, 871 (2008) 57. M. Kinoshita, Y. Okamoto, F. Hirata, J. Am. Chem. Soc. 120, 1855 (1998) 58. A. Mitsukake, M. Kinoshita, Y. Okamoto, F. Hirata, Chem. Phys. Lett. 329, 295 (2000) 59. A. Mitsukake, M. Kinoshita, Y. Okamoto, F. Hirata, J. Phys. Chem. B 108, 19002 (2004) 60. B. Kim, S.-H. Chong, R. Ishizuka, F. Hirata, Condens. Matter Phys. 11, 179 (2008)

11 Computational Studies of Protein Dynamics J.A. McCammon

Abstract. Theoretical and computational studies of protein function have reached the point at which they are making important contributions to drug discovery and other practical applications. At the same time, they are deepening our understanding of the principles of protein activity, including the dynamical features that give rise to NMR and other experimental measurements, and the time-dependent aspects of biological function.

11.1 Introduction Proteins are well known to exhibit a wide variety of internal motions, on timescales extending from femtoseconds to hours. These motions are also known to be involved in protein function. Examples of such functional motions include the displacement of amino acid residues in enzymes to allow substrate binding and product release, and the rearrangements of enzyme and substrate atoms during catalysis. But how important are the details of the time dependence of such motion? It appears, in fact, that the functions of proteins are governed in many cases by the detailed time dependence of their internal motions. Indeed, it appears that evolution has shaped not only the structures of proteins, but also these essential dynamical characteristics. This chapter provides an overview of protein dynamics and function. Representative experimental results are outlined, and it is shown how computer simulations can be used quantitatively to interpret the dynamical behavior of proteins, including their binding of ligands.

11.2 Brief Survey of Protein Motions Some internal motions of proteins can be described quite simply. These include the localized vibrations within covalently bonded groups and also the elastic vibrations that involve coherent small-amplitude displacements of larger portions of the molecule. But generally, motions in proteins are more complex,

212

J.A. McCammon

and more interesting. The ease with which dihedral angles can be varied in proteins, together with the relatively soft nature of their nonbonded interactions other than the short-range interatomic repulsions, and the dense packing of groups within globular proteins combine to yield the rugged energy landscape that is now familiar from much experimental and theoretical work [8,35]. Variations in the protonation states of titratable groups and in the binding of water molecules and ions to sites in the protein also contribute to the structure of the energy landscape, as discussed below. Motions in proteins correspond to excursions on this energy landscape, and may be correspondingly complex. Even the “simple” motions mentioned at the outset of this section will be perturbed by transitions over barriers in the protein’s energy landscape; e.g., the localized vibrations of a covalently bonded group will diﬀer to some extent, depending on which energy well in the landscape the biopolymer resides in. Spectroscopic studies on the protein myoglobin indicated that proteins may have hierarchical energy landscapes [8]. This study and many subsequent ones suggest that a typical globular protein may have a few conformational substates in its “taxonomic” tier with the largest barriers, that barriers between such taxonomic conformational substates may be on the order of 100 kJ mol−1 , and that there may typically be a few lower tiers with a small number of conformational substates in each tier [29]. A nuclear magnetic resonance study of the most slowly exchanging buried water molecule in the bovine pancreatic trypsin inhibitor indicates that its exchange can be modeled as a diﬀusion process on an energy landscape with the crossing of barriers on the order of 10 kJ mol−1 [6]. The exchange of this water molecule occurs with a characteristic time of about 170 μs at 300 K. Examination of the structure of the protein shows that not only side-chain motions but also signiﬁcant backbone motions must occur during the exchange of this particular water molecule, which is consistent with the many conformational substates being involved with the exchange process. The authors of this study suggest that any local process in a protein that occurs on the nanosecond to millisecond timescale and requires substantial displacements of groups in the protein may be rate-limited by interconversion of conformational substates and display features similar to those observed in their study [6]. Recent single-molecule experimental studies of proteins provide more detailed views of protein motions, and conﬁrm that a wide variety of timescales is involved in, e.g., catalytic action of enzymes [7,14,15,19,33]. Of course, molecular dynamics simulations have been used to probe motions in single proteins for many years, and advances in both theory and computational science have made simulations a powerful approach to building theoretical understanding of protein dynamics [1]. The recent introduction of “accelerated molecular dynamics” methods is helpful in this context [11]. Although detailed dynamical information is sacriﬁced to the enhanced sampling of conformational space in these methods, which have been shown to access conformational ﬂuctuations that are revealed by nuclear magnetic resonance experiments on the millisecond

11 Computational Studies of Protein Dynamics

213

timescale [17], it is possible to recover dynamical information with certain models and approximations [12]. Also, accelerated molecular dynamics simulations have revealed the important role of solvent water in contributing to the rough energy landscape of proteins. That is, the roughness does not emerge entirely from the protein; the making and breaking of hydrogen bonds between the protein and the solvent are estimated to increase the roughness of the protein landscape by about 4 kJ mol−1 , with marked eﬀects on the overall timescale of protein motions [13].

11.3 Binding and Selectivity In a number of cases, particularly where ligand–receptor binding is fast, it appears that certain features of the internal motion of one or both partners have evolved to be rapid enough to avoid kinetic bottlenecks. The enzyme acetylcholinesterase represents one such case. Acetylcholinesterase is found in cholinergic synapses, including neuromuscular junctions. It functions to clear the neurotransmitter acetylcholine following excitation of the postsynaptic nerve or muscle. As such, it has been under tremendous evolutionary pressure to operate at the maximum possible speed; e.g., the correspondingly fast reﬂexes aided our ancestors in escaping from predators. In the crystallographic structures of forms of the enzyme from two diﬀerent species, a gorge or channel extending approximately 2 nm from the surface of the enzyme to the active site is apparent [3, 28]. This is the most likely route for binding the substrate acetylcholine. But in both structures, a constriction exists midway down the channel that, if static, would preclude passage of substrate. Despite this, the enzyme binds substrate at or near the diﬀusion-controlled limit. It has been known for some time that if such obstacles can be removed frequently enough by the ﬂuctuations in an enzyme or other receptor, the obstacles will not slow the overall rate of binding [18]; this is termed the fast gating kinetic regime. Molecular dynamics simulations suggest that this is the situation for acetylcholinesterase [25, 34]. Fluctuations in the enzyme open the channel every few picoseconds, which is often enough to allow capture of the substrate before it can diﬀuse away over times on the order of a few hundred picoseconds. A recent analysis by Zhou [37] presents the most complete current theory of such “gated” diﬀusional binding processes, and suggests that a similar picture describes the classic example of myoglobin. Another group of gated enzymes comprises those that have a peptide loop that opens and closes over the active site. Wade et al. [32] suggest that somewhat slower but still rapid gating (times on the order of 1 ns) allows one such enzyme, triosephosphate isomerase, to operate in the diﬀusion-controlled regime. It must be noted that the molecular dynamics simulations of acetylcholinesterase mentioned above are far too short to sample transitions over barriers that separate many conformational substates of the protein. But, for acetylcholinesterase similar behavior is observed for the two subunits of the

214

J.A. McCammon

homodimeric enzyme that was simulated [34]. Because only small displacements in the wall of the channel are required to open the gate, it may be that a relatively simple “elastic” picture is suﬃcient here. For myoglobin, where ligand binding is thought to involve more complex motions of the protein, the gate dynamics may still be suﬃciently rapid at 300 K to allow for “simple” binding. What are the possible functional implications of such gating motions? Ensuring the maximum possible speed of binding is clearly one function. For example, it is necessary to create a special environment around a substrate for enzymatic catalysis, but evolutionary pressure has forced the creation of this environment to happen very rapidly for certain enzymes. Another function may well be the contribution of gating to the selectivity of binding [39]. For ligands that are only slightly larger than the natural ones, the gate may not open frequently enough to allow unhindered binding, so that the larger ligands are less likely to be bound before diﬀusing away. The overall rate of binding can decrease very rapidly with the increasing size of the ligand, and this will be reﬂected in the probability of binding one ligand compared to another in the nonequilibrium regime typical of living systems [39]. As biophysical studies move above the molecular level to consider supramolecular and cellular scale processes, similar issues are certain to arise. In fact, the physiologically important form of acetylcholinesterase in many synapses comprises closely held tetramers, attached to collagen-like stalks, which are in turn attached to the postsynaptic membrane. X-ray crystallography has suggested that a number of arrangements of the monomers is possible in these tetrameric clusters, including structures in which one monomer may occlude the active site of a neighbor. Recent simulation studies by Gorfe et al. show, however, that the relative diﬀusional motions of the acetylcholinesterase monomers is fast enough to reduce the kinetic penalties associated with such steric hindrance; in other words, the kinetics is in the “fast gating” regime [9]. Although the above discussion focused on the binding of small molecules to biopolymers, similar issues arise in connection with the binding of biopolymers to one another. In particular, rapid motion (times of a few nanoseconds) of surface loops of proteins may facilitate the assembly of chaperonins [16] and allow the binding of multiple receptors in the case of certain ﬁbronectin domains [4]. Recent studies have shown that conformational ﬂuctuations of proteins can be important in structure-based drug discovery as in the discovery of an unexpected “cryptic” binding site in the HIV integrase enzyme during the course of molecular dynamics studies (Fig. 11.1) [24]. This helped to pave the way for the discovery of the ﬁrst in a new class of antiviral agents for HIV/AIDS, the compound Isentress (raltegravir), which was licensed by the U.S. Food and Drug Administration in October 2007. A recent review of work in this area has been published by Amaro et al. [2].

11 Computational Studies of Protein Dynamics

215

Fig. 11.1. Two predicted binding conformations of an HIV-1 integrase inhibitor to a molecular dynamics (MD) snapshot of the protein. The green conformation is similar to that in the crystal structure and the magenta is in a secondary predicted binding trench that opened during an MD simulation of the protein [24]

Of importance in the present context, the binding of drugs to ﬂuctuating binding sites in target molecules can be kinetically gated by the detailed dynamics of those sites. This has been shown to be the case in the binding of a number of clinically useful inhibitors to the HIV protease enzyme [5]. Simulations of the HIV protease enzyme revealed opening and closing of peptide loops or “ﬂaps” that lie over the active site. Analysis of these using gated binding theory [39] showed that the predicted order of rate constants for drugs of diﬀerent sizes agreed with the experimental results. An emerging frontier in biophysics is the characterization of the eﬀects of the crowded cellular environment on molecular processes. For the case of HIV protease, Brownian dynamics simulations using coarse-grained models of the polypeptide have shown that crowding can have a substantial eﬀect on the frequency of opening and closing of the enzyme’s active site [20]. Because only small displacements are required to open the gates in some of the systems mentioned above, biopolymer motion on short timescales (picoseconds to nanoseconds) can inﬂuence function. In other cases, larger displacements and longer timescales are important, as discussed in later sections.

216

J.A. McCammon

11.4 Concerted Binding and Release In the case of biopolymers or assemblies of biopolymers that bind more than one ligand, it appears that the binding of one ligand sometimes drives the release or relocation of another ligand. One example is the enzyme dihydrofolate reductase from E. coli, in which the binding of the cofactor nicotinamide adenine dinucleotide phosphate (NADPH) leads to structural changes that tend to expel the cofactor tetrahydrofolate (THF) from a diﬀerent site, as part of the cyclic activity of the enzyme [23]. In some ATP synthases, the protondriven rotation of an asymmetric axle centered in an enzymatic cluster causes changes in the conformations of these enzymes, which in turn enable substrate binding, and drive catalysis and product release, all at diﬀerent sites in the cluster; an excellent discussion of coordinated events in molecular biophysics has been presented recently by Zhou [38]. The actual dynamics of the transitions involved remains to be fully determined, but is undoubtedly complex. Nevertheless, remarkable videos of the concerted motions in this system have been obtained in the laboratory of Masasuke Yoshida [30]. To have useful rates of turnover (time of about 100 ms for ATP synthase), there must be upper bounds on the roughness of the energy landscape.

11.5 Molecular Clocks The preceding discussion has considered processes that are fast, or at least closely correlated in time. Other functional processes in biopolymers may require delay times, which in some cases may imply lower bounds on the roughness of the energy landscape. Slow kinetics is very important in signal transduction. A well-known case is that of the so-called G proteins, which typically exchange the nucleotide GDP for GTP to become activated and so able to activate downstream partners [10]. The G proteins return to their inactivated GDP-bound state by slow hydrolysis of GTP; in other words, the G proteins are intrinsically “bad” enzymes. The inactivation of the G proteins can be greatly speeded up by their interaction with “GTPase activating proteins.” Enzymes that bind two or more substrates or cofactors that interact in the active site may in some cases require that one of these species be held for some time in a particular conformation. This has been suggested to be the case in lactate dehydrogenase, where the cofactor NADH may have to retain the conformation found in its binary complex with the enzyme during the binding and reaction of substrate [31]. The nicotinamide ring is sterically hindered from rotating during the reaction, which occurs on the millisecond timescale, and this is thought to contribute to the stereospeciﬁcity of the reaction. Necessary delay times may also occur in biopolymer conformational changes. This appears to occur, for example, in certain proteins that eﬀect

11 Computational Studies of Protein Dynamics

217

the fusion of viral and host membranes. In the case of the inﬂuenza protein hemagglutinin HA2, post-translational cleavage is thought to leave the protein in a long-lived metastable conformation, which is induced to change only in response to a reduction in pH within the host cell [27]. In another context, it has been suggested on the basis of molecular dynamics simulations that the relaxation of the bacterial photosynthetic reaction center is slow on the timescale of the initial electron transfer steps following photoexcitation, and that this slow relaxation leads to a smaller reorganization energy and faster electron transfer than would be obtained in the case of fast relaxation [21]. The delay of product release is crucial for the time-dependent organization of the cell cytoskeleton. Actin ﬁlaments are dynamic polymers whose assembly and disassembly in the cytoplasm drives cell shape changes, cell locomotion, and chemotactic migration. The ATP hydrolysis that accompanies actin polymerization and the subsequent release of the cleaved phosphate destabilizes the ﬁlaments, and, therefore, must be slow compared to their elongation [22]. The results of molecular dynamics simulations suggest that the phosphate is stabilized by a tightly bound divalent cation and by a salt bridge formed with His73 [36]. Consistent with this model, certain His73 mutants exhibit rapid depolymerization or decreased stability [26]. Actin’s phosphate release appears to act as a clock, altering in a time-dependent manner the mechanical properties of the ﬁlament and its propensity to depolymerize. Acknowledgments Work in the author’s laboratory is supported in part by the National Institutes of Health, the National Science Foundation, the Howard Hughes Medical Institute, the NSF Center for Theoretical Biological Physics, the NIH National Biomedical Computation Resource, the NSF Supercomputer Centers, and Accelrys.

References 1. S.A. Adcock, J.A. McCammon, Chem. Revs. 106, 1589 (2006) 2. R.E. Amaro, R. Baron, J.A. McCammon, J. Comput. Aid. Mol. Des. 22, 693 (2008) 3. Y. Bourne, P. Taylor, P. Marchot, Cell 83, 502 (1995) 4. P.A. Carr, H.P. Erickson, A.G. Palmer, Structure 5, 949 (1997) 5. C.E. Chang, T. Shen, J. Trylska, V. Tozzini, J.A. McCammon, Biophys. J. 90, 3880 (2006) 6. V.P. Denisov, J. Peters, H.D. Horlein, B. Halle, Nat. Struct. Biol. 3, 505 (1996) 7. R.M. Dickson, A.B. Cubitt, R.Y. Tsien, W.E. Moerner, Nature 388, 355 (1997) 8. H. Frauenfelder, S.G. Sligar, P.G. Wolynes, Science 254, 1598 (1991) 9. A.A. Gorfe, C.E. Chang, I. Ivanov, J.A. McCammon, Biophys. J. 94, 1144 (2008) 10. A.A. Gorfe, B.J. Grant, J.A. McCammon, Structure 16, 885 (2008) 11. D. Hamelberg, J. Mongan, J.A. McCammon, J. Chem. Phys. 120, 11919 (2004)

218

J.A. McCammon

12. D. Hamelberg, T. Shen, J.A. McCammon, J. Chem. Phys. 122, 241103 (2005) 13. D. Hamelberg, T. Shen, J.A. McCammon, J. Chem. Phys. 125, 094905 (2006) 14. Y.W. Jia, A. Sytnik, L.Q. Li, S. Vladimirov, B.S. Cooperman, R.M. Hochstrasser, Proc. Natl. Acad. Sci. USA 94, 7932 (1997) 15. H. Kojima, E. Muto, H. Higuchi, T. Yanagida, Biophys. J. 73, 2012 (1997) 16. S.J. Landry, N.K. Steede, K. Maskos, Biochemistry 36, 10975 (1997) 17. P.R.L. Markwick, G. Bouvignies, M. Blackledge, J. Am. Chem. Soc. 129, 4724 (2007) 18. J.A. McCammon, S.H. Northrup, Nature 293, 316 (1981) 19. W. Min, B.P. English, G. Luo, B.J. Cherayil, S.C. Kou, X.S. Xie, Acc. Chem. Res. 38, 923 (2005) 20. D.D.L. Minh, C.E. Chang, J. Trylska, V. Tozzini, J.A. McCammon, J. Am. Chem. Soc. 128, 6006 (2006) 21. W.W. Parson, Z.T. Chu, A. Warshel, Biophys. J. 74, 182 (1998) 22. T.D. Pollard, I. Goldberg, W.H. Schwarz, J. Biol. Chem. 267, 20339 (1992) 23. M.R. Sawaya, J. Kraut, Biochemistry 36, 586 (1997) 24. J. Schames, R.H. Henchman, J.S. Siegel, C.A. Sotriﬀer, H. Ni, J.A. McCammon, J. Med. Chem. 47, 1879 (2004) 25. T. Shen, K. Tai, R.H. Henchman, J.A. McCammon, Acc. Chem. Res. 35, 332 (2002) 26. L.R. Solomon, P.A. Rubenstein, J. Biol. Chem. 262, 11382 (1987) 27. D.A. Steinhauer, J. Martin, Y.P. Lin, S.A. Wharton, M.B.A. Oldstone, J.J. Skehel, D.C. Wiley, Proc. Natl. Acad. Sci. USA 93, 12873 (1996) 28. J.L. Sussman, M. Harel, F. Frolow, C. Oefner, A. Goldman, L. Toker, I. Silman, Science 253, 872 (1991) 29. D. Thorn Leeson, D.A. Wiersma, K. Fritsch, J. Friedrich, J. Phys. Chem. B 101, 6331 (1997) 30. H. Ueno, T. Suzuki, K. Kinosita Jr., M. Yoshida, Proc. Natl. Acad. Sci. USA 102, 1333 (2005) 31. J. van Beek, R. Callender, M.R. Gunner, Biophys. J. 72, 619 (1997) 32. R.C. Wade, B.A. Luty, E. Demchuk, J.D. Madura, M.E. Davis, J.M. Briggs, J.A. McCammon, Nat. Struct. Biol. 1, 65 (1994) 33. S. Wennmalm, L. Edman, R. Rigler, Proc. Natl. Acad. Sci. USA 94, 10641 (1997) 34. S.T. Wlodek, T.W. Clark, L.R. Scott, J.A. McCammon, J. Am. Chem. Soc. 119, 9513 (1997) 35. P.G. Wolynes, Q. Rev. Biophys. 38, 405 (2005) 36. W. Wriggers, K. Schulten, Proteins 35, 262 (1999) 37. H.X. Zhou, J. Chem. Phys. 108, 8146 (1998) 38. H.X. Zhou, Phys. Biol. 2, R1 (2005) 39. H.X. Zhou, S.T. Wlodek, J.A. McCammon, Proc. Natl. Acad. Sci. USA 95, 9280 (1998)

12 Biological Functions of Trehalose as a Substitute for Water M. Sakurai

Abstract. A disaccharide, α, α-trehalose, acts as a substitute for water in biological systems. Such a function comes from the unique hydration and solid-state properties of this sugar, which ultimately originate in the presence of the α, α-1,1-glycosidic linkage. A recent study on the anhydrobiosis of Polypedilum vanderplanki, an insect that survives desiccation, brought about a signiﬁcant advance in our understanding of the functional mechanism of trehalose in vivo.

12.1 Introduction Water is the most abundant molecule in cells, accounting for approximately 70% of the total weight of a cell, plays a crucial role in stabilizing the higher order structures of proteins, membranes and DNA, and is a medium of various biological reactions. However, some organisms can survive adverse environments such as drought and low temperature through various physiological and biochemical adaptations. An ultimate strategy against desiccation stress is anhydrobiosis, or “life without water” [1, 2], which is the state of an organism severely dehydrated but capable of revival after rehydration. Anhydrobiosis is found across diverse biological kingdoms, including plants, animals, mushrooms, nematodes, yeasts, fungi, brine shrimp and insects [1–3]. These anhydrobiotes commonly contain high concentrations of disaccharides, particularly α, α-trehalose (hereafter trehalose, Fig. 12.1) [1]. For example, when an African chironomid, Polypedilum vanderplanki, was dehydrated slowly, it converted as much as 20% of its dry weight into this molecule [3]. Trehalose is able to stabilize biological structures in a dehydrated form, and to make them intact and functional as soon as the hydration and temperature conditions return to normal. Thus, it behaves as a chemical chaperone in desiccation stress. Additionally, trehalose acts as a protectant against other environmental stresses such as freezing [4,5], osmotic shock [6], oxidation [7–9],

220

M. Sakurai

Fig. 12.1. Molecular structure of α, α-trehalose (α-D-glucopyranosyl α-D-glucopyranoside)

among others. Our goal is to understand why trehalose behaves as a chemical chaperone, and why it is superior to other saccharides as a stress protectant. To address the ﬁrst problem, it is necessary to obtain detailed information about the physicochemical property of trehalose in the dehydrated state, especially about its solid-state phase transition and vitriﬁcation behaviors. On the other hand, as is well-known, the water surrounding solute molecules like sugar is structurally and dynamically distinct from bulk water. It is inferred that the solute-induced changes of the water structure result in a signiﬁcant modiﬁcation of the hydration shell near biological molecules such as protein and membranes and thereby inﬂuence their stability. For trehalose in particular, its strong protection ability against water stresses may result from the strong perturbation eﬀect on the surrounding water. Therefore, to address the second problem, it is necessary to elucidate the features of trehalose from the viewpoint of its hydration property. In this review, we shall ﬁrst focus our attention on the hydration and solidstate properties of trehalose, and then extract the characteristic features of trehalose. Based on these ﬁndings, we shall discuss the possible mechanisms by which trehalose acts as a chemical chaperone and subsequently outline our recent study reporting the mechanism by which P. vanderplanki survives an extremely dehydrated state. Furthermore, regarding the peculiar hydration property of this sugar, we shall brieﬂy describe the antioxidant function of trehalose and its inhibitory eﬀect on protein aggregation. Finally, we shall describe a perspective on the application of trehalose to long-term storage of biological materials.

12 Biological Functions of Trehalose as a Substitute for Water

221

12.2 Hydration Property of Trehalose 12.2.1 Property of the Aqueous Solution of Trehalose Here we compare the thermodynamic parameters of trehalose, maltose and sucrose because they have the same chemical formula (C12 H22 O11 ) and mass (molecular weight 342.3), but diﬀerent structures which could be responsible for their diﬀerent hydration properties. The anomaly of hydration of trehalose is understood from the following observation [10]. Namely, the amount of water used for the preparation of 1.5 M trehalose solution is smaller than the amount used for the preparation of other sugar solutions. In a 1.5 M solution, trehalose itself occupies 37.5% of the volume of the solution. However, in a 1.5 M solution, sucrose occupies 13% and maltose occupies 14%. These data suggest that trehalose has a larger hydrated volume than the other sugars. This hypothesis can be demonstrated from various thermodynamic parameters as shown in Table 12.1. The intrinsic viscosity [η] is attributed to an overall hydrodynamic volume of the solute. The values of [η] for the above disaccharides are close to each other but that for trehalose is slightly larger [11]. Partial molar volume V20 is the sum of the intrinsic volume Vint of the solute and the volume contribution Vsolute−solvent due to solute–solvent interactions V20 = Vint + Vsolute−solvent and is therefore informative of the character of solute–solvent interactions. The above disaccharides might have similar molecular volumes because of the same mass and formula and thus the diﬀerence of their V20 values should reﬂect that of the Vsolute−solvent term. The V20 value of trehalose is smaller than those of maltose and sucrose [12], which is indicative of a more extensive solute–solvent interaction in the aqueous solution of trehalose. The

Table 12.1. Comparison of various hydration properties of disaccharides Sugar

Trehalose Maltose Sucrose Lactose a

[η] (cm3 g−1 )a

V20 (cm3 mol−1 )b

104 K20 (cm3 mol−1 bar−1 )c

0 Cp,2 (J K−1 mol−1 )b

2.58 2.55 2.45 2.5

207.61 210.07 211.92 208.96

−30.2 −23.7 −18.5 −31.1

655 622 648 657

NDHNd

τch /τc0d

48.3 23.8 36.8 –

7.08 4.66 6.81 –

Cited from [11]. Cited from [12]. c Cited from [13]. d Cited from [15]. In the evaluation of τch /τc0 , we used the value of nh obtained from DSC measurements (see Table 12.2). b

222

M. Sakurai

Table 12.2. Comparison of hydration numbers as determined by various techniques Sugar Trehalose Maltose Sucrose

Viscosity measurementsa

Ultrasound measurementsb

QENSc

DSCd

8.0 7.5 6.8

15.3 14.5 13.9

9.0 8.4 7.5

8.0 6.5 6.3

a

Cited from [11]. Cited from [12]. c Quasielastic neutron scattering measurements. Cited from [14]. d Diﬀerential scanning calorimetry measurements. Cited from [15]. b

partial molar compressibility K20 , corresponding to the second derivative of free energy with respect to pressure, is a more sensitive parameter that directly reﬂects solute–solvent interactions, since the intrinsic volume can be regarded as incompressible, which would be true for small molecules under 0 ordinary pressures. The value of isentropic partial molar compressibility Ks,2 assumes a more negative value when the water in the hydration shell becomes denser and less compressible than bulk water; in other words, when the hydration shell forms more extensive or strong hydrogen bonding. As expected, 0 than maltose and sucrose [13]. trehalose has a larger negative value of Ks,2 These observations are also supported by the fact that trehalose has a larger 0 [12], which becomes more positive value of partial molar heat capacity Cp,2 positive when extensive or strong hydrogen bonding interaction or hydrophobic hydration occurs between a solute and its surrounding water molecules. Table 12.2 summarizes the data of hydration number nh obtained from diﬀerent experimental techniques [11, 13–15]. In accord with the picture based on the above thermodynamic parameters, the hydration number nh of trehalose is larger than maltose and sucrose. According to the result of recent terahertz absorption spectroscopy, the dynamical hydration shell of trehalose extends from the surface to 6.5 ± 0.9 ˚ A [16]. In order to characterize the hydration phenomena in more detail, it is worthwhile to obtain information on the dynamics of water molecules involved in the hydration shell. One of the useful techniques for such a purpose is 17 O-NMR spectroscopy. In the so-called two-state model, 17 O nuclei in the aqueous solution are assumed to be distributed between the following two motional states: the water in the hydration shell and the bulk water. Under this assumption, the analysis of concentration-dependent changes of the spin– lattice relaxation time of 17 O nucleus gives the following important parameter known as the dynamic hydration number [17]: " # nDHN = nh K τch /τc0 − 1 − 1 , where nh is the hydration number, K is a constant related to the quadrupole coupling constant of the 17 O nucleus, and τch and τc0 are the rotational

12 Biological Functions of Trehalose as a Substitute for Water

223

correlation times of hydration water and pure, i.e., bulk water, respectively. We reported the data of nDHN which revealed a larger retardation of the water dynamics near trehalose relative to several gluco-oligosaccharides [15]. As shown in Table 12.1, nDHN lies in the order trehalose > sucrose > maltose. In addition, and more importantly, the magnitude of τhc /τc0 , a direct measure of retardation of the water dynamics in the hydration shell, is larger for trehalose than for the other gluco-oligosaccharides studied. Taken together, trehalose has a characteristic hydration property in terms of not only its large hydration number but also the remarkably lowered dynamics of its hydration water. Branca et al. investigated the aqueous solutions of trehalose, maltose and sucrose using Raman spectroscopy and comparatively analyzed the relative spectral contribution from the O–H vibration in the tetrabonded H2 O molecules and from that in a distorted bond [18]. Of particular interest is that trehalose exerts a superior, destroying eﬀect – relative to the others – on the tetrahedral hydrogen bond network of pure water with an increase in sugar concentration. Similar results have also been obtained from inelastic light scattering measurements [19]. What emerges from these data is that the water structure formed on the sugar surface is incompatible with that of the tetrahedral hydrogen bond network of pure water. In this regard, trehalose is a good water structure breaker. Finally, it should be noted that the thermodynamic parameters of trehalose are not anomalous compared with those of lactose (Table 12.1). This means that the peculiarity of trehalose in hydration is not necessarily deduced from the macroscopic properties of the solution alone. 12.2.2 Atomic-Level Picture of Hydration of Trehalose The information from the above experimental data is limited to water dynamics and structure averaged over an inhomogeneous sugar surface in various conformational states, and does not provide atomic-level detail about the hydration diﬀerence depending on the stereochemistry of sugars. Computer simulation is useful to address this issue. French’s group made much eﬀort to elucidate the conformational property of trehalose. Their molecular mechanics and quantum chemical calculations indicated that in vacuo trehalose has only a single energy minimum around the glycosidic bond [20, 21]: the minimum is located at the glycosidic dihedral angles of (φ, ϕ) = (−60◦ , −60◦ ), corresponding to the gauche conformation. This is true for the sugar in an aqueous solution as shown in Fig. 12.2a which shows the population density map for the dihedral angle distributions obtained from an MD simulation. This unique conformation is similar to a clam shell (Fig. 12.1). Such conformational rigidity of trehalose comes from the α, α-1,1 type of glycosidic linkage, which is unique to this sugar among naturally occurring gluco-disaccharides. Indeed, neotrehalose, which has an α, β-1,1 conﬁguration, has more than two stable conformations around the glycosidic linkage (Fig. 12.2b) [20]. Similarly, other types of glycosidic linkage, including (1–4) and (1–6) bonds, among others,

224

M. Sakurai

Fig. 12.2. Population density map for the dihedral angle distributions obtained from MD simulations for trehalose (a) and neotrehalose (b) in aqueous solution

allow for multiple conformers. Therefore, the less ﬂexible α, α-1, 1-glycosidic linkage may be an important clue responsible for the biological functions of trehalose. To date, the MD simulation for aqueous trehalose has been reported by several groups [22–29]. Our early MD study indicated that trehalose can hydrogen-bond with the surrounding water more extensively than maltose, leading to more restrained translational diﬀusion of water molecules around trehalose [22]. A recent remarkable increase in computer performance allows for more rapid and accurate MD calculations for various sugars in solution. More recently, Choi et al. performed systematic computational work for a series of disaccharides to obtain an atomic-level insight into the unique biochemical role of trehalose over other glycosidically linked sugars [29]. In that study, 13 diﬀerent homodisaccharides with diﬀerent glycosidic linkages were examined. Analyses of the hydration number and the radial distribution function of solvent water molecules showed that a highly anisotropic hydration shell is formed around this sugar in aqueous solution As shown in Fig. 12.3, the concave side of the clam shell is fully hydrated, while there are pockets having no ﬁrst hydration shell on the convex side In addition, they evaluated the number of long-lived hydrogen bonds deﬁned as having a lifetime longer than 20 ps. As a result, trehalose was shown to have an average of 2.8 of long-lived hydrogen bonds with water, which is a much larger number than the average number of hydrogen bonds for the other 12 sugars. The stable hydrogen-bond network was thought to be derived from the formation of long-lived water bridges at the expense of decreasing the dynamics of the water molecules. This dynamic reduction of water by trehalose was also conﬁrmed from the data for the translational diﬀusion coeﬃcients. These results are consistent with our 17 O NMR results as described above [15]. According

12 Biological Functions of Trehalose as a Substitute for Water

225

Fig. 12.3. Distribution of water molecules around trehalose. Cloud-like regions represent iso-probability surface of water oxygen atoms

to Choi et al., trehalose is a “dynamic reducer” for solvent water molecules, which comes from its anisotropic hydration and conformational rigidity [29]. Taken together with ﬁndings explained in Sects. 12.2.1 and 12.2.2, trehalose is a water structure maker in the sense that it forms a highly anisotropic and unmobilized hydration shell around itself. However, this simultaneously means that the tetrahedral hydrogen bond network in water is highly perturbed by the addition of trehalose. In this sense, trehalose is a water structure breaker as well. Such a dual character is the key to understanding the various biological roles of this sugar. Finally, it should again be stressed that such a peculiar property of trehalose originates from the presence of its α, α-1,1-linkage.

12.3 Solid-State Property of Trehalose 12.3.1 Polymorphism In order to elucidate the mechanism by which trehalose enables biological organisms to survive desiccation stress, information about the solid-state property of this sugar is indispensable. Trehalose is crystallized from its aqueous solution as a dihydrate. The two crystalline water molecules are easily activated on heating. Our FTIR study indicated that their bending vibration band undergoes a steep shift from 1,680 to 1,640 cm−1 at around 70◦ C [30], which implies that they convert from an ice-like structure to a liquid-like one before melting (90◦ C) of the crystal. Furthermore, this was thermodynamically supported by our study on the low temperature heat capacity of the dihydrate [31]. Due to such a labile nature of the crystal water, solid-state trehalose exhibits intriguing polymorphism. So far three diﬀerent crystal forms including the dihydrate have been identiﬁed. The dihydrate, usually referred to as

226

M. Sakurai

Th (or form I), has a rhombic crystalline form [32,33]. One anhydrous form, referred to as Tβ (or form III), is a monoclinic form [34–36]. Another anhydrous form, referred to as Tα (or form II), was ﬁrst identiﬁed by Sussich et al. [37] and its property has been extensively studied by diﬀerential scanning calorimetry (DSC), powder X-ray diﬀractometry [36,38–40] and FTIR [35,41,42]. However it was not until more recently that we succeeded in the X-ray analysis in which its crystalline structure was revealed [43]. It is accepted that the dehydration behavior of Th depends on the heating rate [38, 44, 45], the presence or absence of nitrogen gas ﬂow [40] and the particle size [46,47]. On the other hand, little is known about the eﬀect of vapor pressure (humidity) on the interconversions among the phases, including crystalline states and amorphous ones, despite the fact that vapor pressure is one of the key thermodynamic parameters that inﬂuences the property of the crystal water. These situations have led to considerable puzzling and scattering with respect to the interpretation of the dehydration behavior of Th. In order to address this problem, we investigated the de- and rehydration behavior of Th under humidity-controlled atmospheres through simultaneous measurements of X-ray and DSC, and those of thermogravimetry and differential thermal analysis (DTA) [48]. It was revealed that anhydrous forms resulting from Th dehydration strongly depend on the surrounding humid atmospheres, and the resulting anhydrous forms under diﬀerent conditions of humidity require diﬀerent partial vapor pressures of water for their rehydration back to Th. Figure 12.4 shows the resulting pathways that link diﬀerent solid forms of trehalose. In dry atmospheres, Tα is formed at 105◦ C on dehydrating of Th. It is highly hygroscopic and can be readily rehydrated back to Th when exposed to even low humid atmospheres, consistent with our previous results from FTIR measurements [35, 42]. In highly humid atmospheres, on the other hand, dehydration of Th undergoes a direct transformation (i.e. solid–solid conversion) into a stable anhydrous crystal Tβ at as high as 90◦ C, although a higher temperature (≈170◦ C) is needed for the formation of this

Fig. 12.4. Phase and state transitions of trehalose

12 Biological Functions of Trehalose as a Substitute for Water

227

anhydrous crystal under dry conditions. Tβ is less hygroscopic, and a high partial vapor pressure of water is necessary for its rehydration back to Th. In intermediate humid atmospheres, the dehydration of Th leads to the formation of an unidentiﬁed state Tε, whose crystallinity is higher under the more humid atmospheres. In addition to the eﬀect of humidity, we recently investigated the eﬀect of atmospheric pressure on the phase transition of Th using an in-house DTA apparatus and obtained invaluable information for conﬁrmative assignment of the endothermic peaks due to melting or dehydration of Th [49]. Recently, by using positron annihilation lifetime spectroscopy, Kilburn et al. observed that in the dihydrated Th, water is organized as a conﬁned one-dimensional ﬂuid in channels of ﬁxed diameter that allow activated diﬀusion of water in and out of the crystallites [5]. They present direct real-time evidence of water molecules unloading reversibly from these channels, thereby acting as both a sink and a source of water in low moisture systems. They postulated that this behavior may provide the overall stability required to keep organisms viable through dehydration conditions. The empty and water-ﬁlled channels may correspond to the crystal structures of the anhydrous forms, Tα and Th, respectively [50]. Therefore, among the interconversion processes shown in Fig. 12.4, the formation of Tα and its reversible conversion to Th may be particularly important for a better understanding of the protective action of trehalose. To obtain more detailed insight into the biological role of Tα, we recently revealed its crystal structure [43]. The features of Tα are summarized as follows. The trehalose molecule in Tα has an approximate C2 symmetry as does that in Th and Tβ. The molecular arrangement in Tα was very similar to that in Th and there are hydrogen bonds preserved in both. One of the most important ﬁndings is that there are two diﬀerent holes, hole-1 and hole-2, along one crystal axis. Hole-1 is constructed by trehalose molecules with a screw diad at its center, while hole-2 has a smaller diameter and is without a symmetry operator (Fig. 12.5). Due to the screw axis at the center of hole-1, hollows are present at the side of the hole with diameters roughly equal to that of hole-1. Hole-1 and side pockets followed by hollows correspond to the positions of two water molecules of the dihydrate. Therefore, hole-1 is considered to be a one-dimensional water channel with side pockets. Additionally, molecular and crystal energy calculations demonstrated that the intermolecular interactions between trehalose molecules in Tα were weaker than those in Tβ, which accounts for more rapid water uptake into the Tα crystal. 12.3.2 Glassy State of Trehalose Table 12.3 lists the glass transition temperatures for all of the naturally occurring gluco-disaccharides, i.e., disaccharides composed of two glucose units, and for sucrose. For trehalose, the value of 115 ± 2◦ C is currently accepted as the exact Tg of anhydrous trehalose [51–55], although various Tg values have been

228

M. Sakurai

Fig. 12.5. Crystal structures of (a) Tα and (b) Th along a-axis. Trehalose molecules are drawn by a spaceﬁlling model with a partial wireframe model. There are two diﬀerent holes in Tα: hole 1 and hole 2. Diameter of each circle is 2.1 ˚ A. In Th, these holes are occupied by two crystal water molecules Table 12.3. Glass transition temperatures Tg and the activation energies ΔErelax of enthalpy relaxation of dry amorphous disaccharidesa Sugar Trehalose Neotrehalose Kojibiose Sophorosee Nigerose Laminaribiose Maltose Isomaltose Cellobiose Gentibiose Sucrose

T g (◦ C)

ΔErelax (kJ mol−1 )

116 (113) 105 118.3 88.6 81.1 106.7 84.5 (90) 89.5 100.2 94.4 (68)

401.0 (360.8) 223.4 273.1 283.3 270.5 314.1 292.4 (245.4) 279.9 307.4 284.1 (212.2)

a

The data in parentheses were cited from [55]. The other data were cited from [57].

reported so far from 73◦ C [56] to 116.9◦ C [52]. The Tg of trehalose is highest among the gluco-disaccharides, although the value is not special, at least not anomalous. In addition, trehalose has another noteworthy glass-forming property in favor of its function as a biological protective agent for long-term storage in dry states. Kawai et al. reported the activation energy of the enthalpy relaxation, ΔErelax for trehalose, maltose and sucrose [55]. ΔErelax is thought to be the activation energy of the translational diﬀusion of molecules forming the glass of interest, being a direct measure of the chemical and physical stability of the vitriﬁed matrix. According to their results, the ΔErelax

12 Biological Functions of Trehalose as a Substitute for Water

229

of trehalose is larger than maltose and sucrose by >150 kJ mol−1 . Recently, we extended a similar study to all the naturally occurring gluco-disaccharides and indicated that trehalose has the highest Tg and the largest ΔErelax value (Table 12.3) [57]. These results indicate that trehalose is more easily vitriﬁed than other well-known disaccharides and its glassy state is more stable than that of others. Generally, water is a good plasticizer for glassy matrices: with an increase in water content, the glass transition temperature is lowered, which is an unfavorable phenomenon for preserving biomaterials in the dry state. Aldous et al. focused on the ability of a given sugar to form crystalline hydrates from the anhydrous amorphous state. It was found that trehalose can be crystallized as hydrous forms from the amorphous state, leading to a decrease in the residual water content of the remaining amorphous matrix [58]. As a result, the glass transition temperature Tg becomes higher, or at least its Tg depression caused by plasticization through water uptake is more or less avoidable [51,58]. In a similar way, the coexistence of Tα crystals as a sink of water is useful for reducing the risk of plasticization. Indeed, Nagase et al. reported that if a mixture of Tα and amorphous trehalose is exposed to moisture, water is absorbed more rapidly by the transformation from Tα to Th than by water absorption to amorphous trehalose [36].

12.4 Biological Roles of Trehalose 12.4.1 Possible Mechanisms of Anhydrobiosis It has been widely accepted that trehalose acts as a stabilizer that protects biomolecules against water stresses such as desiccation, freezing and osmotic pressure, and so on [1, 2, 4–9]. Among them, the functional mechanism for desiccation stress has been extensively investigated [1, 59, 60] and three main mechanisms have been proposed, so far [60]. The vitriﬁcation hypothesis suggests that the mobility of cellular components caged by sugar glasses is severely restricted and that they can thus escape from destruction [59, 60]. The water replacement hypothesis suggests that sugars can replace water molecules by forming hydrogen bonds with polar residues of lipid and/or protein molecules, thereby stabilizing their structures in the absence of water [59,60]. The water entrapment hypothesis suggests that sugars concentrate water near the surfaces of membrane and protein, thus preserving them from destruction [61–63]. Currently, it is thought that these three mechanisms are not mutually exclusive [59, 60]. For instance, vitriﬁcation may occur simultaneously with direct interactions between the sugar and the polar residues of biomolecules. As shown in Table 12.3, dry trehalose vitriﬁes at a higher temperature than do other disaccharides and the resultant glassy matrix is highly stable in the sense that enthalpy relaxation occurs with more diﬃculty than in other

230

M. Sakurai

disaccharides. Thus, trehalose is one of the sugars by which the vitriﬁcation mechanism would work more eﬃciently. As shown in Table 12.2, trehalose has a larger hydration number, thereby being able to serve a larger number of hydrogen bonding sites to a biomolecule in place of water. Thus, trehalose is also one of the sugars appropriate for the water replacement mechanism. Its high hydration ability (a larger hydration number and a larger retardation of the water dynamics) is of course, a great advantage to the water entrapment mechanism as well. In the past decades, the above three mechanisms have been demonstrated by various in vitro experiments [59, 60] and computer simulations [62–69]. Among them, a recent model study by Albertorio et al. should be noted in the sense that it pointed to the importance of α, α-(1-1) linkage of trehalose for preserving the membrane structure [70]. They found that disaccharide molecules containing an α, α-(1-1) linkage, compared with other disaccharides, are eﬀective at retaining the bilayer structure in the absence of water. They inferred that the speciﬁc arrangement of the hydroxyl groups in α, α-trehalose may optimize the hydrogen-bonding arrangement for water replacement, because the somewhat less protective behavior was aﬀorded by α, α-galactotrehalose, which only diﬀers from the structure of α, α-trehalose by the epimerization of the two 4-hydroxyl groups to the axial position from the equatorial position. The vitriﬁcation mechanism has been demonstrated well for anhydrobiotic plants, although in these cases vitriﬁed sugar is not trehalose but sucrose probably mixed with proteins [60]. Our previous studies using lyophilized yeast cells provided results that could be reasonably interpreted by the water replacement mechanism [71] or the water entrapment mechanism [72]. However, not enough direct evidence has accumulated for these mechanisms to work well in vivo. In order to obtain rigorous evidence for the functional mechanism of trehalose in vivo, we recently performed a study using the larvae of the sleeping chironomid, Polypedilum vanderplanki, as described below [73]. 12.4.2 Strategy for Desiccation Tolerance in the Sleeping Chironomid P. vanderplanki is the most complex and largest multicellular animal capable of anhydrobiosis [74,75]. The larvae dwell in temporary rock pools in semiarid regions in Africa. The small and shallow pools occasionally dry up, so that the larvae become severely desiccated, but are able to recover after rehydration when the next rain arrives. P. vanderplanki can repeat the process of dehydration/rehydration several times as long as they remain in the larval stage. According to one report, larvae of P. vanderplanki can recover from desiccation of up to 17 years [76]. Watanabe et al. succeeded in inducing P. vanderplanki larvae to enter anhydrobiosis under laboratory conditions and found that high levels of trehalose (about 20% of the dry body mass) are synthesized in the dehydrated larvae [3, 77].

12 Biological Functions of Trehalose as a Substitute for Water

231

We focused our attention on seeking evidence for the vitriﬁcation and water replacement mechanisms in P. vanderplanki [73]. For this purpose, two kinds of dehydrated larvae with very diﬀerent trehalose contents were prepared by regulating the dehydration rate. That is, larvae accumulating a large amount of trehalose (36 μg per individual) were obtained by slow dehydration over 72 h, while those with comparatively little trehalose (2 μg per individual) were obtained by quick dehydration within several hours. No apparent diﬀerence was found between the contents of total protein, triacylglycerol and water content (≈3 wt. % per dry individual) in both of these preparations. Then, the trehalose distribution in the larvae body was visualized by use of FTIR imaging spectroscopy. Trehalose is known to exhibit a unique vibration band at 992 cm−1 [35], which is assigned to its α, α-1,1 linkage. Indeed a clear shoulder peak was observed at this position for a slowly dehydrated larva, whereas the corresponding peak was not detected for a quickly dehydrated one. The intensity distribution of this peak for the slowly dehydrated larvae is shown in Fig. 12.6, where the peak intensity at 992 cm−1 is normalized with respect to that of amide II band. This clearly indicates that trehalose is almost uniformly distributed through the larval body, at least at this level of resolution. The physical state of trehalose accumulated in the larvae body was examined using DSC. The resulting thermogram for the slowly dehydrated larvae exhibited a clear baseline shift in a step-wise manner (Fig. 12.7a), indicating the occurrence of a glass transition. The onset, middle and end temperatures

Fig. 12.6. Optical (a) and FTIR (b) imaging data for a slowly dehydrated larva. Mapped are intensities of the characteristic 992-cm−1 peak, which were normalized by being divided by that of the amide II band

232

M. Sakurai

Fig. 12.7. Glass in anhydrobiotic larvae and their recovery after heat treatments. (a) DSC thermograms for slowly and quickly dehydrated larvae. (b) Dependence of the recovery rate after rehydration on exposure to high temperatures in slowly (ﬁlled symbols) and quickly (open symbols) dehydrated larvae. Circles and triangles show recovery after exposure to high temperature for 5 min and 1 h, respectively. Data from [73]

were 62◦ C, 65◦ C, and 71◦ C, respectively, meaning that the sample was in the glassy state at a temperature of <62◦ C and in the rubber state at >71◦ C. In contrast, neither baseline shift nor peak appearance was observed for the quickly dehydrated sample. We then compared the viability of both the slowly and quickly dehydrated larvae to determine the recovery rate after rehydration following exposure to diﬀerent temperatures for 5 min or 1 h (Fig. 12.7b). For the slowly dehydrated sample, a high recovery rate of 60–90% was observed up to 50◦ C exposure, and longer exposures tended to cause a slightly lower survival rate. Exposure to higher temperatures gradually decreased the recovery rate, and no survival occurred beyond ca. 100◦ C. The quickly dehydrated larvae never recovered, regardless of the temperatures employed. Interestingly, the glass transition curve for the slow sample correlates closely

12 Biological Functions of Trehalose as a Substitute for Water

233

Fig. 12.8. (a) FTIR spectra in the region of 1,280–1,200 cm−1 , which shows asymmetric stretching vibration of P=O atomic groups. (b) Temperature dependence of FTIR bands in the region 2,849–2,856 cm−1 , which shows symmetric CH2 stretching vibration. Data from [73]

with the corresponding recovery rate. This result clearly indicates that trehalose acts as a protectant only when it is in the glassy state, in other words, vitriﬁcation of trehalose is a prerequisite to keep the anhydrobiotic state stable in P. vanderplanki. Evidence for the water replacement mechanism was obtained from measurements of the P=O asymmetric stretching vibration appearing at 1,280–1,200 cm−1 , which sensitively reﬂects the hydrogen bonding interactions of the head groups of phospholipids with other molecules. As shown in Fig. 12.8a, the peak position of this band was slightly lower in slowly than in quickly dehydrated larvae. This suggests that in the former sample hydrogen bonds are formed between the polar head groups of phospholipids and probably trehalose, although compounds other than phospholipids, such as DNA and RNA, could also contribute to such a peak shift. Indeed, our previous report indicated that the P=O stretching vibration of dry DNA is perturbed

234

M. Sakurai

by the addition of trehalose [78]. To further assess whether the above shift is related to the physical state change of phospholipids, we focused on the symmetric CH2 stretching vibration of fatty acid chains. As a result, it was found that this peak shifted from 2,850 to 2,854 cm−1 with increasing temperature (Fig. 12.8b) and interestingly the gel-to-liquid crystalline temperature, deﬁned as the midpoint of the transition curve, is signiﬁcantly lowered in the slowly dehydrated larvae. It should be noted that cellular membranes in this sample are in the liquid crystalline state at room temperature in spite of the absence of water. Thus unfavorable phase transition is avoided during the subsequent rehydration process, a key factor that allows cellular membranes to successfully recover from desiccation. Combining the observations for both vibration peaks, it is reasonable to interpret that trehalose perturbs the head groups of phospholipids through direct hydrogen bonding interactions, which allows the membrane to be kept in the liquid crystalline state even in a highly dehydrated environment. Taken together, our results indicate that the vitriﬁcation and water replacement mechanisms are both involved in anhydrobiosis in P. vanderplanki, and that trehalose is a major player in such an intriguing biological phenomenon. The successful anhydrobiotic larva is just like a substance assembled mainly with biological organic molecules, with the spatial arrangements required for normal physiology largely maintained by immobilization in the biological glasses. The larvae of P. vanderplanki can reversibly convert from the living state to such an amorphous solid state by replacing the normal intracellular medium with trehalose to enter anhydrobiosis, and vice versa. Finally, the possibility should be pointed out that some factors other than trehalose may be involved when the vitreous state is formed in the body of P. vanderplanki. This is partly due to the fact that the glass transition temperatures of the slowly dehydrated larvae shifted less with an increase in water content than expected from theoretical values calculated for a binary mixture of pure trehalose and water. For plant anhydrobiotes, it has been reported that proteins as well as soluble sugars may be vitriﬁed in the cytoplasmic glass [79, 80]. Our results do not exclude such a possibility. Late embryogenesis abundant (LEA) proteins are known to occur in various anhydrobiotic organisms [81] and have been suggested to reinforce biological glasses [82]. Recently, LEA-like proteins were also found in P. vanderplanki [83]. Therefore, further studies are required for a complete understanding of the desiccation tolerance in P. vanderplanki. 12.4.3 Other Biological Roles of Trehalose Protein stabilization by trehalose in aqueous solution is also an example of the biological functions of trehalose [84–86]. According to reports by Timasheﬀ and coworkers, preferential hydration should occur when the interaction of a cosolvent with water is stronger than its interaction with a protein [85]. In other words, a water structure maker like trehalose is a good cosolvent causing

12 Biological Functions of Trehalose as a Substitute for Water

235

preferential hydration. The preferential hydration eﬀect should lead to a loss in the entropy of solvation upon protein denaturation, rendering the unfolded state even more unstable, and resulting in a shift of the equilibrium in favor of the native state. In principle, the preferential hydration model should be applied not only to proteins but also to other biological components such as membranes. Indeed, our early 31 P NMR study indicated that trehalose stabilizes hydrated unilamellar liposome by increasing the packing density among the constitutive phospholipid molecules, leading to inhibition of the fusion of the liposome [87]. According to the preferential hydration model, trehalose is expected to promote the aggregation of unfolded proteins because the aggregated state should have a smaller protein-solvent interface than their isolatedly dissolved state. However, in contradiction to this expectation, trehalose has been shown to suppress the aggregation of proteins associated with Huntington’s and Alzheimer’s diseases. Tanaka et al. reported that trehalose could be used to inhibit the aggregation of polyglutamine in vivo in a rat model for Huntington’s disease [88], while an in vitro study by Liu et al. indicated that this sugar effectively inhibits the aggregation and neurotoxicity of β-amyloid (Aβ) 40 and 42 [89]. A similar inhibition eﬀect on protein aggregation was also observed in yeast cells during heat shock [90]. Although the underlying mechanism for such phenomena is far from being fully understood at present, a key to solve this issue may exist in another peculiar property of trehalose, that is, a speciﬁc interaction with hydrophobic compounds as described below. In addition to the protective function against water stresses, there is growing evidence that trehalose is capable of protecting biological molecules against oxidative damage [7–9]. In particular, we have extensively studied the antioxidant function on unsaturated fatty acid (UFA) from both the experimental and theoretical viewpoints [91, 92]. The autoxidation of UFA is initialized by the reaction in which activated oxygen or free radicals attract hydrogen atoms from the allyl group of UFA as follows: –CH2 –CH = CH–CH2 –CH = CH–CH2 – → –CH2 –CH = CH– • CH–CH = CH–CH2 – .

We indicated that trehalose suppresses this reaction, while other disaccharides, such as sucrose, maltose and neotrehalose, showed a negligible eﬀect [8]. According to detailed NMR analyses, trehalose interacts speciﬁcally with UFA possessing a cis type C=C double bond(s), such as LA (18:2, cis), with a 1:1 stoichiometry. A theoretical model for the trehalose-cis C=C bond complex is shown in Fig. 12.9, where the OH–6 of trehalose interacts with the π-orbital at the mid position of the double bond and simultaneously the OH−3 forms the C–H · · · O type of hydrogen bond at a terminal of the double bond. The complex formation energy (stabilization energy) was estimated to be 5.52 and 7.78 kcal mol−1 from quantum chemical calculations at the HF/6-31G** and B3LYP/6-31G** levels of theory, respectively. On complex

236

M. Sakurai

Fig. 12.9. The optimized structures of trehalose / 2-butene complex obtained from the HF/6-31G** calculation

formation, the activation energy of the above hydrogen abstraction reaction was shown to be greatly increased: in the isolated state, 14.8 kcal mol−1 (UHF/6-31G**) and 9.2 kcal mol−1 (UB3LYP/6-31G**), while in the complexed state, 37.8 kcal mol−1 (UHF) and 38.6 kcal mol−1 (UB3LYP). These results indicate that the OH · · · π and CH · · · O multiple hydrogen bonds with trehalose signiﬁcantly modify the electronic structure of the diene moiety, leading to a kinetic depression of the hydrogen abstraction reaction. The above ﬁnding for the complexation of trehalose with a cis double bond leads us to expect that this sugar can also interact with benzene and its derivatives because their double bonds are cis-like. In fact, our preliminary study using NMR and molecular dynamics simulation indicated that in aqueous solution a benzene molecule binds to trehalose in such a manner that dehydration penalty could be minimized [93]. Concretely, it binds to the convex side of trehalose, where there are less hydrated regions as can be seen from Fig. 12.3. This peculiar interaction may account for the suppressive eﬀect of this sugar on peptide or protein aggregation as described above. Namely, there is a possibility that trehalose binds to aromatic side chains that are exposed to the aqueous phase upon unfolding and consequently act as a spacer to inhibit the direct contact between unfolded protein molecules. This interesting issue is now under investigation in our laboratory.

12.5 Conclusion The physicochemical uniqueness of trehalose originates from the presence of an α, α-1,1-linkage, which brings about the rigid conformation with a clam, shell-like shape. Because of the conformational rigidity, trehalose has a

12 Biological Functions of Trehalose as a Substitute for Water

237

unique hydration characteristic: a spatially anisotropic but dynamically stable hydration shell. This in turn brings about several characteristic thermodynamic properties for its aqueous solution. Trehalose has the dual character as a good water structure maker and breaker. Additionally, trehalose has characteristic solid-state properties. In particular, the glassy property with not only high T g but also high ΔErelax makes trehalose a superior desiccation protectant than other saccharides. Furthermore, the actual glassy matrix of trehalose may be partially prevented from devitriﬁcation through the coexistence with anhydrous Tα crystal, which acts as a sink of water. As unveiled for P. vanderplanki, anhydrobiotes successfully maintain their shelf lives by utilizing well these characteristic features of trehalose, especially through the vitriﬁcation and water replacement mechanisms. The view described here would bring about a signiﬁcant advance in understanding the limitation and further possibility of this sugar in various applications and in undertaking the molecular design of more eﬀective protectants in the future. With progress in the understanding of the fundamental aspects of trehalose, much eﬀort has been made to confer desiccation tolerance on nonanhydrobiotic organisms by introducing trehalose into target cells. Although recently the human platelet was successfully freeze-dried with trehalose [94], a major obstacle to application is that usually cellular membranes are impermeable to trehalose. Several trials introducing trehalose into target cells have been made and have brought a certain degree of success. For example, introduction of bacterial trehalose biosynthetic enzyme genes into human ﬁbroblasts increases intracellular trehalose concentration and results in enhanced desiccation tolerance [95]. In another approach, engineered switchable pores or extracellular nucleotide-gated channels (engineered-hemolysin or P2X7 purinergic receptor pore) were created in cellular membranes to allow trehalose uptake [96]. Most recently, Kikawada et al. isolated a novel trehalose transporter (TRET1) from P. vanderplanki [97]. Transport activity of TRET1 was stereochemically speciﬁc for trehalose and the direction of transport is reversible depending on the concentration gradient of trehalose. By combining the knowledge obtained from the study of P. vanderplanki and these new techniques, it is expected that long-term storage becomes possible for a variety of cells, tissues and even organs in a dry state. Acknowledgments This work was supported in part by the Program for Promotion of Basic Research Activities for Innovative Biosciences (PROBRAIN) and also in part by Grants-in-Aid for Scientiﬁc Research on Priority Areas (no. 16041212 and 18031012) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.

238

M. Sakurai

References 1. J.H. Crowe, F.A. Hoekstra, L. Crowe, Annu. Rev. Physiol. 54, 579 (1992) 2. J.S. Clegg, Comp. Biochem. Physiol. B 128, 613 (2001) 3. M. Watanabe, T. Kikawada, N. Minagawa, F. Yukuhiro, T. Okuda, J. Exp. Biol. 205, 2799 (2002) 4. R.A. Ring, H.V. Danks, Cryo Lett. 19, 275 (1998) 5. P.O. Montiel, Cryo Lett. 21, 83 (2000) 6. A.V. Laere, FEMS Microbiol. Rev. 63, 201 (1988) 7. N. Benaroudj, D.H. Lee, L.A. Goldberg, J. Biol. Chem. 276, 24261 (2001) 8. K. Oku, M. Kurose, M. Kubota, S. Fukuda, M. Kurimoto, Y. Tujisaka, M. Sakurai, Nippon Shokuhin Kagaku Kougaku Kaishi (in Japanese) 50, 133 (2003) 9. R.S. Herderio, M.D. Pereira, A.D. Panek, E.C.A. Eleutherio, Biochem. Biophys. Acta 1760, 340 (2006) 10. M. Sola-Penna, J.R. Meyer-Fernandes, Arch. Biochem. Biophys. 360, 10 (1998) 11. M-O. Portmann, G. Birch, J. Sci. Food Agric. 69, 275 (1995) 12. P.K. Banipal, T.S. Banipal, B.S. Lark, J.C. Ahluwalia, J. Chem. Soc. Faraday Trans. 93, 81 (1997) 13. S.A. Galema, H. Høiland, J. Phys. Chem. 95, 5321 (1991) 14. S. Magazu, V. Villiari, P. Migliardo, G. Maisano, M.T.F. Telling, J. Phys. Chem. B 105, 1851 (2001) 15. H. Kawai, M. Sakurai, Y. Inoue, R. Chˆ ujˆ o, S. Kobayashi, Cryobiology 29, 599 (1992) 16. M. Heyden, E. Br¨ undermann, U. Heugen, G. Niehues, D.M. Leitner, M. Havenith, J. Am. Chem. Soc. 130, 5773 (2008) 17. H. Uedaira, M. Ikura, H. Uedaira, Bull. Chem. Soc. Jpn. 62, 1 (1989) 18. C. Branca, S. Magaz´ u, G. Maisano, P. Migliardo, J. Chem. Phys. 111, 281 (1999) 19. C. Branca, S. Magaz´ u, G. Maisano, S.M. Bennington, B. F˚ ak, J. Phys. Chem. 107, 1444 (2003) 20. M.K. Dowd, P.J. Reilly, A.D. French, J. Comp. Chem. 13, 102 (1992) 21. A.D. French, G.P. Johnson, A-M. Keltere, M.K. Dowd, C.J. Cramer, J. Phys. Chem. A 106, 4988 (2002) 22. M. Sakurai, M. Murata, Y. Inoue, A. Hino, S. Kobayashi, Bull. Chem. Soc. Jpn. 70, 847 (1997) 23. Q. Liu, R.K. Schmit, B. Teo, P.A. Karplus, J.W. Brady, J. Am. Chem. Soc. 119, 7851 (1997) 24. G. Bonanno, R. Noto, S.L. Fornili, J. Chem. Soc. Faraday Trans. 94, 2755 (1998) 25. P.B. Conrad, J.J. de Pablo, J. Phys. Chem. A 103, 4049 (1999) 26. S.B. Engelsen, S. P´erez, J. Phys. Chem. B 104, 9301 (2000) 27. A. Lerbret, P. Bordat, F. Aﬀouard, Y. Guinet, A. H´edoux, L. Paccou, D. Pr´evost, M. Descamps, Carbohydr. Res. 340, 881 (2005) 28. A. Lerbret, P. Bordat, F. Aﬀouard, M. Descamps, F. Migliardo, J. Phys. Chem. B. 109, 11046 (2005) 29. Y. Choi, K.W. Cho, K. Jeong, S. Jung, Carbohydr. Res. 341, 1020 (2006) 30. K. Akao, Y. Okubo, T. Ikeda, Y. Inoue, M. Sakurai, Chem. Lett. 8, 759 (1998) 31. T. Furuki, R. Abe, H. Kawaji, T. Atake, M. Sakurai, J. Chem. Thermodyn. 38, 1612 (2006)

12 Biological Functions of Trehalose as a Substitute for Water

239

32. G.M. brown, D.C. Rohrer, B. Berking, C.A. Beevers, R.G. Gould, R. Simpson, Acta Crystallogr. B 28, 3145 (1972) 33. T. Taga, M. Senma, K. Osaki, Acta Crystallogr. B 28, 3258 (1972) 34. G.A. Jeﬀrey, R. Nanni, Carbohydr. Res. 137, 21 (1985) 35. K. Akao, Y. Okubo, N. Asakawa, Y. Inoue, M. Sakurai, Carbohydr. Res. 334, 233 (2001) 36. H. Nagase, T. Endo, H. Ueda, M. Nakagaki, Carbohydr. Res. 337, 167 (2002) 37. F. Sussich, R. Urbani, A. Ces` aro, F. Princivalle, S. Br¨ uckner, Carbohydr. Lett. 2, 403 (1997) 38. F. Sussich, R.Urbani, F. Princivalle, A. Ces` aro, J. Am. Chem. Soc. 120, 7893(1998) 39. F. Sussich, C. Skopec, J. Brady, A. Ces` aro, Carbohydr. Res. 334, 165 (2001) 40. H. Nagase, T. Endo, H. Ueda, T. Nagai, STP Pharm. Sci. 13, 269 (2003) 41. A.M. Gil, P.S. Belton, V. Felix, Spectrochim. Acta 52, 1649 (1996) 42. K. Akao, Y. Okubo, Y. Inoue, M. Sakurai, Carbohydr. Res. 337, 1729 (2002) 43. H. Nagase, N. Ogawa, T. Endo, M. Shiro, H. Ueda, M. Sakurai, J. Phys. Chem. B 112, 9105 (2008) 44. F. Sussich, A. Cesaro, J. Therm. Anal. Calorim. 62, 757 (2000) 45. J.F. Willart, A. De Gusseme, S. Hemon, M. Descamps, F. Leveiller, A. Rameau, J. Phys. Chem. B 106, 3365 (2002) 46. L.S. Taylor, P. York, J. Pharm. Sci. 87, 347 (1998) 47. L.S. Taylor, A.C. Williams, P. York, Pharm. Res. 15, 1207 (1998) 48. T. Furuki, A. Kishi, M. Sakurai, Carbohydr. Res. 340, 429 (2005) 49. T. Furuki, R. Abe, H. Kawaji, T. Atake, M. Sakurai, J. Therm. Anal. Calorim. 91, 561–567 (2008) 50. D. Kilburn, S. Townrow, V. Meunier, R. Richardson, A. Alam, J. Ubbink, Nat. Mater. 5, 632 (2006) 51. L.M. Crowe, D.S. Reid, J.H. Crowe, Biophys. J. 71, 2087 (1996) 52. D.P. Milller, J.J. de Pablo, J. Phys. Chem. B 104, 8876 (2000) 53. T. Chen, A. Fowler, M. Toner, Cryobiology 40, 277 (2000) 54. R. Surama, A. Pyne, R. Suryanarayanan, Pharm. Res. 21, 867 (2004) 55. K. Kawai, T. Hagiwara, R. Takai, T. Suzuki, Pharm. Res. 22, 490 (2005) 56. J.L. Green, C.A. Angell, J. Phys. Chem. 93, 2880 (1989) 57. K. Oku, M. Kubota, S. Fukuda, M. Kurimoto, Y. Tujisaka, M. Sakurai, Cryobiol. Cryotechnol. 50, 97 (2004) 58. B.J. Aldous, A.D. Aﬀret, F. Franks, Cryo Lett. 16, 181 (1996) 59. J.H. Crowe, J.F. Carpenter, L.M. Crowe, Annu. Rev. Physiol. 60, 73 (1998) 60. J.H. Crowe, in Molecular Aspects of the Stress Response: Chaperones, Membranes and Networks, ed. by P. Csermely, L. V´ıgh (Landes Bioscience and Springer, New York, 2007), Chapter 13 61. P.S. Belton, A.H. Gil, Biopolymers 34, 957 (1994) 62. G. Cottone, G. Gicotti, L. Gordone, J. Cell. Phys. 117, 9862 (2002) 63. R.D. Lins, C.S. Pereira, P.H. H¨ unenberger, Proteins 55, 177 (2004) 64. A.K. Sum, R. Faller, J.J. de Pablo, Biophys. J. 85, 2830 (2003) 65. M.A. Villarreal, S.B. D´ıaz, E.A. Disalvo, G.G. Montich, Langmuir 20, 7844 (2004) 66. C.S. Pereira, R.D. Lins, I. Chandrasekhar, L.C.G. Freitas, P.H. H¨ unenberger, Biophys. J. 86, 2273 (2004) 67. A. Skibinsky, R.M. Venable, R.W. Pastor, Biophys. J. 89, 4111 (2005)

240

M. Sakurai

68. C.S. Pereira, P.P.H. H¨ unenberger, J. Phys. Chem. B 110, 15572 (2006) 69. L. Lerbret, F. Aﬀouard, P. Bordat, A. H´edoux, Y. Guinet, M. Descamps, Chem. Phys. 345, 267 (2008) 70. F. Albertorio, V.A. Chapa, X. Chen, A.J. Diaz, P.S. Cremer, J. Am. Chem. Soc. 129, 10567 (2007) 71. F. Sano, N. Asakawa, Y. Inoue, M. Sakurai, Cryobiology 39, 80 (1999) 72. M. Sakurai, H. Kawai, Y. Inoue, A. Hino, S. Kobayashi, Bull. Chem. Soc. Jpn. 68, 3621 (1995) 73. M. Sakurai, T. Furuki, K. Akao, D. Tanaka, Y. Nakahara, T. Kikawada, M. Watanabe, T. Okuda, Proc. Natl. Acad. Sci. USA 105, 5093 (2008) 74. H.E. Hinton, J. Insect Physiol. 5, 286 (1960) 75. H.E. Hinton, Nature 188, 336 (1960) 76. S. Adams, Antenna 8, 58 (1985) 77. M. Watanabe, M. Kikawada, T. Okuda, J. Exp. Biol. 206, 2281 (2003) 78. B. Zhu, T. Furuki, T. Okuda, M. Sakurai, J. Phys. Chem. B 111, 5542 (2007) 79. W.Q. Sun, A. Leopold, Comp. Biochem. Physiol. A 117, 327 (1997) 80. J. Buitink, O. Leprince, Cryobiology 48, 215 (2004) 81. A. Tunnacliﬀe, M.J. Wise, Naturwissenshafen 114, 741 (2007) 82. W.F. Wolkers, S. McCready, W. Brandt, G.G. Lindsey, F.A. Hoekstra, Biochim. Biophys. Acta 1544, 196 (2001) 83. T. Kikawada, Y Nakahara, Y. Kanamori, K. Iwata, M. Watanabe, B. McGee, A. Tunnacliﬀe, T. Okuda, Biochem. Biophys. Res. Commun. 348, 56 (2006) 84. T.-Y. Lin, S.N. Timasheﬀ, Protein Sci. 5, 372 (1996) 85. G. Xie, S.N. Timasheﬀ, Biophys. Chem. 64, 25 (1997) 86. J.K. Kaushik, R. Bhat, Proc. Natl. Acad. Sci. USA 278, 26458 (2003) 87. T. Nishiwaki, M. Sakurai, Y. Inoue, R. Chujo, S. Kobayashi, Chem. Lett. 19, 1841 (1990) 88. M. Tanaka, Y. Machida, S. Niu, T. Ikeda, N.R. Jana, H. Doi, M. Kurosawa, M. Nekooki, N. Nukina, Nat. Med. 10, 148 (2004) 89. R. Liu, H. Barkhordarian, S. Emadi, C.B. Park, M.R. Sierks, Neurobiol. Dis. 20, 74 (2005) 90. M.A. Singer, S. Lindquist, Mol. Cell 1, 639 (1998) 91. K. Oku, H. Watanabe, M. Kubota, S. Fukuda, M. Kurimoto, Y. Tsujisaka, M. Komori, Y. Inoue, M. Sakurai, J. Am. Chem. Soc. 125, 12739 (2003) 92. K. Oku, M. Kurose, M. Kubota, S. Fukuda, M. Kurimoto, Y. Tujisaka, A. Okabe, M. Sakurai, J. Phys. Chem. B 109, 3032 (2005) 93. A. Okabe, K. Oku, S. Fukuda, T. Furuki, M. Sakurai, Cryobiol. Cryotechnol. 53, 111 (2007) 94. G. Brumﬁel, Nature 428, 14 (2004) 95. N. Guo, I. Puhlev, D.R. Brown, J. Mansbridge, F. Levine, Nat. Biotechnol. 18, 168 (2000) 96. A. Eroglu, M.J. Russo, R. Bieganski, A. Fowler, S. Cheley, H. Bayley, M. Toner, Nat. Biotechnol. 18, 163 (2000) 97. T. Kikawada, A. Saito, Y. Kanamori, Y. Nakahara, K. Iwata, D. Tanaka, M. Watanabe, T. Okuda, Proc. Natl. Acad. Sci. USA 104, 11585 (2007)

13 Protein Misfolding Diseases and the Key Role Played by the Interactions of Polypeptides with Water C.M. Dobson

Abstract. The manner in which a newly synthesised chain of amino acids folds into the unique structure of a functional globular protein depends both on the intrinsic properties of the amino acid sequence and on multiple inﬂuences within the crowded aqueous milieu of the cell. But if proteins misfold, or fail to remain correctly folded, a common consequence is aggregation, a phenomenon that is involved in many highly debilitating and increasingly common medical disorders including Alzheimer’s disease and Type II diabetes. In this chapter we describe ﬁrst how the concerted application of a wide range of experimental and theoretical techniques under laboratory conditions has allowed the fundamental principles of protein misfolding and aggregation to be understood at an atomic level. Then we discuss approaches that are designed to explore how these principles apply within living systems. Of particular importance in the context of this volume is the emergence of the role of aggregation propensity, closely linked to the solubility of speciﬁc states of proteins in the aqueous environment of the cell, as one of the most fundamental properties that is encoded in the sequences of peptide and protein molecules.

13.1 Introduction One of the essential characteristics of a living system is the ability of its component molecular structures to self-assemble into their functional forms in a largely aqueous environment [1]. The folding of proteins into their compact three-dimensional structures is the most fundamental example of biological self-assembly; understanding this process therefore provides unique insight into the way in which evolutionary selection has inﬂuenced the properties of a molecular system for functional advantage [2]. The wide variety of highly speciﬁc structures that result from protein folding, and which serve to bring key functional groups into close proximity, has enabled living systems to develop astonishing diversity and selectivity in their underlying chemical processes. A key aspect of this process is the role played by water in the stability of the folded states of proteins in the cellular environment and in enabling the folding process to occur eﬃciently [3]. In addition, the evolutionary selection of

242

C.M. Dobson

the sequences of proteins ensures that they are able to retain solubility at the level required for the optimal functional eﬃciency of the organisms in which they are expressed [4]. Another important recent development in molecular biology is that we now know that the folding process does much more than simply generate biological activity, and that it is strongly coupled to many other biological processes including the traﬃcking of molecules to speciﬁc cellular locations and the regulation of cellular growth and diﬀerentiation. In addition, only correctly folded proteins have the ability to remain soluble in crowded biological environments and to interact selectively with their natural partners [2]. It is not surprising, therefore, that the failure of proteins to fold correctly, or to remain correctly folded, is the origin of a wide variety of pathological conditions [5]. In this chapter we explore the underlying nature and consequences of misfolding and its links with disease, with particular emphasis on the role of water. In order to achieve these objectives we show how it is possible to relate processes, such as solubility which can be studied in detail in the test tube, to their eﬀects in living systems through the use of model organisms such as the fruit ﬂy [6]. In this context we stress the remarkable correlations between speciﬁc physicochemical phenomena and biological phenomena ranging from locomotor abilities to lifespan.

13.2 The Importance of Normal and Aberrant Protein Folding in Biology The manner by which a polypeptide chain folds to a speciﬁc three-dimensional protein structure has not until recently been understood at anything approaching the atomic level. The folded structures of the native states of many proteins are, however, known, and are thought almost always to correspond to the structures that are most thermodynamically stable under physiological conditions [7]. The role of water in determining this stability is critical, and globular proteins have a close-packed hydrophobic core with polar and charged groups on the surface. Burial of the hydrophobic residues is a major driving force in folding, and the nature and distribution of surface groups is crucial for ensuring solubility and independence within the crowded molecular environment of the cell [8]. Despite the fact that the native state is energetically favoured, the total number of possible conformations of any polypeptide chain is so large that a systematic search for this required structure during folding from an ensemble of highly unstructured species would take an astronomical length of time. It is now clear, however, that the folding process does not involve a series of mandatory steps between well-deﬁned partially folded states, but rather a stochastic search of the many conformations accessible to a polypeptide chain [7, 9–11].

13 Protein Misfolding Diseases

243

Natural proteins are able to fold to speciﬁc structures because, on average, native-like interactions between residues are more stable than non-native ones. The former are therefore more persistent and the polypeptide chain is able to ﬁnd its lowest energy structure by a process of trial and error. Moreover, if the free energy surface or landscape has the right shape (see Fig. 13.1), only a minute fraction of all possible conformations is sampled by any given

Fig. 13.1. A highly schematic energy landscape for protein folding. This surface is derived from a computer simulation of the folding of a highly simpliﬁed model of a small protein. The surface serves to “funnel” the multitude of denatured conformations to the unique native structure. The critical region on a simple surface such as this one is the saddle point corresponding to the transition state, the barrier that all molecules must cross to be able to fold to the native state. Superimposed on this schematic surface is an ensemble of structures corresponding to the experimental transition state for the folding of a small protein; this ensemble was calculated by using computer simulations constrained by experimental data from mutational studies of the protein acylphosphatase [12]. The spheres represent the three “key residues” in the structure; when these residues have formed their native-like contacts, the overall topology of the native fold is established. The structure of the native state is shown at the bottom of the surface, while at the top are indicated schematically some contributors to the distribution of unfolded states that represent the starting point for folding. Also indicated are highly simpliﬁed trajectories for the folding of individual molecules. From [2]

244

C.M. Dobson

protein molecule during its transition from a random coil to a native structure [7, 9–11]. As the landscape, describing the free energies of the diﬀerent possible conformations of the protein in its cellular environment (aqueous for the cytosolic proteins that we largely discuss here, although non-polar for at least some regions of membrane proteins), is encoded by the amino acid sequence, natural selection has enabled proteins to evolve so that they are able to fold rapidly and eﬃciently. Such a description is often referred to as the new view of protein folding and illustrates how the application of ideas from chemical physics and statistical mechanics has provided a robust and universal conceptual basis for understanding this complex biological process in molecular detail [7, 9–11]. In a living system, proteins are synthesised on ribosomes from the genetic information encoded in the cellular DNA. The nature of the subsequent folding process for a given protein varies signiﬁcantly for diﬀerent types of protein, and ranges from co-translational folding, in which the nascent chain becomes at least partially structured prior to its release from the ribosome, to folding within organelles such as mitochondria where folding may occur only after traﬃcking and translocation through membranes [13–15]. But despite such variety, the fundamental principles of folding, discussed above, are undoubtedly universal. And as incompletely folded proteins must inevitably expose to the solvent at least some regions of structure that are buried in the native state, they are prone to inappropriate interactions with other molecules within the crowded environment of a cell [16]. Living systems have therefore evolved a range of strategies to prevent such behaviour [14], including the presence of proteins that catalyse potentially slow steps in attaining the correct fold, such as proline isomerisation and disulphide bond formation, many varieties of molecular chaperones that play a vital role in reducing misfolding and aggregation, as well as quality control mechanisms that play crucial roles in targeting irreversibly misfolded proteins for degradation [14, 17–19]. It is increasingly recognised, however, that the process of protein folding is much more than just a fascinating example of the ability of a biological system to self-assemble to generate a functional state. Biological phenomena as apparently diverse as the translocation of proteins across membranes, their traﬃcking to particular locations or secretion to the outside world, the speciﬁcity of the immune response and the regulation of cell growth and proliferation are directly dependent on folding and unfolding events [2]. Failure to fold correctly, or to remain correctly folded, will therefore give rise to the malfunctioning of living systems and hence to disease [20–22]. Some of these diseases (e.g., cystic ﬁbrosis [20] and some types of cancer [23]) result from the simple fact that if proteins do not fold correctly they will not be present in suﬃcient quantities to exercise their proper function; many such disorders, normally called loss of function diseases, are familial as the probability of misfolding is often greater in mutational variants than in the wild-type protein because of the likelihood of their decreased stability and reduced cooperativity. In other cases, proteins with a high propensity to misfold escape

13 Protein Misfolding Diseases

245

Fig. 13.2. Schematic representation of the possible mechanism of amyloid formation by a globular protein such as lysozyme. After synthesis on the ribosome, the protein folds in the endoplasmic reticulum (ER), aided by molecular chaperones that deter aggregation of incompletely folded species. The correctly folded protein is secreted from the cell and functions normally in its extracellular environment. Under some circumstances, the protein unfolds at least partially, and becomes prone to aggregation. This can result in the formation of ﬁbrils and other aggregates that can accumulate in tissue. Small oligomeric or pre-ﬁbrillar aggregates as well as highly organised ﬁbrils and plaques can give rise to pathological conditions in some disorders, notably the neurodegenerative diseases. N, I and U refer to native, partially unfolded (intermediate) and unfolded states of the protein, respectively. QC refers to the quality control mechanism that prevents incompletely folded proteins being secreted from the ER. From [24]

all the protective mechanisms and form intractable aggregates within cells or (more commonly) in extracellular space (Fig. 13.2). An increasing number of disorders (see Table 13.1), including Alzheimer’s and Parkinson’s diseases, the spongiform encephalopathies and type II diabetes, are known to be directly associated with the deposition of such aggregates in tissues including the brain, heart and spleen [5]. In the next section we shall look at the underlying molecular origins of the formation of these species, and of the crucial importance of the solubility of proteins in the environments in which they are located within living systems.

246

C.M. Dobson

Table 13.1. A selection of some of the major human diseases associated with misfolding and the formation of extracellular amyloid deposits or intracellular inclusions with amyloid like characteristics (selected from [5] in which a more comprehensive list is given) Disease

Aggregating protein or peptide

Length of protein or peptidea

Structure of protein or peptideb

Neurodegenerative diseases Alzheimer’s diseasec Spongiform encephalopathiesc,e

Amyloid β peptide Prion protein or fragments thereof

Parkinson’s diseasec Amyotrophic lateral sclerosisc Huntington’s diseasef

α-Synuclein Superoxide dismutase 1 Huntingtin with long polyQ stretches Mutants of transthyretin

Familial amyloidotic polyneuropathyf

40 or 42d 253

140 153

Natively unfolded Natively unfolded (1–120) and α-helical (121–230) Natively unfolded All-β, IG-like

3,144g

Largely natively unfolded

127

All-β, prealbumin-like

Non-neuropathic systemic amyloidoses AL amyloidosisc

AA amyloidosisc

Senile systemic amyloidosisc Hemodialysis-related amyloidosisc Finnish hereditary amyloidosisf Lysozyme amyloidosisf

Immunoglobulin light chains or fragments thereof Fragments of serum amyloid A protein Wild-type transthyretin β2-Microglobulin Fragments of gelsolin mutants Mutants of lysozyme

ca. 90d

All-β, IG-like

76–104d

All-α, unknown fold

127 99

All-β, prealbumin-like All-β, IG-like

71

Natively unfolded

130

α + β, lysozyme-fold

Non-neuropathic localised amyloidoses ApoAI amyloidosis

f

Type II diabetesc Medullary carcinoma of the thyroidc

Fragments of apolipoprotein AI Amylin Calcitonin

80–93d

Natively unfolded

37 32

Natively unfolded Natively unfolded (continued)

13 Protein Misfolding Diseases

247

Table 13.1. (Continued) Disease

Aggregating protein or peptide

Length of protein or peptidea

Structure of protein or peptideb

Hereditary cerebral haemorrhage with amyloidosisf Injection-localised amyloidosisc

Mutants of amyloid β peptide

40 or 42d

Natively unfolded

Insulin

21 + 30h

All-α, insulin-like

a

The data do not refer to the number of amino acid residues of the precursor proteins, but to the lengths of the processed polypeptide chains that deposit into aggregates. b This column reports the structural class and fold; both refer to the processed peptides or proteins that deposit into aggregates prior to aggregation and not to the precursor proteins. c Predominantly sporadic although in some of these diseases hereditary forms associated with speciﬁc mutations are well documented. d Fragments of various lengths are generated and reported in ex vivo ﬁbrils. e Five per cent of cases are infectious (iatrogenic). f Predominantly hereditary although in some of these diseases sporadic cases are documented. g Lengths refer to the normal sequences with non-pathogenic traits of polyQ. h Human insulin consists of two chains (A and B with 21 and 30 residues, respectively) covalently bonded by disulphide bridges.

13.3 Protein Aggregation and Amyloid Formation Each amyloid-associated disease involves predominantly the aggregation of a speciﬁc protein, although a range of other components including additional proteins and carbohydrates is incorporated into the deposits when they form in vivo [5]. In the case of neurodegenerative diseases, the quantities of aggregates involved can sometimes be so small as to be almost undetectable, whereas in some systemic diseases – such as that associated with lysozyme discussed below – literally kilograms of protein can be found in one or more organs [29]. The characteristics of the soluble forms of the 40 or so proteins involved in the well-deﬁned amyloid disorders are very varied – they range from intact globular proteins to largely unstructured peptide molecules – but the aggregated forms have many common characteristics [30]. Amyloid deposits all show speciﬁc optical behaviour (such as birefringence) on binding certain dye molecules such as Congo red. The ﬁbrillar structures typical of many of the aggregates have very similar morphologies (long, unbranched and often twisted structures a few nanometres in diameter) and a characteristic “cross-β” X-ray ﬁbre diﬀraction pattern. The latter reveals that the organised core structure is composed of β-sheets whose strands run perpendicular to the ﬁbril axis [30].

248

C.M. Dobson

The ability of polypeptide chains to form such structures turns out, however, not to be restricted to the relatively small numbers of proteins associated with recognised clinical disorders, and, indeed, we have suggested that it could be a generic feature of polypeptide chains [21,24]. Compelling evidence for the latter statement is that ﬁbrils can be formed in vitro by many peptides and proteins with no known disease association, including such well-known and highly studied molecules as myoglobin [31], and also by homopolymers such as polyalanine, polythreonine or polylysine [32]. The latter ﬁnding indicates that the ability to form the amyloid structure does not need to be encoded in the sequence of the protein; in essence it is inherent in the intrinsic character of polypeptide chains, akin to analogous properties of many synthetic polymers, and this ﬁnding is reinforced by recent computer simulations of a simple model of a small homopolymeric peptide that self-assembles into a cross-β structure under a wide range of conditions (Fig. 13.3) [33]. Of particular interest is the fact that a variety of diﬀerent mechanisms of assembly are observed in the simulations, ranging from the direct assembly of single β-sheets to a process in which the peptides coalesce into a disorganised oligomer within which structural reorganisation takes place to produce the cross-β structure; remarkably, the variety of assembly processes seen in an extended series of computer simulations has been observed experimentally in studies of a wide range of diﬀerent systems [5]. We have determined the atomic-level structure of a peptide molecule in amyloid ﬁbrils by solid-state NMR techniques, and the results show clearly the extended molecular conformation characteristic of β-strands and also the fact that the side chains are close-packed in remarkably speciﬁc orientations, at least within the central region of the structure [25]. Indeed, increasingly detailed models based on data from techniques such as X-ray ﬁbre diﬀraction, cryo-electron microscopy (EM) and solid-state NMR are now emerging [5]; one early example showing characteristic features that have been observed in general terms in a range of more recent studies of a variety of diﬀerent systems, representing variations on a common theme, is shown in Fig. 13.4 [26, 34]. An additional development is the ability to crystallise small peptides that show ﬁbrillar-like assemblies within three-dimensional crystals, enabling the nature of the interactions between speciﬁc residues in amyloid-like structures to be explored [27]. But it is clear that the increasing capability of solid-state NMR spectroscopy to determine detailed three-dimensional structures of ﬁbrillar structures [28] is the crucial step forward, and that it will soon lead to a knowledge of suﬃcient amyloid and amyloid-like structures to enable the determinants of their characteristic structural features to be understood in detail. In addition to deﬁning their molecular structures, it is of considerable interest to understand the physical properties of the ﬁbrils and the nature of the forces that lead to their stability. To this end, we have been studying a range of diﬀerent ﬁbrils by means of experimental approaches originally developed within the rapidly developing ﬁeld of nanotechnology, such as atomic force microscopy (AFM), in conjunction with computer simulation methods [35].

13 Protein Misfolding Diseases

249

Fig. 13.3. Schematic illustration of the “condensation-ordering” mechanism of aggregation. This mechanism is indicated by results from computer simulations of the aggregation of a 12-residue peptide composed of identical amino acids, modelled using a simple “tube” model to describe the peptide structure [33]. The characteristic cross-β structure of amyloid ﬁbrils is observed to emerge spontaneously, and can do so through a variety of apparently distinct processes that have been the focus of intense experimental and theoretical studies [5]. These diﬀerent processes appear as diﬀerent manifestations of a common underlying process and depend on the relative importance of hydrogen bonding and hydrophobic interactions. Highly hydrophobic polypeptide chains collapse ﬁrst into disordered and highly dynamic oligomers and then rearrange into ordered assemblies, while more hydrophilic peptides assemble directly into an array of β-strands. As well as allowing the various processes involved in aggregation to be identiﬁed, these simulations enable the nature of the nucleation process to be revealed and provide insight into the origin of the toxicity of the oligomeric aggregates that appear in the intermediate stages of the process. From [33]

Our ﬁndings reveal that amyloid ﬁbrils represent a well-deﬁned class of highly organised materials with similar physical properties that can be compared and contrasted on the nanometre scale with well-established types of more conventional materials [35]. Speciﬁcally, the core structure of the ﬁbrils is stabilised primarily by interactions, particularly hydrogen bonds, involving the polypeptide main chain (Fig. 13.5). As the main chain is common to all

250

C.M. Dobson

Fig. 13.4. Comparison of examples of native and amyloid structures of protein molecules. On the left are ribbon diagrams of the native structures of three small proteins: an SH3 domain (top), myoglobin (bottom) and acylphosphatase (middle). The native structures diﬀer in their topologies and contents of α-helices and β-sheets resulting from the dominance of side-chain interactions within their highly evolved sequences. On the right is a molecular model of an amyloid ﬁbril (image kindly provided by Helen Saibil, Birkbeck College, London, from work reported in [26]. The ﬁbril was produced from the SH3 domain whose native structure is shown on the left, and consists of four “protoﬁlaments” that twist around one another to form a hollow tube with a diameter of approximately 6 nm. The β-strands (ﬂat arrows) are oriented perpendicular to the ﬁbril axis and are linked together by hydrogen bonds involving main chain amide and carbonyl groups, many of which are intermolecular, to form a continuous structure in each protoﬁlament. The protoﬁlaments are held together by much weaker interactions involving primarily side-chain contacts. As the main chain is common to all polypeptides, the core protoﬁlament structures of ﬁbrils from diﬀerent sequences have common features, diﬀering only in detail as a result of diﬀerences in the non-dominant eﬀects of side-chain packing. The arrow indicates that when the native states of globular proteins are destabilised, they tend to convert into the generic amyloid structure, as described in the text. From [34]

polypeptides, this observation explains why ﬁbrils formed from polypeptides of very diﬀerent sequences have marked similarities, particularly in the ﬁbril core structure, although diﬀerences in detail exist as a result of the inﬂuence of the packing of the side chains [24, 30, 35]. In some cases, only a fraction of the residues of a given protein may be involved in this core structure, with the remainder of the chain associated in some other manner with the ﬁbrillar assembly; in other cases, almost the whole polypeptide chain appears to be involved. The generic amyloid structure, characteristic of the polymeric character of polypeptide chains, contrasts strongly with the highly individualistic globular structures of most natural proteins; in these latter structures the

13 Protein Misfolding Diseases

251

Fig. 13.5. Comparison of the mechanical properties among diﬀerent classes of materials. The plot shows the correlation between the bending rigidity of a given material as a function of its cross-sectional moment of inertia. A linear relationship within a speciﬁc type of material indicates that the forces stabilising the diﬀerently sized samples of that material are identical. The dark grey band in the diagram encompasses the various examples of amyloid ﬁbrils formed from diﬀerent types of peptide or protein investigated in this study. The close correlation of the rigidity and moment of inertia indicates similar interactions in each type of ﬁbril, and analysis shows that the dominant contribution to the interactions are the main-chain hydrogen bonds between the β-strands of the amyloid cross-β structure; further support for this conclusion comes from the fact that spider silk, the strength of which is also attributed to main-chain hydrogen bonding, correlates closely with amyloid ﬁbrils. The mid-grey band encompasses materials such as actin ﬁlaments that are held together by amphiphilic interactions characteristic of amino-acid side chains; the two examples of amyloid protoﬁbrils examined in this study fall within this range, suggesting that strong main-chain interactions are not fully formed at this stage of the assembly process. Further details are given in [35], from which this ﬁgure is taken

interactions associated with the highly complementary packing of the side chains appear to override the main chain preferences (Fig. 13.4) [24, 35]. Because the interactions stabilising the two alternative types of ordered protein structure, the globular and amyloid forms, are similar in nature their stabilities can be similar under some conditions. Even though the ability to aggregate to form amyloid ﬁbrils appears to be generic, the propensity to do so under given circumstances can vary dramatically between diﬀerent sequences [5]. It has proved possible to correlate the relative aggregation rates of a wide range of peptides and proteins with

252

C.M. Dobson

Fig. 13.6. Calculated vs. observed changes in aggregation rates upon mutation. The experimental data relate to mutations of short peptides or natively unfolded proteins including amylin, the Aβ-peptide and α-synuclein. The calculated values are determined from an equation involving the changes in just three variables – hydrophobicity, charge and secondary structure propensities – caused by the mutations. The plot shows, for both experimental and calculated data, ln (υmut /υwt ), i.e., the natural logarithm of the aggregation rate of the mutant υmut divided by that of the wild-type molecule υwt . From [36]

physicochemical features of the molecules such as charge, secondary structure propensities and hydrophobicity (Fig. 13.6) [36] and indeed to predict the regions of a polypeptide chain that have the highest propensity to self-assemble and which are likely to be found in the ﬁbril cores [37]. In a globular protein the polypeptide main chain and the hydrophobic side chains are largely buried within the folded structure. Only when they are exposed, for example when the protein is partially unfolded (e.g., at low pH or as the result of destabilising mutations) or fragmented (e.g., by proteolysis), will conversion into amyloid ﬁbrils be facile. Recent studies are exploring in much greater detail than before the nature and rate of establishment of the equilibrium between the solution and ﬁbrillar states of a protein, and in essence deﬁning both the kinetic behaviour and the solubility of the peptides and proteins involved [38, 39]. The propensities of folded proteins to aggregate will therefore depend on the accessibility of such aggregation-prone species, a conclusion that is clearly demonstrated by detailed studies of the amyloidogenic mutational variants of lysozyme, which we have found to decrease the stability and cooperativity of the native state (Fig. 13.4) [40–43]. Indeed, these experiments show that the eﬀect of the disease-associated mutations is to decrease the energy diﬀerence between the native state and the intermediates populated in the normal folding of the protein, such that the latter are accessible to a much greater extent in the variants than in the wild-type protein [40]. The large mass of evidence now accumulated from studies of lysozyme has provided detailed

13 Protein Misfolding Diseases

253

insight into many aspects of the likely origin of systemic amyloid disease; this topic has recently been reviewed and will not be discussed in this article [41]. Of particular interest, however, is the increasing recognition that ﬂuctuations of native-like species could be of critical importance in the aggregation of proteins to form amyloid structures under physiological conditions without the need for signiﬁcant perturbations of the environment in which the proteins normally function [44].

13.4 Molecular Evolution and the Control of Protein Misfolding It is apparent that biological systems have become robust not just by careful manipulation of the sequences of proteins but also by controlling, by means of molecular chaperones and degradation mechanisms, the particular conformational state adopted by a given polypeptide chain at a given time and under given conditions (Fig. 13.7). This process can be thought of as being analogous to, and just as fundamental and important as, the way that biology regulates and controls the various chemical transformations that take place in the cell by means of enzymes. And, just as the aberrant behaviour of enzymes can cause metabolic disease, the aberrant behaviour of the chaperone and other machinery regulating polypeptide conformations can contribute to misfolding and aggregation diseases [45, 46]. The ideas encapsulated in Fig. 13.7, therefore, serve as a physicochemical framework for understanding the fundamental events that underlie misfolding diseases. For example, many of the mutations associated with the familial forms of deposition diseases, as discussed earlier for lysozyme, increase the population of partially unfolded states, and hence increase the propensity to aggregate by decreasing the stability or cooperativity of the native structure [41,47,48]. Other familial diseases are associated with the accumulation of amyloid deposits whose primary components are fragments of native proteins; such fragments can be produced by aberrant processing or incomplete proteolysis, and are unable to fold into aggregation-resistant states. Other pathogenic mutations act by enhancing the propensities of such species to aggregate, for example, by increasing their hydrophobicity or decreasing their charge [36]. And, in the case of the prion disorders such as Kuru or Creutzfeldt–Jakob disease, it appears that ingestion of pre-aggregated states of an identical protein, e.g., by voluntary or involuntary cannibalism or by means of contaminated pharmaceuticals or surgical instruments, can increase dramatically the inherent rate of aggregation through seeding and breakage, and hence generate a mechanism for transmission [49, 50]. In some aggregation diseases, the large quantities of insoluble protein involved may physically disrupt speciﬁc organs and hence cause pathological behaviour [29]. But for neurodegenerative disorders, such as Alzheimer’s disease, the primary symptoms almost certainly result from toxicity associated

254

C.M. Dobson

Fig. 13.7. A uniﬁed view of some of the multiple types of structure that can be formed by polypeptide chains. An unstructured chain, for example newly synthesised on a ribosome, may fold to a native structure, perhaps via one or more partially folded intermediates. It can, however, experience other fates such as degradation or aggregation. An amyloid ﬁbril is just one form of aggregate, but it is unique in having a highly organised structure, as shown in Fig. 13.5. The populations and interconversions of the various states are determined by their relative thermodynamic and kinetic stabilities under any given conditions. In living systems, however, transitions between the diﬀerent states are highly regulated by control of the environment, and by the presence of molecular chaperones, proteolytic enzymes and other factors. Failure of such regulatory mechanisms is likely to be a major factor in the onset of misfolding diseases. From [2]

with aggregation and are therefore often described as gain of function diseases [51, 52]. The early pre-ﬁbrillar aggregates of proteins associated with such diseases have been shown to be highly damaging to cells; by contrast, the mature ﬁbrils appear relatively benign [52–54]. Moreover, we have recently

13 Protein Misfolding Diseases

255

found that similar aggregates of proteins that are not connected with any known diseases can be equally toxic to cells, both when added to cell culture medium [55] and also when microinjected into the brains of rats [56]. The generic nature of such aggregates and their eﬀects on cells has recently been supported by the remarkable ﬁnding that antibodies raised against early aggregates of Aβ cross-react with early aggregates of a range of diﬀerent peptides and proteins, and moreover inhibit their toxicity [57, 58]. It is possible that there are speciﬁc mechanisms for this toxicity, for example, as a result of annular species that resemble the toxins produced by bacteria that form pores in membranes and disrupt the ion balance in cells [59]. It is likely, however, that the relatively disorganised pre-ﬁbrillar aggregates are inherently toxic through a less speciﬁc mechanism, for example, as a result of the exposure of non-native hydrophobic surfaces stimulating aberrant interactions with cell membranes or other cellular components [60]. In contrast to the exquisitely designed surfaces of the correctly structured molecules within the crowded cellular environment, that have evolved to interact only with speciﬁc partners, the surfaces of any non-evolved polymeric aggregates that escape the various types of protective mechanisms, discussed below, are likely to interact inappropriately with many of the components of a biological system and hence will commonly cause malfunctions and potentially disease.

13.5 Impaired Misfolding Control and the Onset of Disease Under normal circumstances, molecular chaperones and other “housekeeping” mechanisms are remarkably eﬃcient in ensuring that such potentially toxic species are neutralised before they can do any damage [14, 60, 61]. Such neutralisation could result simply from the eﬃcient targeting of misfolded proteins for degradation, but it appears that molecular chaperones are also able to alter the partitioning between harmful and harmless forms of aggregates, as a result of changing the kinetic or thermodynamic stability of one or more of the multiple species accessible to a protein (Fig. 13.7) [62]. If the eﬃciency of such protective mechanisms becomes impaired, however, the probability of pathogenic behaviour must increase [45, 61]. Such a scenario would explain why most of the amyloid diseases are associated with old age, where there is likely to be an increased tendency for proteins to become misfolded or damaged, ultimately at least coupled with a decreased eﬃciency of the protective chaperone and unfolded protein responses [63]. It is ironic that through our success in increasing the life expectancy of the populations of the developed world we are now seeing the limitations of our proteins and of the regulatory mechanisms that control their behaviour [60, 64]. One of the characteristics of proteins that is implied in this explanation of misfolding diseases is that relatively small changes in their sequences as a result of mutation, or of their biological environment in old age, are, at least

256

C.M. Dobson

in some cases, enough to cause a shift from normal (soluble) to abnormal (aggregation) behaviour. This situation can be qualitatively rationalised by the argument that natural selection can only generate sequences that are good enough to allow the organism concerned to ﬂourish relative to its potential competitors; in this context, the behaviour of proteins in old age is unlikely to be of importance in such a selection process [24,64]. Dramatic evidence for this supposition has recently emerged from an analysis of the relationship between experimental aggregation rates of a set of human proteins and measurements of the level of gene expression that are likely to relate to the concentrations of the corresponding proteins in the organism itself [4]. This analysis [4] shows that the correlation coeﬃcient between the aggregation rates and expression levels of all the proteins for which both sets of data could be found, which includes proteins both associated and not associated with amyloid disease, is an astonishing 0.97 (Fig. 13.8). This very high degree of correlation is, however, exactly that predicted qualitatively by the reasoning given above concerning evolutionary selection. Speciﬁcally, it reﬂects the fact that a protein must be soluble enough to exist at the level that is optimal for the organism concerned, and this solubility is achieved by the selection during evolution of amino acid substitutions which reduce the propensity to aggregate. Most amino acid substitutions, however, increase the aggregation propensity of natural proteins [36]. So once evolutionary selection has achieved

Fig. 13.8. Correlation between expression levels and the measured aggregation rates for a set of human proteins. The aggregation rates represent all the data obtained from a comprehensive search of the amyloid aggregation literature, for studies carried out at pH values between 4.0 and 8.0. The expression levels are estimated from the cellular mRNA concentration and are taken from published databases. The standard deviations of the aggregation rates are reported only in four cases, as these values are generally not available or diﬃcult to extract from the literature. Data for two proteins not involved in any known medical conditions are included in the plot while the other points correspond to proteins that are associated with amyloid diseases. From [4]

13 Protein Misfolding Diseases

257

a suﬃciently low aggregation propensity to allow the optimal level of the protein concerned to be achieved, random mutagenesis will in general prevent the aggregation propensity decreasing further; this combination of eﬀects is likely to be the explanation for cytosolic proteins tending to fall very close to the line indicated in Fig. 13.8. This result reﬂects the critical role that the interaction of proteins with water plays in the evolution of biological organisms and in the balance between the normal and aberrant behaviour that is associated with the onset of misfolding diseases.

13.6 Probing Misfolding and Aggregation in Living Organisms The conclusions and ideas of the molecular basis of amyloid disease that have been discussed so far have been derived almost completely from experiments carried out in the test tube (in vitro) and in the computer (in silico). Despite the fact that there is strong circumstantial evidence to link them to events occurring in living systems (in vivo), including experiments with cells in culture, we wish to explore much more rigorously the way in which the myriad components of the intra- and extracellular environment aﬀect the quantitative relationship between physicochemical properties such as aggregation propensity and its consequences in a living organism. To this end we are using the fruit ﬂy (Drosophila meganister ) as a model organism to link the chemistry and physics of aggregation to its biological eﬀects in higher organisms [65]. The advantage of this particular system for our purposes is that the short lifespan (typically about 30 days) and low unit cost relative to, for example, transgenic rodent models permit us to carry out a very large number of experiments in a reasonable timeframe to obtain data that are statistically highly signiﬁcant. The approach we have taken is to exploit the existence of transgenic fruit ﬂies in which the 42-residue Aβ-peptide is expressed in the brain. Lines of ﬂies had previously been generated in which deposits of the peptide can be seen to develop with time [66]. In addition, the ﬂies develop locomotor defects, observed most easily in assays of their ability to climb up a glass surface, and have reduced lifespans. The deposits of the Aβ-peptide were found initially to occur within neurons and then to accumulate as extracellular deposits analogous to those seen in suﬀerers from Alzheimer’s disease as well as in transgenic mouse models designed to study this condition. It had also been found that ﬂies expressing the Aβ-peptide having the E22G (Arctic) mutation, which results in a very early onset form of Alzheimer’s disease in humans, have very much shorter lifespans than those expressing the wild-type peptide and show a much earlier appearance of peptide-containing deposits within brain tissue and of locomotor defects [66, 67]. The conceptual basis that underlies these experiments is encapsulated in Fig. 13.8 and the accompanying explanation, which indicates that at least

258

C.M. Dobson

many of our proteins are “on the edge” of aggregation as they can have evolved only to be as robust as is necessary to allow the living system in which they are present to compete successfully for survival [4]. If we were to make mutations in the Aβ-peptide that increase or decrease its propensity for aggregation, we predicted that they should, on the arguments made earlier, increase or decrease respectively the severity of neuronal damage in the transgenic ﬂy system. From our previous studies of aggregation in vitro, we can predict the changes in the intrinsic propensity to aggregate by using the algorithms based on physicochemical principles and derived from the experimental data (Fig. 13.6) [36, 37, 54]. We have used this approach to design a series of some 20 single mutational variants of the 42-residue peptide in the ﬁrst instance which we anticipated would give a spread of aggregation propensities. Because this peptide is not intrinsic to the ﬂy, there is no reason to suppose that the mutations will cause any other diﬀerences in their behaviour; this assumption can, however, be explored statistically when the results on the whole set of peptides are analysed. The variation in intrinsic aggregation rates is generally predicted to be signiﬁcantly less than an order of magnitude – rather modest in terms of the variations in the rates of diﬀerent naturally occurring peptides and proteins that cover more than six orders of magnitude – and representative studies carried out in vitro have validated the accuracy of these predictions [54]. In addition, quantitative analysis shows that the levels of expression of the diﬀerent peptides are similar, enabling this factor to be eliminated from the analysis of the origins of any signiﬁcantly diﬀerent behaviour found within the series of variants. The results of this set of experiments are dramatic, and a ﬂavour of their remarkable nature is illustrated in a snapshot of a climbing assay involving a subset of the mutated peptides (Fig. 13.8). This experiment illustrates the eﬀect of introducing two diﬀerent single-residue mutations designed in each case to reduce the aggregation propensity of the wild-type peptide. It is immediately evident that the mutations result in the dramatic recovery of locomotor skills; similar experiments in which mutations were designed to increase the aggregation propensity show equally striking decreases in such skills [54]. By deﬁning a “toxicity” parameter based on locomotor ability and lifespan, the correlation of the experimental eﬀects of the mutations can be compared with the predictions in a quantitative manner (Fig. 13.9); this procedure reveals that the correlation coeﬃcient relating toxicity to the aggregation propensity of 17 mutational variants is an astonishing 0.85 [6]. We can conclude from this ﬁnding that, despite the vast machinery associated with the regulation and management of peptide and protein expression and degradation, the times of onset of restricted movement and the lifespans of the ﬂies are quantitatively dependent simply on the physicochemical properties of the aggregation-prone species. The value of the correlation coeﬃcient for the data shown in Fig. 13.9 shows that the probability that neuronal dysfunction is not related directly to the aggregation of the Aβ-peptide, in this system at least, is less than 1 in

13 Protein Misfolding Diseases

259

Fig. 13.9. The eﬀect of mutations in the sequence of the 42-residue human Alzheimer Aβ-peptide on neuronal dysfunction in transgenic fruit ﬂies. The upper left panel (a) illustrates a climbing assay of ﬂies expressing the wild-type sequence (left) and two mutational variants predicted to reduce the peptide’s aggregation propensity; the more mobile the ﬂies, the higher up the tube they can climb. The right-hand upper panel (b) represents a similar experiment with ﬂies expressing the Aβ-peptide containing the E22G “Arctic mutation” (left-hand tube). The two right hand tubes contain ﬂies expressing peptides that contain mutations that decrease the propensity to form pre-ﬁbrillar aggregates (protoﬁbrils). The lower panel (c) shows the degree of correlation between the relative locomotor activity of a series of mutational variants against their predicted propensities to form protoﬁbrils. Figure adapted from [6]

100,000. In additional studies we have investigated the eﬀects of second mutations introduced to “rescue” ﬂies expressing an aggregation-prone variant of the Aβ-peptide, speciﬁcally the Arctic mutation (E22G) (Fig. 13.9). These

260

C.M. Dobson

experiments show that it is possible to neutralise eﬀectively completely the eﬀects of even this highly pathogenic mutation by a further substitution that increases its solubility [6]. Moreover, more detailed analysis shows that the data correlate even more closely with the tendency of the various mutational variants to convert into pre-ﬁbrillar (oligomeric) species than with their propensities to form the fully formed amyloid ﬁbrils themselves [6]. These experiments therefore provide further very strong evidence for the proposition that oligomeric aggregates are responsible for cellular damage, and that they are the culprits in the onset of at least some of the diseases associated with the eventual appearance of amyloid ﬁbrils. Moreover, studies of the eﬀects of aggregation in another model organism, C. elegans, using gene knockout techniques have provided evidence for the idea that the formation of relatively harmless large aggregates could have evolved to be a protective mechanism against neuronal damage [68,69]. We believe that the use of model organisms in the ways illustrated in these examples will play a major role in the quest to understand the underlying links between physical and chemical principles and biological function: speciﬁcally in the context of this chapter, the fundamental origins of the complex and increasingly common diseases that are associated with protein misfolding [6] and the key role of the links between the interactions of biological systems with water in terms of their stability and solubility.

13.7 The Recent Proliferation of Misfolding Diseases and Prospects for Eﬀective Therapies In the speciﬁc context of protein misfolding and misassembly, events that will always have a ﬁnite probability of occurring given the complex and stochastic processes involved in normal folding and assembly, these studies have shown that under normal circumstances molecular chaperones and other “housekeeping” mechanisms are remarkably eﬃcient in ensuring that potentially toxic species such as oligomeric or pre-ﬁbrillar amyloid aggregates are neutralised in living systems before they can do signiﬁcant damage [5,14]. Such neutralisation can result from targeting them eﬃciently for degradation, from disrupting them to regenerate their soluble precursors or from their conversion into less toxic aggregates such as ﬁbrils and plaques. The evidence discussed in this chapter suggests that the reason for the recent proliferation of aggregation diseases, in the developed world in particular, is fundamentally due to the fact that at least some of our proteins are poised right at the boundary between solubility and insolubility [4]. In such a situation, relatively small changes in aggregation propensities (e.g., resulting from even a single mutation as in familial amyloid diseases such as that associated with lysozyme [40, 41]), or in protein concentration (e.g., in dialysis-related amyloidosis [74]) or decreases in the eﬃciency of protective mechanisms or increases in the number of misfolded or damaged proteins

13 Protein Misfolding Diseases

261

(e.g., in old age [63]) can result in the initiation and slow accumulation of aggregates such as amyloid ﬁbrils, which can in some cases result in the presence of signiﬁcant quantities of toxic species such as ﬁbril precursors. These ideas, based initially on studies in “test tubes” or of cells in culture, are now being linked to the behaviour of higher organisms though the use of model systems such as fruit ﬂies as “living test tubes” [64]. We see as a result of this type of approach the way that the principles of chemistry and physics translate remarkably directly into the biological and physiological properties of living systems to an extent that can be attributed to the highly interdependent co-evolution of molecules and the biological environments in which they function. It is particularly satisfying, in the light of the fact that living cells contain a remarkable concentration of molecular species, typically more than 300 g L−1 [16], to conclude that the importance of maintaining solubility of these species reﬂects the key role that the interaction of water with biomolecules plays in determining whether the behaviour of a biological system is normal or aberrant. This picture that our proteins, the most abundant and ubiquitous of all molecules in biology, are poised on the brink of an aggregation precipice may appear at ﬁrst sight to be a very negative conclusion about the prospects for avoiding misfolding and deposition diseases in the future. There is, however, a very positive conclusion that can be drawn from these ﬁndings: they indicate that only relatively small reductions in intrinsic physicochemical properties such as aggregation propensities, or in factors such as protein concentration or the eﬃciency of the various mechanisms, natural or otherwise, which serve to protect us from disease, can take us into the safety zone of solubility; such a situation is illustrated in the dramatic eﬀects of the “rescue” mutations in the ﬂy model of Alzheimer’s disease [54]. Indeed, the vast increase in our understanding of the origins and means of progression of misfolding and aggregation diseases that has taken place in the last decade are beginning to allow the rational design of strategies to combat these highly debilitating disorders in diﬀerent ways. The generic process of aggregation that has been outlined earlier indicates that there are several very speciﬁc steps in the process where directed therapeutic intervention looks highly promising [70, 71]. Ultimately, if one can achieve the ability to manipulate gene sequences in humans (e.g., by “gene therapy” or stem-cell techniques), it should be possible to abolish disorders such as Alzheimer’s disorders as we see in the case of the rescue mutations in transgenic fruit ﬂies discussed above [54]. But until then, certain classes of molecular therapeutics look particularly promising; as an example, a number of approaches based on antibodies or other speciﬁc binding agents are being explored, as such binding agents can be targeted against a particular molecular species so as, for example, to stabilise the aggregation-resistant native state or to reduce the concentration of aggregation-prone species [42, 72, 73]. Moreover, the recent discovery that antibodies can be raised against diﬀerent generic forms of aggregates, including oligomeric species, suggests that they could in principle play a role analogous

262

C.M. Dobson

to natural chaperones [57,58]. In addition, the remarkable correlation between the events occurring in vitro, in silico and in vivo not only represents a major breakthrough in showing the relevance of carefully designed studies in the test tube for understanding the equivalent processes in a living system, but also indicates the value of model organisms for exploring potential therapeutic strategies [6], and, indeed, in addition provides considerable insight into the relationships between chemistry, physics, biology and medicine.

13.8 Concluding Remarks Application of the techniques and concepts of experimental and theoretical chemistry and physics over many years has provided great insight into the nature and properties of biological molecules at the atomic level, including the manner in which they undergo normal and aberrant self-assembly in laboratory environments; indeed, many of the fundamental principles of the latter have emerged at least in general terms from these studies [5, 7]. Concurrently, the methods of biochemistry and cell biology have revealed much about how the same molecules are associated with speciﬁc functional processes in the cellular environment and the ways in which such functions can be impaired [5,60]. Further applications of these approaches are likely to continue to increase the depth of our understanding of the fundamental events associated with the processes of protein folding, misfolding and aggregation. The results discussed in this chapter also indicate that model organisms such as the fruit ﬂy can be of enormous value in exploring the underlying origins of the phenomena that give rise to disease in humans, and also represent a powerful means of exploring the genetic factors that inﬂuence such diseases and the eﬀects of processes such as ageing, and also of rapidly screening potential therapeutic compounds [6,67]. The substantial degree of progress that has already been made in recent years provides grounds for great optimism that means will be found in the relatively near future to treat eﬀectively, or even to prevent, at least the most common forms of this set of highly unpleasant and usually fatal disorders. Such progress is urgently needed because of the dramatic increase in the numbers of people who are suﬀering from, or vulnerable to, these diseases that are leading them to the top of the list of challenges to healthcare and social support in many countries around the world. And, from the point of view of the topic of this volume, the results of the studies described in this chapter demonstrate in a dramatic manner the key role that the interaction of biological molecules with water has played in biological evolution, and still plays in determining the narrow boundary between the normal and aberrant behaviour of all living systems. Acknowledgements I should like to thank in particular the Wellcome Trust and the Leverhulme Trust for generous funding of the research activities described here over many

13 Protein Misfolding Diseases

263

years, as well as the UK Research Councils, the European Commission, the Royal Society and numerous charitable organisation for their crucial support, without which the work described in this chapter could not have been carried out. I should also like to thank very deeply all the students, research fellows and colleagues who have contributed to all aspects of this work, the names of many of whom appear in the references in this chapter. I should also like to express my gratitude to Professors Kunihiro Kuwajima and Yuji Goto, along with the Co-ordinators and Advisers of the “Water and Biomolecules” Research Project supported by the Japanese Ministry of Science, Culture, Sports and Technology (MEXT), for giving me the privilege of being associated with their research programme and for the stimulation that this connection has had in the development of many of the ideas discussed in this chapter.

References 1. M. Vendruscolo, J. Zurdo, C.E. MacPhee, C.M. Dobson, Philos. Trans. R. Soc. Lond. A 361, 1205 (2003) 2. C.M. Dobson, Nature 426, 884 (2003) 3. M.S. Cheung, A.E. Garcia, J.N. Onuchic, Proc. Natl. Acad. Sci. USA 99, 685 (2002) 4. G.G. Tartaglia, S. Pechmann, C.M. Dobson, M. Vendruscolo, Trends Biochem. Sci. 32, 204 (2007) 5. F. Chiti, C.M. Dobson, Annu. Rev. Biochem. 75, 333 (2006) 6. L.M. Luheshi, D.C. Crowther, C.M. Dobson, Curr. Opin. Chem. Biol. 12, 25 (2008) 7. C.M. Dobson, A. Sali, M. Karplus, Angew. Chem. Int. Ed. Engl. 37, 868 (1998) 8. J.S. Richardson, D.C. Richardson, Proc. Natl. Acad. Sci. USA 99(5), 2754 (2002) 9. P.G. Wolynes, J.N. Onuchic, D. Thirumalai, Science 267, 1619 (1995) 10. K.A. Dill, H.S. Chan, Nat. Struct. Biol. 4, 10 (1997) 11. A.R. Dinner, A. Sali, L.J. Smith, C.M. Dobson, M. Karplus, Trends Biochem. Sci. 25, 331 (2000) 12. M. Vendruscolo, E. Paci, C.M. Dobson, M. Karplus, Nature 409, 641 (2001) 13. B. Hardesty, G. Kramer, Prog. Nucleic Acid Res. Mol. Biol. 66, 41 (2001) 14. F.U. Hartl, M. Hayer-Hartl, Science 295, 1852 (2002) 15. S.T. Hsu, P. Fucini, L.D. Cabrita, H. Launay, C.M. Dobson, J. Christodoulou, Proc. Natl. Acad. Sci. USA 104, 16516 (2007) 16. R.J. Ellis, Curr. Opin. Struct. Biol. 11, 114 (2001) 17. C. Hammon, A. Helenius, Curr. Opin. Cell. Biol. 7, 523 (1995) 18. R.J. Kaufman, D. Scheuner, M. Schr¨ oder, X. Shen, K. Lee, C.Y. Liu, S.M. Arnold, Nat. Rev. Mol. Cell Biol. 3, 411 (2002) 19. M.R.Wilson, S.B. Easterbrook Smith, Trends Biochem. Sci. 25, 95 (2000) 20. P.J. Thomas, B.H. Qu, P.L. Pedersen, Trends Biochem. Sci. 20, 456 (1995) 21. C.M. Dobson, Philos. Trans. R. Soc. Lond. B 356, 133 (2001) 22. A. Horwich, J. Clin. Invest. 110, 1221 (2002) 23. A.N. Bullock, A.R. Fersht, Nat. Rev. Cancer 1, 68 (2001)

264

C.M. Dobson

24. C.M. Dobson, Trends Biochem. Sci. 24, 329 (1999) 25. C.P. Jaroniec, C.E. MacPhee, V.S. Bajaj, M.T. McMahon, C.M. Dobson, R.G. Griﬃn, Proc. Natl. Acad. Sci. USA 101, 711 (2004) 26. J.L. Jim´enez, J.I. Guijarro, E. Orlova, J. Zurdo, C.M. Dobson, M. Sunde, H.R. Saibil, EMBO J. 18, 815 (1999) 27. R. Nelson, M.R. Sawaya, M. Balbirnie, A.O. Madsen, C. Riekel, R. Grothe, D. Eisenberg, Nature 435, 773 (2005) 28. C. Ritter, M-L. Maddelein, A.B. Siemer, T. L¨ uhrs, M. Ernst, B.H. Meier, S. Saupe, R. Riek, Nature 435, 844 (1995) 29. S.Y. Tan, M.B. Pepys, Histophathology 25, 403 (1994) 30. M. Sunde, C.C.F. Blake, Adv. Protein Chem. 50, 123 (1997) 31. M. F¨ andrich, M.A. Fletcher, C.M. Dobson, Nature 410, 165 (2001) 32. M. F¨ andrich, C.M. Dobson, EMBO J. 21, 5682 (2002) 33. S. Auer, C.M. Dobson M. Vendruscolo, HFSP J. 1, 137 (2007) 34. C.M. Dobson, in Physical Biology: From Atoms to Medicine, ed. A.H. Zewail (Imperial College Press, London, 2008), pp. 289–335 35. T.P. Knowles, A.W. Fitzpatrick, S. Meehan, H.R. Mott, M. Vendruscolo, C.M. Dobson, M.E. Welland, Science 318, 1900 (2007) 36. F. Chiti, M. Stefani, N. Taddei, G. Ramponi, C.M. Dobson, Nature 424, 805 (2003) 37. A.P. Pawar, K.F. DuBay, J. Zurdo, F. Chiti, M. Vendruscolo, C.M. Dobson, J. Mol. Biol. 350, 379 (2005) 38. S. Shammas, T.P.J. Knowles, A.J. Baldwin, C.E. MacPhee, M.E. Welland, C.M. Dobson, G.L. Devlin, in preparation 39. A.J. Baldwin, G.L Devlin, C. Waudby, M-F. Massuto, T.J.P. Knowles, S.J. Spencer-Cahill, J Christodoulou, P.D. Barker, C.M. Dobson, in preparation 40. D.R. Booth, M. Sunde, V. Bellotti, C.V. Robinson, W.L. Hutchinson, P.E. Fraser, P.N. Hawkins, C.M. Dobson, S.E. Radford, C.C.F. Blake, M.B. Pepys, Nature 385, 787 (1997) 41. M. Dumoulin, J.R. Kumita, C.M. Dobson, Acc. Chem. Res. 39, 603 (2006) 42. M. Dumoulin, A.M. Last, A. Desmyter, K. Decanniere, D. Canet, A. Spencer, D.B. Archer, S. Muyldermans, L. Wyns, A. Matagne, C. Redﬁeld, C.V. Robinson, C.M. Dobson, Nature 424, 783 (2003) 43. J.R. Kumita, S. Poon, G.L. Caddy, C.L. Hagan, M. Dumoulin, J.J. Yerbury, E.M. Stewart, C.V. Robinson, M.R. Wilson, C.M. Dobson, J. Mol. Biol. 369, 157 (2007) 44. F. Chiti, C.M. Dobson, Nature Chem. Biol. 5, 15 (2009) 45. N.F. Bence, R.M. Sampat, R.R. Kopito, Science 292, 1552 (2001) 46. A.J.L. Macario, E.C. Macario, Ageing Res. Rev. 1, 295 (2002) 47. J.W. Kelly, Curr. Opin. Struct. Biol. 8, 101 (1998) 48. M. Ramirez-Alvarado, J.S. Merkel, L. Regan, Proc. Natl. Acad. Sci. USA 97, 8979 (2000) 49. S.B. Prusiner, Science 278, 245 (1997) 50. M. Tanaka, S.R. Collins, B.H. Toyama, J.S. Weissman, Nature 442, 585 (2006) 51. J.P. Taylor, J. Hardy, K.H. Fischbeck, Science 296, 1991 (2002) 52. B. Caughey, P.T. Lansbury Jr., Annu. Rev. Neurosci. 26, 267 (2003) 53. D.M. Walsh, I. Klyubin, J.V. Fadeeva, W.K. Cullen, R. Anwyl, M.S. Wolfe, M.J. Rowan, D.J. Selkoe, Nature 416, 535 (2002)

13 Protein Misfolding Diseases

265

54. L.M. Luheshi, G.G. Tartaglia, A.C. Brorsson, A.P. Pawar, I.E. Watson, F. Chiti, M. Vendruscolo, D.A. Lomas, C.M. Dobson, D.C. Crowther, PLoS Biol. 5, e290 (2007) 55. M. Bucciantini, E. Giannoni, F. Chiti, F. Baroni, L. Formigli, J. Zurdo, N. Taddei, G. Ramponi, C.M. Dobson, M. Stefani, Nature 416, 507 (2002) 56. S. Baglioni, F. Casamenti, M. Bucciantini, L. Luheshi, N. Taddei, F. Chiti, C.M. Dobson, M. Stefani, J. Neurosci. 26, 8160 (2006) 57. R. Kayed, E. Head, J.L. Thompson, T.M. McIntire, S.C. Milton, C.W. Cotman, C.G. Glabe, Science 300, 486 (2003) 58. R. Kayed, C.G. Glabe, Meth. Enzymol. 413, 326 (2006) 59. H.A. Lashuel, D. Hartley, B.M. Petre, T. Walz, P.T. Lansbury Jr., Nature 418, 291 (2002) 60. M. Stefani, C.M. Dobson, J. Mol. Med. 81, 678 (2003) 61. M.Y. Sherman, A.L. Goldberg, Neuron 29, 15 (2001) 62. P.J. Muchowski, G. Schaﬀar, A. Sittler, E.E. Wanker, M.K. Hayer-Hartl, F.U. Hartl, Proc. Natl. Acad. Sci. USA 97, 7841 (2000) 63. P. Csermely, Trends Gen. 17, 701 (2001) 64. C.M. Dobson, Nature 418, 729 (2002) 65. A. Finelli, A. Kelkar, H.J. Song, H. Yang, M. Konsolaki, Mol. Cell Neurosci. 26, 365 (2004) 66. D.C. Crowther, K.J. Kinghorn, E. Miranda, R. Pase, J.A. Curry, F.A. Duthie, D.C. Gubb, D.A. Lomar, Neuroscience 132, 123 (2005) 67. J. Bilen, N.M. Bonini, Annu. Rev. Genet. 39, 153 (2005) 68. E. Cohen, J. Bieschke, R.M. Perciavalle, J.W. Kelly, A. Dillon, Science 313, 1604 (2006) 69. P.T. Lansbury, Proc. Natl. Acad. Sci. USA 96, 3342 (1999) 70. C.M. Dobson, Science 304, 1259 (2004) 71. F.E. Cohen, J.W. Kelly, Nature 426, 905 (2003) 72. D. Schenck, Nat. Rev. Neurosci. 4, 49 (2003) 73. M. Dumoulin, C.M. Dobson, Biochimie 86, 589 (2005) 74. C.M. Dobson, Nat. Struct. Mol. Biol. 13, 295 (2006)

14 Eﬀect of UV Light on Amyloidogenic Proteins: Nucleation and Fibril Extension A.K. Thakur and Ch. Mohan Rao

Abstract. Amyloid ﬁbril formation is associated with a large number of neurodegenerative diseases. Understanding the molecular details of amyloidogenesis is critical for developing strategies to intervene in the pathological process. Formation of amyloid ﬁbrils is a three-stage process: structural perturbation, nucleation and ﬁbril extension. Absorption of UV light is known to perturb protein conformation and lead to aggregation. We have investigated the eﬀect of UV light on three amyloidogenic proteins: prion protein, β2-microglobulin and α-synuclein, representing three diﬀerent classes of proteins, largely α-helical, β-sheet and natively unstructured, respectively. Of these, only prion protein undergoes amorphous aggregation upon UV exposure. Interestingly, all three proteins, after UV exposure, fail to form amyloid ﬁbrils de novo. It is possible that UV exposure compromises nucleation or ﬁbril extension, or both. Interestingly, upon seeding, these UV-exposed proteins formed amyloid ﬁbrils. The ﬁbrils formed by UV-exposed prion protein were morphologically diﬀerent from those formed by the unexposed protein. Upon UV exposure all the three proteins lose their ability to form de novo ﬁbrils, but remain competent for seeded ﬁbril growth. UV exposure, therefore, selectively compromises the ability of these proteins to nucleate. UV exposure might be of use in investigating the amyloidogenic process, especially the diﬀerent processes associated with nucleation and ﬁbril extension.

14.1 Introduction Molecular self-assembly is one of the key factors of biological structure and function. The forces that are associated with the self-assembly also play a role in the folding of nascent proteins depending on their amino acid sequences. Although several proteins have been shown to be refolded to their correct, functionally active structures in vitro, the situation in vivo is quite diﬀerent. Owing to molecular crowding obtained in vivo, several nonnative interactions can cause protein misfolding and aggregation. Molecular chaperones and heat shock proteins prevent such nonproductive interactions and help proteins to achieve and maintain the native state. Small heat shock proteins such as

268

A.K. Thakur and Ch.M. Rao

αB-crystallin have been shown to inhibit ﬁbril extension of α-synuclein [1] and β2-microglobulin [2]. Proteins have to balance between thermodynamic stability and the ﬂexibility required for biological function. Thus the native functional state of proteins critically depends on several factors. Misfolding and aggregation of proteins, either amorphous or ordered aggregates, lead to diseases such as cataract, transmissible spongiform encephalopathies (TSE), Alzheimer’s disease, Parkinson’s disease and dialysis-related amyloidosis. Understanding the molecular details of aggregation and amyloid ﬁbril formation is important in designing strategies to mitigate the complications. Amyloid ﬁbril formation involves three steps: structural perturbation, nucleation and elongation. Several modalities are being used to perturb the native structure to initiate amyloid ﬁbril formation in vitro. Would it be possible to use UV exposure as a structural perturbant to initiate nucleation leading to amyloid ﬁbril formation or aggregation? We have addressed this question using mouse prion protein [3], human β2-microglobulin and human α-synuclein. Interestingly, inter alia, we ﬁnd that UV-exposed proteins fail to form amyloid ﬁbrils; however, they remain competent for ﬁbril extension if provided with preformed ﬁbrils as seeds. UV exposure, therefore, selectively compromises the nucleation process. This chapter provides a brief and contextual overview of the several structural perturbants and describes the eﬀect of UV light on the amyloidogenic proteins.

14.2 Amyloid The amyloid ﬁbrils are characterized by the presence of a cross-β sheet structure and show a structural hierarchy: subprotoﬁbrils twisting around each other to form protoﬁlaments, which in turn laterally join and twist around to form matured ﬁbrils. Recent lines of evidence suggest that such wellordered structures lead to extensive H-bonding, resulting in a novel blue ﬂuorescence [4]. The ﬁbrils are chemically and thermodynamically stable. Despite diﬀerences in primary structure, all proteins achieve similar cross-β sheet structures in their amyloid form. This led to the suggestion that formation of amyloid ﬁbril might be a generic property of any polypeptide chain; all proteins can form amyloids under appropriate conditions [5]. Till now, around 60 proteins have been shown to form ﬁbrils. Amyloid ﬁbril formation involves three major stages: structural perturbation (prenucleation stage), nucleation and ﬁbril extension. 14.2.1 Structural Perturbation Structural perturbation or conformational change in the soluble protein is important for amyloid formation. The observation of an amyloidogenic intermediate of transthyretin (TTR) in acidic pH led to the hypothesis of conformational perturbation as a prerequisite for amyloid ﬁbril formation [6].

14 Eﬀect of UV Light on Amyloidogenic Proteins

269

Several studies since then have supported this suggestion, and now it is widely accepted that conformational change/structural perturbation is a prerequisite for amyloid formation. Structural perturbation involves destabilization of the native state, thus forming nonnative states or partially unfolded intermediates (kinetic or thermodynamic intermediates), which are prone to aggregation. Mild to harsh conditions such as low pH, exposure to elevated temperatures, exposure to hydrophobic surfaces and partial denaturation using urea and guanidinium chloride are used to achieve nonnative states. Stabilizers of intermediate states such as trimethylamine N-oxide (TMAO) are also used for amyloidogenesis. However, natively unfolded proteins, such as α-synuclein, tau protein and yeast prion, require some structural stabilization for the formation of partially folded intermediates that are competent for ﬁbril formation. Conditions for partial structural consolidation include low pH, presence of sodium dodecyl sulfate (SDS), temperature or chemical chaperones. pH In many cases, low pH has been used to form amyloid intermediates. TTR exists as a tetramer at neutral pH; lowering the pH to 4.4 leads to monomerization. At this pH, an intermediate with well-deﬁned, less hydrophobic, tertiary structure was observed. This intermediate forms amyloid ﬁbrils and hence it is called the amyloid intermediate; pH > 5 did not result in amyloid formation [7]. The recombinant variable domain of immunoglobulin light chain (V(L) domain) forms two intermediates: one at pH 3 with native-like secondary structure and large, exposed hydrophobic surface, and the other at pH 2, which is largely disordered but retains a beta sheet structure. Out of these two, the intermediate with native-like conformation, formed at pH 3, appears to act as an intermediate for ﬁbril formation [8]. β2-Microglobulin ﬁbril formation has been shown to be rapid below pH 4.0, and, in addition, ionic strength also plays a role in the ﬁbril formation [9, 10]. Presence of salts at low pH increases hydrophobicity of β2-microglobulin. A balance of electrostatic and hydrophobic interaction provided by anionic binding was shown to inﬂuence the amyloid ﬁbril growth and stability of β2-microglobulin [11]. In contrast, natively unfolded (intrinsically disordered) proteins such as α-synuclein require partially folded intermediates to form ﬁbrils. Low intrinsic hydrophobicity and high net charge at neutral pH result in the natively unfolded structure of α-synuclein. Lowering the pH leads to reduced net charge, inducing α-helical intermediates in α-synuclein. The radius of gyration of α-synuclein at neutral pH is 40 ˚ A, and it decreases to 30 ˚ A upon lowering the pH; this compaction of the protein molecule correlates with increase in ﬁbril formation [12]. Low pH thus induces conformational changes and facilitates ﬁbril formation.

270

A.K. Thakur and Ch.M. Rao

Temperature Temperature is one of the major determinants of protein conformation. Either of the extreme temperatures, high or low, leads to unfolding of proteins referred to as thermal or cold denaturation, respectively. The process of protein folding or unfolding is commonly associated with one or more intermediates. Some of the intermediates thus generated might partition into oﬀ-pathway processes such as aggregation or amyloid ﬁbril formation. Temperature-induced formation of partially folded intermediates has been observed in the cases of α-synuclein [12], β2-microglobulin, lysozyme [13], Aβ-peptide [14–16], prion protein [17], insulin [18] and ataxin [19]. Surface Interactions Apart from pH and temperature, interaction with various surfaces, such as hydrophobic or hydrophilic, plays major role in ﬁbril formation. It has been suggested that in vitro ﬁbril formation induced by surface interactions could be the best mimic of in vivo ﬁbril formation, as in vivo deposits are associated with surfaces [20]. A few studies indicate the involvement of hydrophobic surfaces such as graphite, mica and teﬂon; interaction with these surfaces has been shown to facilitate ﬁbril formation [21, 22]. Conversely, charged surfaces also can induce the conformational changes required for ﬁbril formation. In light-chain amyloidosis, pathological deposition of amyloid ﬁbrils of immunoglobulin light-chain fragments occurs in several tissues including the walls of blood vessels. Recombinant light-chain variable domain, SMA, forms ﬁbrils on native mica, which has a negatively charged surface. Surface interactions accelerate the rate of ﬁbril formation and also alter the mechanism. No ﬁbrils of SMA were observed on hydrophobic or positively charged surfaces, indicating the role of electrostatic interactions between the surface and proteins [20]. Partially Denaturing Condition Denaturants such as urea and guanidinium chloride have been used to perturb the structure of proteins. Partially denaturing conditions such as 2–5 M GdmCl [23–27] or 3 M urea [28] generate partially unfolded intermediates, which facilitate ﬁbril formation. Higher concentrations of denaturants would prevent interprotein interactions and solubilize the aggregating species. Lower concentration of denaturants might not generate any intermediate species and thus would not facilitate ﬁbril formation. Depending on the protein and denaturant, an optimal concentration of denaturant would be needed for ﬁbril formation. Combination of temperature and GdmCl has also been used to form ﬁbrils of the small heat shock protein bovine α-crystallin [29].

14 Eﬀect of UV Light on Amyloidogenic Proteins

271

Membrane Interactions Many pathological amyloid deposits are associated with membranes. Amphiphilic molecules such as SDS and lipids provide a membrane mimetic environment that can be used to investigate the role of membranes in the amyloidogenic process. Such membrane mimetics have been shown to enhance ﬁbril formation [30, 31]. On the contrary, a few other studies show that such membrane mimetic conditions inhibit ﬁbrillogenesis [32, 33]. The dual role of membranes is rather intriguing. Recently, we have addressed this apparent contradiction. We have investigated the interaction of SDS with α-synuclein [34]. The study showed two types of ensembles of α-synuclein and SDS: the ﬁbrillogenic ensembles formed with optimal SDS concentration of around 0.5–0.75 mM are characterized by enhanced accessible hydrophobic surfaces and extended to partially helical conformation, while the less or nonﬁbrillogenic ensembles formed above 2 mM SDS are characterized by less accessible hydrophobic surfaces and maximal helical content. This ﬁnding is consistent with both the observations in the literature; the apparent contradiction is attributable to the relative concentrations of SDS [34]. Fibril formation of β2-microglobulin is also reported to be maximum at 0.5 mM SDS [35]. Lipids, particularly negatively charged lipids, such as phosphatidyl serine [36], and free fatty acids such as palmitic acid, stearic acid, oleic acid and linoleic acid [37] have been implicated in ﬁbril formation. Electrostatic interactions between protein and lipids have been observed [38]. These interactions accelerate the ﬁbril formation of many amyloidogenic proteins such as α-synuclein [39], Aβ-peptide [38], lysozyme, insulin, glyceraldehyde-3phosphate dehydrogenase, myoglobin, transthyretin, cytochrome c, histone H1 and α-lactalbumin [36]. Cholesterol and lipid rafts have also been investigated for their role in promoting amyloidogenesis [40–42].

Other Perturbants Organic solvents such as methanol, ethanol, triﬂuoroethanol, propanol and hexaﬂuoro-2-propanol [33]; osmolytes such as glycerol, betaine, taurine and TMAO [43, 44]; pesticides such as rotenone, dieldrin and paraquat, [45, 46]; metal ions [47]; ultrasonication [48] and pressure [49] have been shown to inﬂuence the rate of ﬁbril formation. In addition to these external factors, intrinsic changes such as point mutations and truncations of amyloidogenic proteins also facilitate ﬁbril formation. Point mutation D187N leads to exposure of the hidden cleavage site forming the amyloidogenic fragment of gelsolin upon proteolysis [50]. Mutations such as A30P, A53T and R46K in αsynuclein protein lead to increase in self-aggregation and oligomerization into protoﬁbrils, compared to the wild-type protein [51]. Several point mutations in prion protein cause onset of diseases such as A117V [52], D178N [53], E200K [54], P102L [55], and F198S [56]. Interestingly, metal ions such as copper,

272

A.K. Thakur and Ch.M. Rao

aluminum and zinc are known to promote the ﬁbril formation of α-synuclein [47,57,58]. However, metal ions are also known to inhibit ﬁbril formation of the Aβ-peptide [59]. 14.2.2 Nucleation Lansbury and his group have shown that amyloid formation is a nucleationdependent process and that the nucleation step can be evaded by using seeds of preformed ﬁbrils. The nucleation process is a rate-limiting step in amyloidogenesis. It is characterized by a lag phase. During the time required for nucleus formation, the protein appears to be soluble. Nucleus formation requires a series of association steps that are thermodynamically unfavorable because the resultant intermolecular interactions do not outweigh the entropic cost of association [60]. Once the nucleus has formed, further addition of monomers becomes thermodynamically favorable. The nucleation is concentration dependent [61] and shows the presence of hydrophobic cooperativity in the process [62]. 14.2.3 Fibril Extension The lag in kinetics persists till the formation of a critical nucleus, after which the reaction proceeds in favor of a rapid increase in size [63]. Bidirectional growth of the elongating ﬁber was observed at this stage [64]. Binding of the monomer to the continuously growing ﬁber and subsequent conformational change characterize this event [65, 66]. These amyloid aggregates show Congo-red birefringence and cross-β sheet structure. The organization of these ﬁbrils remains the same among diﬀerent types of proteins – unbranched 2–3 subprotoﬁbrils (10–15 ˚ A) helically arrange to form protoﬁlaments (protoﬁbril) (25–30 ˚ A), which associate laterally or twisted in bundle of ﬁve to form mature ﬁbrils [67].

14.3 UV Light as a Potent Structural Perturbant The eﬀect of light on proteins has been known for several decades. Many proteins such as γ-crystallin, present in the eye lens, aggregate upon exposure to UV light [68]. UV light leads to photo-oxidation of aromatic residues (tryptophan, tyrosine and phenylalanine), which leads to conformational alteration and eventually to aggregation. This process is associated with the reactive oxygen species (ROS). We have earlier investigated the photo-aggregation of γ-crystallin upon UV exposure and the prevention of the aggregation using α-crystallin [69]. We observed an increase in the hydrophobic surface due to partial unfolding of this protein upon UV exposure [69]. Eye-lens proteins undergo alteration in conformation, as well as in quaternary packing,

14 Eﬀect of UV Light on Amyloidogenic Proteins

273

leading to the opacity of the lens [70]. Therefore, UV light can be a potential protein structural perturbant. We have investigated the possibility of using UV exposure as a structural perturbant to initiate nucleation leading to amyloid ﬁbril formation. We have used three amyloidogenic proteins – prion protein, β2-microglobulin and α-synuclein. These proteins, interestingly, represent three diﬀerent classes of structures – prion protein is rich in α-helix, β2-microglobulin is rich in β-sheet and α-synuclein is natively unfolded. 14.3.1 UV-Induced Aggregation of Prion Protein Prion protein has eight tryptophans, and seven of them are in the ﬂexible Nterminal region of the protein. The abundance of tryptophan raises a question – whether perturbing the N-terminal region via photo-oxidation of tryptophans would have any eﬀect on amyloid aggregation. We have exposed prion protein to UV light of 290 nm. Within a few minutes of exposure, prion protein, in sodium phosphate buﬀer, pH 7.4, aggregated extensively. We monitored the aggregation by measuring the Rayleigh scattering by setting the excitation and emission monochromators at 465 nm. The scattering proﬁle is shown Fig. 14.1. Aggregation starts after a lag period of about 4 min and plateaus after 15 min. We have also exposed β2-microglobulin and α-synuclein to UV light under similar conditions. Surprisingly, neither β2-microglobulin nor α-synuclein exhibited any aggregation during the period of the experiment (Fig. 14.1). Extended exposure for 1 h also did not lead to any aggregation (data not shown). Prion protein aggregates do not show increase in Thioﬂavin T (ThT) ﬂuorescence, indicating the formation of amorphous aggregates. In order to probe

Fig. 14.1. Photo-aggregation of prion protein, β2-microglobulin and α-synuclein. In each case, the protein (0.05 mg ml−1 ) in 50 mM phosphate buﬀer was exposed to light of 290 nm. Light scattering was measured using a ﬂuorescence spectrophotometer (Fluorolog FL3-22) by setting excitation and emission monochromators at 465 nm. Mouse full-length prion protein (ﬁlled square) exhibits aggregation upon UV exposure, whereas β2-microglobulin (ﬁlled triangle) and human α-synuclein (ﬁlled circle) do not show signiﬁcant aggregation

274

A.K. Thakur and Ch.M. Rao

the nature of aggregation (covalent or noncovalent), a photo-aggregated sample of prion protein was treated with 0.1% SDS. We observed a fast decrease in Rayleigh scattering within minutes, showing that the aggregates are soluble in SDS. SDS solubility indicated the predominance of noncovalent interactions in the aggregation of prion protein. Exposure of prion protein at high concentrations to light under partial denaturing condition also did not lead to increase in Rayleigh scattering, further conﬁrming the noncovalent nature of interactions in the aggregation of prion protein. We also analyzed the role of the disulﬁde bond in the photo-aggregation of prion protein by testing the samples on reducing and nonreducing SDS PAGE. We observed the presence of intact intradisulﬁde bond both before and after exposure of prion protein to UV light. Thus, exposure to UV light causes perturbation of the N-terminal region of prion protein (which has seven out of eight tryptophans) and leads to amorphous aggregation. Noncovalent interactions play a predominant role in the photo-aggregation of prion protein [3]. 14.3.2 Prevention of UV-Induced Aggregation of Prion Protein Tryptophan, upon absorption of light, forms the tryptophanyl radical and generates N -formyl kynurenine and kynurenine. In the presence of antioxidants, generation of radicals such as superoxide, singlet oxygen, hydroxyl and peroxyl radicals will be inhibited. We used several antioxidants to investigate the role of ROS in the photo-aggregation of prion protein (since β2M and α-synuclein do not photo-aggregate, the eﬀect of antioxidants has not been investigated with these proteins). Antioxidants such as mannitol, l-cysteine, superoxide dismutase (SOD) and catalase have been used for scavenging hydoxyl radical, singlet oxygen, superoxide and peroxyl radicals, respectively. Antioxidants were added to the protein sample prior to light exposure, and Rayleigh scattering was monitored using light exposure as described above. The presence of mannitol or catalase did not alter the aggregation proﬁle (data not shown), showing that hydroxyl and peroxyl radicals are not involved in the photo-aggregation of prion protein. On the other hand, SOD prevented 45% of aggregation of prion protein, where as l-cysteine prevented photoaggregation to an extent of ∼97% (Fig. 14.2). These studies thus showed that singlet oxygen and superoxide radicals are involved in the photo-aggregation of prion protein [3]. 14.3.3 UV Exposure Alters Conformation of Prion Protein Far UV circular dichroism (CD) spectrum of prion protein is known to exhibit an α-helical structure (Fig. 14.3a inset). Upon exposure to UV light, prion protein undergoes aggregation. Hence CD measurements were not possible. However, we could record the CD spectra of the UV-exposed prion protein in partially denaturing conditions (3 M urea and 1 M GdmCl). Under these

14 Eﬀect of UV Light on Amyloidogenic Proteins

275

Fig. 14.2. Eﬀect of antioxidants on the photo-aggregation of prion protein. Prion protein photo-aggregation was monitored in the presence of antioxidants l-cysteine, superoxide dismutase (SOD), mannitol and catalase. Prion protein (PrP) was used at a concentration of 0.05 mg ml−1 . The concentrations of the antioxidants used were: l-cysteine, 1 mM; SOD, 20 μ g ml−1 (∼64 U ml−1 ); mannitol, 50 mM and catalase, 2.5 ng ml−1 (∼0.895 mU ml−1 ). l-Cysteine (ﬁlled circle) prevents the photoaggregation of prion protein almost completely, while SOD (ﬁlled triangle) prevents it partially. Mannitol and catalase do not prevent the aggregation (data not shown)

conditions, prion protein is known to undergo ordered aggregation and form amyloid ﬁbrils; hence this condition is called the amyloid condition. We did not observe any photo-aggregation of prion protein under amyloid conditions. Interestingly, UV exposure was suﬃcient to cause observable diﬀerences in the far UV CD of prion protein compared to the unexposed protein under partially denaturing conditions (Fig. 14.3a). We ﬁnd that UV exposure leads to a decrease in the α-helical content of prion protein [3]. It is possible that photo-oxidation of aromatic amino acids, leading to side-chain modiﬁcation, results in conformational change, making prion protein prone to amorphous aggregation. We have also investigated the eﬀect of UV light exposure on the secondary structures of β2-microglobulin and α-synuclein. Since β2-microglobulin and α-synuclein do not undergo photo-aggregation, we have studied the eﬀect of exposure to UV light under conditions that lead to their ordered aggregation. β2-Microglobulin forms amyloid aggregates at pH 2.5. We exposed β2microglobulin in citrate buﬀer, pH 2.5, to UV light and monitored changes in the far UV CD spectrum. Upon exposure to UV light, we observed minor changes in the far UV CD spectrum of β2-microglobulin (Fig. 14.3b). α-Synuclein is a natively unfolded molecule. In the presence of 0.5 mM SDS in HEPES buﬀer, pH 7.0, it adopts a partially folded conformation [34]. Interestingly, exposure of α-synuclein to UV light under these conditions leads to no observable change in the far UV CD spectrum (Fig. 14.3c). Thus, we

276

A.K. Thakur and Ch.M. Rao

Fig. 14.3. Secondary structural changes of prion protein, β2-microglobulin and α-synuclein upon UV exposure under their respective amyloid-forming conditions. Far UV CD spectra of (a) prion protein in 20 mM sodium phosphate buﬀer (pH 6.8) containing 100 mM NaCl, 3 M urea and 1 M GdmCl. Inset shows the Far UV CD spectrum of native prion protein (b) β2-microglobulin in 50 mM citrate buﬀer (pH 2.5) containing 100 mM KCl and (c) α-synuclein in 20 mM HEPES–NaOH buﬀer (pH 7.0) containing 100 mM NaCl and 0.5 mM SDS. In each panel, curves 1 and 2 show the far UV CD spectra of the protein before and after exposure to UV light, respectively. Panel 3a reproduced from [3]

14 Eﬀect of UV Light on Amyloidogenic Proteins

277

see a diﬀerential eﬀect of UV light exposure on the changes in the secondary structures of these proteins. 14.3.4 UV-Exposed Proteins Failed to Form Amyloid De Novo As discussed earlier, conformational change/structural perturbation is a prerequisite for amyloid formation. Exposure to UV light did not initiate the ﬁbril formation; it led to amorphous aggregation of prion protein and no observable change to the other two proteins. We have employed conditions that are known to favor amyloid ﬁbril formation and investigated the eﬀect of UV light exposure of these proteins on their ability to form amyloid ﬁbrils. UV-Exposed Prion Protein Failed to Form Amyloid De Novo We have investigated the amyloid formation of UV-exposed prion protein under amyloidogenic conditions (3 M urea, 1 M GdmCl, 150 mM NaCl, pH 6.8, at 37◦ C with continuous shaking at 600 rpm). Under amyloidogenic conditions, prion protein (that is not exposed to UV light) showed increase in ThT ﬂuorescence after ∼40 h and attained saturation by 120 h (Fig. 14.4a). Surprisingly, structural perturbation upon light exposure had a negative eﬀect on amyloidogenesis. Even after incubating UV-exposed prion protein for several days under amyloidogenic conditions, we did not observe any increase in ThT ﬂuorescence (Fig. 14.4a), indicating that prion protein completely failed to form ﬁbrils upon mild UV exposure [3]. UV-Exposed β2-Microglobulin and α-Synuclein Failed to Form Amyloid De Novo We also exposed β2-microglobulin and α-synuclein to UV light and investigated the de novo amyloid ﬁbril formation ability of the UV-exposed proteins. β2-Microglobulin readily forms amyloid ﬁbrils at pH 2.5 in 100 mM KCl. Within 8 h of incubation at 37◦ C and with continuous shaking at 1,000 rpm, β2-microglobulin attains saturation of ﬁbril formation (Fig. 14.4b). We monitored the ﬁbril formation of β2-microglobulin exposed to UV light and incubated under the above-mentioned conditions. Interestingly, we found that β2-microglobulin, like prion protein, failed to form ﬁbrils upon UV exposure (Fig. 14.4b). In the case of α-synuclein, ﬁbril formation occurred in the presence of 0.5 mM SDS and stirring at 1,000 rpm. α-Synuclein exhibited increase in ThT ﬂuorescence within 3 h of incubation and reached a plateau at ∼10 h. We exposed α-synuclein to UV light and incubated it under the conditions mentioned above. UV-exposed α-synuclein failed to form ﬁbrils even upon prolonged incubation under amyloidogenic conditions (Fig. 14.4c). Thus, the inability to form ﬁbrils was not conﬁned to exposed prion protein alone. All three proteins, when exposed to UV light, failed to form ordered aggregates.

278

A.K. Thakur and Ch.M. Rao

Fig. 14.4. Eﬀect of UV exposure on de novo amyloid ﬁbril formation of prion protein, β2-microglobulin and α-synuclein. Amyloid ﬁbril formation of (a) prion protein (b) β2-microglobulin and (c) α-synuclein. In each panel, (ﬁlled square) represents proteins that are not exposed to UV light, and (ﬁlled circle) the UV-exposed protein. The ﬁbril formation was monitored by ThT ﬂuorescence. An aliquot of the sample was withdrawn at diﬀerent time points and added to 0.5 ml of 10 μM ThT in 50 mM glycine–NaOH buﬀer (pH 8.5), and the ﬂuorescence intensity at 485 nm with excitation wavelength set at 445 nm was measured using a Fluorolog FL3-22 ﬂuorescence spectrophotometer. The UV-exposed proteins failed to form amyloid ﬁbril de novo

14 Eﬀect of UV Light on Amyloidogenic Proteins

279

14.3.5 Is Subcritical Concentration of UV-Exposed Protein Responsible for Failure to Form Amyloid Fibrils? Figure 14.5 shows the atomic force microscopy (AFM) images of prion protein unexposed and exposed to UV light, which were incubated for 120 h in amyloid-forming conditions. The AFM image of prion protein not exposed to UV light exhibited the typical ﬁbrillar morphology (Fig. 14.5a). The AFM image of the UV-exposed protein, on the other hand, did not show the presence of any ﬁbrils (Fig. 14.5b). The inability of UV-exposed prion protein to form amyloid ﬁbrils is intriguing. In order to see whether UV exposure causes loss of available protein leading to the subcritical level, if any, we have investigated the concentration dependence of prion protein in its amyloidogenesis. Several concentrations ranging from 0.1 to 1.0 mg ml−1 of unexposed prion protein were prepared for amyloid formation. Figure 14.6 shows a rise in the ThT ﬂuorescence of prion protein after 48 h even at 0.25 mg ml−1 (or one-fourth of the initial concentration) (i.e., even assuming the loss of available protein to be 75%). Fibril formation could be seen at dilution as low as tenfold (0.1 mg ml−1 ). We have also studied the amyloidogenic potential of β2-microglobulin and α-synuclein (that are not exposed to UV light) at one-ﬁfth and one-tenth of the concentrations used in the experiments for ﬁbril formation of UV-exposed proteins (shown in Fig. 14.4b and c). Both β2microglobulin and α-synuclein showed increase in ThT ﬂuorescence at each of these concentrations. However, UV-exposed prion protein, β2-microglobulin and α-synuclein even at much higher concentrations showed no ﬁbril formation as monitored by ThT ﬂuorescence (Fig. 14.4a–c) and AFM (Fig. 14.5b). Thus, the ability of all the three proteins to form amyloid ﬁbrils even at one-tenth of

Fig. 14.5. AFM images. (a) AFM image of the amyloid ﬁbrils of prion protein. (b) AFM image of the sample of UV-exposed prion protein not showing the presence of any ﬁbrils. Reproduced from [3]

280

A.K. Thakur and Ch.M. Rao

Fig. 14.6. De novo amyloid formation of prion protein at diﬀerent concentrations. Diﬀerent concentrations of unexposed prion protein (0.1, 0.25, 0.5, 0.75 and 1.0 mg ml−1 ) and UV-exposed prion protein (1.0 mg ml−1 ) were subjected to amyloid-forming conditions. The ﬁgure shows representative data at each concentration. Exposed represents UV-exposed prion protein

the concentration used for ﬁbril formation of UV-exposed proteins rules out the trivial possibility of loss of protein as a possible cause for the observed lack of amyloid formation with the UV-exposed samples. 14.3.6 UV-Exposed Amyloidogenic Proteins Form Amyloid Upon Seeding As described earlier, amyloidogenesis involves nucleation and ﬁbril extension. Thus UV exposure could lead to compromised nucleation or ﬁbril extension, or both. Fragments of preformed amyloid ﬁbrils act as seed when mixed with monomeric protein solution and lead to ﬁbril extension. Seeded ﬁbril extension reactions have no lag periods in contrast to de novo ﬁbril formation. Seeding thus eliminates the need for nucleation. Does the UV-exposed protein remain competent for ﬁbril extension under conditions where seeding is not important? In order to test this possibility, we have generated ﬁbrils from prion protein, α-synuclein and β2-microglobulin samples and sonicated them to obtain seeds. Seeds were added to the respective UV-exposed monomeric proteins, and ﬁbril formation was monitored using ThT ﬂuorescence. Figure 14.7a shows the increase in ThT ﬂuorescence of prion protein either exposed or not exposed to UV light. The UV-exposed protein shows increase in ThT ﬂuorescence albeit with slower kinetics, compared to that of the unexposed protein. UV-exposed β2-microglobulin (Fig. 14.7b) and α-synuclein (Fig. 14.7c) also exhibited similar behavior in terms of elongation of ﬁbrils as well as kinetics

14 Eﬀect of UV Light on Amyloidogenic Proteins

281

Fig. 14.7. Eﬀect of UV-exposure on seeded amyloid ﬁbril formation in prion protein, β2-microglobulin and α-synuclein. Samples of unexposed (ﬁlled square) and UVexposed (ﬁlled circle) (a) 1 mg ml−1 prion protein in 20 mM sodium phosphate buﬀer (pH 6.8) containing 100 mM NaCl, 3 M urea and 1 M GdmCl, (b) 0.5 mg ml−1 β2microglobulin in 50 mM citrate buﬀer (pH 2.5) containing 100 mM KCl and (c) α-synuclein in 20 mM HEPES–NaOH buﬀer (pH 7.0) containing 100 mM NaCl and 0.5 mM SDS, were treated with the respective sonicated ﬁbril seeds, and the ﬁbril growth was monitored with time by ThT ﬂuorescence

282

A.K. Thakur and Ch.M. Rao

of ﬁbril extension. Thus, UV-exposed proteins retained the ability to form ﬁbrils upon seeding. This is an interesting result, as all these proteins (UVexposed prion protein, α-synuclein and β2-microglobulin) failed to form ﬁbrils de novo. However, they have the ability to elongate in the presence of seeds of amyloid ﬁbrils obtained from unexposed proteins. These results suggest that UV exposure selectively aﬀects the nucleation, leaving the protein competent for ﬁbril extension. 14.3.7 UV-Exposed Prion Protein Fibrils Show Altered Fibril Morphology We further investigated ﬁbril morphology under these conditions using electron microscopy (EM) and AFM. Fibrils formed from monomers of unexposed prion protein in the seeded reaction were slender and long as shown in Fig. 14.8a. These ﬁbrils showed a canonical organization of ﬁbrils with subprotoﬁbrils of 8.89 ± 0.355 nm twisting around each other to form protoﬁlaments of 20.57 ± 0.833 nm. Contrary to this, ﬁbrils obtained from monomers of UV-exposed protein in seeded reactions were thick and stout and ﬂat in appearance and showed a thickness of 30 ± 0.916 and 47.72 ± 2.066 nm, indicating diﬀerent organization of ﬁbrils as observed from the EM image (Fig. 14.8b). We have recoded phase images of these ﬁbrils in the tapping mode of AFM. Phase images provide some insight into the compactness or stiﬀness of the material under investigation. Compactness (or stiﬀness) refers to hardness or softness of the sample. A hard sample gives a larger change in phase angle; soft samples in contrast lead to smaller changes in phase angle. The ﬁbrils

Fig. 14.8. EM images of ﬁbrils of seeded reactions. A small amount of sample was placed on a copper grid and stained by uranyl acetate for EM imaging. EM image of ﬁbrils formed with (a) prion protein and (b) UV-exposed prion protein. Scale bar – 500 nm. Reproduced from [3]

14 Eﬀect of UV Light on Amyloidogenic Proteins

283

of unexposed prion protein show a phase angle of 37.5 ± 0.358◦ as observed from phase image. In contrast, phase images of ﬁbrils of UV-exposed prion protein showed a signiﬁcantly low phase angle of 3.82 ± 0.1457◦ . A signiﬁcantly lower phase angle for UV-exposed prion protein ﬁbrils indicates a less compact packing (or less stiﬀness) of these ﬁbrils.

14.4 Discussion Our investigations on the eﬀect of UV light exposure on the amyloidogenic proteins, prion protein, β2-microglobulin and α-synuclein provided interesting results. All these proteins failed to form ﬁbrils when exposed to UV light. This failure to form ﬁbrils might arise because of the following plausible reasons: (1) photo-oxidation causing loss of available monomer protein leading to subcritical level, if any, of the protein for amyloidogenesis; (2) incapability of UV-exposed protein to participate in amyloid process probably due to loss of crucial structure of monomers and (3) inhibitory eﬀect of the oxidized molecule on the amyloid nucleus. The fact that prion protein, β2-microglobulin and α-synuclein not exposed to UV light exhibit the ability to form amyloid ﬁbrils at signiﬁcantly lower concentrations (one-tenth) than those used for ﬁbril formation of the UV-exposed proteins rules out the possibility that subcritical protein concentration is responsible for the observed lack of ﬁbril formation upon UV exposure. We ﬁnd that all three UV-exposed proteins, if provided with preformed seeds, readily form amyloid ﬁbrils, thus ruling out the possible inhibitory eﬀects of photo-oxidized molecules. Hence it appears that UV exposure renders prion protein incapable of forming amyloid nucleus perhaps as a result of some structural changes. Our far UV CD studies (Fig. 14.3) show some change in the secondary structure of prion protein upon exposure to UV light. UV-exposed β2microglobulin also shows a small change, however, and α-synuclein does not show any change in its secondary structure. UV exposure of proteins leads to photolysis of tryptophan which can cause conformational changes in the protein. Our earlier studies on mellitin, β-lactoglobulin and crystallins have shown that photo-oxidation of a protein depends upon its conformation [70]. Photo-oxidation also depends upon the polarity of the tryptophan environment [71, 72]. Prion protein has eight tryptophans, seven of which are completely exposed and are present at the N-terminal domain. Thus, photo-oxidation of prion protein can cause damage to the N-terminal region, leading to conformational change, aggregation and loss of ability to form ﬁbrils de novo. Interestingly, β2-microglobulin has two tryptophans, whereas α-synuclein has none. Absorption of light by other chromophores and subsequent generation of ROS might contribute to observed failure of de novo ﬁbril formation. Further studies are needed to understand these observations. Prion protein consists of two domains – the ﬂexible N-terminal domain and the C-terminal domain which consists of three α-helices and two β-sheets

284

A.K. Thakur and Ch.M. Rao

[73, 74]. The aggregation properties of full-length prion protein (PrP 23–231) have not been studied as extensively as its truncated forms (PrP 90–231, PrP 106–126 and PrP 121–231) because the N-terminal ﬂexible domain was not considered important for amyloid formation. However, the N-terminal domain appears to be important, as prion protein with N-terminal deletions has been shown to form abnormal conformations of prion aggregates [75–77]. Moreover, transgenic mice lacking residues 32–106 are not susceptible to prion infection [78]. In the current study, we have exposed full-length prion protein to UV light and followed its amyloid formation. Since most of the tryptophan residues are present in the N-terminal region, we expect this region to be the most aﬀected. Interestingly, UV-exposed prion protein failed to form amyloid ﬁbrils de novo, indicating the importance of the N-terminal domain in amyloid formation. Our study shows that UV exposure of prion protein, β2-microglobulin and α-synuclein leads to loss of ability of these proteins to form amyloid ﬁbrils de novo. However, they retained the ability to elongate the ﬁbrils when provided with preformed ﬁbrils as seeds. Thus, UV exposure selectively compromises the ability to nucleate ﬁbril growth. Figure 14.9 schematically describes the eﬀect of UV light on the amyloidogenic proteins. Prion protein, β2-microglobulin and α-synuclein under amyloidogenic conditions undergo structural changes and form amyloid nucleus to which other monomers join to extend the nucleus to protoﬁbrils and subsequently thicker ﬁbrils and amyloid aggregates (grey arrows). UV exposure inhibits the nucleation process and hence ﬁbril formation. UV-exposed prion protein undergoes some structural alterations and forms amorphous aggregates. β2-Microglobulin and α-synuclein, however, do not form such amorphous aggregates upon exposure to UV light. All three proteins remain competent for ﬁbril extension if provided with preformed ﬁbrils as seed. Morphology of the ﬁbrils formed by UV-exposed β2-microglobulin and α-synuclein is comparable to that of the ﬁbrils of unexposed proteins. Morphology of ﬁbrils of UV-exposed prion protein diﬀers in size and compactness from those of the ﬁbrils formed by the unexposed protein (Fig. 14.9). The selective loss of ability to nucleate ﬁbril growth upon UV exposure is an important ﬁnding, as research on speciﬁc inhibition of the nucleation and elongation processes is scanty and poses a basic problem of separating these two intricately interwoven processes. Apolipoprotein E has been shown to speciﬁcally inhibit nucleation of Aβ-amyloid aggregation [79,80]. Similarly, tetracycline has been shown to speciﬁcally inhibit the elongation process of amyloid-forming W7FW14F mutant of apomyoglobin [81]. Although light exposure might not be a factor in amyloid-associated pathologies, other than perhaps in the eye and skin, it appears to be a useful perturbant to investigate amyloid ﬁbril formation. Since UV exposure leads to failure of de novo amyloid ﬁbril formation of three diﬀerent amyloidogenic proteins, subtle structural changes that help prevent ﬁbril formation could be investigated further. UV exposure also leads to selective compromise of

14 Eﬀect of UV Light on Amyloidogenic Proteins

285

Fig. 14.9. Schematic representation of eﬀect of light on amyloid proteins. Adapted from [3]

the nucleation process. Thus, it appears that UV exposure could be exploited as a tool for investigating the amyloidogenic process, especially the diﬀerent processes that are associated with nucleation and ﬁbril extension. Acknowledgments We thank Dr T. Ramakrishna for critically editing the manuscript, which helped in improving its quality; Md. Faiz Ahmad for α-synuclein protein and construct of β2-microglobulin; and Dr Shashi Singh for electron microscopy. AKT acknowledges the award of a Senior Research Fellowship by the Council of Scientiﬁc and Industrial Research, New Delhi, India.

286

A.K. Thakur and Ch.M. Rao

References 1. M.F. Ahmad, B. Raman, T. Ramakrishna, Ch.M. Rao, J. Mol. Biol. 375, 1040 (2008) 2. B. Raman, T. Ban, M. Sakai, S.Y. Pasta, T. Ramakrishna, H. Naiki, Y. Goto, Ch.M. Rao, Biochem. J. 392, 573 (2005) 3. A.K. Thakur, Ch.M. Rao, PLoS ONE. 3, e2688 (2008) 4. A. Shukla, S. Mukherjee, S. Sharma, V. Agrawal, K.V. Radha Kishan, P. Guptasarma, Arch. Biochem. Biophys. 428, 144 (2004) 5. C.M. Dobson, Trends Biochem. Sci. 24, 329 (1999) 6. W. Colon, J.W. Kelly, Biochemistry 31, 8654 (1992) 7. Z. Lai, W. Col´ on, J.W. Kelly, Biochemistry 35, 6470 (1996) 8. S.P. Martsev, A.P. Dubnovitsky, A.P. Vlasov, M. Hoshino, K. Hasegawa, H. Naiki, Y. Goto, Biochemistry 41, 3389 (2002) 9. H. Naiki, N. Hashimoto, S. Suzuki, H. Kimura, K. Nakakuki, F. Gejyo, Amyloid 4, 223 (1997) 10. V.J. McParland, N.M. Kad, A.P. Kalverda, A. Brown, P. Kirwin-Jones, M.G. Hunter, M. Sunde, S.E Radford, Biochemistry 39, 8735 (2000) 11. B. Raman, E. Chatani, M. Kihara, T. Ban, M. Sakai, K. Hasegawa, H. Naiki, Ch.M. Rao, Y. Goto, Biochemistry 44, 1288 (2005) 12. V.N. Uversky, J. Li, A.L. Fink, J. Biol. Chem. 276, 10737 (2001) 13. K. Sasahara, H. Yagi, H. Naiki, Y. Goto, J. Mol. Biol. 372, 981 (2007) 14. Y. Kusumoto, A. Lomakin, D.B. Teplow, G.B. Benedek, Proc. Natl. Acad. Sci. U. S. A. 95, 12277 (1998) 15. O. Gursky, S. Aleshkov, Biochem. Biophys. Acta 1476, 93 (2000) 16. J. Danielsson, J. Jarvet, P. Damberg, A. Gr¨ aslund, FEBS J. 272, 3938 (2005) 17. O.V. Bocharova, N. Makarava, L. Breydo, M. Anderson, V.V. Salnikov, I.V. Baskakov, J. Biol. Chem. 281, 2373 (2006) 18. A. Arora, C. Ha, C.B. Park, Protein Sci. 13, 2429 (2004) 19. E. Shehi, P. Fusi, F. Secundo, S. Pozzuolo, A. Bairati, P. Tortora, Biochemistry 42, 14626 (2003) 20. M. Zhu, P.O. Souillac, C. Ionescu-Zanetti, S.A. Carter, A.L. Fink, J. Biol. Chem. 277, 50914 (2002) 21. T. Kowalewski, D.M. Holtzman, Proc. Natl. Acad. Sci. U S A 96, 3688 (1999) 22. Z. Wang, C. Zhou, C. Wang, L. Wan, X. Fang, C. Bai, Ultramicroscopy 97, 73 (2003) 23. Y. Sun, N. Makarava, C.I. Lee, P. Laksanalamai, F.T. Robb, I.V. Baskakov, J. Mol. Biol. 376, 1155 (2008) 24. B.A. Vernaglia, J. Huang, E.D. Clark, Biomacromolecules 5, 1362 (2004) 25. A. Ahmad, I.S. Millett, S. Doniach, V.N. Uversky, A.L. Fink, Biochemistry 42, 11404 (2003) 26. Z. Lai, J. McCulloch, H.A. Lashuel, J.W. Kelly, Biochemistry 36, 10230 (1997) 27. M. Calamai, F. Chiti, C.M. Dobson, Biophys. J. 89, 4201 (2005) 28. O.V. Bocharova, L. Breydo, A.S. Parfenov, V.V. Salnikov, I.V. Baskakov, J. Mol. Biol. 346, 645 (2005) 29. S. Meehan, Y. Berry, B. Luisi, C.M. Dobson, J.A. Carver, C.E. MacPhee, J. Biol. Chem. 279, 3413 (2004) 30. H.J. Lee, C. Choi, S.J. Lee, J. Biol. Chem. 277, 671 (2002) 31. E.N. Lee, S.Y. Lee, D. Lee, J. Kim, S.R. Paik, J. Neurochem. 84, 1128 (2003)

14 Eﬀect of UV Light on Amyloidogenic Proteins

287

32. V. Narayanan, S. Scarlata, Biochemistry 40, 9927 (2001) 33. L.A. Munishkina, C. Phelan, V.N. Uversky, A.L. Fink, Biochemistry 42, 2720 (2003) 34. M.F. Ahmad, T. Ramakrishna, B. Raman, Ch.M. Rao, J. Mol. Biol. 364, 1061 (2006) 35. S. Yamamoto, K. Hasegawa, I. Yamaguchi, S. Tsutsumi, J. Kardos, Y. Goto, F. Gejyo, H. Naiki, Biochemistry 43, 11075 (2004) 36. H. Zhao, E.K. Tuominen, P.K. Kinnunen, Biochemistry 43, 10302 (2004) 37. Z. Ma, G.T. Westermark, Mol. Med. 8, 863 (2002) 38. E.Y. Chi, C. Ege, A. Winans, J. Majewski, G. Wu, K. Kjaer, K.Y. Lee, Proteins 72, 1 (2008) 39. D.P. Smith, D.J. Tew, A.F. Hill, S.P. Bottomley, C.L. Masters, K.J. Barnham, R. Cappai, Biochemistry 47, 1425 (2008) 40. P. Critchley, J. Kazlauskaite, R. Eason, T.J. Pinheiro, Biochem. Biophys. Res. Commun. 313, 559 (2004) 41. J. Kazlauskaite, N. Sanghera, I. Sylvester, C. V´enien-Bryan, T.J. Pinheiro Biochemistry 42, 3295 (2003) 42. N. Sanghera, T.J. Pinheiro, J Mol. Biol. 315, 1241 (2002) 43. T. Scheibel, S.L. Lindquist, Nat. Struct. Biol. 8, 958 (2001) 44. M.L. Hegde, K.S.J. Rao, Arch. Biochem. Biophys. 464, 57 (2007) 45. V.N. Uversky, J. Li, A.L. Fink, FEBS Lett. 500, 105 (2001) 46. A.B. Manning-Bog, A.L. McCormack, J. Li, V.N. Uversky, A.L. Fink, D.A. Di Monte J Biol. Chem. 277, 1641 (2002) 47. V.N. Uversky, J. Li, A.L. Fink, J Biol. Chem. 276, 44284 (2001) 48. Y. Ohhashi, M. Kihara, H. Naiki, Y. Goto, J Biol. Chem. 280, 32843 (2005) 49. E. Chatani, H. Naiki, Y. Goto, J Mol. Biol. 359, 1086 (2006) 50. S.L. Kazmirski, M.J. Howard, R.L. Isaacson, A.R. Fersht, Proc. Nat. Acad. Sci. 97, 10706 (2000) 51. E.K. Tan, L.M. Skipper, Pathogenic mutations in Parkinson disease. Hum. Mutat. 28, 641 (2007) 52. K. Doh-ura, J. Tateishi, H. Sasaki, T. Kitamoto, Y. Sakaki, Biochem. Biophys. Res. Commun. 163, 974 (1989) 53. L.G. Goldfarb, M. Haltia, P. Brown, A. Nieto, J. Kovanen, W.R. McCombie, S. Trapp, D.C. Gajdusek, Lancet 337, 425 (1991) 54. D. Goldgaber, L.G. Goldfarb, P. Brown, D.M. Asher, W.T. Brown, S. Lin, J.W. Teener, S.M. Feinstone, R. Rubenstein, R.J. Kascsak, J.W. Boellaard, D.C. Gajdusek, Exp. Neurol. 106, 204 (1989) 55. K. Hsiao, H.F. Baker, T.J. Crow, M. Poulter, F. Owen, J.D. Terwilliger, D. Westaway, J. Ott, S.B. Prusiner, Nature 338, 342 (1989) 56. K. Hsiao, S.R. Dlouhy, M.R. Farlow, C. Cass, M. Da Costa, P.M. Conneally, M.E. Hodes, B. Ghetti, S. B. Prusiner, Nat. Genet. 1, 68 (1992) 57. Bharathi, S.S. Indi, K.S. Rao, Neurosci. Lett. 424, 78 (2007) 58. J.A. Wright, D.R. Brown, J. Neurosci. Res. 86, 496 (2008) 59. B. Raman, T. Ban, K. Yamaguchi, M. Sakai, T. Kawai, H. Naiki, Y. Goto, J. Biol. Chem. 280, 16157 (2005) 60. C. Chothia, J. Janin, Nature 256, 705 (1975) 61. A. Lomakin, D.S. Chung, G.B. Benedek, D.A. Kirschner, D.B. Teplow, Proc. Natl. Acad. Sci. U. S. A. 93, 1125 (1996) 62. R.D. Hills, C.L. Brooks Jr, J. Mol. Biol. 368, 894 (2007)

288

A.K. Thakur and Ch.M. Rao

63. V.N. Uversky, A.L. Fink, Biochim. Biophys. Acta 1698, 131 (2004) 64. C. Goldsbury, J. Kistler, U. Aebi, T. Arvinte, G.J. Cooper, J. Mol. Biol. 285, 33 (1999) 65. M. Gobbi, L. Colombo, M. Morbin, G. Mazzoleni, E. Accardo, M. Vanoni, E. Del Favero, L. Cant` u, D.A. Kirschner, C. Manzoni, M. Beeg, P. Ceci, P. Ubezio, G. Forloni, F. Tagliavini, M. Salmona, J. Biol. Chem. 281, 843 (2006) 66. M.J. Cannon, A.D. Williams, R. Wetzel, D.G. Myszka, Anal. Biochem 328, 67 (2004) 67. T. Shirahama, A.S. Cohen, J. Cell Biol. 33, 679 (1967) 68. B. Chakrabarti, S.K. Bose, K. Mandal, J. Indian Chem. Soc. 63, 131 (1986) 69. B. Raman, C.M. Rao J. Biol. Chem. 269, 27264 (1994) 70. S.C. Rao, C.M. Rao, D. Balasubramanian, Photochem. Photobiol. 51, 357 (1990) 71. L.I. Grossweiner, Curr. Top. Radiat. Res. Quart. 11, 141 (1976) 72. L.I. Grossweiner, A. Blum, A.M. Brendzel, in Trends in Photobiology ed. by C. Helene, M. Charlier, Th. Montenay-Garestier, G. Laustriat (Plenum, New York, 1982), p. 67 73. D.G. Donne, J.H. Viles, D. Groth, I. Mehlhorn, T.L. James, F.E. Cohen, S.B. Prusiner, P.E. Wright, H.J. Dyson, Proc. Natl. Acad. Sci. U. S. A. 94, 13452 (1997) 74. R. Riek, S. Hornemann, G. Wider, R. Glockshuber, K. W¨ uthrich, FEBS Lett. 413, 282 (1997) 75. V.A. Lawson, S.A. Priola, K. Wehrly, B. Chesebro, J. Biol. Chem. 276, 35265 (2001) 76. V.A. Lawson, S.A. Priola, K. Meade-White, M. Lawson, B. Chesebro, J. Biol. Chem. 279, 13689 (2004) 77. K.N. Frankenﬁeld, E.T. Powers J.W. Kelly, Protein Sci. 14, 2154 (2005) 78. Weissmann, J. Biol. Chem. 274, 3 (1999) 79. K.C. Evans, E.P. Berger, C.G. Cho, K.H. Weisgraber, P.T. Lansbury Jr. Proc. Natl. Acad. Sci. U. S. A. 92, 763 (1995) 80. S.J. Wood, W. Chan, R. Wetzel, Biochemistry 35, 12623 (1996) 81. C. Malmo, S. Vilasi, C. Iannuzzi, S. Tacchi, C. Cametti, G. Irace, I. Sirangelo FASEB J. 20, 346 (2005)

15 Real-Time Observation of Amyloid Fibril Growth by Total Internal Reﬂection Fluorescence Microscopy H. Yagi, T. Ban, and Y. Goto

Abstract. Amyloid ﬁbrils form through nucleation and growth. To clarify the mechanism involved, direct observations are important. We developed a unique approach to monitor ﬁbril growth in real time at the single-ﬁbril level using total internal reﬂection ﬂuorescence microscopy (TIRFM) combined with thioﬂavin T (ThT), an amyloid-speciﬁc ﬂuorescence dye. We succeeded in visualizing the ﬁbril growth with β2-microglobulin (β2-m) and amyloid β peptide. On the basis of signiﬁcant variations in amyloid morphology revealed by TIRFM, we propose that the taxonomy of amyloid supramolecular assemblies will be useful to clarify the structure–function relationship of amyloid ﬁbrils.

15.1 Introduction Amyloid ﬁbrils have been a critical subject in recent studies of proteins because they were recognized to be associated with the pathology of more than 20 serious human diseases [1–3]. Additionally, various proteins and peptides that are not related to diseases can also form amyloid-like ﬁbrils, implying that the formation of amyloid ﬁbrils is a generic property of polypeptides. Although no sequence or structural similarity has been found among the amyloid precursor proteins, amyloid ﬁbrils share several common structural and spectroscopic properties. Irrespective of the protein species, electron microscopy (EM) and X-ray ﬁber diﬀraction indicate that amyloid ﬁbrils are relatively rigid and straight with a diameter of 10–15 nm and several layers of cross-β sheets. Amyloid ﬁbrils form via a nucleation-dependent process in which nonnative forms of precursor proteins or peptides slowly associate to form a nucleus, which is followed by an extension reaction in which the nucleus grows by the sequential incorporation of precursor molecules. Structural studies using solid state NMR have shown that amyloid ﬁbrils are stabilized by juxtaposing hydrophobic segments minimizing electrostatic repulsion [4–6]. From the hydrogen/deuterium exchange of amide protons, amyloid ﬁbrils were shown

290

H. Yagi et al.

to be stabilized by an extensive network of hydrogen bonds substantiating the β-sheets [4–7]. On the basis of various approaches, increasingly convincing structural models of amyloid ﬁbrils are emerging. The heterogeneity of amyloid ﬁbrils has been in focus recently [4–6]. It has been shown that Aβ-amyloid ﬁbrils with diﬀerent morphological features have diﬀerent underlying side-chain structures as revealed by solid-state NMR measurements and that both the morphology and molecular structure are selfpropagated by seeding [4–6]. A similar observation of the template-dependent propagation of distinct ﬁbrils was made with insulin [8]. More recently, mammalian prion amyloids from diﬀerent species were shown to diﬀer distinctly in secondary structure and morphology as measured by Fourier transform infrared spectroscopy (FTIR) and atomic force microscopy (AFM), respectively [9]. Importantly, cross-seeding of prion monomers from one species with preformed ﬁbrils from another species produced a new amyloid strain that inherited the secondary structure and morphology of the template ﬁbrils. Strain-speciﬁc conformational diﬀerences were also found for yeast Sup35 prion amyloid ﬁbrils [10, 11]. These ﬁndings may explain the structural basis underlying conformational memory as suggested for prion diseases [12]. To obtain further insight into the structure and heterogeneity of amyloid ﬁbrils, direct observation of individual ﬁbrils is important. Here we describe a unique approach we developed to monitor ﬁbril growth in real time at the single ﬁbril level [13–17]. On the basis of the observed dramatic diversity and underlying structural basis, we classify amyloid supramolecular assemblies [18].

15.2 Total Internal Reﬂection Fluorescence Microscopy TIRFM has been useful for monitoring single molecules by eﬀectively reducing the background ﬂuorescence under the evanescent ﬁeld formed on the surface of a quartz slide [19–21] (Fig. 15.1). When a laser is incident on the interface between the quartz slide (high reﬂection index) and an aqueous solution (low reﬂection index) at the critical angle for total internal reﬂection, the evanescent ﬁeld is produced beyond the interface in the solution. The illumination is restricted to ﬂuorophores either bound to the quartz slide surface or located close by, resulting in highly reduced background ﬂuorescence. Furthermore, with the careful selection of optical elements, the background ﬂuorescence can be reduced 2,000-fold compared to that in ordinary epi-ﬂuorescence microscopy. On the other hand, ThT is a reagent known to become strongly ﬂuorescent upon binding to amyloid ﬁbrils [22], so that one can detect the ﬁbrils speciﬁcally without covalent modiﬁcation. Importantly, because the evanescent ﬁeld formed by the total internal reﬂection of the laser light penetrates to a depth of 150 nm, one can selectively monitor ﬁbrils lying along the slide glass within 150 nm, and thus can obtain the exact length of the ﬁbrils. By combining

15 Real-Time Observation of Amyloid Fibril Growth

291

Fig. 15.1. Schematic representation of amyloid ﬁbrils revealed by total internal reﬂection ﬂuorescence microscopy. (a) The penetration depth of the evanescent ﬁeld formed by the total internal reﬂection of laser light is ∼150 nm for a laser light at 455 nm, so only amyloid ﬁbrils lying parallel to the slide glass surface were observed. (b) Schematic diagram of a prism-type TIRFM system on an inverted microscope. ISIT: image-intensiﬁer-coupled silicone intensiﬁed target camera, CCD: charge-coupled device camera

amyloid ﬁbril-speciﬁc ThT ﬂuorescence and TIRFM, it is possible to observe the amyloid ﬁbrils and the process by which they form, without introducing any ﬂuorescence reagent covalently bound to the protein molecule.

15.3 Real-Time Observation of β2-m and Aβ Fibrils Real-time observation of the growth of individual β2-m ﬁbrils was carried out at pH 2.5 on the surface of quartz slides (Fig. 15.2) [13]. At time zero, the β2-m seeds appeared as bright ﬂuorescent spots. Then, ﬁbril growth occurred from the seed ﬁbrils, with saturation occurring in a couple of hours when the monomeric β2-m was depleted. The overall time course of ﬁbril growth was similar to that in solution with similar concentrations of seeds and monomers. Intriguingly, most of the ﬁbrils showed unidirectional growth starting from one end of the seeds. Although we cannot exclude the possibility that the interaction with the glass surface was responsible for the unidirectional extension, the unidirectional picture is likely to hold for the formation of ﬁbrils of β2-m and also of Aβ(1–40) (see below).

292

H. Yagi et al.

Fig. 15.2. Direct observation of β2-m amyloid ﬁbril growth obtained by TIRFM. Adapted from ref. [13] with permission. Incubation times are 0, 30, 60, and 90 min

This approach using ThT can be applied to various amyloid ﬁbrils since the binding of ThT is common to amyloid ﬁbrils. This was demonstrated with Aβ(1–40) amyloid ﬁbrils [14], revealing more dramatic images since we could perform the experiments at pH 7.5, where the ﬂuorescence of ThT is much stronger than at pH 2.5 (Fig. 15.3). The growth of ﬁbrils occurred simultaneously at many seeds. Although several ﬁbrils often developed from apparently one seed, it is likely that the clustered seeds produced such a radial pattern. Once started, unidirectional growth continued producing remarkably long ﬁbrils of more than 15 μm in length. Considering that TIRFM selectively monitors ﬁbrils lying along the slide within 150 nm, the interaction of ﬁbrils with the quartz surface caused the lateral growth. In addition, the combination of relatively rapid ﬁbril growth and less aggregation of ﬁbrils weakly ﬁxed on the quartz surface enabled the formation of remarkably long ﬁbrils. The remarkable length of the ﬁbrils enabled an exact analysis of the rate of growth of individual ﬁbrils. The growth at the early and middle stages seems to occur in an all-or-none manner: when the ﬁbril extends, the rate is almost constant (∼0.3 μm min−1 ) independent of the ﬁbril species. There were cases where the growth paused brieﬂy, possibly because of physical obstacles or local depletion of monomers. When the growth restarted, however, a similar rate of 0.3 μm min−1 was regained. Similar discontinuous growth, termed the stop-and-run mechanism, was also observed during the growth of α-synuclein protoﬁbrils monitored by AFM in situ [23].

15.4 Eﬀects of Various Surfaces on the Growth of Aβ Fibrils The size and the shape of ﬁbrils, as well as the kinetics of formation, are dependent on the physicochemical nature of the surface [24–26]. We studied the eﬀects of the physicochemical properties of the surface on the growth of amyloid ﬁbrils of Aβ [15]. Using speciﬁc chemical modiﬁcations, it is possible to modify the properties of the quartz surface, both in terms of net charge

15 Real-Time Observation of Amyloid Fibril Growth

293

Fig. 15.3. Direct observation of Aβ(1–40) amyloid ﬁbril growth by TIRFM. Realtime monitoring of ﬁbril growth on glass slides. Arrows indicate the unidirectional growth of Aβ from a single seed ﬁbril. The scale bar represents 10 μm. Reproduced from [14] with permission

and hydrophobicity. We observed the seed-dependent formation of Aβ (1–40) ﬁbrils on the surface of various chemically modiﬁed substrates that were created either by alternative adsorption of polyelectrolytes or with self-assembled monolayer of silanes. In the presence of the Aβ(1–40) seed ﬁbrils, enhanced ﬁbril formation was observed on negatively charged surfaces, including quartz and polyethyleneimine (PEI)/polyvinylsulfonate (PVS). On quartz, intense growth led to remarkably long ﬁbrils as reported previously [14]. We often observed radial growth patterns suggesting the presence of clustered seeds. Extensive ﬁbril formation was generally observed on the surfaces with negative charges, regardless of whether they were modiﬁed by a polyelectrolyte or silane. In contrast, ﬁbril growth was largely suppressed on positively charged or hydrophobic surfaces. Aβ(1–40) is negatively charged at pH 7.5, suggesting that the tight interactions between Aβ(1–40) and the surfaces prevent the ﬁbril growth.

294

H. Yagi et al.

Fig. 15.4. Real-time observations of the formation of Aβ(1–40) spherulite. Realtime observations of Aβ(1–40) amyloid ﬁbril growth on PEI/PVS at pH 7.5 and 37◦ C. Concentrations of Aβ(1–40) monomers, seeds, and ThT were 50 μM, 5 μg ml−1 , and 5 μM, respectively. White arrows in panels of 0–20 min indicate the hazy area detected before clear images of spherical amyloid ﬁbrils were obtained. At time zero, large clusters were not observed on the surface. At 10 min, hazy globular objects were identiﬁed. At 15 min, ﬁbrils emerged. Fibrils grew both in size and number with time, forming huge spherical amyloid assemblies with a radius of more than 20 μm at 120 min. Reproduced from [15] with permission

Fibril growth was especially prominent on the surfaces covered with PEI/PVS, highly negatively charged and hydrophilic polyelectrolytes (Fig. 15.4). We initially presumed that the growth of ﬁbrils on the PEI/PVS initiated from large clustered seeds attached to the surface. However, the real-time observation revealed striking images of ﬁbril growth, producing huge spherical assemblies with a densely packed radial pattern (Fig. 15.4). Importantly, no branching of the growing ends was observed as on quartz. Considering that TIRFM illumination has a depth of penetration of ∼150 nm and the depth of focus on the objective lens is about 100 nm, the large clusters of seeds formed at ﬁrst in solution and were not in contact with the substrate. The hazy areas observed at the initial stages, as indicated by the arrows in Fig. 15.4, may represent the clustered seeds or aggregated intermediates formed in solution. Since the thickness of the water medium

15 Real-Time Observation of Amyloid Fibril Growth

295

estimated from the ﬁne-focus stroke between the quartz slide and cover slip is about 10 μm, the spherical assemblies observed here are in fact ﬂattened spheres. The surface used for TIRFM observation was located on the upper side of the cell, so the clustered ﬁbrils on the surface were not deposited by gravitational force. Most importantly, these spherulitic structures resemble the amyloid core of senile plaques observed in the central cortices of patients suﬀering from Alzheimer’s disease [27]. Similar spherical amyloid deposits are observed in a mouse model of Alzheimer’s disease [28], in patients with Creutzfeldt– Jakob disease [29], and in several other neurodegenerative diseases [30], indicating that they are a common architectural feature of ﬁbrils. Furthermore, spherulites were observed in vitro in many systems including natural and synthetic polymers, for example in insulin [31, 32], pathogenic immunoglobulin chains [33], β-lactoglobulin [34] and synthetic peptides [35], indicating that they are a common architectural feature of the ﬁbers. We consider that the senile plaque-like spherical objects observed here correspond to “spherulites”, a higher order spherical assembly of amyloid ﬁbrils ranging in diameter from 10 to 150 μm. In a polarizing light microscope, spherulites exhibit a typical “Maltese-cross” extinction pattern [31].

15.5 Spontaneous Formation of Aβ(1–40) Fibrils and Classiﬁcation of Morphologies We also studied the spontaneous formation of Aβ(1–40) ﬁbrils without seeds on quartz slides [18]. Spontaneous ﬁbrillation of Aβ(1–40), accelerated by a low concentration of sodium dodecyl sulfate and a high concentration of sodium chloride under the quartz slides, produced various remarkable amyloid assemblies. Densely packed spherulitic structures with radial ﬁbril growth were typically observed. When the packing of ﬁbrils was coarse, extremely long ﬁbrils often protruded from the spherulitic cores. In other cases, a large number of worm-like ﬁbrils were formed. TEM and AFM revealed relatively short and straight ﬁbrillar blocks associated laterally without tight interaction, leading to a random-walk-like ﬁbril growth. These results suggest that, during spontaneous ﬁbrillation, the nucleation occurring in contact with surfaces is easily aﬀected by environmental factors, creating various types of nuclei, and hence variations in amyloid morphology. On the basis of the various amyloid supramolecular ﬁbrillar assemblies of Aβ(1–40) ﬁbrils produced dependent on and independently of seeds, there are three basic types of amyloid supramolecular ﬁbrillar assemblies (Fig. 15.5). Type I: Basic straight and rigid ﬁbrils with a diameter of about 10–15 nm. Although tremendous lengths can be achieved without lateral association, as observed for the seed-dependent growth on the quartz surface, the preparation in solution tends to form clustered ﬁbrils. Precursors of mature amyloid ﬁbrils can be oligomeric species, protoﬁlaments, or initial short ﬁbrils. Variation in

296

H. Yagi et al.

Fig. 15.5. Schematic models of supramolecular ﬁbrillar assemblies of Aβ(1–40) ﬁbrils. Variation in morphology can arise at the level of oligomeric species, protoﬁlaments, or initial short ﬁbrils. They associate together on the quartz surface, creating three types of supramolecular ﬁbrillar assemblies: Straight ﬁbrils (Type I), spherulitic assemblies (Type II), and worm-like ﬁbrils (Type III). A mixed architecture of type I and ﬁbrils (Type I/II) was also observed when the internal density is coarse. It is to be noted that the diﬀerent precursors are represented together in a box and that the relationships between amyloid precursors and ﬁnal products remain unclear. Reproduced from [18] with permission

morphology can arise at the level of these amyloid precursors. On the other hand, it is possible that diﬀerent precursors as shown here produce similar mature ﬁbrils. Thus, although it is clear that interactions with surfaces at the early stages aﬀect the ﬁnal morphological features, the relationships between amyloid precursors and ﬁnal products remain unclear. This is also true for type II and III ﬁbrils below. Type II Spherulitic amyloid assemblies typically made of type I basic ﬁbrils. The worm-like ﬁbrils (Type III, see below) can also form spherulitic assemblies. Spherulitic structures were observed in the spontaneous growth of Aβ(1–40) ﬁbrils as well as in seed-dependent growth. The diameter reaches more than 30 μm. Probably, the clustered seeds or precursors initiate the ﬁbril growth in a radial pattern. Internal density varies depending on the spherulitic assembly. Intriguingly, a densely packed spherulitic interior ensures concerted growth producing globular architectures. On the other hand,

15 Real-Time Observation of Amyloid Fibril Growth

297

when the internal density is coarse, independent growth of constituent ﬁbrils occurs, making a unique mixed architecture of type I and II ﬁbrils, reminiscent of nerve synapses. Type III: Another most intriguing morphology is the worm-like ﬁbrils. Although the TIRFM images suggest ﬂexible ﬁbrils, the TEM and AFM images clariﬁed that the worm-like ﬁbrils are in fact made of rigid ﬁbril blocks associated laterally. Incomplete lateral association results in curvature of the longitudinal axis, producing the random-walk-like ﬁbril growth. This incomplete lateral association may also produce branching of ﬁbrils at the growing ends. Thus, in internal structure, the worm-like ﬁbrils of Aβ(1–40) are distinct from the ﬂexible and thin protoﬁlaments often observed for other amyloids [36,37]. On the other hand, the remarkable length suggests that the nucleation of the worm-like ﬁbrils does not occur frequently. As far as we know, an architecture as unique as that of type III ﬁbrils has not been reported previously. These results suggest that the amyloid ﬁbrils have high potential to form various high-order structures. We anticipate that the present classiﬁcation will apply to various amyloid ﬁbrils.

15.6 Conclusion We visualized the formation of amyloid ﬁbrils in real time at the single ﬁbril level. On the basis of the unique images of ﬁbrils, we classiﬁed the amyloid supramolecular ﬁbrillar assemblies of Aβ(1–40) ﬁbrils into three basic types: rigid and straight type I ﬁbrils, spherulitic type II ﬁbrils, and worm-like type III ﬁbrils (Fig. 15.5). This classiﬁcation is likely to be applicable to the ﬁbrils of other proteins as well. Considering the increased morphological variability in the spontaneous ﬁbrillation, interactions with surfaces at the early stages determine the ﬁnal morphological features. Diﬀerent amyloid supramolecular assemblies will have distinct biological impacts on the development and, furthermore, transmission of amyloidosis. Thus, clarifying the structural basis leading the various types of amyloid ﬁbrils at the diﬀerent levels, from the structure of amyloid precursors to protoﬁlament packing and interﬁbrillar interactions, is an important next step. The anatomy and taxonomy of amyloid supramolecular assemblies will be critical to the progress in amyloid structural biology. Acknowledgments We would like to acknowledge Hironobu Naiki (Fukui University), Tetsushi Wazawa (Tohoku University), Kenichi Morigaki (AIST), and Daizo Hamada (Kobe University) for their support and encouragement. This work was supported by the Grants-in-Aid from the Japanese Ministry of Education, Culture, Sports, Science and Technology, and by the Japan Society for Promotion of Science (JSPS) Research Fellowships for Young Scientists to TB.

298

H. Yagi et al.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.

J.C. Rochet, P.T. Lansbury Jr., Curr. Opin. Struct. Biol. 10, 60 (2000) C.M. Dobson, Nature 426, 884 (2003) V.N. Uversky, A.L. Fink, Biochim. Biophys. Acta 1698, 131 (2004) A.T. Petkova, R.D. Leapman, Z. Guo, W.M. Yau, M.P. Mattson, R. Tycko, Science 307, 262 (2005) C. Wasmer, A. Lange, H. Van Melckebeke, A.B. Siemer, R. Riek, B.H. Meier, Science 319, 1523 (2008) C. Ritter et al., Nature 435, 844 (2005) M. Hoshino, H. Katou, Y. Hagihara, K. Hasegawa, H. Naiki, Y. Goto, Nat. Struct. Biol. 9, 332 (2002) W. Dzwolak, V. Smirnovas, R. Jansen, R. Winter, Protein Sci. 13, 1927 (2004) E.M. Jones, W.K. Surewicz, Cell 121, 63 (2005) M. Tanaka, P. Chien, K. Yonekura, J.S. Weissman, Cell 121, 49 (2005) M. Tanaka, P. Chien, N. Naber, R. Cooke, J.S. Weissman, Nature 428, 323 (2004) P. Chien, J.S. Weissman, A.H. DePace, T.M. Annu. Rev. Biochem. 73, 617 (2004) T. Ban, D. Hamada, K. Hasegawa, H. Naiki, Y. Goto, J. Biol. Chem. 278, 16462 (2003) T. Ban, M. Hoshino, S. Takahashi, D. Hamada, K. Hasegawa, H. Naiki, Y. Goto, J. Mol. Biol. 344, 757 (2004) Ban T et al., J. Biol. Chem. 281, 33677 (2006) T. Ban, K. Yamaguchi, Y. Goto, Acc. Chem. Res. 39, 663 (2006) T. Ban, Y. Goto, Methods. Enzymol. 413, 91 (2006) H. Yagi, T. Ban, K. Morigaki, H. Naiki, Y. Goto, Biochemistry 46, 15009 (2007) T. Funatsu, Y. Harada, M. Tokunaga, K. Saito, T. Yanagida, Nature 374, 555 (1995) R. Yamasaki et al., J. Mol. Biol. 292, 965 (1999) T. Wazawa, M. Ueda, Adv. Biochem. Eng. Biotechnol. 95, 77 (2005) H. Naiki, K. Higuchi, M. Hosokawa, T. Takeda, Anal. Biochem. 177, 244 (1989) W. Hoyer, D. Cherny, V. Subramaniam, D.M. Jovin, J. Mol. Biol. 340, 127 (2004) T. Kowalewski, H.K. Holtzman, Proc. Natl. Acad. Sci. U. S. A. 96, 3688 (1999) G.H. Blackley, M.C. Sanders, C.J. Davies, S.J. Roberts, M.J. Tendler, P.O. Wilkinson, J. Mol. Biol. 298, 833 (2000) M. Zhu, S.A. Souillac, C. Ionescu-Zanetti, A.L. Carter, L.W. Fink, J. Biol. Chem. 277, 50914 (2002) Y.G. Jin et al., Proc. Natl. Acad. Sci. U. S. A. 100, 15294 (2003) K. Hsiao et al., Science 274, 99 (1996) L. Manuelidis, W. Fritch, J.D. Xi, Science 277, 94 (1997) P.T. Harper, M.R. Lansbury Jr., Annu. Rev. Biochem. 66, 385 (1997) C.E. Krebs, A.F. Macphee, I.E. Miller, C.M. Dunlop, A.M. Dobson, S.S. Donald, Proc. Natl. Acad. Sci. U. S. A. 101, 14420 (2004) M.R. Rogers, E.H. Krebs, A.M. Bromley, E. van der Linden, L.M. Donald, Biophys. J. 90, 1043 (2006) R. Raﬀen et al., Protein Sci. 8, 509 (1999) D.M. Sagis, C. Veerman, E. van der Linden, Langmuir 20, 924 (2004)

15 Real-Time Observation of Amyloid Fibril Growth

299

35. Y. Fezoui, D.M. Hartley, D.J. Walsh, J.J. Selkoe, D.B. Osterhout, D.P. Teplow, Nat. Struct. Biol. 7, 1095 (2000) 36. V.J. Hong, M. Gozu, K. Hasegawa, H. Naiki, Y. Goto, J. Biol. Chem. 277, 21554 (2002) 37. T.I. McParland et al., Biochemistry 39, 8735 (2000)

Index

N -formyl kynurenine, 274 θ point, 44 α-synuclein, 252, 268, 273, 292 αB-crystallin, 268 α-lactalbumin, 13, 14 α, α-1,1 linkage, 225, 231 α, α-1,1-glycosidic linkage, 219, 224 β-lactoglobulin, 295 β-sheets, 290 β2-microglobulin, 268, 273, 289 γ-crystallin, 272 Φ-value analysis, 13, 24 17 O-NMR spectroscopy, 222 31 P NMR, 235 p–T , 174 Aβ-peptide, 252, 257–259, 272 CH/π hydrogen bond, 145 Ca2+ -binding protein, 14 (oligomeric) species, 260 l-cysteine, 274 “Maltese-cross” extinction pattern, 295 “condensation-ordering” mechanism of aggregation, 249 3D distribution function, 196, 198, 200, 206 3D-RISM, 190, 192, 196, 200, 202, 205, 207, 208 Aβ(1–40) amyloid ﬁbrils, 292 Aβ-amyloid ﬁbrils, 290 Aβ, 255, 291 AA amyloidosis, 246 ab initio shape prediction, 137

accelerated molecular dynamics, 212, 213 acetylcholinesterase, 213, 214 actin, 217 actin ﬁlaments, 251 active site, 213, 215, 216 acylphosphatase, 243, 250 AFM, 290 ageing, 262 aggregation, 241 AL amyloidosis, 246 alanine dipeptide, 80 Alzheimer’s, 245 Alzheimer’s disease, 246, 253, 261, 268, 295 Alzheimer’s disease and Type II diabetes, 241 amylin, 252 amyloid, 245, 268 amyloid β, 289 amyloid diseases, 256 amyloid ﬁbril, 249–251, 289 amyloid intermediate, 269 amyloid supramolecular assemblies, 290 amyloid supramolecular ﬁbrillar assemblies, 297 amyloidogenesis, 267 amyloidogenic, 267 amyotrophic lateral sclerosis, 246 analytical generalized born plus non-polar (AGBNP), 99 anhydrobiosis, 219, 229 ankyrin-repeat, 130

302

Index

antibodies, 261 antioxidant function, 235 ApoAI amyloidosis, 246 apomyoglobin, 1 AppA, 149, 157 Arctic mutation, 259 association, 149 atomic force microscopy (AFM), 248, 290 ATP hydrolysis, 217 ATP synthases, 216 B1 domain of streptococcal protein G, 88 bacteriorhodopsin, 86 binding, 215, 216 binding free energy, 201 biological evolution, 262 biological self-assembly, 241 biological switch, 127 biosensor, 150 BLUF, 157 Boltzmann factor, 63 Brownian dynamics simulations, 215 C-peptide of ribonuclease A, 78 C. elegans, 260 calcium binding protein, 204 calcium binding site, 203 cancer, 244 capillary method, 150 catalase, 274 cellular, 214 channel, 213, 214 chaperone, 253 chaperonin, 214 chemical chaperone, 220 chromatin structure, 57 circular dichroism, 124 closure relations, 193 cluster analysis, 29 coarse-grained models, 215 coil-globule transition, 43, 44 computational, 211 computer simulation, 243 computer simulation methods, 248 conformation, 217 conformation change, 149 conformational changes, 138, 216

conformational ensemble, 124 conformational ﬂuctuations, 212, 214 conformational substates, 212, 213 Congo red, 247 coordination numbers, 201 coupled folding and binding, 124 Creutzfeldt–Jakob disease, 253, 295 cross-β, 247 cross-β structure, 249 cross-β sheets, 289 crowded, 215 crowded molecular environment of the cell, 242 cryo-electron microscopy, 248 cryptic binding site, 214 cystic ﬁbrosis, 244 cytochrome c, 154 cytochrome P450, 110 cytoskeleton, 217 degradation, 260 dehydration penalty, 188, 202, 236 densitometric studies, 178 density of states, 64 density pair distribution function, 190 desiccation tolerance, 230 diagrams, 174 dialysis-related amyloidosis, 260, 268 diﬀerential scanning calorimetry (DSC), 226 diﬀerentiation, 242 diﬀusion, 212, 214 diﬀusion coeﬃcient, 149, 150, 154 diﬀusion detected biosensor, 168, 170 diﬀusion peak, 166 diﬀusion-controlled, 213 dimer, 161, 162 dimerization, 161, 163 direct correlation function, 191 dissociation, 149, 163 disulphide bond formation, 244 DNA condensation, 40 docking simulation, 188 donor–acceptor distance, 144 dose-response curve, 201 Drosophila meganister, 257 drug design, 188 drug discovery, 211, 214 drugs, 215

Index DSC, 226, 231 dynamics, 216 electron microscopy, 289 electron transfer, 217 endoplasmic reticulum (ER), 245 energy landscape, 212, 213, 216, 243 energy surface, 243 enthalpy change, 152 enthalpy relaxation, 228 enzymatic reaction, 187, 208 enzyme, 212, 213 enzyme dihydrofolate reductase, 216 evolution, 211, 213 evolutionary selection, 241 familial amyloidotic polyneuropathy, 246 familial diseases, 253 ﬁbril extension, 272 ﬁbrillogenesis, 271 ﬁbrils, 291 ﬁbronectin, 214 ﬁnal slope, 142 Finnish hereditary amyloidosis, 246 ﬂuctuation analysis, 138 FMN, 163 folding, 13, 138 folding intermediate, 15 folding of proteins, 241 folding pathway, 1 Fourier transform infrared (FTIR) spectroscopy, 225, 290 fractal dimension, 142 fringe length, 151 fruit ﬂy, 242, 257 FTIR, 225 FTIR imaging spectroscopy, 231 FTIR spectra, 233 functional unfolded proteins, 122 funnel, 243 G proteins, 216 G-peptide, 104 gain of function diseases, 254 gate, 214 gate dynamics, 214 gated, 215 gated binding, 215

303

gating, 213, 214 GDP, 216 gel-to-liquid crystalline temperature, 234 gene expression, 256 gene therapy, 261 generalized Born, 98 generalized ensemble, 61 generalized Langevin equation, 208 generalized-ensemble algorithm, 61, 63 generic feature of, 248 glass transition temperatures, 228 glassy state, 232 gluco-disaccharides, 228 good solvent, 44 GTP, 216 hemodialysis-related amyloidosis, 246 hen egg-white lysozyme, 196 hereditary, 247 hereditary cerebral haemorrhage with amyloidosis, 247 high-angle solution X-ray scattering, 137, 139 high-pressure structure, 205 hinge-bending motions, 21 histones, 56 HIV integrase, 214 HIV protease enzyme, 215 HIV-1 integrase inhibitor, 215 HNC closure, 193 housekeeping mechanisms, 255 human lysozyme, 202 Huntington’s disease, 246 hydration, 30, 221, 223 hydration number, 222 hydration structure, 197 hydrodynamic radius, 154 hydrodynamic volume, 221 hydrogen abstraction reaction, 236 hydrogen bonds, 213, 290 hydrogen exchange, 23 hydrogen exchange pulse labeling, 2 hydrogen/deuterium exchange, 289 hydrophobic collapse, 1 hydrophobic interaction, 199 hydrophobicity, 293 hydroxyl, 274

304

Index

immune response, 244 immunoglobulin, 295 immunoglobulin-like domains, 130 inelastic light scattering, 223 inﬂuenza, 217 inhibitor of NF-κB (IκB), 129 injection-localised amyloidosis, 247 interprotein interaction, 168 intrachain segregation, 41, 42 intramolecular correlation function, 192 intrinsically disordered, 123 ion channel, 208 Isentress, 214 isobaric-isothermal ensemble, 67 isomerization, 138 KH closure, 193 kinetic measurement, 139 kinetic model of the G-Peptide, 108 kinetics, 214, 216 Kirkwood–Buﬀ equation, 194 Kuhn segment, 43 Kuru, 253 kynurenine, 274 lactate dehydrogenase, 216 landscape, 243 late embryogenesis abundant (LEA) proteins, 234 LBHB, 144 Le Chaterier’s law, 205, 206 ligand binding, 214 ligand binding sites, 199 ligand–receptor binding, 213 liquid crystalline state, 234 locomotor defects, 257 loss of function diseases, 244 LOV, 163 low-barrier hydrogen bond, 144 lysozyme, 198, 200, 252, 260 lysozyme amyloidosis, 246 maltose, 221 mannitol, 274 maximum dimension, 137 MC, 61 MD, 61 MD simulation, 224 MD unfolding simulations, 18, 26

mean activity coeﬃcient, 202 Medullary carcinoma of the thyroid, 246 membranes, 217 methionine, 13 Metropolis algorithm, 64 model organisms, 262 molecular chaperones, 244, 254, 255 molecular clocks, 216 molecular dynamics, 13, 61, 207, 212, 215 molecular dynamics simulation, 138, 217 molecular evolution, 253 molecular Ornstein–Zernike (MOZ), 191 molecular recognition, 187, 207 molten globule, 2, 15, 23, 139 Monte Carlo, 61, 207 motions, 211 MREM, 63 MUCA, 62 multibaric-multithermal, 80 multibaric-multithermal algorithm, 68 multicanonical algorithm, 61–63 multicanonical ensemble, 67 multidimensional replica-exchange method, 63 multiple binding partners, 123 multiple hydrogen bonds, 236 myoglobin, 153, 212–214, 248 N-terminal, 13 N-terminal methionine, 16 NADH, 216 NADPH, 216 nanotechnology, 248 native and amyloid structures, 250 natural selection, 256 negative phototaxis, 138 nerve, 213 net charge, 292 network model of protein folding, 103 neurodegenerative diseases, 245, 246 neuromuscular junctions, 213 neuronal dysfunction, 258, 259 neurotransmitter acetylcholine, 213 neutron crystallography, 145 new view of protein folding, 244 nicotinamide, 216

Index NMR, 211 NMR relaxation, 1 NMR spectroscopy, 1, 124 noble gas, 199 non-neuropathic localised amyloidoses, 246 non-neuropathic systemic amyloidoses, 246 nonequilibrium, 214 nonpolar hydration, 101 Nos´e-Andersen, 68 nuclear factor-kappaB (NF-κB), 129 nuclear localization signal, 130 nuclear magnetic resonance, 212 nucleation, 272, 295 nucleosome, 56 nucleus, 289 oﬀ-pathway, 270 old age, 256, 261 oligomeric or pre-ﬁbrillar aggregates, 245 oligomeric species, 295 oligomerization, 149 on the edge, 258 OPLS all-atom force ﬁeld, 100 Ornstein–Zernike equation, 191 P. vanderplanki, 230 pair correlation function, 191 pancreatic trypsin inhibitor, 212 Parkinson’s disease, 245, 246, 268 partial molar compressibility, 222 partial molar enthalpy, 83 partial molar heat capacity, 222 partial molar volume, 84, 194, 204, 205, 221 partial molar volume of proteins, 195 PAS domain, 138 pearl-necklace globule, 52 Percus trick, 193 peroxyl radicals, 274 persistent length, 39 phase images, 282 phospholipids, 233 photoactive yellow protein, 137, 138, 149, 155 photodissociation, 167 photoexcitation, 217

305

photointermediate, 137, 139 photooxidation, 273 photoreceptor, 138 photosensor, 148 photosignal transduction, 138 photosynthetic reaction center, 217 phototropins, 149, 163 plasticizer, 229 polyalanine, 248 polylysine, 248 polymorphism, 225 Polypedilum vanderplanki, 219, 230 polypeptide, 215 polyQ, 247 polythreonine, 248 poor solvent, 44 population grating, 152 positron annihilation lifetime spectroscopy, 227 pre-ﬁbrillar aggregates, 255 preferential hydration model, 235 prenucleation stage, 268 pressure denaturation of protein, 204 pressure perturbation calorimetry, 173 pressure unfolding studies, 174 prion amyloids, 290 prion disorders, 253 prion protein, 268, 273 product release, 211, 216, 217 proline isomerisation, 244 protein, 216, 217 protein aggregation, 235, 236 protein dynamics, 211, 212 protein folding, 1, 13, 61, 97, 207 protein folding, misfolding and aggregation, 262 protein loop prediction, 103 protein misfolding, 260 protein misfolding diseases, 241 protoﬁlaments, 250, 268, 282, 295 proton transfer, 144 PYP, 138, 155 quality control mechanisms, 244 radial distribution function, 190 radius of gyration, 137 raltegravir, 214 Raman spectroscopy, 223

306

Index

random coil, 40 rate constants, 215 Rayleigh instability, 52 real-time observation, 291 refractive index, 150 regulation of cell growth, 244 regulation of cellular growth, 242 relaxation dispersion, 8 release, 216 REM, 62 reorganization energy, 217 replica exchange molecular dynamics, 102 replica-exchange method, 61, 62, 71 residual entropy of the ordinary ice, 76 ribosome, 244, 254 rings-on-a-string, 52, 53 RISM, 190, 192, 207, 208 rubber state, 232 salt bridge, 217 secondary structure packing, 137 segmental Q-coordinates, 27 selective ion-binding, 201 selectivity, 214 self-assemble, 241 self-assembly, 267 semiﬂexible polymers, 43 senile plaques, 295 senile systemic amyloidosis, 246 SH3 domain, 250 signal transduction, 216 signaling networks, 123 simulating replica exchange, 112 simulation, 211, 214, 215 single-molecule, 212 single-residue mutations, 258 singlet oxygen, 274 small heat shock proteins, 267 solid-state NMR, 248 solubility, 241, 252, 260 solution X-ray scattering, 137 solvation free energy, 194 species grating, 152 spectral silent processes, 150 spherulites, 295 spider silk, 251 spin label, 4 spongiform encephalopathies, 245, 246

spontaneous ﬁbrillation, 295 sporadic, 247 SPR, 169 staphylococcal nuclease, 176 static measurement, 139 stem-cell techniques, 261 stereospeciﬁcity, 216 Stokes–Einstein equation, 154 stop-and-run mechanism, 292 stopped-ﬂow circular dichroism, 17 strong short hydrogen bond, 144 structural change, 137 subprotoﬁbrils, 268, 282 substrate binding, 211, 216 sucrose, 221 Sup35, 290 superoxide, 274 superoxide dismutase (SOD), 274 supramolecular, 214 surface plasmon, 150 surface plasmon resonance (SPR), 150 synapses, 214 Taylor dispersion, 150 terahertz absorption spectroscopy, 222 therapeutic intervention, 261 thermal expansivity, 177 thermal expansivity and ΔV , 179 thermal grating, 152 thioﬂavin T, 289 third-generation synchrotron radiation sources, 138 three-dimensional distribution, 194 three-dimensional reference interaction site model (3D-RISM), 189 time dependence, 211 timescales, 215 TIRFM, 289 total correlation function, 191 total internal reﬂection, 290 total internal reﬂection ﬂuorescence microscopy, 289 toxicity, 255, 258 traﬃcking, 244 traﬃcking of molecules, 242 transcriptional activator CBP, 126 transient grating, 149 transient grating method, 139 transition state, 13, 24, 29

Index translocation, 244 transmissible spongiform encephalopathies (TSE), 268 trehalose, 219, 221 trehalose transporter, 237 triosephosphate isomerase, 213 type II diabetes, 245, 246 ubiquitin, 205 unfolding, 13, 138 unfolding pathway, 29 unsaturated fatty acid, 235 UV Light, 272 viscosity, 221 vitriﬁcation hypothesis, 229 volume change, 175 volume grating, 152

water, 212, 213, 241 water channel, 227 water entrapment hypothesis, 229 water replacement hypothesis, 229 water stresses, 229 water structure breaker, 223, 225 water structure maker, 225, 234 water with biomolecules, 261 water-binding sites, 197 WHAM, 61 worm-like ﬁbrils, 295, 297

X-ray diﬀractometry, 226 X-ray ﬁber diﬀraction, 248, 289 xenon, 200 xenon sites, 198

307